1.1 --- a/README.txt Wed Jul 17 00:20:21 2013 +0200
1.2 +++ b/README.txt Wed Jul 17 01:02:28 2013 +0200
1.3 @@ -2,8 +2,8 @@
1.4 ------------
1.5
1.6 ConfluenceConverter is a distribution of software that converts exported data
1.7 -from Confluence Wiki instances, provided in the form of an XML file, to a
1.8 -collection of Wiki pages and resources that can be imported into a MoinMoin
1.9 +from Confluence wiki instances, provided in the form of an XML file, to a
1.10 +collection of wiki pages and resources that can be imported into a MoinMoin
1.11 instance as a page package.
1.12
1.13 Prerequisites
1.14 @@ -21,7 +21,7 @@
1.15 in the MoinMoin distribution.
1.16
1.17 The moinsetup program is highly recommended for the installation of page
1.18 -packages and the management of MoinMoin Wiki instances:
1.19 +packages and the management of MoinMoin wiki instances:
1.20
1.21 http://moinmo.in/ScriptMarket/moinsetup
1.22
1.23 @@ -30,6 +30,18 @@
1.24
1.25 http://moinmo.in/HelpOnPackageInstaller
1.26
1.27 +MoinMoin Prerequisites
1.28 +----------------------
1.29 +
1.30 +The page package installer does not preserve user information when installing
1.31 +page revisions. This can be modified by applying a patch to MoinMoin as
1.32 +follows while at the top level of the MoinMoin source distribution:
1.33 +
1.34 +patch -p1 $CCDIR/patches/patch-moin-1.9-MoinMoin-packages.diff
1.35 +
1.36 +Here, CCDIR is the path to the top level of this source distribution where
1.37 +this README.txt file is found.
1.38 +
1.39 Wiki Content Prerequisites
1.40 --------------------------
1.41
1.42 @@ -43,17 +55,17 @@
1.43 Quick Start
1.44 -----------
1.45
1.46 -Given an XML file for a Confluence Wiki instance (in the example below, the
1.47 -file is called com_entities.xml), the following command can be used to prepare
1.48 -a page package for MoinMoin:
1.49 +Given an XML export archive file for a Confluence wiki instance (in the
1.50 +example below, the file is called COM-123456-789012.zip), the following
1.51 +command can be used to prepare a page package for MoinMoin:
1.52
1.53 -python convert.py com_entities.xml COM attachments
1.54 +python convert.py COM-123456-789012.zip COM
1.55
1.56 In addition to the filename, a workspace name is required. Confluence appears
1.57 to require a workspace as a container for collections of pages, but this also
1.58 -permits us to selectively import parts of a Wiki into MoinMoin. If a directory
1.59 -of attachments is also specified, these will be imported into the page
1.60 -package.
1.61 +permits us to selectively import parts of a wiki into MoinMoin. If attachments
1.62 +were included in the export from Confluence, these will be imported into the
1.63 +page package.
1.64
1.65 The result of the above command will be a directory having the same name as
1.66 the chosen workspace, together with a zip archive for that directory's
1.67 @@ -66,6 +78,50 @@
1.68
1.69 This requires a suitable moinsetup.cfg file in the working directory.
1.70
1.71 +Mappings from Identifiers to Pages
1.72 +----------------------------------
1.73 +
1.74 +Confluence uses numbers to label content revisions, and links to Confluence
1.75 +sites sometimes use these numbers instead of a readable page name. MoinMoin,
1.76 +meanwhile, only uses page names and has no external numeric identifier scheme.
1.77 +Consequently, it is necessary to produce a mapping from Confluence identifiers
1.78 +to MoinMoin page names. In addition to numeric identifiers, Confluence also
1.79 +provides "tiny URLs" which are an alphanumeric encoding of the numeric
1.80 +identifiers.
1.81 +
1.82 +To generate mappings for the Confluence content, use the mappings script as
1.83 +follows:
1.84 +
1.85 +tools/mappings COM
1.86 +
1.87 +Here, COM is a directory name containing converted Confluence content,
1.88 +corresponding to a space name in the original Confluence wiki. More than one
1.89 +space name can be used to generate a complete mapping for a site.
1.90 +
1.91 +The following files are generated:
1.92 +
1.93 + * mapping-id-to-page.txt
1.94 + * mapping-tiny-to-id.txt
1.95 + * mapping-tiny-to-page.txt
1.96 +
1.97 +The most useful of these is the first as it includes all the necessary
1.98 +information provided by the arbitrary mapping from identifiers to page names.
1.99 +The second mapping merely converts the "tiny URLs" to identifiers, which can
1.100 +be done by applying an algorithm without any external knowledge of the wiki
1.101 +structure. The third mapping is provided as a convenience, combining the "tiny
1.102 +URL" conversion and the arbitrary mapping to page names.
1.103 +
1.104 +Where Web server facilities such as RewriteMap are available for use, the
1.105 +first and third mapping files can be used directly. See the Apache
1.106 +documentation for details of RewriteMap:
1.107 +
1.108 +http://httpd.apache.org/docs/2.4/rewrite/rewritemap.html
1.109 +
1.110 +Otherwise, it is more likely that the first file is used by a program that can
1.111 +perform a redirect to the appropriate wiki page, and the "tiny URL" decoding
1.112 +is also done by this program when deployed in a suitable location to receive
1.113 +such requests.
1.114 +
1.115 Output Structure
1.116 ----------------
1.117
1.118 @@ -82,6 +138,8 @@
1.119
1.120 Each page directory contains the following things:
1.121
1.122 + * pagetype (either "Page", "Comment" or "BlogPost")
1.123 +
1.124 * manifest (a list of version entries in a format similar to the MoinMoin
1.125 page package manifest format)
1.126
1.127 @@ -93,6 +151,8 @@
1.128
1.129 * children (a list of child page names defined for the page)
1.130
1.131 + * comments (a list of creation date plus comment page identifier pairs)
1.132 +
1.133 In the output structure, content items such as comments are represented as
1.134 pages and each reference a content version. Since comments will ultimately be
1.135 represented as subpages of some parent page, they will have a pagetitle file
1.136 @@ -105,14 +165,14 @@
1.137 The page package import activity in particular can be a source of problems.
1.138 Generally, any error occurring when attempting to import a package is likely
1.139 to be due to insufficient privileges when writing to the pages directory of a
1.140 -Wiki or to its edit-log file.
1.141 +wiki or to its edit-log file.
1.142
1.143 -The moinsetup software can generate scripts that set the ownership of Wiki
1.144 +The moinsetup software can generate scripts that set the ownership of wiki
1.145 files or apply ACLs (access control lists) to those files in order to make
1.146 -access to Wiki data more convenient. Where the ownership of the files must be
1.147 +access to wiki data more convenient. Where the ownership of the files must be
1.148 set (to www-data or nobody), the import step can be run as that user given
1.149 sufficient privileges. However, the easiest solution is to apply ACLs, thus
1.150 -allowing the user who created the Wiki to retain write access to it.
1.151 +allowing the user who created the wiki to retain write access to it.
1.152
1.153 Contact, Copyright and Licence Information
1.154 ------------------------------------------
2.1 --- a/TO_DO.txt Wed Jul 17 00:20:21 2013 +0200
2.2 +++ b/TO_DO.txt Wed Jul 17 01:02:28 2013 +0200
2.3 @@ -1,3 +1,41 @@
2.4 +Enhancements
2.5 +============
2.6 +
2.7 +User and timestamp recording
2.8 +
2.9 + (Attempt to enforce user-switching in the page import script)
2.10 +
2.11 +Comment ownership and presentation
2.12 +
2.13 + (Generate an ACL for each comment page)
2.14 +
2.15 + Generate a user profile box linking to a user's profile image
2.16 +
2.17 +User imports
2.18 +
2.19 + Investigate user information in any export files
2.20 +
2.21 + Probably absent, see:
2.22 + https://confluence.atlassian.com/display/CONFKB/How+to+Export+User+Data+to+CSV+in+Confluence
2.23 +
2.24 + The following may be JIRA-related:
2.25 + https://confluence.atlassian.com/display/AOD/Exporting+wiki+data
2.26 +
2.27 + Alternatively, just collect user details from the combined history details
2.28 +
2.29 + Make ACL-protected pages for all users, including any profile images as
2.30 + attachments
2.31 +
2.32 +User activity
2.33 +
2.34 + Consider incorporating activity information for each user using a macro
2.35 + placed on user home pages
2.36 +
2.37 +
2.38 +
2.39 +Issues
2.40 +======
2.41 +
2.42 DEV/GSoC 2011 - Conversion from Confluence wiki to Moin (11960378)
2.43
2.44 End tags for strong, em immediately followed by start tags for strong, em
2.45 @@ -28,17 +66,15 @@
2.46
2.47 Make redirect pages so that COM redirects to COM/Home, and so on.
2.48
2.49 -Comment ownership and presentation.
2.50 -
2.51 Page identifier links: http://wiki.list.org/pages/viewpage.action?pageId=4816921 (COM/Home)
2.52
2.53 -Macros: {toc}
2.54 -
2.55
2.56
2.57 Mostly Handled
2.58 ==============
2.59
2.60 +Macros: {toc}
2.61 +
2.62 DEV/A 5 Minute Guide to Get the Mailman Web UI Running (only for development) (13303877)
2.63
2.64 (Preformatted regions on their own line might be converted into proper sections)
3.1 --- a/convert.py Wed Jul 17 00:20:21 2013 +0200
3.2 +++ b/convert.py Wed Jul 17 01:02:28 2013 +0200
3.3 @@ -115,6 +115,10 @@
3.4 title = "%s/%s" % (self.space, title)
3.5 write(join(pages_dir, pageid, "pagetitle"), title)
3.6
3.7 + # Note the type of the page.
3.8 +
3.9 + write(join(pages_dir, pageid, "pagetype"), objecttype)
3.10 +
3.11 # See sort_manifest for access to this data.
3.12
3.13 append(join(pages_dir, pageid, "manifest"),
3.14 @@ -419,12 +423,15 @@
3.15 titles will be appended to the file having that filename.
3.16 """
3.17
3.18 + pagetype = join(pages_dir, pageid, "pagetype")
3.19 manifest = join(pages_dir, pageid, "manifest")
3.20 attachments = join(pages_dir, pageid, "attachments")
3.21 pagetitle = join(pages_dir, pageid, "pagetitle")
3.22 children = join(pages_dir, pageid, "children")
3.23 comments = join(pages_dir, pageid, "comments")
3.24
3.25 + type = exists(pagetype) and read(pagetype) or None
3.26 +
3.27 if exists(pagetitle):
3.28 title = read(pagetitle)
3.29 space, _page_name = get_space_and_name(title)
3.30 @@ -443,6 +450,13 @@
3.31 # Modify the content to include child pages and comments.
3.32
3.33 for _action, _archive_filename, filename, new_title, username, comment in result:
3.34 + text = read(filename)
3.35 +
3.36 + # Add an ACL to comment pages so that people cannot change other
3.37 + # people's comments.
3.38 +
3.39 + if type == "Comment":
3.40 + text = "#acl %s:read,write,delete,revert All:read\n%s" % (username, text)
3.41
3.42 # Add child page information to the content.
3.43
3.44 @@ -462,12 +476,16 @@
3.45
3.46 child_pages.append(" * [[%s|%s]]" % (child_page_name, child_page_label))
3.47
3.48 - append(filename, child_page_section % "\n".join(child_pages))
3.49 + text += child_page_section % "\n".join(child_pages)
3.50
3.51 # Add comments to the content.
3.52
3.53 if exists(comments) and title and not no_translate:
3.54 - append(filename, comment_section % title)
3.55 + text += comment_section % title
3.56 +
3.57 + # Rewrite the file.
3.58 +
3.59 + write(filename, text)
3.60
3.61 # Add the attachments to the manifest.
3.62
3.63 @@ -521,14 +539,20 @@
3.64 Please specify an XML file containing Wiki data, a workspace name, and an
3.65 optional attachments directory location. For example:
3.66
3.67 -com_entities.xml COM attachments
3.68 +%(progname)s com_entities.xml COM attachments
3.69
3.70 Adding --no-translate will unpack the Wiki but not translate the content.
3.71 When doing so without an attachments directory, add an empty argument as
3.72 follows:
3.73
3.74 -com_entities.xml COM '' --no-translate
3.75 -"""
3.76 +%(progname)s com_entities.xml COM '' --no-translate
3.77 +
3.78 +An archive can be used instead of the XML file, and since this may include
3.79 +attachments, no additional attachments directory needs to be specified:
3.80 +
3.81 +%(progname)s COM-123456-789012.zip COM
3.82 +""" % {"progname" : split(sys.argv[0])[-1]}
3.83 +
3.84 sys.exit(1)
3.85
3.86 no_translate = "--no-translate" in sys.argv