# HG changeset patch # User Paul Boddie # Date 1374015748 -7200 # Node ID 5749ec315d4db6ba359b115eb60a0ae2ff5bf924 # Parent 9ba8a7ddb583e57a3409a8c9047ccae966b95518 Added ACLs to comments so that only their authors can edit them. Added more documentation, including details of generating identifier mappings and information about preserving user details when installing pages in a wiki. Improved the converter's help text. diff -r 9ba8a7ddb583 -r 5749ec315d4d README.txt --- a/README.txt Wed Jul 17 00:20:21 2013 +0200 +++ b/README.txt Wed Jul 17 01:02:28 2013 +0200 @@ -2,8 +2,8 @@ ------------ ConfluenceConverter is a distribution of software that converts exported data -from Confluence Wiki instances, provided in the form of an XML file, to a -collection of Wiki pages and resources that can be imported into a MoinMoin +from Confluence wiki instances, provided in the form of an XML file, to a +collection of wiki pages and resources that can be imported into a MoinMoin instance as a page package. Prerequisites @@ -21,7 +21,7 @@ in the MoinMoin distribution. The moinsetup program is highly recommended for the installation of page -packages and the management of MoinMoin Wiki instances: +packages and the management of MoinMoin wiki instances: http://moinmo.in/ScriptMarket/moinsetup @@ -30,6 +30,18 @@ http://moinmo.in/HelpOnPackageInstaller +MoinMoin Prerequisites +---------------------- + +The page package installer does not preserve user information when installing +page revisions. This can be modified by applying a patch to MoinMoin as +follows while at the top level of the MoinMoin source distribution: + +patch -p1 $CCDIR/patches/patch-moin-1.9-MoinMoin-packages.diff + +Here, CCDIR is the path to the top level of this source distribution where +this README.txt file is found. + Wiki Content Prerequisites -------------------------- @@ -43,17 +55,17 @@ Quick Start ----------- -Given an XML file for a Confluence Wiki instance (in the example below, the -file is called com_entities.xml), the following command can be used to prepare -a page package for MoinMoin: +Given an XML export archive file for a Confluence wiki instance (in the +example below, the file is called COM-123456-789012.zip), the following +command can be used to prepare a page package for MoinMoin: -python convert.py com_entities.xml COM attachments +python convert.py COM-123456-789012.zip COM In addition to the filename, a workspace name is required. Confluence appears to require a workspace as a container for collections of pages, but this also -permits us to selectively import parts of a Wiki into MoinMoin. If a directory -of attachments is also specified, these will be imported into the page -package. +permits us to selectively import parts of a wiki into MoinMoin. If attachments +were included in the export from Confluence, these will be imported into the +page package. The result of the above command will be a directory having the same name as the chosen workspace, together with a zip archive for that directory's @@ -66,6 +78,50 @@ This requires a suitable moinsetup.cfg file in the working directory. +Mappings from Identifiers to Pages +---------------------------------- + +Confluence uses numbers to label content revisions, and links to Confluence +sites sometimes use these numbers instead of a readable page name. MoinMoin, +meanwhile, only uses page names and has no external numeric identifier scheme. +Consequently, it is necessary to produce a mapping from Confluence identifiers +to MoinMoin page names. In addition to numeric identifiers, Confluence also +provides "tiny URLs" which are an alphanumeric encoding of the numeric +identifiers. + +To generate mappings for the Confluence content, use the mappings script as +follows: + +tools/mappings COM + +Here, COM is a directory name containing converted Confluence content, +corresponding to a space name in the original Confluence wiki. More than one +space name can be used to generate a complete mapping for a site. + +The following files are generated: + + * mapping-id-to-page.txt + * mapping-tiny-to-id.txt + * mapping-tiny-to-page.txt + +The most useful of these is the first as it includes all the necessary +information provided by the arbitrary mapping from identifiers to page names. +The second mapping merely converts the "tiny URLs" to identifiers, which can +be done by applying an algorithm without any external knowledge of the wiki +structure. The third mapping is provided as a convenience, combining the "tiny +URL" conversion and the arbitrary mapping to page names. + +Where Web server facilities such as RewriteMap are available for use, the +first and third mapping files can be used directly. See the Apache +documentation for details of RewriteMap: + +http://httpd.apache.org/docs/2.4/rewrite/rewritemap.html + +Otherwise, it is more likely that the first file is used by a program that can +perform a redirect to the appropriate wiki page, and the "tiny URL" decoding +is also done by this program when deployed in a suitable location to receive +such requests. + Output Structure ---------------- @@ -82,6 +138,8 @@ Each page directory contains the following things: + * pagetype (either "Page", "Comment" or "BlogPost") + * manifest (a list of version entries in a format similar to the MoinMoin page package manifest format) @@ -93,6 +151,8 @@ * children (a list of child page names defined for the page) + * comments (a list of creation date plus comment page identifier pairs) + In the output structure, content items such as comments are represented as pages and each reference a content version. Since comments will ultimately be represented as subpages of some parent page, they will have a pagetitle file @@ -105,14 +165,14 @@ The page package import activity in particular can be a source of problems. Generally, any error occurring when attempting to import a package is likely to be due to insufficient privileges when writing to the pages directory of a -Wiki or to its edit-log file. +wiki or to its edit-log file. -The moinsetup software can generate scripts that set the ownership of Wiki +The moinsetup software can generate scripts that set the ownership of wiki files or apply ACLs (access control lists) to those files in order to make -access to Wiki data more convenient. Where the ownership of the files must be +access to wiki data more convenient. Where the ownership of the files must be set (to www-data or nobody), the import step can be run as that user given sufficient privileges. However, the easiest solution is to apply ACLs, thus -allowing the user who created the Wiki to retain write access to it. +allowing the user who created the wiki to retain write access to it. Contact, Copyright and Licence Information ------------------------------------------ diff -r 9ba8a7ddb583 -r 5749ec315d4d TO_DO.txt --- a/TO_DO.txt Wed Jul 17 00:20:21 2013 +0200 +++ b/TO_DO.txt Wed Jul 17 01:02:28 2013 +0200 @@ -1,3 +1,41 @@ +Enhancements +============ + +User and timestamp recording + + (Attempt to enforce user-switching in the page import script) + +Comment ownership and presentation + + (Generate an ACL for each comment page) + + Generate a user profile box linking to a user's profile image + +User imports + + Investigate user information in any export files + + Probably absent, see: + https://confluence.atlassian.com/display/CONFKB/How+to+Export+User+Data+to+CSV+in+Confluence + + The following may be JIRA-related: + https://confluence.atlassian.com/display/AOD/Exporting+wiki+data + + Alternatively, just collect user details from the combined history details + + Make ACL-protected pages for all users, including any profile images as + attachments + +User activity + + Consider incorporating activity information for each user using a macro + placed on user home pages + + + +Issues +====== + DEV/GSoC 2011 - Conversion from Confluence wiki to Moin (11960378) End tags for strong, em immediately followed by start tags for strong, em @@ -28,17 +66,15 @@ Make redirect pages so that COM redirects to COM/Home, and so on. -Comment ownership and presentation. - Page identifier links: http://wiki.list.org/pages/viewpage.action?pageId=4816921 (COM/Home) -Macros: {toc} - Mostly Handled ============== +Macros: {toc} + DEV/A 5 Minute Guide to Get the Mailman Web UI Running (only for development) (13303877) (Preformatted regions on their own line might be converted into proper sections) diff -r 9ba8a7ddb583 -r 5749ec315d4d convert.py --- a/convert.py Wed Jul 17 00:20:21 2013 +0200 +++ b/convert.py Wed Jul 17 01:02:28 2013 +0200 @@ -115,6 +115,10 @@ title = "%s/%s" % (self.space, title) write(join(pages_dir, pageid, "pagetitle"), title) + # Note the type of the page. + + write(join(pages_dir, pageid, "pagetype"), objecttype) + # See sort_manifest for access to this data. append(join(pages_dir, pageid, "manifest"), @@ -419,12 +423,15 @@ titles will be appended to the file having that filename. """ + pagetype = join(pages_dir, pageid, "pagetype") manifest = join(pages_dir, pageid, "manifest") attachments = join(pages_dir, pageid, "attachments") pagetitle = join(pages_dir, pageid, "pagetitle") children = join(pages_dir, pageid, "children") comments = join(pages_dir, pageid, "comments") + type = exists(pagetype) and read(pagetype) or None + if exists(pagetitle): title = read(pagetitle) space, _page_name = get_space_and_name(title) @@ -443,6 +450,13 @@ # Modify the content to include child pages and comments. for _action, _archive_filename, filename, new_title, username, comment in result: + text = read(filename) + + # Add an ACL to comment pages so that people cannot change other + # people's comments. + + if type == "Comment": + text = "#acl %s:read,write,delete,revert All:read\n%s" % (username, text) # Add child page information to the content. @@ -462,12 +476,16 @@ child_pages.append(" * [[%s|%s]]" % (child_page_name, child_page_label)) - append(filename, child_page_section % "\n".join(child_pages)) + text += child_page_section % "\n".join(child_pages) # Add comments to the content. if exists(comments) and title and not no_translate: - append(filename, comment_section % title) + text += comment_section % title + + # Rewrite the file. + + write(filename, text) # Add the attachments to the manifest. @@ -521,14 +539,20 @@ Please specify an XML file containing Wiki data, a workspace name, and an optional attachments directory location. For example: -com_entities.xml COM attachments +%(progname)s com_entities.xml COM attachments Adding --no-translate will unpack the Wiki but not translate the content. When doing so without an attachments directory, add an empty argument as follows: -com_entities.xml COM '' --no-translate -""" +%(progname)s com_entities.xml COM '' --no-translate + +An archive can be used instead of the XML file, and since this may include +attachments, no additional attachments directory needs to be specified: + +%(progname)s COM-123456-789012.zip COM +""" % {"progname" : split(sys.argv[0])[-1]} + sys.exit(1) no_translate = "--no-translate" in sys.argv