ConfluenceConverter (file README.txt at 3255ed8e2426)

     1 Introduction
     2 ------------
     3 
     4 ConfluenceConverter is a distribution of software that converts exported data
     5 from Confluence wiki instances, provided in the form of an XML file, to a
     6 collection of wiki pages and resources that can be imported into a MoinMoin
     7 instance as a page package.
     8 
     9 Prerequisites
    10 -------------
    11 
    12 ConfluenceConverter requires a library called xmlread that can be found at the
    13 following location:
    14 
    15 http://hgweb.boddie.org.uk/xmlread
    16 
    17 The xmlread.py file from the xmlread distribution can be copied into the
    18 ConfluenceConverter directory.
    19 
    20 ConfluenceConverter also requires access to the MoinMoin.wikiutil module found
    21 in the MoinMoin distribution.
    22 
    23 The moinsetup program is highly recommended for the installation of page
    24 packages and the management of MoinMoin wiki instances:
    25 
    26 http://moinmo.in/ScriptMarket/moinsetup
    27 
    28 If moinsetup is not being used, the page package installer documentation
    29 should be consulted:
    30 
    31 http://moinmo.in/HelpOnPackageInstaller
    32 
    33 To read Confluence user profiles on live Confluence sites using the
    34 get_profiles.py program, the libxml2dom library is required:
    35 
    36 http://hgweb.boddie.org.uk/libxml2dom
    37 
    38 MoinMoin Prerequisites
    39 ----------------------
    40 
    41 The page package installer does not preserve user information when installing
    42 page revisions. This can be modified by applying a patch to MoinMoin as
    43 follows while at the top level of the MoinMoin source distribution:
    44 
    45 patch -p1 $CCDIR/patches/patch-moin-1.9-MoinMoin-packages.diff
    46 
    47 Here, CCDIR is the path to the top level of this source distribution where
    48 this README.txt file is found.
    49 
    50 Wiki Content Prerequisites
    51 --------------------------
    52 
    53 For the output of the converter, the following MoinMoin extensions are
    54 required:
    55 
    56 http://moinmo.in/ParserMarket/ImprovedTableParser
    57 http://hgweb.boddie.org.uk/MoinSupport
    58 http://moinmo.in/MacroMarket/Color2
    59 
    60 Quick Start
    61 -----------
    62 
    63 Given an XML export archive file for a Confluence wiki instance (in the
    64 example below, the file is called COM-123456-789012.zip), the following
    65 command can be used to prepare a page package for MoinMoin:
    66 
    67 python convert.py COM-123456-789012.zip COM
    68 
    69 In addition to the filename, a workspace name is required. Confluence appears
    70 to require a workspace as a container for collections of pages, but this also
    71 permits us to selectively import parts of a wiki into MoinMoin. If attachments
    72 were included in the export from Confluence, these will be imported into the
    73 page package.
    74 
    75 The result of the above command will be a directory having the same name as
    76 the chosen workspace, together with a zip archive for that directory's
    77 contents. Thus, the above command would produce a directory called COM and an
    78 archive called COM.zip.
    79 
    80 To import the result, use moinsetup as follows:
    81 
    82 python moinsetup.py -m install_page_package COM.zip
    83 
    84 This requires a suitable moinsetup.cfg file in the working directory.
    85 
    86 Mappings from Identifiers to Pages
    87 ----------------------------------
    88 
    89 Confluence uses numbers to label content revisions, and links to Confluence
    90 sites sometimes use these numbers instead of a readable page name. MoinMoin,
    91 meanwhile, only uses page names and has no external numeric identifier scheme.
    92 Consequently, it is necessary to produce a mapping from Confluence identifiers
    93 to MoinMoin page names. In addition to numeric identifiers, Confluence also
    94 provides "tiny URLs" which are an alphanumeric encoding of the numeric
    95 identifiers.
    96 
    97 To generate mappings for the Confluence content, use the mappings script as
    98 follows:
    99 
   100 tools/mappings.sh COM
   101 
   102 Here, COM is a directory name containing converted Confluence content,
   103 corresponding to a space name in the original Confluence wiki. More than one
   104 space name can be used to generate a complete mapping for a site.
   105 
   106 The following files are generated:
   107 
   108   * mapping-id-to-page.txt
   109   * mapping-tiny-to-id.txt
   110   * mapping-tiny-to-page.txt
   111 
   112 The most useful of these is the first as it includes all the necessary
   113 information provided by the arbitrary mapping from identifiers to page names.
   114 The second mapping merely converts the "tiny URLs" to identifiers, which can
   115 be done by applying an algorithm without any external knowledge of the wiki
   116 structure. The third mapping is provided as a convenience, combining the "tiny
   117 URL" conversion and the arbitrary mapping to page names.
   118 
   119 Translating Requests Using the Mappings
   120 ---------------------------------------
   121 
   122 Where Web server facilities such as RewriteMap are available for use, the
   123 first and third mapping files can be used directly. See the Apache
   124 documentation for details of RewriteMap:
   125 
   126 http://httpd.apache.org/docs/2.4/rewrite/rewritemap.html
   127 
   128 Otherwise, it is more likely that the first file is used by a program that can
   129 perform a redirect to the appropriate wiki page, and the "tiny URL" decoding
   130 is also done by this program when deployed in a suitable location to receive
   131 such requests. To support this, the following resources are provided:
   132 
   133   * scripts/redirect.py
   134   * config/mailmanwiki-redirect
   135 
   136 The latter configuration file should be combined with the Web server
   137 configuration file such that the appropriate aliases are able to capture
   138 requests and invoke the redirect.py script before the main wiki aliases are
   139 consulted. The script itself should be placed in a suitable filesystem
   140 location, and the mapping-id-to-page.txt file should be placed alongside it,
   141 or it should be placed in a different location and the MAPPING_ID_TO_PAGE
   142 variable changed in the script to refer to this different location.
   143 
   144 Identifying and Migrating Users
   145 -------------------------------
   146 
   147 Confluence export archives do not contain user profile information, but page
   148 versions are marked with user identifiers. Therefore, a list of user
   149 identifiers can be obtained by running a script extracting these identifiers.
   150 The following command writes to standard output the users involved with
   151 editing the wiki in four different spaces (exported to four directories):
   152 
   153 tools/users.sh COM DEV DOC SEC
   154 
   155 This output can be edited and then passed to a program which fetches other
   156 profile details as follows:
   157 
   158 tools/users.sh COM DEV DOC SEC > users.txt # for editing
   159 cat users.txt | tools/get_profiles.py http://wiki.list.org/
   160 
   161 If no users are to be removed in migration, the following command could be
   162 issued:
   163 
   164 tools/users.sh COM DEV DOC SEC | tools/get_profiles.py http://wiki.list.org/
   165 
   166 The get_profiles.py program needs to be told the URL of the original
   167 Confluence site. Note that it accesses the site at a default rate of around
   168 one request per second; a different delay between requests can be specified
   169 using an additional argument.
   170 
   171 The output of the get_profiles.py program can be passed to another program
   172 which adds users to MoinMoin, and so the following commands can be used:
   173 
   174   cat users.txt \
   175 | tools/get_profiles.py http://wiki.list.org/ \
   176 | tools/addusers.py wiki
   177 
   178 And using one single command:
   179 
   180   tools/users.sh COM DEV DOC SEC \
   181 | tools/get_profiles.py http://wiki.list.org/ \
   182 | tools/addusers.py wiki
   183 
   184 The addusers.py program needs to be told the directory containing the wiki
   185 configuration.
   186 
   187 Output Structure
   188 ----------------
   189 
   190 The structure of a converted workspace is a directory hierarchy containing the
   191 following directories:
   192 
   193   * pages     (a collection of directories defining each page or content item,
   194                corresponding to Page, Comment and BlogPost elements in the XML
   195                exported from Confluence)
   196 
   197   * versions  (a collection of files, each defining a revision or version of
   198                some content, corresponding to BodyContent elements in the XML
   199                exported from Confluence)
   200 
   201 Each page directory contains the following things:
   202 
   203   * pagetype    (either "Page", "Comment" or "BlogPost")
   204 
   205   * manifest    (a list of version entries in a format similar to the MoinMoin
   206                  page package manifest format)
   207 
   208   * attachments (a list of attachment version entries in a format similar to
   209                  the MoinMoin page package manifest format)
   210 
   211   * pagetitle   (an optional page title imposed on the page by another content
   212                  item)
   213 
   214   * children    (a list of child page names defined for the page)
   215 
   216   * comments    (a list of creation date plus comment page identifier pairs)
   217 
   218 In the output structure, content items such as comments are represented as
   219 pages and each reference a content version. Since comments will ultimately be
   220 represented as subpages of some parent page, they will have a pagetitle file
   221 in their directory with an appropriate subpage name written according to the
   222 parent page's name and comment details.
   223 
   224 Troubleshooting
   225 ---------------
   226 
   227 The page package import activity in particular can be a source of problems.
   228 Generally, any error occurring when attempting to import a package is likely
   229 to be due to insufficient privileges when writing to the pages directory of a
   230 wiki or to its edit-log file.
   231 
   232 The moinsetup software can generate scripts that set the ownership of wiki
   233 files or apply ACLs (access control lists) to those files in order to make
   234 access to wiki data more convenient. Where the ownership of the files must be
   235 set (to www-data or nobody), the import step can be run as that user given
   236 sufficient privileges. However, the easiest solution is to apply ACLs, thus
   237 allowing the user who created the wiki to retain write access to it.
   238 
   239 Contact, Copyright and Licence Information
   240 ------------------------------------------
   241 
   242 The current Web page for ConfluenceConverter at the time of release is:
   243 
   244 http://hgweb.boddie.org.uk/ConfluenceConverter
   245 
   246 Copyright and licence information can be found in the docs directory - see
   247 docs/COPYING.txt and docs/LICENCE.txt for more information.