ConfluenceConverter (file README.txt at 914efdc23592)

     1 Introduction
     2 ------------
     3 
     4 ConfluenceConverter is a distribution of software that converts exported data
     5 from Confluence wiki instances, provided in the form of an XML file, to a
     6 collection of wiki pages and resources that can be imported into a MoinMoin
     7 instance as a page package.
     8 
     9 Prerequisites
    10 -------------
    11 
    12 ConfluenceConverter requires a library called xmlread that can be found at the
    13 following location:
    14 
    15 http://hgweb.boddie.org.uk/xmlread
    16 
    17 The xmlread.py file from the xmlread distribution can be copied into the
    18 ConfluenceConverter directory.
    19 
    20 ConfluenceConverter also requires access to the MoinMoin.wikiutil module found
    21 in the MoinMoin distribution.
    22 
    23 The moinsetup program is highly recommended for the installation of page
    24 packages and the management of MoinMoin wiki instances:
    25 
    26 http://moinmo.in/ScriptMarket/moinsetup
    27 
    28 If moinsetup is not being used, the page package installer documentation
    29 should be consulted:
    30 
    31 http://moinmo.in/HelpOnPackageInstaller
    32 
    33 MoinMoin Prerequisites
    34 ----------------------
    35 
    36 The page package installer does not preserve user information when installing
    37 page revisions. This can be modified by applying a patch to MoinMoin as
    38 follows while at the top level of the MoinMoin source distribution:
    39 
    40 patch -p1 $CCDIR/patches/patch-moin-1.9-MoinMoin-packages.diff
    41 
    42 Here, CCDIR is the path to the top level of this source distribution where
    43 this README.txt file is found.
    44 
    45 Wiki Content Prerequisites
    46 --------------------------
    47 
    48 For the output of the converter, the following MoinMoin extensions are
    49 required:
    50 
    51 http://moinmo.in/ParserMarket/ImprovedTableParser
    52 http://hgweb.boddie.org.uk/MoinSupport
    53 http://moinmo.in/MacroMarket/Color2
    54 
    55 Quick Start
    56 -----------
    57 
    58 Given an XML export archive file for a Confluence wiki instance (in the
    59 example below, the file is called COM-123456-789012.zip), the following
    60 command can be used to prepare a page package for MoinMoin:
    61 
    62 python convert.py COM-123456-789012.zip COM
    63 
    64 In addition to the filename, a workspace name is required. Confluence appears
    65 to require a workspace as a container for collections of pages, but this also
    66 permits us to selectively import parts of a wiki into MoinMoin. If attachments
    67 were included in the export from Confluence, these will be imported into the
    68 page package.
    69 
    70 The result of the above command will be a directory having the same name as
    71 the chosen workspace, together with a zip archive for that directory's
    72 contents. Thus, the above command would produce a directory called COM and an
    73 archive called COM.zip.
    74 
    75 To import the result, use moinsetup as follows:
    76 
    77 python moinsetup.py -m install_page_package COM.zip
    78 
    79 This requires a suitable moinsetup.cfg file in the working directory.
    80 
    81 Mappings from Identifiers to Pages
    82 ----------------------------------
    83 
    84 Confluence uses numbers to label content revisions, and links to Confluence
    85 sites sometimes use these numbers instead of a readable page name. MoinMoin,
    86 meanwhile, only uses page names and has no external numeric identifier scheme.
    87 Consequently, it is necessary to produce a mapping from Confluence identifiers
    88 to MoinMoin page names. In addition to numeric identifiers, Confluence also
    89 provides "tiny URLs" which are an alphanumeric encoding of the numeric
    90 identifiers.
    91 
    92 To generate mappings for the Confluence content, use the mappings script as
    93 follows:
    94 
    95 tools/mappings COM
    96 
    97 Here, COM is a directory name containing converted Confluence content,
    98 corresponding to a space name in the original Confluence wiki. More than one
    99 space name can be used to generate a complete mapping for a site.
   100 
   101 The following files are generated:
   102 
   103   * mapping-id-to-page.txt
   104   * mapping-tiny-to-id.txt
   105   * mapping-tiny-to-page.txt
   106 
   107 The most useful of these is the first as it includes all the necessary
   108 information provided by the arbitrary mapping from identifiers to page names.
   109 The second mapping merely converts the "tiny URLs" to identifiers, which can
   110 be done by applying an algorithm without any external knowledge of the wiki
   111 structure. The third mapping is provided as a convenience, combining the "tiny
   112 URL" conversion and the arbitrary mapping to page names.
   113 
   114 Translating Requests Using the Mappings
   115 ---------------------------------------
   116 
   117 Where Web server facilities such as RewriteMap are available for use, the
   118 first and third mapping files can be used directly. See the Apache
   119 documentation for details of RewriteMap:
   120 
   121 http://httpd.apache.org/docs/2.4/rewrite/rewritemap.html
   122 
   123 Otherwise, it is more likely that the first file is used by a program that can
   124 perform a redirect to the appropriate wiki page, and the "tiny URL" decoding
   125 is also done by this program when deployed in a suitable location to receive
   126 such requests. To support this, the following resources are provided:
   127 
   128   * scripts/redirect.py
   129   * config/mailmanwiki-redirect
   130 
   131 The latter configuration file should be combined with the Web server
   132 configuration file such that the appropriate aliases are able to capture
   133 requests and invoke the redirect.py script before the main wiki aliases are
   134 consulted. The script itself should be placed in a suitable filesystem
   135 location, and the mapping-id-to-page.txt file should be placed alongside it,
   136 or it should be placed in a different location and the MAPPING_ID_TO_PAGE
   137 variable changed in the script to refer to this different location.
   138 
   139 Output Structure
   140 ----------------
   141 
   142 The structure of a converted workspace is a directory hierarchy containing the
   143 following directories:
   144 
   145   * pages     (a collection of directories defining each page or content item,
   146                corresponding to Page, Comment and BlogPost elements in the XML
   147                exported from Confluence)
   148 
   149   * versions  (a collection of files, each defining a revision or version of
   150                some content, corresponding to BodyContent elements in the XML
   151                exported from Confluence)
   152 
   153 Each page directory contains the following things:
   154 
   155   * pagetype    (either "Page", "Comment" or "BlogPost")
   156 
   157   * manifest    (a list of version entries in a format similar to the MoinMoin
   158                  page package manifest format)
   159 
   160   * attachments (a list of attachment version entries in a format similar to
   161                  the MoinMoin page package manifest format)
   162 
   163   * pagetitle   (an optional page title imposed on the page by another content
   164                  item)
   165 
   166   * children    (a list of child page names defined for the page)
   167 
   168   * comments    (a list of creation date plus comment page identifier pairs)
   169 
   170 In the output structure, content items such as comments are represented as
   171 pages and each reference a content version. Since comments will ultimately be
   172 represented as subpages of some parent page, they will have a pagetitle file
   173 in their directory with an appropriate subpage name written according to the
   174 parent page's name and comment details.
   175 
   176 Troubleshooting
   177 ---------------
   178 
   179 The page package import activity in particular can be a source of problems.
   180 Generally, any error occurring when attempting to import a package is likely
   181 to be due to insufficient privileges when writing to the pages directory of a
   182 wiki or to its edit-log file.
   183 
   184 The moinsetup software can generate scripts that set the ownership of wiki
   185 files or apply ACLs (access control lists) to those files in order to make
   186 access to wiki data more convenient. Where the ownership of the files must be
   187 set (to www-data or nobody), the import step can be run as that user given
   188 sufficient privileges. However, the easiest solution is to apply ACLs, thus
   189 allowing the user who created the wiki to retain write access to it.
   190 
   191 Contact, Copyright and Licence Information
   192 ------------------------------------------
   193 
   194 The current Web page for ConfluenceConverter at the time of release is:
   195 
   196 http://hgweb.boddie.org.uk/ConfluenceConverter
   197 
   198 Copyright and licence information can be found in the docs directory - see
   199 docs/COPYING.txt and docs/LICENCE.txt for more information.