1 Introduction
2 ------------
3
4 ConfluenceConverter is a distribution of software that converts exported data
5 from Confluence wiki instances, provided in the form of an XML file, to a
6 collection of wiki pages and resources that can be imported into a MoinMoin
7 instance as a page package.
8
9 Prerequisites
10 -------------
11
12 ConfluenceConverter requires a library called xmlread that can be found at the
13 following location:
14
15 http://hgweb.boddie.org.uk/xmlread
16
17 The xmlread.py file from the xmlread distribution can be copied into the
18 ConfluenceConverter directory.
19
20 ConfluenceConverter also requires access to the MoinMoin.wikiutil module found
21 in the MoinMoin distribution.
22
23 The moinsetup program is highly recommended for the installation of page
24 packages and the management of MoinMoin wiki instances:
25
26 http://moinmo.in/ScriptMarket/moinsetup
27
28 If moinsetup is not being used, the page package installer documentation
29 should be consulted:
30
31 http://moinmo.in/HelpOnPackageInstaller
32
33 MoinMoin Prerequisites
34 ----------------------
35
36 The page package installer does not preserve user information when installing
37 page revisions. This can be modified by applying a patch to MoinMoin as
38 follows while at the top level of the MoinMoin source distribution:
39
40 patch -p1 $CCDIR/patches/patch-moin-1.9-MoinMoin-packages.diff
41
42 Here, CCDIR is the path to the top level of this source distribution where
43 this README.txt file is found.
44
45 Wiki Content Prerequisites
46 --------------------------
47
48 For the output of the converter, the following MoinMoin extensions are
49 required:
50
51 http://moinmo.in/ParserMarket/ImprovedTableParser
52 http://hgweb.boddie.org.uk/MoinSupport
53 http://moinmo.in/MacroMarket/Color2
54
55 Quick Start
56 -----------
57
58 Given an XML export archive file for a Confluence wiki instance (in the
59 example below, the file is called COM-123456-789012.zip), the following
60 command can be used to prepare a page package for MoinMoin:
61
62 python convert.py COM-123456-789012.zip COM
63
64 In addition to the filename, a workspace name is required. Confluence appears
65 to require a workspace as a container for collections of pages, but this also
66 permits us to selectively import parts of a wiki into MoinMoin. If attachments
67 were included in the export from Confluence, these will be imported into the
68 page package.
69
70 The result of the above command will be a directory having the same name as
71 the chosen workspace, together with a zip archive for that directory's
72 contents. Thus, the above command would produce a directory called COM and an
73 archive called COM.zip.
74
75 To import the result, use moinsetup as follows:
76
77 python moinsetup.py -m install_page_package COM.zip
78
79 This requires a suitable moinsetup.cfg file in the working directory.
80
81 Mappings from Identifiers to Pages
82 ----------------------------------
83
84 Confluence uses numbers to label content revisions, and links to Confluence
85 sites sometimes use these numbers instead of a readable page name. MoinMoin,
86 meanwhile, only uses page names and has no external numeric identifier scheme.
87 Consequently, it is necessary to produce a mapping from Confluence identifiers
88 to MoinMoin page names. In addition to numeric identifiers, Confluence also
89 provides "tiny URLs" which are an alphanumeric encoding of the numeric
90 identifiers.
91
92 To generate mappings for the Confluence content, use the mappings script as
93 follows:
94
95 tools/mappings COM
96
97 Here, COM is a directory name containing converted Confluence content,
98 corresponding to a space name in the original Confluence wiki. More than one
99 space name can be used to generate a complete mapping for a site.
100
101 The following files are generated:
102
103 * mapping-id-to-page.txt
104 * mapping-tiny-to-id.txt
105 * mapping-tiny-to-page.txt
106
107 The most useful of these is the first as it includes all the necessary
108 information provided by the arbitrary mapping from identifiers to page names.
109 The second mapping merely converts the "tiny URLs" to identifiers, which can
110 be done by applying an algorithm without any external knowledge of the wiki
111 structure. The third mapping is provided as a convenience, combining the "tiny
112 URL" conversion and the arbitrary mapping to page names.
113
114 Where Web server facilities such as RewriteMap are available for use, the
115 first and third mapping files can be used directly. See the Apache
116 documentation for details of RewriteMap:
117
118 http://httpd.apache.org/docs/2.4/rewrite/rewritemap.html
119
120 Otherwise, it is more likely that the first file is used by a program that can
121 perform a redirect to the appropriate wiki page, and the "tiny URL" decoding
122 is also done by this program when deployed in a suitable location to receive
123 such requests.
124
125 Output Structure
126 ----------------
127
128 The structure of a converted workspace is a directory hierarchy containing the
129 following directories:
130
131 * pages (a collection of directories defining each page or content item,
132 corresponding to Page, Comment and BlogPost elements in the XML
133 exported from Confluence)
134
135 * versions (a collection of files, each defining a revision or version of
136 some content, corresponding to BodyContent elements in the XML
137 exported from Confluence)
138
139 Each page directory contains the following things:
140
141 * pagetype (either "Page", "Comment" or "BlogPost")
142
143 * manifest (a list of version entries in a format similar to the MoinMoin
144 page package manifest format)
145
146 * attachments (a list of attachment version entries in a format similar to
147 the MoinMoin page package manifest format)
148
149 * pagetitle (an optional page title imposed on the page by another content
150 item)
151
152 * children (a list of child page names defined for the page)
153
154 * comments (a list of creation date plus comment page identifier pairs)
155
156 In the output structure, content items such as comments are represented as
157 pages and each reference a content version. Since comments will ultimately be
158 represented as subpages of some parent page, they will have a pagetitle file
159 in their directory with an appropriate subpage name written according to the
160 parent page's name and comment details.
161
162 Troubleshooting
163 ---------------
164
165 The page package import activity in particular can be a source of problems.
166 Generally, any error occurring when attempting to import a package is likely
167 to be due to insufficient privileges when writing to the pages directory of a
168 wiki or to its edit-log file.
169
170 The moinsetup software can generate scripts that set the ownership of wiki
171 files or apply ACLs (access control lists) to those files in order to make
172 access to wiki data more convenient. Where the ownership of the files must be
173 set (to www-data or nobody), the import step can be run as that user given
174 sufficient privileges. However, the easiest solution is to apply ACLs, thus
175 allowing the user who created the wiki to retain write access to it.
176
177 Contact, Copyright and Licence Information
178 ------------------------------------------
179
180 The current Web page for ConfluenceConverter at the time of release is:
181
182 http://hgweb.boddie.org.uk/ConfluenceConverter
183
184 Copyright and licence information can be found in the docs directory - see
185 docs/COPYING.txt and docs/LICENCE.txt for more information.