1 Introduction
2 ------------
3
4 ConfluenceConverter is a distribution of software that converts exported data
5 from Confluence wiki instances, provided in the form of an XML file, to a
6 collection of wiki pages and resources that can be imported into a MoinMoin
7 instance as a page package.
8
9 Migration Activities
10 --------------------
11
12 The following activities are involved in a migration from Confluence to
13 MoinMoin. First, the activities that can be performed from any location:
14
15 * Export of Confluence content
16 * Conversion of Confluence content to MoinMoin content
17 * Confluence page identifier extraction and mapping to MoinMoin identifiers
18 * Acquisition of Confluence user profile details
19
20 Then, the activities that are performed on the server:
21
22 * Installation of MoinMoin
23 * Initialisation of a MoinMoin wiki instance
24 * Import of MoinMoin content into the new wiki instance
25 * Installation of MoinMoin extensions
26 * Initialisation of user profiles in MoinMoin
27 * Installation of scripts and identifier mappings
28 * Filesystem permission adjustments
29
30 Prerequisites
31 -------------
32
33 ConfluenceConverter requires a library called xmlread that can be found at the
34 following location:
35
36 http://hgweb.boddie.org.uk/xmlread
37
38 The xmlread.py file from the xmlread distribution can be copied into the
39 ConfluenceConverter directory.
40
41 ConfluenceConverter also requires access to the MoinMoin.wikiutil module found
42 in the MoinMoin distribution.
43
44 The moinsetup program is highly recommended for the installation of page
45 packages and the management of MoinMoin wiki instances:
46
47 http://moinmo.in/ScriptMarket/moinsetup
48
49 If moinsetup is not being used, the page package installer documentation
50 should be consulted:
51
52 http://moinmo.in/HelpOnPackageInstaller
53
54 To read Confluence user profiles on live Confluence sites using the
55 get_profiles.py program, the libxml2dom library is required:
56
57 http://hgweb.boddie.org.uk/libxml2dom
58
59 MoinMoin Prerequisites
60 ----------------------
61
62 The page package installer does not preserve user information or the last
63 modified time when installing page revisions. This can be modified by applying
64 a patch to MoinMoin as follows while at the top level of the MoinMoin source
65 distribution:
66
67 patch -p1 $CCDIR/patches/patch-moin-1.9-MoinMoin-packages.diff
68
69 Here, CCDIR is the path to the top level of this source distribution where
70 this README.txt file is found.
71
72 When importing users, MoinMoin may be unable to handle user information
73 containing non-ASCII characters. Another patch to solve such problems can be
74 applied to MoinMoin as follows:
75
76 patch -p1 $CCDIR/patches/patch-moin-1.9-MoinMoin-user.diff
77
78 Wiki Content Prerequisites
79 --------------------------
80
81 For the output of the converter, the following MoinMoin extensions are
82 required:
83
84 http://moinmo.in/ParserMarket/ImprovedTableParser
85 http://moinmo.in/ActionMarket/SubpageComments
86 http://moinmo.in/MacroMarket/Color2
87
88 A common dependency of various extensions is provided by MoinSupport:
89
90 http://hgweb.boddie.org.uk/MoinSupport
91
92 Additional Software
93 -------------------
94
95 PDF export support requires the ExportPDF action:
96
97 http://moinmo.in/ActionMarket/ExportPDF
98
99 This in turn requires Apache FOP for PDF production using XSL-FO:
100
101 http://xmlgraphics.apache.org/fop/
102
103 (On Debian systems, the fop package provides this tool.)
104
105 To produce XSL-FO from DocBook output, xsltproc is required from the libxslt
106 distribution:
107
108 http://xmlsoft.org/XSLT/
109
110 (On Debian systems, the xsltproc package provides this tool.)
111
112 And DocBook output requires the DocBook resources to be installed, described
113 in the following guide:
114
115 http://www.sagehill.net/docbookxsl/ToolsSetup.html
116
117 (On Debian systems, the docbook-xsl package provides these resources.)
118
119 Quick Start
120 -----------
121
122 (!) The acquisition of Confluence wiki content and its conversion can be
123 performed from any location, not necessarily on the server.
124
125 To obtain XML export archives from a Confluence wiki instance, the
126 exportspacexml.action resource is visited and the "Export" button selected.
127 For example, for the Mailman Wiki, the appropriate resource (with the COM
128 namespace selected) is as follows:
129
130 http://wiki.list.org/spaces/exportspacexml.action?key=COM
131
132 For your own instance, adjust the above URL accordingly. Alternatively, you
133 can find your way to the export page by selecting a namespace, then choosing
134 "Advanced" from the "Browse" menu, and then choosing "XML Export" from the
135 "Export" sidebar.
136
137 Given an XML export archive file for a Confluence wiki instance (in the
138 example below, the file is called COM-123456-789012.zip), the following
139 command can be used to prepare a page package for MoinMoin:
140
141 python convert.py COM-123456-789012.zip COM
142
143 In addition to the filename, a workspace name is required. Confluence appears
144 to require a workspace as a container for collections of pages, but this also
145 permits us to selectively import parts of a wiki into MoinMoin. If attachments
146 were included in the export from Confluence, these will be imported into the
147 page package.
148
149 The result of the above command will be a directory having the same name as
150 the chosen workspace, together with a zip archive for that directory's
151 contents. Thus, the above command would produce a directory called COM and an
152 archive called COM.zip.
153
154 (!) The following step is performed on the server.
155
156 To import the result (although you may wish to process other namespaces
157 first), use moinsetup as follows:
158
159 python moinsetup.py -m install_page_package COM.zip
160
161 This requires a suitable moinsetup.cfg file in the working directory.
162
163 Importing Many Workspaces/Namespaces
164 ------------------------------------
165
166 Where more than one namespace is to be imported, the page packages should be
167 merged so that the resulting history information is ordered correctly.
168
169 (!) This process can be performed from any location and the result uploaded to
170 the server for eventual import.
171
172 To merge packages, use a command of the following form:
173
174 python merge.py OUT COM.zip DEV.zip DOC.zip SEC.zip
175
176 A directory called OUT and a page package called OUT.zip will be produced. The
177 latter can then be imported into MoinMoin as described above.
178
179 Mappings from Identifiers to Pages
180 ----------------------------------
181
182 Confluence uses numbers to label content revisions, and links to Confluence
183 sites sometimes use these numbers instead of a readable page name. MoinMoin,
184 meanwhile, only uses page names and has no external numeric identifier scheme.
185 Consequently, it is necessary to produce a mapping from Confluence identifiers
186 to MoinMoin page names. In addition to numeric identifiers, Confluence also
187 provides "tiny URLs" which are an alphanumeric encoding of the numeric
188 identifiers.
189
190 (!) This process can be performed with the converted content from any
191 location, with the generated files uploaded to the server for eventual
192 deployment.
193
194 To generate mappings for the Confluence content, use the mappings script as
195 follows:
196
197 tools/mappings.sh COM
198
199 Here, COM is a directory name containing converted Confluence content,
200 corresponding to a space name in the original Confluence wiki. More than one
201 space name can be used to generate a complete mapping for a site. For example:
202
203 tools/mappings.sh COM DEV DOC SEC
204
205 The following files are generated:
206
207 * mapping-id-to-page.txt
208 * mapping-tiny-to-id.txt
209 * mapping-tiny-to-page.txt
210
211 The most useful of these is the first as it includes all the necessary
212 information provided by the arbitrary mapping from identifiers to page names.
213 The second mapping merely converts the "tiny URLs" to identifiers, which can
214 be done by applying an algorithm without any external knowledge of the wiki
215 structure. The third mapping is provided as a convenience, combining the "tiny
216 URL" conversion and the arbitrary mapping to page names.
217
218 Translating Requests Using the Mappings
219 ---------------------------------------
220
221 Where Web server facilities such as RewriteMap are available for use, the
222 first and third mapping files can be used directly. See the Apache
223 documentation for details of RewriteMap:
224
225 http://httpd.apache.org/docs/2.4/rewrite/rewritemap.html
226
227 Otherwise, it is more likely that the first file is used by a program that can
228 perform a redirect to the appropriate wiki page, and the "tiny URL" decoding
229 is also done by this program when deployed in a suitable location to receive
230 such requests. To support this, the following resources are provided:
231
232 * scripts/redirect.py
233 * config/mailmanwiki-redirect
234
235 The latter configuration file should be combined with the Web server
236 configuration file such that the appropriate aliases are able to capture
237 requests and invoke the redirect.py script before the main wiki aliases are
238 consulted. The script itself should be placed in a suitable filesystem
239 location, and the mapping-id-to-page.txt file should be placed alongside it,
240 or it should be placed in a different location and the MAPPING_ID_TO_PAGE
241 variable changed in the script to refer to this different location.
242
243 Supporting Confluence Action URLs
244 ---------------------------------
245
246 Besides the "viewpage" action mapping identifiers to pages (covered by the
247 mapping described above), some other action URLs may be used in wiki content
248 and must either be translated or supported using redirects. Since external
249 sites may also employ such actions, a redirect strategy perhaps makes more
250 sense. To support this, the following resources are involved:
251
252 * scripts/dashboard.py
253 * scripts/redirect.py
254 * scripts/search.py
255 * config/mailmanwiki-redirect
256
257 The latter configuration file is also involved in identifier-to-page mapping,
258 but in this case it causes requests to the "dashboard", "doexportpage" and
259 "dosearchsite" actions to be directed to the dashboard.py, redirect.py and
260 search.py scripts respectively.
261
262 The dashboard.py script merely redirects requests to the root of the site,
263 thus assuming that the front page is configured to show dashboard-like
264 information.
265
266 The redirect.py script, apart from supporting identifier-to-page redirects,
267 also supports attachment downloads and PDF page exports, since both kinds of
268 resource employ identifiers to indicate which page is involved. In an
269 environment that uses .htaccess and mod_rewrite, the redirect.py script should
270 also be deployed under separate names (such as export.py and exportpdf.py) so
271 that it can discover whether it should be exporting a page instead of just
272 showing it.
273
274 The search.py script redirects search requests in a suitable form to the
275 MoinMoin "fullsearch" action.
276
277 Identifying and Migrating Users
278 -------------------------------
279
280 Confluence export archives do not contain user profile information, but page
281 versions are marked with user identifiers. Therefore, a list of user
282 identifiers can be obtained by running a script extracting these identifiers.
283 The following command writes to standard output the users involved with
284 editing the wiki in four different spaces (exported to four directories):
285
286 tools/users.sh COM DEV DOC SEC
287
288 This output can be edited and then passed to a program which fetches other
289 profile details as follows:
290
291 tools/users.sh COM DEV DOC SEC > users.txt
292
293 After editing...
294
295 cat users.txt \
296 | tools/get_profiles.py http://wiki.list.org/ \
297 > profiles.txt
298
299 If no users are to be removed in migration, the following command could be
300 issued:
301
302 tools/users.sh COM DEV DOC SEC \
303 | tools/get_profiles.py http://wiki.list.org/ \
304 > profiles.txt
305
306 The get_profiles.py program needs to be told the URL of the original
307 Confluence site. Note that it accesses the site at a default rate of around
308 one request per second; a different delay between requests can be specified
309 using an additional argument.
310
311 (!) The above steps can be performed from any location, but the command
312 pipelines below need to be run on the server due to the use of a program that
313 updates the deployed wiki.
314
315 The output of the get_profiles.py program can be passed to another program
316 which adds users to MoinMoin, and so the following commands can be used:
317
318 cat profiles.txt \
319 | tools/addusers.py wiki
320
321 Alternatively, the users can be converted to profiles and immediately added
322 without creating a profiles file:
323
324 cat users.txt \
325 | tools/get_profiles.py http://wiki.list.org/ \
326 | tools/addusers.py wiki
327
328 Or just using one single command without inspecting the users or profiles at
329 all:
330
331 tools/users.sh COM DEV DOC SEC \
332 | tools/get_profiles.py http://wiki.list.org/ \
333 | tools/addusers.py wiki
334
335 The addusers.py program needs to be told the directory containing the wiki
336 configuration.
337
338 Output Structure
339 ----------------
340
341 The structure of a converted workspace is a directory hierarchy containing the
342 following directories:
343
344 * pages (a collection of directories defining each page or content item,
345 corresponding to Page, Comment and BlogPost elements in the XML
346 exported from Confluence)
347
348 * versions (a collection of files, each defining a revision or version of
349 some content, corresponding to BodyContent elements in the XML
350 exported from Confluence)
351
352 Each page directory contains the following things:
353
354 * pagetype (either "Page", "Comment" or "BlogPost")
355
356 * manifest (a list of version entries in a format similar to the MoinMoin
357 page package manifest format)
358
359 * attachments (a list of attachment version entries in a format similar to
360 the MoinMoin page package manifest format)
361
362 * pagetitle (an optional page title imposed on the page by another content
363 item)
364
365 * children (a list of child page names defined for the page)
366
367 * comments (a list of creation date plus comment page identifier pairs)
368
369 In the output structure, content items such as comments are represented as
370 pages and each reference a content version. Since comments will ultimately be
371 represented as subpages of some parent page, they will have a pagetitle file
372 in their directory with an appropriate subpage name written according to the
373 parent page's name and comment details.
374
375 Troubleshooting
376 ---------------
377
378 The page package import activity in particular can be a source of problems.
379 Generally, any error occurring when attempting to import a package is likely
380 to be due to insufficient privileges when writing to the pages directory of a
381 wiki or to its edit-log file.
382
383 The moinsetup software can generate scripts that set the ownership of wiki
384 files or apply ACLs (access control lists) to those files in order to make
385 access to wiki data more convenient. Where the ownership of the files must be
386 set (to www-data or nobody), the import step can be run as that user given
387 sufficient privileges. However, the easiest solution is to apply ACLs, thus
388 allowing the user who created the wiki to retain write access to it.
389
390 Contact, Copyright and Licence Information
391 ------------------------------------------
392
393 The current Web page for ConfluenceConverter at the time of release is:
394
395 http://hgweb.boddie.org.uk/ConfluenceConverter
396
397 Copyright and licence information can be found in the docs directory - see
398 docs/COPYING.txt and docs/LICENCE.txt for more information.