1 Introduction
2 ============
3
4 ConfluenceConverter is a distribution of software that converts exported data
5 from Confluence wiki instances, provided in the form of an XML file, to a
6 collection of wiki pages and resources that can be imported into a MoinMoin
7 instance as a page package.
8
9 Migration Activities
10 --------------------
11
12 The following activities are involved in a migration from Confluence to
13 MoinMoin. First, the activities that can be performed from any location:
14
15 * Export of Confluence content
16 * Conversion of Confluence content to MoinMoin content
17 * Confluence page identifier extraction and mapping to MoinMoin identifiers
18 * Acquisition of Confluence user profile details
19
20 Then, the activities that are performed on the server:
21
22 * Installation of MoinMoin
23 * Initialisation of a MoinMoin wiki instance
24 * Import of MoinMoin content into the new wiki instance
25 * Installation of MoinMoin extensions
26 * Initialisation of user profiles in MoinMoin
27 * Installation of scripts and identifier mappings
28 * Filesystem permission adjustments
29
30
31
32 Prerequisites
33 =============
34
35 ConfluenceConverter requires a library called xmlread that can be found at the
36 following location:
37
38 http://hgweb.boddie.org.uk/xmlread
39
40 The xmlread.py file from the xmlread distribution can be copied into the
41 ConfluenceConverter directory.
42
43 ConfluenceConverter also requires access to the MoinMoin.wikiutil module found
44 in the MoinMoin distribution. Setting the PYTHONPATH environment variable to
45 the location of the MoinMoin package should be sufficient for access to this
46 module.
47
48 The moinsetup program is highly recommended for the installation of page
49 packages and the management of MoinMoin wiki instances:
50
51 http://moinmo.in/ScriptMarket/moinsetup
52
53 If moinsetup is not being used, the page package installer documentation
54 should be consulted:
55
56 http://moinmo.in/HelpOnPackageInstaller
57
58 To read Confluence user profiles on live Confluence sites using the
59 get_profiles.py program, the libxml2dom library is required:
60
61 http://hgweb.boddie.org.uk/libxml2dom
62
63 MoinMoin Prerequisites
64 ----------------------
65
66 The page package installer does not preserve user information or the last
67 modified time when installing page revisions. This can be modified by applying
68 a patch to MoinMoin as follows while at the top level of the MoinMoin source
69 distribution:
70
71 patch -p1 $CCDIR/patches/patch-moin-1.9-MoinMoin-packages.diff
72
73 Here, CCDIR is the path to the top level of this source distribution where
74 this README.txt file is found.
75
76 When importing users, MoinMoin may be unable to handle user information
77 containing non-ASCII characters. Another patch to solve such problems can be
78 applied to MoinMoin as follows:
79
80 patch -p1 $CCDIR/patches/patch-moin-1.9-MoinMoin-user.diff
81
82 Wiki Content Prerequisites
83 --------------------------
84
85 For the output of the converter, the following MoinMoin extensions are
86 required:
87
88 http://moinmo.in/ParserMarket/ImprovedTableParser
89 http://moinmo.in/ActionMarket/SubpageComments
90 http://moinmo.in/MacroMarket/Color2
91
92 A common dependency of various extensions is provided by MoinSupport:
93
94 http://hgweb.boddie.org.uk/MoinSupport
95
96
97
98 Additional Software
99 ===================
100
101 PDF export support requires the ExportPDF action:
102
103 http://moinmo.in/ActionMarket/ExportPDF
104
105 This in turn requires Apache FOP for PDF production using XSL-FO:
106
107 http://xmlgraphics.apache.org/fop/
108
109 (On Debian systems, the fop package provides this tool.)
110
111 To produce XSL-FO from DocBook output, xsltproc is required from the libxslt
112 distribution:
113
114 http://xmlsoft.org/XSLT/
115
116 (On Debian systems, the xsltproc package provides this tool.)
117
118 And DocBook output requires the DocBook resources to be installed, described
119 in the following guide:
120
121 http://www.sagehill.net/docbookxsl/ToolsSetup.html
122
123 (On Debian systems, the docbook-xsl package provides these resources.)
124
125
126
127 Quick Start
128 ===========
129
130 (!) The acquisition of Confluence wiki content and its conversion can be
131 performed from any location, not necessarily on the server.
132
133 To obtain XML export archives from a Confluence wiki instance, the
134 exportspacexml.action resource is visited and the "Export" button selected.
135 For example, for the Mailman Wiki, the appropriate resource (with the COM
136 namespace selected) is as follows:
137
138 http://wiki.list.org/spaces/exportspacexml.action?key=COM
139
140 For your own instance, adjust the above URL accordingly. Alternatively, you
141 can find your way to the export page by selecting a namespace, then choosing
142 "Advanced" from the "Browse" menu, and then choosing "XML Export" from the
143 "Export" sidebar.
144
145 Given an XML export archive file for a Confluence wiki instance (in the
146 example below, the file is called COM-123456-789012.zip), the following
147 command can be used to prepare a page package for MoinMoin:
148
149 python convert.py COM-123456-789012.zip COM
150
151 In addition to the filename, a workspace name is required. Confluence appears
152 to require a workspace as a container for collections of pages, but this also
153 permits us to selectively import parts of a wiki into MoinMoin. If attachments
154 were included in the export from Confluence, these will be imported into the
155 page package.
156
157 The result of the above command will be a directory having the same name as
158 the chosen workspace, together with a zip archive for that directory's
159 contents. Thus, the above command would produce a directory called COM and an
160 archive called COM.zip.
161
162 (!) The following step is performed on the server.
163
164 To import the result (although you may wish to process other namespaces
165 first), use moinsetup as follows:
166
167 python moinsetup.py -m install_page_package COM.zip
168
169 This requires a suitable moinsetup.cfg file in the working directory.
170
171 Importing Many Workspaces/Namespaces
172 ------------------------------------
173
174 Where more than one namespace is to be imported, the page packages should be
175 merged so that the resulting history information is ordered correctly.
176
177 (!) This process can be performed from any location and the result uploaded to
178 the server for eventual import.
179
180 To merge packages, use a command of the following form:
181
182 python merge.py OUT COM.zip DEV.zip DOC.zip SEC.zip
183
184 A directory called OUT and a page package called OUT.zip will be produced. The
185 latter can then be imported into MoinMoin as described above.
186
187 Mappings from Identifiers to Pages
188 ----------------------------------
189
190 Confluence uses numbers to label content revisions, and links to Confluence
191 sites sometimes use these numbers instead of a readable page name. MoinMoin,
192 meanwhile, only uses page names and has no external numeric identifier scheme.
193 Consequently, it is necessary to produce a mapping from Confluence identifiers
194 to MoinMoin page names. In addition to numeric identifiers, Confluence also
195 provides "tiny URLs" which are an alphanumeric encoding of the numeric
196 identifiers.
197
198 (!) This process can be performed with the converted content from any
199 location, with the generated files uploaded to the server for eventual
200 deployment.
201
202 To generate mappings for the Confluence content, use the mappings script as
203 follows:
204
205 tools/mappings.sh COM
206
207 Here, COM is a directory name containing converted Confluence content,
208 corresponding to a space name in the original Confluence wiki. More than one
209 space name can be used to generate a complete mapping for a site. For example:
210
211 tools/mappings.sh COM DEV DOC SEC
212
213 The following files are generated:
214
215 * mapping-id-to-page.txt
216 * mapping-tiny-to-id.txt
217 * mapping-tiny-to-page.txt
218
219 The most useful of these is the first as it includes all the necessary
220 information provided by the arbitrary mapping from identifiers to page names.
221 The second mapping merely converts the "tiny URLs" to identifiers, which can
222 be done by applying an algorithm without any external knowledge of the wiki
223 structure. The third mapping is provided as a convenience, combining the "tiny
224 URL" conversion and the arbitrary mapping to page names.
225
226 Translating Requests Using the Mappings
227 ---------------------------------------
228
229 Where Web server facilities such as RewriteMap are available for use, the
230 first and third mapping files can be used directly. See the Apache
231 documentation for details of RewriteMap:
232
233 http://httpd.apache.org/docs/2.4/rewrite/rewritemap.html
234
235 Otherwise, it is more likely that the first file is used by a program that can
236 perform a redirect to the appropriate wiki page, and the "tiny URL" decoding
237 is also done by this program when deployed in a suitable location to receive
238 such requests. To support this, the following resources are provided:
239
240 * scripts/redirect.py
241 * config/mailmanwiki-redirect
242
243 The latter configuration file should be combined with the Web server
244 configuration file such that the appropriate aliases are able to capture
245 requests and invoke the redirect.py script before the main wiki aliases are
246 consulted. The script itself should be placed in a suitable filesystem
247 location, and the mapping-id-to-page.txt file should be placed alongside it,
248 or it should be placed in a different location and the MAPPING_ID_TO_PAGE
249 variable changed in the script to refer to this different location.
250
251 Supporting Confluence Action URLs
252 ---------------------------------
253
254 Besides the "viewpage" action mapping identifiers to pages (covered by the
255 mapping described above), some other action URLs may be used in wiki content
256 and must either be translated or supported using redirects. Since external
257 sites may also employ such actions, a redirect strategy perhaps makes more
258 sense. To support this, the following resources are involved:
259
260 * scripts/dashboard.py
261 * scripts/redirect.py
262 * scripts/search.py
263 * config/mailmanwiki-redirect
264
265 The latter configuration file is also involved in identifier-to-page mapping,
266 but in this case it causes requests to the "dashboard", "doexportpage" and
267 "dosearchsite" actions to be directed to the dashboard.py, redirect.py and
268 search.py scripts respectively.
269
270 The dashboard.py script merely redirects requests to the root of the site,
271 thus assuming that the front page is configured to show dashboard-like
272 information.
273
274 The redirect.py script, apart from supporting identifier-to-page redirects,
275 also supports attachment downloads and PDF page exports, since both kinds of
276 resource employ identifiers to indicate which page is involved. In an
277 environment that uses .htaccess and mod_rewrite, the redirect.py script should
278 also be deployed under separate names (such as export.py and exportpdf.py) so
279 that it can discover whether it should be exporting a page instead of just
280 showing it.
281
282 The search.py script redirects search requests in a suitable form to the
283 MoinMoin "fullsearch" action.
284
285 Identifying and Migrating Users
286 -------------------------------
287
288 Confluence export archives do not contain user profile information, but page
289 versions are marked with user identifiers. Therefore, a list of user
290 identifiers can be obtained by running a script extracting these identifiers.
291 The following command writes to standard output the users involved with
292 editing the wiki in four different spaces (exported to four directories):
293
294 tools/users.sh COM DEV DOC SEC
295
296 This output can be edited and then passed to a program which fetches other
297 profile details as follows:
298
299 tools/users.sh COM DEV DOC SEC > users.txt
300
301 After editing...
302
303 cat users.txt \
304 | tools/get_profiles.py http://wiki.list.org/ \
305 > profiles.txt
306
307 If no users are to be removed in migration, the following command could be
308 issued:
309
310 tools/users.sh COM DEV DOC SEC \
311 | tools/get_profiles.py http://wiki.list.org/ \
312 > profiles.txt
313
314 The get_profiles.py program needs to be told the URL of the original
315 Confluence site. Note that it accesses the site at a default rate of around
316 one request per second; a different delay between requests can be specified
317 using an additional argument.
318
319 (!) The above steps can be performed from any location, but the command
320 pipelines below need to be run on the server due to the use of a program that
321 updates the deployed wiki.
322
323 The output of the get_profiles.py program can be passed to another program
324 which adds users to MoinMoin, and so the following commands can be used:
325
326 cat profiles.txt \
327 | tools/addusers.py wiki
328
329 Alternatively, the users can be converted to profiles and immediately added
330 without creating a profiles file:
331
332 cat users.txt \
333 | tools/get_profiles.py http://wiki.list.org/ \
334 | tools/addusers.py wiki
335
336 Or just using one single command without inspecting the users or profiles at
337 all:
338
339 tools/users.sh COM DEV DOC SEC \
340 | tools/get_profiles.py http://wiki.list.org/ \
341 | tools/addusers.py wiki
342
343 The addusers.py program needs to be told the directory containing the wiki
344 configuration.
345
346 Output Structure
347 ----------------
348
349 The structure of a converted workspace is a directory hierarchy containing the
350 following directories:
351
352 * pages (a collection of directories defining each page or content item,
353 corresponding to Page, Comment and BlogPost elements in the XML
354 exported from Confluence)
355
356 * versions (a collection of files, each defining a revision or version of
357 some content, corresponding to BodyContent elements in the XML
358 exported from Confluence)
359
360 Each page directory contains the following things:
361
362 * pagetype (either "Page", "Comment" or "BlogPost")
363
364 * manifest (a list of version entries in a format similar to the MoinMoin
365 page package manifest format)
366
367 * attachments (a list of attachment version entries in a format similar to
368 the MoinMoin page package manifest format)
369
370 * pagetitle (an optional page title imposed on the page by another content
371 item)
372
373 * children (a list of child page names defined for the page)
374
375 * comments (a list of creation date plus comment page identifier pairs)
376
377 In the output structure, content items such as comments are represented as
378 pages and each reference a content version. Since comments will ultimately be
379 represented as subpages of some parent page, they will have a pagetitle file
380 in their directory with an appropriate subpage name written according to the
381 parent page's name and comment details.
382
383 Troubleshooting
384 ---------------
385
386 The page package import activity in particular can be a source of problems.
387 Generally, any error occurring when attempting to import a package is likely
388 to be due to insufficient privileges when writing to the pages directory of a
389 wiki or to its edit-log file.
390
391 The moinsetup software can generate scripts that set the ownership of wiki
392 files or apply ACLs (access control lists) to those files in order to make
393 access to wiki data more convenient. Where the ownership of the files must be
394 set (to www-data or nobody), the import step can be run as that user given
395 sufficient privileges. However, the easiest solution is to apply ACLs, thus
396 allowing the user who created the wiki to retain write access to it.
397
398
399
400 Contact, Copyright and Licence Information
401 ==========================================
402
403 The current Web page for ConfluenceConverter at the time of release is:
404
405 http://hgweb.boddie.org.uk/ConfluenceConverter
406
407 Copyright and licence information can be found in the docs directory - see
408 docs/COPYING.txt and docs/LICENCE.txt for more information.
409
410
411
412 Resources
413 =========
414
415 "Confluence Data Model"
416 https://confluence.atlassian.com/doc/confluence-data-model-127369837.html
417
418 "Confluence Storage Format"
419 https://confluence.atlassian.com/doc/confluence-storage-format-790796544.html
420
421 "Confluence Wiki Markup"
422 https://confluence.atlassian.com/doc/confluence-wiki-markup-251003035.html
423
424 "Macros"
425 https://confluence.atlassian.com/doc/macros-139387.html