1 Introduction
2 ------------
3
4 ConfluenceConverter is a distribution of software that converts exported data
5 from Confluence wiki instances, provided in the form of an XML file, to a
6 collection of wiki pages and resources that can be imported into a MoinMoin
7 instance as a page package.
8
9 Prerequisites
10 -------------
11
12 ConfluenceConverter requires a library called xmlread that can be found at the
13 following location:
14
15 http://hgweb.boddie.org.uk/xmlread
16
17 The xmlread.py file from the xmlread distribution can be copied into the
18 ConfluenceConverter directory.
19
20 ConfluenceConverter also requires access to the MoinMoin.wikiutil module found
21 in the MoinMoin distribution.
22
23 The moinsetup program is highly recommended for the installation of page
24 packages and the management of MoinMoin wiki instances:
25
26 http://moinmo.in/ScriptMarket/moinsetup
27
28 If moinsetup is not being used, the page package installer documentation
29 should be consulted:
30
31 http://moinmo.in/HelpOnPackageInstaller
32
33 To read Confluence user profiles on live Confluence sites using the
34 get_profiles.py program, the libxml2dom library is required:
35
36 http://hgweb.boddie.org.uk/libxml2dom
37
38 MoinMoin Prerequisites
39 ----------------------
40
41 The page package installer does not preserve user information when installing
42 page revisions. This can be modified by applying a patch to MoinMoin as
43 follows while at the top level of the MoinMoin source distribution:
44
45 patch -p1 $CCDIR/patches/patch-moin-1.9-MoinMoin-packages.diff
46
47 Here, CCDIR is the path to the top level of this source distribution where
48 this README.txt file is found.
49
50 Wiki Content Prerequisites
51 --------------------------
52
53 For the output of the converter, the following MoinMoin extensions are
54 required:
55
56 http://moinmo.in/ParserMarket/ImprovedTableParser
57 http://hgweb.boddie.org.uk/MoinSupport
58 http://moinmo.in/MacroMarket/Color2
59
60 In addition, extensions are provided in this distribution to support various
61 Confluence features, notably comments on pages. These extensions are installed
62 as follows:
63
64 python moinsetup.py -m install_actions $CCDIR/actions
65 python moinsetup.py -m install_macros $CCDIR/macros
66 python moinsetup.py -m install_theme_resources $CCDIR
67 python moinsetup.py -m edit_theme_stylesheet screen.css includecomments.css
68 python moinsetup.py -m edit_theme_stylesheet print.css includecomments.css
69
70 Additional Software
71 -------------------
72
73 PDF export support requires the ExportPDF action:
74
75 http://moinmo.in/ActionMarket/ExportPDF
76
77 This in turn requires Apache FOP for PDF production using XSL-FO:
78
79 http://xmlgraphics.apache.org/fop/
80
81 (On Debian systems, the fop package provides this tool.)
82
83 To produce XSL-FO from DocBook output, xsltproc is required from the libxslt
84 distribution:
85
86 http://xmlsoft.org/XSLT/
87
88 (On Debian systems, the xsltproc package provides this tool.)
89
90 And DocBook output requires the DocBook resources to be installed, described
91 in the following guide:
92
93 http://www.sagehill.net/docbookxsl/ToolsSetup.html
94
95 (On Debian systems, the docbook-xsl package provides these resources.)
96
97 Quick Start
98 -----------
99
100 Given an XML export archive file for a Confluence wiki instance (in the
101 example below, the file is called COM-123456-789012.zip), the following
102 command can be used to prepare a page package for MoinMoin:
103
104 python convert.py COM-123456-789012.zip COM
105
106 In addition to the filename, a workspace name is required. Confluence appears
107 to require a workspace as a container for collections of pages, but this also
108 permits us to selectively import parts of a wiki into MoinMoin. If attachments
109 were included in the export from Confluence, these will be imported into the
110 page package.
111
112 The result of the above command will be a directory having the same name as
113 the chosen workspace, together with a zip archive for that directory's
114 contents. Thus, the above command would produce a directory called COM and an
115 archive called COM.zip.
116
117 To import the result, use moinsetup as follows:
118
119 python moinsetup.py -m install_page_package COM.zip
120
121 This requires a suitable moinsetup.cfg file in the working directory.
122
123 Mappings from Identifiers to Pages
124 ----------------------------------
125
126 Confluence uses numbers to label content revisions, and links to Confluence
127 sites sometimes use these numbers instead of a readable page name. MoinMoin,
128 meanwhile, only uses page names and has no external numeric identifier scheme.
129 Consequently, it is necessary to produce a mapping from Confluence identifiers
130 to MoinMoin page names. In addition to numeric identifiers, Confluence also
131 provides "tiny URLs" which are an alphanumeric encoding of the numeric
132 identifiers.
133
134 To generate mappings for the Confluence content, use the mappings script as
135 follows:
136
137 tools/mappings.sh COM
138
139 Here, COM is a directory name containing converted Confluence content,
140 corresponding to a space name in the original Confluence wiki. More than one
141 space name can be used to generate a complete mapping for a site.
142
143 The following files are generated:
144
145 * mapping-id-to-page.txt
146 * mapping-tiny-to-id.txt
147 * mapping-tiny-to-page.txt
148
149 The most useful of these is the first as it includes all the necessary
150 information provided by the arbitrary mapping from identifiers to page names.
151 The second mapping merely converts the "tiny URLs" to identifiers, which can
152 be done by applying an algorithm without any external knowledge of the wiki
153 structure. The third mapping is provided as a convenience, combining the "tiny
154 URL" conversion and the arbitrary mapping to page names.
155
156 Translating Requests Using the Mappings
157 ---------------------------------------
158
159 Where Web server facilities such as RewriteMap are available for use, the
160 first and third mapping files can be used directly. See the Apache
161 documentation for details of RewriteMap:
162
163 http://httpd.apache.org/docs/2.4/rewrite/rewritemap.html
164
165 Otherwise, it is more likely that the first file is used by a program that can
166 perform a redirect to the appropriate wiki page, and the "tiny URL" decoding
167 is also done by this program when deployed in a suitable location to receive
168 such requests. To support this, the following resources are provided:
169
170 * scripts/redirect.py
171 * config/mailmanwiki-redirect
172
173 The latter configuration file should be combined with the Web server
174 configuration file such that the appropriate aliases are able to capture
175 requests and invoke the redirect.py script before the main wiki aliases are
176 consulted. The script itself should be placed in a suitable filesystem
177 location, and the mapping-id-to-page.txt file should be placed alongside it,
178 or it should be placed in a different location and the MAPPING_ID_TO_PAGE
179 variable changed in the script to refer to this different location.
180
181 Supporting Confluence Action URLs
182 ---------------------------------
183
184 Besides the "viewpage" action mapping identifiers to pages (covered by the
185 mapping described above), some other action URLs may be used in wiki content
186 and must either be translated or supported using redirects. Since external
187 sites may also employ such actions, a redirect strategy perhaps makes more
188 sense. To support this, the following resources are involved:
189
190 * scripts/dashboard.py
191 * scripts/redirect.py
192 * scripts/search.py
193 * config/mailmanwiki-redirect
194
195 The latter configuration file is also involved in identifier-to-page mapping,
196 but in this case it causes requests to the "dashboard", "doexportpage" and
197 "dosearchsite" actions to be directed to the dashboard.py, redirect.py and
198 search.py scripts respectively.
199
200 The dashboard.py script merely redirects requests to the root of the site,
201 thus assuming that the front page is configured to show dashboard-like
202 information.
203
204 The redirect.py script, apart from supporting identifier-to-page redirects,
205 also supports PDF page exports since the "doexportpage" action uses
206 identifiers to indicate which page is to be exported.
207
208 The search.py script redirects search requests in a suitable form to the
209 MoinMoin "fullsearch" action.
210
211 Identifying and Migrating Users
212 -------------------------------
213
214 Confluence export archives do not contain user profile information, but page
215 versions are marked with user identifiers. Therefore, a list of user
216 identifiers can be obtained by running a script extracting these identifiers.
217 The following command writes to standard output the users involved with
218 editing the wiki in four different spaces (exported to four directories):
219
220 tools/users.sh COM DEV DOC SEC
221
222 This output can be edited and then passed to a program which fetches other
223 profile details as follows:
224
225 tools/users.sh COM DEV DOC SEC > users.txt # for editing
226 cat users.txt | tools/get_profiles.py http://wiki.list.org/
227
228 If no users are to be removed in migration, the following command could be
229 issued:
230
231 tools/users.sh COM DEV DOC SEC | tools/get_profiles.py http://wiki.list.org/
232
233 The get_profiles.py program needs to be told the URL of the original
234 Confluence site. Note that it accesses the site at a default rate of around
235 one request per second; a different delay between requests can be specified
236 using an additional argument.
237
238 The output of the get_profiles.py program can be passed to another program
239 which adds users to MoinMoin, and so the following commands can be used:
240
241 cat users.txt \
242 | tools/get_profiles.py http://wiki.list.org/ \
243 | tools/addusers.py wiki
244
245 And using one single command:
246
247 tools/users.sh COM DEV DOC SEC \
248 | tools/get_profiles.py http://wiki.list.org/ \
249 | tools/addusers.py wiki
250
251 The addusers.py program needs to be told the directory containing the wiki
252 configuration.
253
254 Output Structure
255 ----------------
256
257 The structure of a converted workspace is a directory hierarchy containing the
258 following directories:
259
260 * pages (a collection of directories defining each page or content item,
261 corresponding to Page, Comment and BlogPost elements in the XML
262 exported from Confluence)
263
264 * versions (a collection of files, each defining a revision or version of
265 some content, corresponding to BodyContent elements in the XML
266 exported from Confluence)
267
268 Each page directory contains the following things:
269
270 * pagetype (either "Page", "Comment" or "BlogPost")
271
272 * manifest (a list of version entries in a format similar to the MoinMoin
273 page package manifest format)
274
275 * attachments (a list of attachment version entries in a format similar to
276 the MoinMoin page package manifest format)
277
278 * pagetitle (an optional page title imposed on the page by another content
279 item)
280
281 * children (a list of child page names defined for the page)
282
283 * comments (a list of creation date plus comment page identifier pairs)
284
285 In the output structure, content items such as comments are represented as
286 pages and each reference a content version. Since comments will ultimately be
287 represented as subpages of some parent page, they will have a pagetitle file
288 in their directory with an appropriate subpage name written according to the
289 parent page's name and comment details.
290
291 Troubleshooting
292 ---------------
293
294 The page package import activity in particular can be a source of problems.
295 Generally, any error occurring when attempting to import a package is likely
296 to be due to insufficient privileges when writing to the pages directory of a
297 wiki or to its edit-log file.
298
299 The moinsetup software can generate scripts that set the ownership of wiki
300 files or apply ACLs (access control lists) to those files in order to make
301 access to wiki data more convenient. Where the ownership of the files must be
302 set (to www-data or nobody), the import step can be run as that user given
303 sufficient privileges. However, the easiest solution is to apply ACLs, thus
304 allowing the user who created the wiki to retain write access to it.
305
306 Contact, Copyright and Licence Information
307 ------------------------------------------
308
309 The current Web page for ConfluenceConverter at the time of release is:
310
311 http://hgweb.boddie.org.uk/ConfluenceConverter
312
313 Copyright and licence information can be found in the docs directory - see
314 docs/COPYING.txt and docs/LICENCE.txt for more information.