paulb@21 | 1 | <?xml version="1.0" encoding="iso-8859-1"?> |
paulb@21 | 2 | <html xmlns="http://www.w3.org/1999/xhtml"> |
paulb@21 | 3 | <head> |
paulb@30 | 4 | <title>libxml2dom</title> |
paulb@21 | 5 | <meta name="generator" content="amaya 8.1a, see http://www.w3.org/Amaya/" /> |
paulb@21 | 6 | <link href="styles.css" rel="stylesheet" type="text/css" /> |
paulb@21 | 7 | </head> |
paulb@21 | 8 | |
paulb@21 | 9 | <body> |
paulb@21 | 10 | <h1>libxml2dom</h1> |
paulb@21 | 11 | |
paulb@21 | 12 | <p>The libxml2dom package provides a traditional DOM wrapper around the |
paulb@66 | 13 | Python bindings for <a href="http://www.xmlsoft.org">libxml2</a>. In contrast |
paulb@66 | 14 | to the libxml2 bindings, libxml2dom provides an API reminiscent of minidom, |
paulb@66 | 15 | pxdom and other Python-based and Python-related XML toolkits. Performance is |
paulb@66 | 16 | disappointing, given the typical high speed of libxml2 processing, but this |
paulb@66 | 17 | is to be expected since large numbers of Python objects are instantiated at |
paulb@66 | 18 | two levels of document tree representation. However, serialisation of |
paulb@66 | 19 | documents is much faster than many other toolkits because it can make direct |
paulb@66 | 20 | use of libxml2.</p> |
paulb@66 | 21 | |
paulb@66 | 22 | <h2>Experiments</h2> |
paulb@66 | 23 | |
paulb@66 | 24 | <p>The main libxml2dom package is relatively slow, even when compared to |
paulb@66 | 25 | Python-only XML toolkits, but previous experiments into source code analysis |
paulb@66 | 26 | suggested that with a slightly altered coding style, programs could be |
paulb@66 | 27 | transformed into a style which utilises the underlying libxml2mod API |
paulb@66 | 28 | directly; this API employs opaque handles which are exposed to Python but |
paulb@66 | 29 | which can only be investigated through the functions in the API. One |
paulb@66 | 30 | significant advantage of accessing the libxml2mod API directly is that the |
paulb@66 | 31 | libxml2 wrapper objects do not need to be instantiated, let alone the |
paulb@66 | 32 | additional libxml2dom wrapper objects, and the consequences are obvious: |
paulb@66 | 33 | reduced memory consumption and improved performance.</p> |
paulb@66 | 34 | |
paulb@66 | 35 | <p>The libxml2macro approach is as follows:</p> |
paulb@66 | 36 | <ul> |
paulb@66 | 37 | <li>Write code using the PyXML-inspired DOM-style API, but giving node |
paulb@66 | 38 | variables and attributes a distinct prefix.</li> |
paulb@66 | 39 | <li>Run the supplied tool <code>libxml2macro.py</code> on the source |
paulb@66 | 40 | file.</li> |
paulb@66 | 41 | <li>Invoke the compiled module directly or import it into programs as |
paulb@66 | 42 | usual.</li> |
paulb@66 | 43 | </ul> |
paulb@66 | 44 | |
paulb@66 | 45 | <p>A description of the process is given in the <code>README.txt</code> file |
paulb@66 | 46 | within the source code distribution. However, what libxml2macro does is to |
paulb@66 | 47 | take code like this...</p> |
paulb@66 | 48 | <pre>for my_node in my_element.childNodes: |
paulb@66 | 49 | if my_node.nodeType == TEXT_NODE: |
paulb@66 | 50 | print my_node.nodeValue</pre> |
paulb@66 | 51 | |
paulb@66 | 52 | <p>...and to transform it into something more or less like this (although in |
paulb@66 | 53 | practice the actual libxml2mod calls are provided in a library, although more |
paulb@66 | 54 | aggressive transformations could result in something actually like this):</p> |
paulb@66 | 55 | <pre>for my_node in libxml2mod.children(my_element): |
paulb@66 | 56 | if libxml2mod.type(my_node) == "text": |
paulb@66 | 57 | print libxml2mod.xmlNodeGetContent(my_node)</pre> |
paulb@66 | 58 | |
paulb@66 | 59 | <p>The result is that developers can still write DOM-style code but not be |
paulb@66 | 60 | penalised for the object-related overhead that such an approach typically |
paulb@66 | 61 | incurs.</p> |
paulb@21 | 62 | |
paulb@21 | 63 | <h2>Copyright and Licence</h2> |
paulb@21 | 64 | |
paulb@21 | 65 | <p>For reasons of consistency, libxml2dom uses the same MIT-style licence as |
paulb@21 | 66 | libxml2. See the file <code>COPYING.txt</code> in the <code>docs</code> |
paulb@21 | 67 | directory within the source code distribution.</p> |
paulb@21 | 68 | |
paulb@54 | 69 | <h2>Installation</h2> |
paulb@54 | 70 | |
paulb@54 | 71 | <p>Given the availability of libxml2, libxml2dom only needs to reside on the |
paulb@54 | 72 | PYTHONPATH and can be installed using the <code>setup.py</code> script |
paulb@54 | 73 | provided:</p> |
paulb@54 | 74 | <pre>python setup.py install</pre> |
paulb@54 | 75 | |
paulb@21 | 76 | <h2>Dependencies and Installation Issues</h2> |
paulb@21 | 77 | |
paulb@21 | 78 | <p>The following descriptions identify dependencies and describe appropriate |
paulb@21 | 79 | installation issues with each dependency:</p> |
paulb@21 | 80 | |
paulb@61 | 81 | <h3>libxml2 2.6.16</h3> |
paulb@21 | 82 | |
paulb@21 | 83 | <p>Building libxml2 from source and configuring the Python bindings can be |
paulb@21 | 84 | done as follows:</p> |
paulb@61 | 85 | <pre>cd libxml2-2.6.16 |
paulb@21 | 86 | ./configure --with-python=/usr/local/bin/python |
paulb@21 | 87 | make</pre> |
paulb@21 | 88 | |
paulb@21 | 89 | <p>If you are to use an installation of Python installed outside |
paulb@21 | 90 | <code>/usr/local</code> then specify the "prefix" accordingly. Install |
paulb@21 | 91 | (possibly as <code>root</code>) in the usual way:</p> |
paulb@21 | 92 | <pre>make install</pre> |
paulb@21 | 93 | |
paulb@61 | 94 | <p>Previous releases of libxml2 in the 2.6 series may work, but some bugs |
paulb@61 | 95 | were observed with the previously recommended 2.6.0 and these may not have |
paulb@61 | 96 | been fixed until 2.6.16 or slightly earlier.</p> |
paulb@61 | 97 | |
paulb@21 | 98 | <h4>Issues</h4> |
paulb@21 | 99 | |
paulb@21 | 100 | <p>The <code>patches</code> directory in the source code distribution |
paulb@21 | 101 | contains a patch against libxml2 2.5.7 which resolves an issue exposed by |
paulb@21 | 102 | libxml2dom. Although it is recommended that later releases of libxml2 are |
paulb@21 | 103 | used instead, the source code distribution of libxml2 2.5.7 can be patched as |
paulb@21 | 104 | follows:</p> |
paulb@21 | 105 | <pre>patch -p0 < libxml2dom/patches/libxml2/libxml.c.diff</pre> |
paulb@21 | 106 | |
paulb@21 | 107 | <p>The command should be run outside/above the <code>libxml2-2.5.7</code> |
paulb@21 | 108 | directory, and the stated path should be adjusted accordingly.</p> |
paulb@21 | 109 | |
paulb@21 | 110 | <h3>Python 2.2</h3> |
paulb@21 | 111 | |
paulb@21 | 112 | <p>Python releases from 2.2 onwards should be compatible with libxml2dom. The |
paulb@21 | 113 | principal requirement from such releases is the new-style class support which |
paulb@21 | 114 | permits the use of properties in the libxml2dom implementation, thus |
paulb@21 | 115 | simplifying the code somewhat.</p> |
paulb@28 | 116 | |
paulb@28 | 117 | <h2>Changelog</h2> |
paulb@28 | 118 | |
paulb@61 | 119 | <h3>New in libxml2dom 0.1.3 (Changes since libxml2dom 0.1.2)</h3> |
paulb@61 | 120 | <ul> |
paulb@61 | 121 | <li>Fixed createElement.</li> |
paulb@66 | 122 | <li>Introduced experimental libxml2macro tools, tests and libraries.</li> |
paulb@61 | 123 | </ul> |
paulb@61 | 124 | |
paulb@36 | 125 | <h3>New in libxml2dom 0.1.2 (Changes since libxml2dom 0.1.1)</h3> |
paulb@36 | 126 | <ul> |
paulb@36 | 127 | <li>Fixed getAttributeNode and getAttributeNodeNS.</li> |
paulb@36 | 128 | <li>Added comment node creation.</li> |
paulb@36 | 129 | <li>Fixed empty namespace usage with elements and attributes.</li> |
paulb@36 | 130 | <li>Introduced usage of the libxml2 file and memory parsing features.</li> |
paulb@36 | 131 | <li>Introduced suppression of DTD retrieval and validation as the default |
paulb@36 | 132 | behaviour.</li> |
paulb@36 | 133 | <li>Added experimental XPath method support.</li> |
paulb@36 | 134 | </ul> |
paulb@36 | 135 | |
paulb@28 | 136 | <h3>New in libxml2dom 0.1.1 (Changes since libxml2dom 0.1)</h3> |
paulb@28 | 137 | <ul> |
paulb@28 | 138 | <li>Fixed text node creation.</li> |
paulb@28 | 139 | <li>Fixed setAttributeNS.</li> |
paulb@28 | 140 | <li>Added encoding parameters to convenience methods.</li> |
paulb@28 | 141 | <li>Added the missing previousSibling property.</li> |
paulb@28 | 142 | <li>Added release number to the package.</li> |
paulb@28 | 143 | </ul> |
paulb@21 | 144 | </body> |
paulb@21 | 145 | </html> |