# HG changeset patch # User Paul Boddie # Date 1219702624 -7200 # Node ID 86911bef5658e5b83d0da634b849fb48b3bf1dee # Parent d735943d4c3de16bad31f6039f8de87da31649f5 Changed the mechanism employed by getElementById to use the libxml2 built-in support for ID-typed attributes. Added support for validation and for overriding the remote resource usage policy. Added an errors module whose DOMError class is used by exceptions raised when parsing. Updated release notes. Added an invalid SVG document to test validation. diff -r d735943d4c3d -r 86911bef5658 PKG-INFO --- a/PKG-INFO Mon Aug 25 22:30:19 2008 +0200 +++ b/PKG-INFO Tue Aug 26 00:17:04 2008 +0200 @@ -1,19 +1,19 @@ Metadata-Version: 1.1 Name: libxml2dom -Version: 0.4.6 +Version: 0.4.7 Author: Paul Boddie Author-email: paul at boddie org uk Maintainer: Paul Boddie Maintainer-email: paul at boddie org uk Home-page: http://www.boddie.org.uk/python/libxml2dom.html -Download-url: http://www.boddie.org.uk/python/downloads/libxml2dom-0.4.6.tar.gz +Download-url: http://www.boddie.org.uk/python/downloads/libxml2dom-0.4.7.tar.gz Summary: PyXML-style API for the libxml2 Python bindings License: LGPL (version 3 or later) Description: The libxml2dom package provides a traditional DOM wrapper around the Python bindings for libxml2. In contrast to the libxml2 bindings, libxml2dom provides an API reminiscent of minidom, pxdom and other Python-based and Python-related XML toolkits. -Keywords: XML libxml2 SVG XMPP SOAP XPath XInclude +Keywords: XML libxml2 SVG XMPP SOAP XPath XInclude Events validation validator Platform: Any Classifier: Development Status :: 3 - Alpha Classifier: License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL) diff -r d735943d4c3d -r 86911bef5658 README.txt --- a/README.txt Mon Aug 25 22:30:19 2008 +0200 +++ b/README.txt Tue Aug 26 00:17:04 2008 +0200 @@ -65,6 +65,16 @@ libxml2dom.macrolib implementation, too). A way is needed to get libxml2 to do the node copying itself. +New in libxml2dom 0.4.7 (Changes since libxml2dom 0.4.6) +-------------------------------------------------------- + + * Fixed the ownerElement of attributes created by XPath queries, and in all + other situations involving the implementation's get_node method. + * Replaced the getElementById implementation with one based on libxml2's + own support for finding attributes declared as identifiers. + * Introduced support for validation, together with the libxml2dom.errors + module. + New in libxml2dom 0.4.6 (Changes since libxml2dom 0.4.5) -------------------------------------------------------- diff -r d735943d4c3d -r 86911bef5658 libxml2dom/__init__.py --- a/libxml2dom/__init__.py Mon Aug 25 22:30:19 2008 +0200 +++ b/libxml2dom/__init__.py Tue Aug 26 00:17:04 2008 +0200 @@ -19,7 +19,7 @@ with this program. If not, see . """ -__version__ = "0.4.6" +__version__ = "0.4.7" from libxml2dom.macrolib import * from libxml2dom.macrolib import \ @@ -29,7 +29,6 @@ toString as Node_toString, toStream as Node_toStream, \ toFile as Node_toFile import urllib # for parseURI in HTML mode -import xml.dom # for getElementById # Standard namespaces. @@ -427,12 +426,11 @@ return tmp def getElementById(self, identifier): - nodes = self.xpath(".//*[@xml:id='" + identifier.replace("'", "'") + "']", - namespaces={"xml" : xml.dom.XML_NAMESPACE}) - if nodes: - return nodes[0] + _node = Node_getElementById(self.ownerDocument.as_native_node(), identifier) + if _node is None: + return None else: - return None + return self.impl.get_node(_node, self) def getElementsByTagName(self, tagName): return self.xpath(".//" + tagName) @@ -619,7 +617,7 @@ def createDocument(namespaceURI, localName, doctype): return default_impl.createDocument(namespaceURI, localName, doctype) -def parse(stream_or_string, html=0, htmlencoding=None, unfinished=0, impl=None): +def parse(stream_or_string, html=0, htmlencoding=None, unfinished=0, validate=0, remote=0, impl=None): """ Parse the given 'stream_or_string', where the supplied object can either be @@ -636,6 +634,13 @@ documents will be parsed, even though such documents may be missing content such as closing tags. + If the optional 'validate' parameter is set to a true value, an attempt will + be made to validate the parsed document. + + If the optional 'remote' parameter is set to a true value, references to + remote documents (such as DTDs) will be followed in order to obtain such + documents. + A document object is returned by this function. """ @@ -643,11 +648,13 @@ if hasattr(stream_or_string, "read"): stream = stream_or_string - return parseString(stream.read(), html=html, htmlencoding=htmlencoding, unfinished=unfinished, impl=impl) + return parseString(stream.read(), html=html, htmlencoding=htmlencoding, + unfinished=unfinished, validate=validate, remote=remote, impl=impl) else: - return parseFile(stream_or_string, html=html, htmlencoding=htmlencoding, unfinished=unfinished, impl=impl) + return parseFile(stream_or_string, html=html, htmlencoding=htmlencoding, + unfinished=unfinished, validate=validate, remote=remote, impl=impl) -def parseFile(filename, html=0, htmlencoding=None, unfinished=0, impl=None): +def parseFile(filename, html=0, htmlencoding=None, unfinished=0, validate=0, remote=0, impl=None): """ Parse the file having the given 'filename'. The optional parameters @@ -662,13 +669,21 @@ documents will be parsed, even though such documents may be missing content such as closing tags. + If the optional 'validate' parameter is set to a true value, an attempt will + be made to validate the parsed document. + + If the optional 'remote' parameter is set to a true value, references to + remote documents (such as DTDs) will be followed in order to obtain such + documents. + A document object is returned by this function. """ impl = impl or default_impl - return impl.adoptDocument(Node_parseFile(filename, html=html, htmlencoding=htmlencoding, unfinished=unfinished)) + return impl.adoptDocument(Node_parseFile(filename, html=html, htmlencoding=htmlencoding, + unfinished=unfinished, validate=validate, remote=remote)) -def parseString(s, html=0, htmlencoding=None, unfinished=0, impl=None): +def parseString(s, html=0, htmlencoding=None, unfinished=0, validate=0, remote=0, impl=None): """ Parse the content of the given string 's'. The optional parameters described @@ -683,13 +698,21 @@ documents will be parsed, even though such documents may be missing content such as closing tags. + If the optional 'validate' parameter is set to a true value, an attempt will + be made to validate the parsed document. + + If the optional 'remote' parameter is set to a true value, references to + remote documents (such as DTDs) will be followed in order to obtain such + documents. + A document object is returned by this function. """ impl = impl or default_impl - return impl.adoptDocument(Node_parseString(s, html=html, htmlencoding=htmlencoding, unfinished=unfinished)) + return impl.adoptDocument(Node_parseString(s, html=html, htmlencoding=htmlencoding, + unfinished=unfinished, validate=validate, remote=remote)) -def parseURI(uri, html=0, htmlencoding=None, unfinished=0, impl=None): +def parseURI(uri, html=0, htmlencoding=None, unfinished=0, validate=0, remote=0, impl=None): """ Parse the content found at the given 'uri'. The optional parameters @@ -704,6 +727,13 @@ documents will be parsed, even though such documents may be missing content such as closing tags. + If the optional 'validate' parameter is set to a true value, an attempt will + be made to validate the parsed document. + + If the optional 'remote' parameter is set to a true value, references to + remote documents (such as DTDs) will be followed in order to obtain such + documents. + XML documents are retrieved using libxml2's own network capabilities; HTML documents are retrieved using the urllib module provided by Python. To retrieve either kind of document using Python's own modules for this purpose @@ -721,12 +751,14 @@ if html: f = urllib.urlopen(uri) try: - return parse(f, html=html, htmlencoding=htmlencoding, unfinished=unfinished, impl=impl) + return parse(f, html=html, htmlencoding=htmlencoding, unfinished=unfinished, + validate=validate, remote=remote, impl=impl) finally: f.close() else: impl = impl or default_impl - return impl.adoptDocument(Node_parseURI(uri, html=html, htmlencoding=htmlencoding, unfinished=unfinished)) + return impl.adoptDocument(Node_parseURI(uri, html=html, htmlencoding=htmlencoding, + unfinished=unfinished, validate=validate, remote=remote)) def toString(node, encoding=None, prettyprint=0): diff -r d735943d4c3d -r 86911bef5658 libxml2dom/errors.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/libxml2dom/errors.py Tue Aug 26 00:17:04 2008 +0200 @@ -0,0 +1,45 @@ +#!/usr/bin/env python + +""" +Errors for DOM Level 3. +See: http://www.w3.org/TR/DOM-Level-3-Core/core.html#ERROR-Interfaces-DOMError + +Copyright (C) 2008 Paul Boddie + +This program is free software; you can redistribute it and/or modify it under +the terms of the GNU Lesser General Public License as published by the Free +Software Foundation; either version 3 of the License, or (at your option) any +later version. + +This program is distributed in the hope that it will be useful, but WITHOUT +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS +FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more +details. + +You should have received a copy of the GNU Lesser General Public License along +with this program. If not, see . +""" + +class DOMError: + + "DOM Level 3 Core exception." + + SEVERITY_WARNING = 1 + SEVERITY_ERROR = 2 + SEVERITY_FATAL_ERROR = 3 + + def __init__(self, severity=None, message=None, type=None, relatedException=None, relatedData=None, location=None): + self.severity = severity + self.message = message + self.type = type + self.relatedException = relatedException + self.relatedData = relatedData + self.location = location + + def __repr__(self): + return "DOMError(%d, %r, %r)" % (self.severity, self.message, self.type) + + def __str__(self): + return repr(self) + +# vim: tabstop=4 expandtab shiftwidth=4 diff -r d735943d4c3d -r 86911bef5658 libxml2dom/macrolib/__init__.py --- a/libxml2dom/macrolib/__init__.py Mon Aug 25 22:30:19 2008 +0200 +++ b/libxml2dom/macrolib/__init__.py Tue Aug 26 00:17:04 2008 +0200 @@ -19,7 +19,7 @@ with this program. If not, see . """ -__version__ = "0.4.6" +__version__ = "0.4.7" # Expose all functions here. diff -r d735943d4c3d -r 86911bef5658 libxml2dom/macrolib/macrolib.py --- a/libxml2dom/macrolib/macrolib.py Mon Aug 25 22:30:19 2008 +0200 +++ b/libxml2dom/macrolib/macrolib.py Tue Aug 26 00:17:04 2008 +0200 @@ -20,6 +20,7 @@ """ import xml.dom +from libxml2dom.errors import DOMError # Try the conventional import first. @@ -488,6 +489,9 @@ "Node type '%s' (%d) not supported." % (_reverseNodeTypes[other.nodeType], other.nodeType) ) +def Node_getElementById(doc, identifier): + return libxml2mod.xmlGetID(doc, identifier) + def Node_xpath(node, expr, variables=None, namespaces=None): expr = from_unicode(expr) @@ -523,16 +527,17 @@ SERIALIZE_ERR = 82 def __repr__(self): - return str(self) + exctype, excdata = self.args[0:2] + return "LSException(%d, %r)" % (exctype, excdata) def __str__(self): - exctype = self.args[0] + exctype, excdata = self.args[0:2] if exctype == self.PARSE_ERR: - return "Parse error: LSException(%d)" % exctype + return "Parse error: %r" % self elif exctype == self.SERIALIZE_ERR: - return "Serialize error: LSException(%d)" % exctype + return "Serialize error: %r" % self else: - return Exception.__repr__(self) + return repr(self) class XIncludeException(Exception): @@ -560,57 +565,52 @@ else: return parseFile(stream_or_string, html=html, htmlencoding=htmlencoding, unfinished=unfinished) -def parseFile(s, html=0, htmlencoding=None, unfinished=0): - # NOTE: Switching off validation and remote DTD resolution. +def parseFile(s, html=0, htmlencoding=None, unfinished=0, validate=0, remote=0): if not html: context = libxml2mod.xmlCreateFileParserCtxt(s) - if context is None: - raise LSException(LSException.PARSE_ERR) - Parser_configure(context) - Parser_parse(context) - doc = Parser_document(context) - if unfinished or Parser_well_formed(context): - return doc - else: - raise LSException(LSException.PARSE_ERR) + return _parseXML(context, unfinished, validate, remote) else: - return libxml2mod.htmlReadFile(s, htmlencoding, HTML_PARSE_NOERROR | HTML_PARSE_NOWARNING | HTML_PARSE_NONET) + return libxml2mod.htmlReadFile(s, htmlencoding, + HTML_PARSE_NOERROR | HTML_PARSE_NOWARNING | html_net_flag(remote)) -def parseString(s, html=0, htmlencoding=None, unfinished=0): - # NOTE: Switching off validation and remote DTD resolution. +def parseString(s, html=0, htmlencoding=None, unfinished=0, validate=0, remote=0): if not html: context = libxml2mod.xmlCreateMemoryParserCtxt(s, len(s)) - if context is None: - raise LSException(LSException.PARSE_ERR) - Parser_configure(context) - Parser_parse(context) - doc = Parser_document(context) - if unfinished or Parser_well_formed(context): - return doc - else: - raise LSException(LSException.PARSE_ERR) + return _parseXML(context, unfinished, validate, remote) else: # NOTE: URL given as None. html_url = None return libxml2mod.htmlReadMemory(s, len(s), html_url, htmlencoding, - HTML_PARSE_NOERROR | HTML_PARSE_NOWARNING | HTML_PARSE_NONET) + HTML_PARSE_NOERROR | HTML_PARSE_NOWARNING | html_net_flag(remote)) -def parseURI(uri, html=0, htmlencoding=None, unfinished=0): - # NOTE: Switching off validation and remote DTD resolution. +def parseURI(uri, html=0, htmlencoding=None, unfinished=0, validate=0, remote=0): if not html: context = libxml2mod.xmlCreateURLParserCtxt(uri, 0) - if context is None: - raise LSException(LSException.PARSE_ERR) - Parser_configure(context) - Parser_parse(context) - doc = Parser_document(context) - if unfinished or Parser_well_formed(context): - return doc - else: - raise LSException(LSException.PARSE_ERR) + return _parseXML(context, unfinished, validate, remote) else: raise NotImplementedError, "parseURI does not yet support HTML" +def _parseXML(context, unfinished, validate, remote): + if context is None: + raise LSException(LSException.PARSE_ERR, DOMError(DOMError.SEVERITY_FATAL_ERROR)) + + Parser_configure(context, validate, remote) + Parser_parse(context) + doc = Parser_document(context) + + if validate and not Parser_valid(context): + + # NOTE: May not be the correct exception. + + raise LSException(LSException.PARSE_ERR, + DOMError(DOMError.SEVERITY_FATAL_ERROR, "Document did not validate")) + + elif unfinished or Parser_well_formed(context): + return doc + else: + raise LSException(LSException.PARSE_ERR, + DOMError(DOMError.SEVERITY_FATAL_ERROR, "Document not well-formed")) + def toString(node, encoding=None, prettyprint=0): return libxml2mod.serializeNode(node, encoding, prettyprint) @@ -625,17 +625,37 @@ HTML_PARSE_NOERROR = 32 HTML_PARSE_NOWARNING = 64 HTML_PARSE_NONET = 2048 +XML_PARSE_DTDVALID = 16 XML_PARSE_NOERROR = 32 XML_PARSE_NOWARNING = 64 XML_PARSE_NONET = 2048 +def html_net_flag(remote): + if remote: + return 0 + else: + return HTML_PARSE_NONET + +def xml_net_flag(remote): + if remote: + return 0 + else: + return XML_PARSE_NONET + +def xml_validate_flag(validate): + if validate: + return XML_PARSE_DTDVALID + else: + return 0 + def Parser_push(): return libxml2mod.xmlCreatePushParser(None, "", 0, None) -def Parser_configure(context): +def Parser_configure(context, validate, remote): libxml2mod.xmlParserSetPedantic(context, 0) - libxml2mod.xmlParserSetValidate(context, 0) - libxml2mod.xmlCtxtUseOptions(context, XML_PARSE_NOERROR | XML_PARSE_NOWARNING | XML_PARSE_NONET) + #libxml2mod.xmlParserSetValidate(context, validate) + libxml2mod.xmlCtxtUseOptions(context, + XML_PARSE_NOERROR | XML_PARSE_NOWARNING | xml_net_flag(remote) | xml_validate_flag(validate)) def Parser_feed(context, s): libxml2mod.xmlParseChunk(context, s, len(s), 1) @@ -643,6 +663,9 @@ def Parser_well_formed(context): return libxml2mod.xmlParserGetWellFormed(context) +def Parser_valid(context): + return libxml2mod.xmlParserGetIsValid(context) + def Parser_document(context): return libxml2mod.xmlParserGetDoc(context) diff -r d735943d4c3d -r 86911bef5658 packages/debian-etch/python-libxml2dom/debian/changelog --- a/packages/debian-etch/python-libxml2dom/debian/changelog Mon Aug 25 22:30:19 2008 +0200 +++ b/packages/debian-etch/python-libxml2dom/debian/changelog Tue Aug 26 00:17:04 2008 +0200 @@ -1,3 +1,16 @@ +libxml2dom (0.4.7-0ubuntu1) stable; urgency=low + + * Fixed the ownerElement of attributes created by XPath + queries, and in all other situations involving the + implementation's get_node method. + * Replaced the getElementById implementation with one + based on libxml2's own support for finding attributes + declared as identifiers. + * Introduced support for validation, together with the + libxml2dom.errors module. + + -- Paul Boddie Tue, 26 Aug 2008 00:12:15 +0200 + libxml2dom (0.4.6-0ubuntu1) stable; urgency=low * Exposed the libxml2 support for processing XInclude declarations. diff -r d735943d4c3d -r 86911bef5658 packages/debian-sarge/python2.3-libxml2dom/debian/changelog --- a/packages/debian-sarge/python2.3-libxml2dom/debian/changelog Mon Aug 25 22:30:19 2008 +0200 +++ b/packages/debian-sarge/python2.3-libxml2dom/debian/changelog Tue Aug 26 00:17:04 2008 +0200 @@ -1,3 +1,16 @@ +libxml2dom (0.4.7-0ubuntu1) stable; urgency=low + + * Fixed the ownerElement of attributes created by XPath + queries, and in all other situations involving the + implementation's get_node method. + * Replaced the getElementById implementation with one + based on libxml2's own support for finding attributes + declared as identifiers. + * Introduced support for validation, together with the + libxml2dom.errors module. + + -- Paul Boddie Tue, 26 Aug 2008 00:12:33 +0200 + libxml2dom (0.4.6-0ubuntu1) stable; urgency=low * Exposed the libxml2 support for processing XInclude declarations. diff -r d735943d4c3d -r 86911bef5658 packages/ubuntu-feisty/python-libxml2dom/debian/changelog --- a/packages/ubuntu-feisty/python-libxml2dom/debian/changelog Mon Aug 25 22:30:19 2008 +0200 +++ b/packages/ubuntu-feisty/python-libxml2dom/debian/changelog Tue Aug 26 00:17:04 2008 +0200 @@ -1,3 +1,16 @@ +libxml2dom (0.4.7-0ubuntu1) feisty; urgency=low + + * Fixed the ownerElement of attributes created by XPath + queries, and in all other situations involving the + implementation's get_node method. + * Replaced the getElementById implementation with one + based on libxml2's own support for finding attributes + declared as identifiers. + * Introduced support for validation, together with the + libxml2dom.errors module. + + -- Paul Boddie Tue, 26 Aug 2008 00:11:32 +0200 + libxml2dom (0.4.6-0ubuntu1) feisty; urgency=low * Exposed the libxml2 support for processing XInclude declarations. diff -r d735943d4c3d -r 86911bef5658 packages/ubuntu-hoary/python2.4-libxml2dom/debian/changelog --- a/packages/ubuntu-hoary/python2.4-libxml2dom/debian/changelog Mon Aug 25 22:30:19 2008 +0200 +++ b/packages/ubuntu-hoary/python2.4-libxml2dom/debian/changelog Tue Aug 26 00:17:04 2008 +0200 @@ -1,3 +1,16 @@ +libxml2dom (0.4.7-0ubuntu1) hoary; urgency=low + + * Fixed the ownerElement of attributes created by XPath + queries, and in all other situations involving the + implementation's get_node method. + * Replaced the getElementById implementation with one + based on libxml2's own support for finding attributes + declared as identifiers. + * Introduced support for validation, together with the + libxml2dom.errors module. + + -- Paul Boddie Tue, 26 Aug 2008 00:10:37 +0200 + libxml2dom (0.4.6-0ubuntu1) hoary; urgency=low * Exposed the libxml2 support for processing XInclude declarations. diff -r d735943d4c3d -r 86911bef5658 tests/test_svg_invalid.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tests/test_svg_invalid.xml Tue Aug 26 00:17:04 2008 +0200 @@ -0,0 +1,12 @@ + + + + + Oh yeah! + + + + +