# HG changeset patch # User Paul Boddie # Date 1692308050 -7200 # Node ID e6e38816ffac9a1de1df1a9f18aefe0805e5b54e # Parent 9f6181276b350eb813f5006090e4963350538e8d Added initial support for document translation. diff -r 9f6181276b35 -r e6e38816ffac moinformat/__init__.py --- a/moinformat/__init__.py Thu Aug 17 23:33:18 2023 +0200 +++ b/moinformat/__init__.py Thu Aug 17 23:34:10 2023 +0200 @@ -3,7 +3,7 @@ """ Moin wiki format tools. -Copyright (C) 2017, 2018, 2019 Paul Boddie +Copyright (C) 2017, 2018, 2019, 2023 Paul Boddie This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software @@ -26,6 +26,7 @@ from moinformat.parsers import get_parser, make_parser, parse from moinformat.serialisers import get_serialiser, make_serialiser, serialise from moinformat.themes import make_theme +from moinformat.translators import get_translator, make_translator, translate from moinformat.utils.copying import copy_attachments import moinformat.errors as errors diff -r 9f6181276b35 -r e6e38816ffac moinformat/metadata.py --- a/moinformat/metadata.py Thu Aug 17 23:33:18 2023 +0200 +++ b/moinformat/metadata.py Thu Aug 17 23:34:10 2023 +0200 @@ -25,6 +25,7 @@ from moinformat.parsers import get_parser, parsers from moinformat.serialisers import get_serialiser, serialisers from moinformat.themes import get_theme +from moinformat.translators import get_translator, translators class Metadata: @@ -44,11 +45,11 @@ effects = { "input_context" : ["input"], - "input_format" : ["parser", "serialiser"], + "input_format" : ["parser", "serialiser", "translator"], "input_separator" : ["input"], "link_format" : ["linker"], "output_context" : ["output"], - "output_format" : ["serialiser"], + "output_format" : ["serialiser", "translator"], "theme_name" : ["theme"], } @@ -221,4 +222,18 @@ return self.make_object("theme", cls) + def get_translator(self, name=None): + + """ + Make a translator using any given 'name' or otherwise using the + "output_format" setting which will be replaced by any given 'name'. + """ + + cls = get_translator(self.get_update("output_format", name), + self.get("input_format")) + + translator = self.make_object("translator", cls) + translator.translators = translators + return translator + # vim: tabstop=4 expandtab shiftwidth=4 diff -r 9f6181276b35 -r e6e38816ffac moinformat/translators/__init__.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/moinformat/translators/__init__.py Thu Aug 17 23:34:10 2023 +0200 @@ -0,0 +1,52 @@ +#!/usr/bin/env python + +""" +Document format translators. + +Copyright (C) 2017, 2018, 2023 Paul Boddie + +This program is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free Software +Foundation; either version 3 of the License, or (at your option) any later +version. + +This program is distributed in the hope that it will be useful, but WITHOUT +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS +FOR A PARTICULAR PURPOSE. See the GNU General Public License for more +details. + +You should have received a copy of the GNU General Public License along with +this program. If not, see . +""" + +from moinformat.translators.manifest import translators + +# Top-level functions. + +def get_translator(name, doctype=None): + + """ + Return a translator class producing nodes whose document type has the given + 'name'. If 'doctype' is indicated, obtain a translator class specific to + that document type. Otherwise, a general translator for Moin content is + obtained. + """ + + return translators["%s.%s" % (name, doctype or "moin")] + +def make_translator(metadata, doctype=None): + + """ + Return a translator instance using the given 'metadata' and optional output + 'doctype'. + """ + + return metadata.get_translator(doctype) + +def translate(doc, translator): + + "Translate 'doc' using the given 'translator' instance." + + return doc.visit(translator) + +# vim: tabstop=4 expandtab shiftwidth=4 diff -r 9f6181276b35 -r e6e38816ffac moinformat/translators/common.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/moinformat/translators/common.py Thu Aug 17 23:34:10 2023 +0200 @@ -0,0 +1,75 @@ +#!/usr/bin/env python + +""" +Document translator support. + +Copyright (C) 2017, 2018, 2019, 2021, 2023 Paul Boddie + +This program is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free Software +Foundation; either version 3 of the License, or (at your option) any later +version. + +This program is distributed in the hope that it will be useful, but WITHOUT +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS +FOR A PARTICULAR PURPOSE. See the GNU General Public License for more +details. + +You should have received a copy of the GNU General Public License along with +this program. If not, see . +""" + +class Translator: + + "General translator support." + + input_formats = None # defined by subclasses + formats = None # defined by subclasses + + def __init__(self, metadata): + + """ + Initialise the translator with the given 'metadata'. + """ + + self.metadata = metadata + + # Initialisation of any other state. + + self.init() + + def init(self): + + "Initialisation method to be overridden by subclasses." + + pass + + def __repr__(self): + return "%s(%r)" % (self.__class__.__name__, self.metadata) + + # Translation visitor methods. + + def visit(self, node): + + """ + Visit the 'node', invoking the appropriate serialisation handler, and + returning the result of the handler. + """ + + return node.visit(self) + + def container(self, container): + + "Visit all nodes in 'container', returning a list of translated nodes." + + nodes = [] + + if container.nodes: + for node in container.nodes: + n = self.visit(node) + if n: + nodes.append(n) + + return nodes + +# vim: tabstop=4 expandtab shiftwidth=4 diff -r 9f6181276b35 -r e6e38816ffac moinformat/translators/manifest.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/moinformat/translators/manifest.py Thu Aug 17 23:34:10 2023 +0200 @@ -0,0 +1,48 @@ +#!/usr/bin/env python + +""" +Document format translator manifest. + +Copyright (C) 2017, 2018, 2021, 2023 Paul Boddie + +This program is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free Software +Foundation; either version 3 of the License, or (at your option) any later +version. + +This program is distributed in the hope that it will be useful, but WITHOUT +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS +FOR A PARTICULAR PURPOSE. See the GNU General Public License for more +details. + +You should have received a copy of the GNU General Public License along with +this program. If not, see . +""" + +from moinformat.imports import get_extensions, get_mapping, get_modules + +# Define an attribute mapping names to modules. + +modules = get_modules(__file__, __name__) + +# Obtain all translators. + +# Use module paths to register the handlers: +# output_format.input_format -> translator + +def get_formats(n, m): + + """ + Given module name 'n', inspect the translator in module 'm', returning a + list of format names. + """ + + l = [] + for output_format in m.translator.formats: + for input_format in m.translator.input_formats: + l.append("%s.%s" % (output_format, input_format)) + return l + +translators = get_mapping(modules, get_formats, lambda m: m.translator) + +# vim: tabstop=4 expandtab shiftwidth=4 diff -r 9f6181276b35 -r e6e38816ffac moinformat/translators/moin/__init__.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/moinformat/translators/moin/__init__.py Thu Aug 17 23:34:10 2023 +0200 @@ -0,0 +1,22 @@ +#!/usr/bin/env python + +""" +A package of translators producing Moin documents. + +Copyright (C) 2023 Paul Boddie + +This program is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free Software +Foundation; either version 3 of the License, or (at your option) any later +version. + +This program is distributed in the hope that it will be useful, but WITHOUT +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS +FOR A PARTICULAR PURPOSE. See the GNU General Public License for more +details. + +You should have received a copy of the GNU General Public License along with +this program. If not, see . +""" + +# vim: tabstop=4 expandtab shiftwidth=4 diff -r 9f6181276b35 -r e6e38816ffac moinformat/translators/moin/html.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/moinformat/translators/moin/html.py Thu Aug 17 23:34:10 2023 +0200 @@ -0,0 +1,99 @@ +#!/usr/bin/env python + +""" +HTML-to-Moin translator. + +Copyright (C) 2023 Paul Boddie + +This program is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free Software +Foundation; either version 3 of the License, or (at your option) any later +version. + +This program is distributed in the hope that it will be useful, but WITHOUT +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS +FOR A PARTICULAR PURPOSE. See the GNU General Public License for more +details. + +You should have received a copy of the GNU General Public License along with +this program. If not, see . +""" + +from moinformat.translators.common import Translator +from moinformat.tree.moin import Block, Heading, Region, Text + +def int_or_default(s, default): + if not s: + return default + try: + return int(s) + except ValueError: + return default + +class HTMLToMoinTranslator(Translator): + + "Translation of HTML document nodes to Moin document nodes." + + input_formats = ["html"] + formats = ["moin"] + + def _get_attribute(self, element, name): + for attribute in element.attributes: + if attribute.name == name: + return attribute.value and attribute.value.value + return None + + def _get_class_values(self, element): + class_value = self._get_attribute(element, "class") + if not class_value: + return {} + + d = {} + for token in class_value.split(): + if token and token.startswith("region-"): + _region, name, value = token.split("-", 2) + d[name] = value + return d + + def element(self, element): + if not element.name: + return None + elif element.name[0] == "h" and element.name[1:].isdigit(): + return Heading(self.container(element), int(element.name[1:]), + start_pad=" ", end_pad=" ", end_extra="\n", + identifier=self._get_attribute(element, "id")) + elif element.name == "p": + return Block(self.container(element)) + elif element.name == "span": + d = self._get_class_values(element) + if d.has_key("type"): + return Region(self.container(element), + int_or_default(d.get("level"), 0), + int_or_default(d.get("indent"), 0), + d.get("type"), + extra="\n") + else: + return Block(self.container(element)) + else: + return None + + def fragment(self, fragment): + return self.container(fragment) + + def text(self, text): + return Text(text.value) + + # Some nodes are not directly translated. + + def node(self, node): + return None + + attribute = node + attribute_value = node + comment = node + directive = node + inclusion = node + +translator = HTMLToMoinTranslator + +# vim: tabstop=4 expandtab shiftwidth=4