# HG changeset patch # User Paul Boddie # Date 1481578395 -3600 # Node ID 0538ce057931e1a2de31385c7f8ead0720fe781b # Parent f326b1d00b210f347c6e6aaa7f46a4495911478b Added initial support for Unicode strings based on byte strings. diff -r f326b1d00b21 -r 0538ce057931 lib/__builtins__/__init__.py --- a/lib/__builtins__/__init__.py Mon Dec 12 22:30:09 2016 +0100 +++ b/lib/__builtins__/__init__.py Mon Dec 12 22:33:15 2016 +0100 @@ -60,7 +60,6 @@ ValueError ) - # Classes. from __builtins__.boolean import bool, False, True @@ -77,8 +76,9 @@ from __builtins__.none import None, NoneType from __builtins__.notimplemented import NotImplemented, NotImplementedType from __builtins__.set import frozenset, set -from __builtins__.str import basestring, str, string, unicode +from __builtins__.str import basestring, str, string from __builtins__.tuple import tuple +from __builtins__.unicode import unicode, utf8string # Functions. diff -r f326b1d00b21 -r 0538ce057931 lib/__builtins__/str.py --- a/lib/__builtins__/str.py Mon Dec 12 22:30:09 2016 +0100 +++ b/lib/__builtins__/str.py Mon Dec 12 22:33:15 2016 +0100 @@ -19,7 +19,7 @@ this program. If not, see . """ -from __builtins__.int import maxint, minint +from __builtins__.int import maxint from __builtins__.operator import _negate from __builtins__.sequence import itemaccess from __builtins__.types import check_int @@ -33,21 +33,30 @@ _p = maxint / 32 _a = 31 - def __init__(self): + def __init__(self, other=None): - "Initialise the string." + "Initialise the string, perhaps from 'other'." # Note the __data__ member. Since strings are either initialised from # literals or converted using routines defined for other types, no form # of actual initialisation is performed here. - self.__data__ = None + # NOTE: Cannot perform "other and other.__data__ or None" since the + # NOTE: __data__ attribute is not a normal attribute. + + if other: + self.__data__ = other.__data__ + else: + self.__data__ = None # Note the __key__ member. This is also initialised statically. Where # a string is the same as an attribute name, the __key__ member contains # attribute position and code details. - self.__key__ = None + if other: + self.__key__ = other.__key__ + else: + self.__key__ = None def __hash__(self): @@ -212,10 +221,10 @@ return str_substr(self.__data__, start, end, step) class string(basestring): - pass -class unicode(basestring): - def encode(self, encoding): pass + "A plain string of bytes." + + pass def str(obj): diff -r f326b1d00b21 -r 0538ce057931 lib/__builtins__/types.py --- a/lib/__builtins__/types.py Mon Dec 12 22:30:09 2016 +0100 +++ b/lib/__builtins__/types.py Mon Dec 12 22:33:15 2016 +0100 @@ -32,7 +32,7 @@ "Check the given string 's'." - if not _isinstance(s, string): + if not _isinstance(s, basestring): raise ValueError(s) # vim: tabstop=4 expandtab shiftwidth=4 diff -r f326b1d00b21 -r 0538ce057931 lib/__builtins__/unicode.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/lib/__builtins__/unicode.py Mon Dec 12 22:33:15 2016 +0100 @@ -0,0 +1,60 @@ +#!/usr/bin/env python + +""" +Unicode objects. + +Copyright (C) 2015, 2016 Paul Boddie + +This program is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free Software +Foundation; either version 3 of the License, or (at your option) any later +version. + +This program is distributed in the hope that it will be useful, but WITHOUT +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS +FOR A PARTICULAR PURPOSE. See the GNU General Public License for more +details. + +You should have received a copy of the GNU General Public License along with +this program. If not, see . +""" + +from __builtins__.str import basestring +from posix.iconv import Converter + +class utf8string(basestring): + + "A character string representation based on UTF-8." + + def encode(self, encoding): + + "Encode the string to the given 'encoding'." + + from_utf8 = Converter("UTF-8", encoding) + try: + from_utf8.feed(self) + return str(from_utf8) + finally: + from_utf8.close() + +def unicode(s, encoding): + + "Convert 's' to a Unicode object, interpreting 's' as using 'encoding'." + + if isinstance(s, utf8string): + return s + + # Obtain a string representation. + + s = s.__str__() + + # Convert the string to UTF-8. + + to_utf8 = Converter(encoding, "UTF-8") + try: + to_utf8.feed(s) + return utf8string(str(to_utf8)) + finally: + to_utf8.close() + +# vim: tabstop=4 expandtab shiftwidth=4