2017-02-04 | Paul Boddie | raw annotate files changeset graph | Removed recoding to UTF-8 since this failed for ISO-8859-15, causing UTF-8 recodings of byte sequences to occur, not producing such undesirable data for ISO-8859-1 only because of it being special-cased. This change may break other ASCII-incompatible encodings because UTF-8 is likely to be the safe form of such data, permitting the parser to understand it, and without such recoding the parser will no longer recognise the grammar's tokens. |
1 from pyparser.automata import DFA, DEFAULT 2 3 def test_states(): 4 d = DFA([{"\x00": 1}, {"\x01": 0}], [False, True]) 5 assert d.states == "\x01\xff\xff\x00" 6 assert d.defaults == "\xff\xff" 7 assert d.max_char == 2 8 9 d = DFA([{"\x00": 1}, {DEFAULT: 0}], [False, True]) 10 assert d.states == "\x01\x00" 11 assert d.defaults == "\xff\x00" 12 assert d.max_char == 1