# HG changeset patch # User Paul Boddie # Date 1495549887 -7200 # Node ID 68e9e6ca22840b7306567551301823e57f40e1c6 # Parent 33f7424021efbed3fc0fe6069dbd6cd12bb750bd Added some explanatory comments. diff -r 33f7424021ef -r 68e9e6ca2284 imiptools/text.py --- a/imiptools/text.py Tue May 16 00:41:37 2017 +0200 +++ b/imiptools/text.py Tue May 23 16:31:27 2017 +0200 @@ -3,7 +3,7 @@ """ Parsing of textual content. -Copyright (C) 2014, 2015, 2016 Paul Boddie +Copyright (C) 2014, 2015, 2016, 2017 Paul Boddie This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software @@ -24,14 +24,17 @@ # Parsing of lines to obtain functions and arguments. -line_pattern_str = r"(?:" \ - r"(?:'(.*?)')" \ - r"|" \ - r'(?:"(.*?)")' \ - r"|" \ - r"([^\s]+)" \ - r")+" \ - r"(?:\s+|$)" +line_pattern_str = ( + r"(?:" + r"(?:'(.*?)')" # single-quoted text + r"|" + r'(?:"(.*?)")' # double-quoted text + r"|" + r"([^\s]+)" # non-whitespace characters + r")+" + r"(?:\s+|$)" # optional trailing whitespace before line end + ) + line_pattern = re.compile(line_pattern_str) def parse_line(text): @@ -40,6 +43,10 @@ Parse the given 'text', returning a list of words separated by whitespace in the input, where whitespace may occur inside words if quoted using single or double quotes. + + Hello world -> ['Hello', 'world'] + Hello ' world' -> ['Hello', ' world'] + Hello' 'world -> ["'Hello'", "'world'] """ parts = []