# HG changeset patch # User Paul Boddie # Date 1317477682 -7200 # Node ID 48a194ecc68c42b32cb80f7b43a598c67582cfc0 # Parent c120a1af7f6c785940cf75d0dfbc9cf879bdd11b Added short-circuiting of failed searches. Added term/key conversion support so that numerically sorted keys can be used. diff -r c120a1af7f6c -r 48a194ecc68c simplex/__init__.py --- a/simplex/__init__.py Sat Oct 01 15:07:17 2011 +0200 +++ b/simplex/__init__.py Sat Oct 01 16:01:22 2011 +0200 @@ -101,9 +101,15 @@ """ for record in reader.get_records(): - if term == accessor.get_key(record): + key = accessor.get_key(record) + if term == key: return record + # Short-circuit failed searches. + + elif term < key: + return None + return None def groups(l, length): diff -r c120a1af7f6c -r 48a194ecc68c simplex/readers.py --- a/simplex/readers.py Sat Oct 01 15:07:17 2011 +0200 +++ b/simplex/readers.py Sat Oct 01 16:01:22 2011 +0200 @@ -35,20 +35,25 @@ "An accessor using a delimiter to split a record." - def __init__(self, keys=None, delimiter=None): + def __init__(self, keys=None, delimiter=None, numeric=0): """ Initialise the accessor using a sequence of 'keys' indicating the columns in each record that provide the values in the eventual compound key provided by each record, along with a 'delimiter' indicating how - such columns are identified. + such columns are identified. If 'numeric' is set to a true value, keys + will be interpreted as numbers. """ self.keys = keys or [0] self.delimiter = delimiter + self.convert = numeric and self.convert_numeric or (lambda x: x) + + def convert_numeric(self, term): + return map(int, term) def get_key(self, record): values = record.split(self.delimiter) - return [values[key] for key in self.keys] + return self.convert([values[key] for key in self.keys]) # vim: tabstop=4 expandtab shiftwidth=4 diff -r c120a1af7f6c -r 48a194ecc68c test_indexed.py --- a/test_indexed.py Sat Oct 01 15:07:17 2011 +0200 +++ b/test_indexed.py Sat Oct 01 16:01:22 2011 +0200 @@ -5,8 +5,8 @@ try: separator = sys.argv.index("--") - filename, interval = sys.argv[1:3] - keys = map(int, sys.argv[3:separator]) + filename, numeric, interval = sys.argv[1:4] + keys = map(int, sys.argv[4:separator]) terms = groups(sys.argv[separator+1:], len(keys)) except (IndexError, ValueError): print >>sys.stderr, "Usage: %s ... -- ..." % sys.argv[0] @@ -14,7 +14,7 @@ f = open(filename) reader = TextFile(f) -accessor = DelimitedRecord(keys) +accessor = DelimitedRecord(keys, numeric=(numeric == "true")) try: t = time.time() l = make_index(reader, accessor, int(interval)) @@ -24,7 +24,7 @@ for term in terms: t = time.time() - line = find_with_index(reader, accessor, l, term) + line = find_with_index(reader, accessor, l, accessor.convert(term)) if line: print "Found (at %s seconds)...\n%s" % (time.time() - t, line) diff -r c120a1af7f6c -r 48a194ecc68c test_scan.py --- a/test_scan.py Sat Oct 01 15:07:17 2011 +0200 +++ b/test_scan.py Sat Oct 01 16:01:22 2011 +0200 @@ -5,8 +5,8 @@ try: separator = sys.argv.index("--") - filename = sys.argv[1] - keys = map(int, sys.argv[2:separator]) + filename, numeric = sys.argv[1:3] + keys = map(int, sys.argv[3:separator]) terms = groups(sys.argv[separator+1:], len(keys)) except (IndexError, ValueError): print >>sys.stderr, "Usage: %s ... -- ..." % sys.argv[0] @@ -14,13 +14,13 @@ f = open(filename) reader = TextFile(f) -accessor = DelimitedRecord(keys) +accessor = DelimitedRecord(keys, numeric=(numeric == "true")) try: for term in terms: reader.seek(0) t = time.time() - line = find_in_file(reader, accessor, term) + line = find_in_file(reader, accessor, accessor.convert(term)) if line: print "Found (at %s seconds)...\n%s" % (time.time() - t, line) finally: