# HG changeset patch # User Paul Boddie # Date 1317420064 -7200 # Node ID 6397ceae2ddaa3aa1ea4c2d28a03338d580ff979 # Parent baee3f51e8f51bee6308dec3182d74f6a0e33b8a Added support for duplicate keys so that an index will always refer to the first record of any group of records sharing such a key, thus ensuring that use of the index does not cause records to be missed (because some of them occur before the referenced record). diff -r baee3f51e8f5 -r 6397ceae2dda simplex.py --- a/simplex.py Fri Sep 30 23:46:27 2011 +0200 +++ b/simplex.py Sat Oct 01 00:01:04 2011 +0200 @@ -65,9 +65,24 @@ l = [] pos = 0 + current_key = None + start_pos = 0 + for i, record in enumerate(f.get_records()): + key = f.get_key(record) + + # Where duplicate keys are permitted, the first record employing the key + # must be available as an index entry. Otherwise, records preceding the + # one referenced by the entry may have the same key and be missed when + # seeking using the index. + + if key != current_key: + current_key = key + start_pos = pos + if i % interval == 0: - l.append((f.get_key(record), pos)) + l.append((current_key, start_pos)) + pos += len(record) return l