# HG changeset patch # User Paul Boddie # Date 1317580983 -7200 # Node ID 94f93801356cb6e8b451e362b3b231b6e944b3fe # Parent fc8abb1911d0ac8074c52561056e76ec66e77706 Converted the indexer into a sequence-like class. diff -r fc8abb1911d0 -r 94f93801356c simplex/__init__.py --- a/simplex/__init__.py Sun Oct 02 19:52:30 2011 +0200 +++ b/simplex/__init__.py Sun Oct 02 20:43:03 2011 +0200 @@ -29,42 +29,9 @@ """ from simplex.readers import * +from simplex.indexers import * import bisect -def make_index(reader, get_key, interval): - - """ - Index a resource whose 'reader' provides records, using a 'get_key' - operation to yield the key for such records, creating an index entry for a - record after a given number of records, defined by 'interval', have been - read since the last entry was produced. - """ - - l = [] - pos = 0 - - current_key = None - start_pos = 0 - - for i, record in enumerate(reader): - key = get_key(record) - - # Where duplicate keys are permitted, the first record employing the key - # must be available as an index entry. Otherwise, records preceding the - # one referenced by the entry may have the same key and be missed when - # seeking using the index. - - if key != current_key: - current_key = key - start_pos = pos - - if i % interval == 0: - l.append((current_key, start_pos)) - - pos += len(record) - - return l - def find_with_index(reader, get_key, l, term): """ diff -r fc8abb1911d0 -r 94f93801356c simplex/indexers.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/simplex/indexers.py Sun Oct 02 20:43:03 2011 +0200 @@ -0,0 +1,86 @@ +#!/usr/bin/env python + +""" +Indexing classes. + +Copyright (C) 2011 Paul Boddie + +This program is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free Software +Foundation; either version 3 of the License, or (at your option) any later +version. + +This program is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A +PARTICULAR PURPOSE. See the GNU General Public License for more details. + +You should have received a copy of the GNU General Public License along +with this program. If not, see . +""" + +class Indexer: + + "An indexer which records an entry periodically." + + def __init__(self, output, get_key, interval): + + """ + Index a resource, recording entries in the given 'output' sequence, + using a 'get_key' operation to yield the key for each record, creating + an index entry for a record after a given number of records, defined by + 'interval', have been appended since the last entry was produced. + """ + + self.output = output + self.interval = interval + self.get_key = get_key + + self.count = 0 + self.pos = 0 + + # Information about the current group. + + self.start_pos = 0 + self.current_key = None + + def append(self, record): + + """ + Present the given 'record' to the indexer, recording it if appropriate. + """ + + key = self.get_key(record) + + # Where duplicate keys are permitted, the first record employing the key + # must be available as an index entry. Otherwise, records preceding the + # one referenced by the entry may have the same key and be missed when + # seeking using the index. + + if key != self.current_key: + self.current_key = key + self.start_pos = self.pos + + if self.count % self.interval == 0: + self.output.append((self.current_key, self.start_pos)) + + self.count += 1 + self.pos += len(record) + +def make_index(reader, get_key, interval): + + """ + Index a resource whose 'reader' provides records, using a 'get_key' + operation to yield the key for such records, creating an index entry for a + record after a given number of records, defined by 'interval', have been + read since the last entry was produced. + """ + + l = [] + indexer = Indexer(l, get_key, interval) + + for record in reader: + indexer.append(record) + + return l + +# vim: tabstop=4 expandtab shiftwidth=4