This module contains classes for writing and reading postings.
The PostIterator interface is the base interface for the two “cursor” interfaces (PostingReader and QueryScorer). It defines the basic methods for moving through the posting list (e.g. reset(), next(), skip_to()).
The PostingReader interface allows reading raw posting information. Individual backends must provide a PostingReader implementation that will be returned by the backend’s whoosh.reading.IndexReader.postings() method. PostingReader subclasses in this module provide synthetic readers or readers that wrap other readers and modify their behavior.
The QueryScorer interface allows retrieving and scoring search results. QueryScorer objects will be returned by the scorer() method on whoosh.query.Query objects. QueryScorer subclasses in this module provide synthetic scorers or scorers that wrap other scorers and modify their behavior.
Base class for posting readers.
“Postings” are used for two purposes in Whoosh.
For each term in the index, the postings are the list of documents the term appears in and any associated value for each document. For example, if the field format is Frequency, the postings for the field might look like:
[(0, 1), (10, 3), (12, 5)]
...where 0, 10, and 12 are document numbers, and 1, 3, and 5 are the frequencies of the term in those documents.
To get a PostingReader object for a term, use the postings() method on an IndexReader or Searcher object.
>>> # Get a PostingReader for the term "render" in the "content" field.
>>> r = myindex.reader()
>>> preader = r.postings("content", u"render")
For fields with term vectors, the vector postings are the list of terms that appear in the field and any associated value for each term. For example, if the term vector format is Frequency, the postings for the term vector might look like:
[(u"apple", 1), (u"bear", 5), (u"cab", 2)]
...where “apple”, “bear”, and “cab” are the terms in the document field, and 1, 5, 2 are the frequencies of those terms in the document field.
To get a PostingReader object for a vector, use the vector() method on an IndexReader or Searcher object.
>>> # Get a PostingReader for the vector of the "content" field
>>> # of document 100
>>> r = myindex.reader()
>>> vreader = r.vector(100, "content")
PostingReader defines a fairly simple interface.
In addition, PostingReader supports a few convenience methods:
all_ids(), all_items(), and all_as() are similar, but return iterators of all IDs/items in the reader, regardless of the current position of the reader.
Different implementations may leave the reader in different positions during and after use of the iteration methods; that is, the effect of the iterators on the reader’s position is undefined and may be different in different PostingReader subclasses and different backend implementations.
Returns the value for the current id as the given type.
Parameter: | astype – a string, such as “weight” or “positions”. The Format object associated with this reader must have a corresponding “as_*” method, e.g. as_weight(), for decoding the value. |
---|
This posting reader concatenates the results from serial sub-readers. This is useful for backends that use a segmented index.
Parameters: |
|
---|
PostingReader that removes certain IDs from a sub-reader.
Parameters: |
|
---|
Reads postings from a list in memory instead of from storage.
>>> preader = ixreader.postings("content", "render")
>>> creader = CachedPostingReader(preader.all_items())
Parameter: | items – a sequence of (id, encodedvalue) pairs. If this is not a list or tuple, it is converted using tuple(). |
---|
QueryScorer extends the PostIterator interface with two methods:
Takes two QueryScorers and pulls items from the first, skipping items that also appear in the second.
THIS SCORER IS NOT ACTUALLY USED, since it turns out to be slightly faster to simply create an “excluded_docs” filter from the “not” query and pass that into the “positive” query.
Parameters: |
|
---|
This is a fake query scorer for testing purposes. You create the object with the posting IDs as arguments, and then returns them as you call next() or skip_to().
>>> fpr = FakeScorer(1, 5, 10, 80)
>>> fpr.id
1
>>> fpr.next()
>>> fpr.id
5