postings module

This module contains classes for writing and reading postings.

The PostIterator interface is the base interface for the two “cursor” interfaces (PostingReader and QueryScorer). It defines the basic methods for moving through the posting list (e.g. reset(), next(), skip_to()).

The PostingReader interface allows reading raw posting information. Individual backends must provide a PostingReader implementation that will be returned by the backend’s whoosh.reading.IndexReader.postings() method. PostingReader subclasses in this module provide synthetic readers or readers that wrap other readers and modify their behavior.

The QueryScorer interface allows retrieving and scoring search results. QueryScorer objects will be returned by the scorer() method on whoosh.query.Query objects. QueryScorer subclasses in this module provide synthetic scorers or scorers that wrap other scorers and modify their behavior.

Posting writer

class whoosh.postings.PostingWriter
finish()
Called when the current set of postings is finished.
write(id, value)

Write the given id and value to the posting store.

Parameters:
  • id – The identifier for this posting.
  • value – The encoded value string for this posting.

Posting readers

class whoosh.postings.PostingReader

Base class for posting readers.

“Postings” are used for two purposes in Whoosh.

For each term in the index, the postings are the list of documents the term appears in and any associated value for each document. For example, if the field format is Frequency, the postings for the field might look like:

[(0, 1), (10, 3), (12, 5)]

...where 0, 10, and 12 are document numbers, and 1, 3, and 5 are the frequencies of the term in those documents.

To get a PostingReader object for a term, use the postings() method on an IndexReader or Searcher object.

>>> # Get a PostingReader for the term "render" in the "content" field.
>>> r = myindex.reader()
>>> preader = r.postings("content", u"render")

For fields with term vectors, the vector postings are the list of terms that appear in the field and any associated value for each term. For example, if the term vector format is Frequency, the postings for the term vector might look like:

[(u"apple", 1), (u"bear", 5), (u"cab", 2)]

...where “apple”, “bear”, and “cab” are the terms in the document field, and 1, 5, 2 are the frequencies of those terms in the document field.

To get a PostingReader object for a vector, use the vector() method on an IndexReader or Searcher object.

>>> # Get a PostingReader for the vector of the "content" field
>>> # of document 100 
>>> r = myindex.reader()
>>> vreader = r.vector(100, "content")

PostingReader defines a fairly simple interface.

  • The current posting ID is in the reader.id attribute.
  • Reader.value() to get the posting payload.
  • Reader.value_as(astype) to get the interpreted posting payload.
  • Reader.next() to move the reader to the next posting.
  • Reader.skip_to(id) to move the reader to that id in the list.
  • Reader.reset() to reset the reader to the beginning.

In addition, PostingReader supports a few convenience methods:

  • ids() returns an iterator of the remaining IDs.
  • items() returns an iterator of the remaining (id, encoded_value) pairs.
  • items_as(astype) returns an interator of the remaining (id, decoded_value) pairs.

all_ids(), all_items(), and all_as() are similar, but return iterators of all IDs/items in the reader, regardless of the current position of the reader.

Different implementations may leave the reader in different positions during and after use of the iteration methods; that is, the effect of the iterators on the reader’s position is undefined and may be different in different PostingReader subclasses and different backend implementations.

all_as(astype)
Yield a series of (id, decoded_value) pairs for each posting. This may or may not change the cursor position, depending on the subclass and backend implementations.
all_items()
Yields all (id, encoded_value) pairs in the reader. Use all_as() to get decoded values. This may or may not change the cursor position, depending on the subclass and backend implementations.
items()
Yields the remaining (id, encoded_value) pairs in the reader. Use items_as() to get decoded values. This may or may not change the cursor position, depending on the subclass and backend implementations.
items_as(astype)
Yields the remaining (id, decoded_value) pairs in the reader. This may or may not change the cursor position, depending on the subclass and backend implementations.
value()
Returns the encoded value string for the current id.
value_as(astype)

Returns the value for the current id as the given type.

Parameter:astype – a string, such as “weight” or “positions”. The Format object associated with this reader must have a corresponding “as_*” method, e.g. as_weight(), for decoding the value.
class whoosh.postings.MultiPostingReader(format, readers, idoffsets)

This posting reader concatenates the results from serial sub-readers. This is useful for backends that use a segmented index.

Parameters:
  • format – the Format object for the field being read.
  • readers – a list of PostingReader objects.
  • idoffsets – a list of integers, where each item in the list represents the ID offset of the corresponding reader in the ‘readers’ list.
class whoosh.postings.Exclude(postreader, excludes)

PostingReader that removes certain IDs from a sub-reader.

Parameters:
  • postreader – the PostingReader object to read from.
  • excludes – a collection of ids to exclude (may be any object, such as a BitVector or set, that implements __contains__).
class whoosh.postings.CachedPostingReader(items)

Reads postings from a list in memory instead of from storage.

>>> preader = ixreader.postings("content", "render")
>>> creader = CachedPostingReader(preader.all_items())
Parameter:items – a sequence of (id, encodedvalue) pairs. If this is not a list or tuple, it is converted using tuple().

QueryScorers

class whoosh.postings.QueryScorer

QueryScorer extends the PostIterator interface with two methods:

  • score() return the score for the current item.
  • __iter__() returns an iterator of (id, score) pairs.
all_ids()
Yields all posting IDs. This may or may not change the cursor position, depending on the subclass and backend implementations.
ids()
Yields the remaining IDs in the reader. This may or may not change the cursor position, depending on the subclass and backend implementations.
next()
Moves to the next posting.
reset()
Resets the reader to the beginning of the postings
score()
Returns the score for the current document.
skip_to(id)
Skips ahead to the given id. The default implementation simply calls next() repeatedly until it gets to the id, but subclasses will often be more clever.
class whoosh.postings.AndMaybeScorer(required, optional)
Takes two sub-scorers, and returns documents that appear in the first, but if the document also appears in the second, adds their scores together.
class whoosh.postings.AndNotScorer(positive, negative)

Takes two QueryScorers and pulls items from the first, skipping items that also appear in the second.

THIS SCORER IS NOT ACTUALLY USED, since it turns out to be slightly faster to simply create an “excluded_docs” filter from the “not” query and pass that into the “positive” query.

Parameters:
  • positive – a QueryScorer from which to take items.
  • negative – a QueryScorer, the IDs of which will be removed from the ‘positive’ scorer.
class whoosh.postings.EmptyScorer
A QueryScorer representing a query that doesn’t match any documents.
class whoosh.postings.FakeScorer(*ids)

This is a fake query scorer for testing purposes. You create the object with the posting IDs as arguments, and then returns them as you call next() or skip_to().

>>> fpr = FakeScorer(1, 5, 10, 80)
>>> fpr.id
1
>>> fpr.next()
>>> fpr.id
5
class whoosh.postings.IntersectionScorer(scorers, boost=1.0)
Acts like the intersection of items in a set of QueryScorers
class whoosh.postings.RequireScorer(scorer, required)
Takes the intersection of two sub-scorers, but only takes scores from the first.
class whoosh.postings.UnionScorer(scorers, boost=1.0, minmatch=0)
Acts like the union of a set of QueryScorers

Exceptions

exception whoosh.postings.ReadTooFar
Raised if a user calls next() or skip_to() on a reader that has reached the end of its items.

Table Of Contents

Previous topic

lang.wordnet module

Next topic

qparser module

This Page