scoring module

This module contains classes for scoring (and sorting) search results.

Scoring algorithm classes

class whoosh.scoring.Weighting

Abstract base class for weighting objects. A weighting object implements a scoring algorithm.

Concrete subclasses must implement the score() method, which returns a score given a term and a document in which that term appears.

avg_field_length(ixreader, fieldnum)
Returns the average length of the field per document. (i.e. total field length / total number of documents)
final(searcher, docnum, score)

Returns a final score for each document. You can use this method in subclasses to apply document-level adjustments to the score, for example using the value of stored field to influence the score (although that would be slow).

Parameters:
  • searcherwhoosh.searching.Searcher for the index.
  • docnum – the doc number of the document being scored.
  • score – the document’s accumulated term score.
Return type:

float

fl_over_avfl(ixreader, docnum, fieldnum)
Returns the length of the current field in the current document divided by the average length of the field across all documents. This is used by some scoring algorithms.
score(searcher, fieldnum, text, docnum, weight, QTF=1)

Returns the score for a given term in the given document.

Parameters:
  • searcherwhoosh.searching.Searcher for the index.
  • fieldnum – the field number of the term being scored.
  • text – the text of the term being scored.
  • docnum – the doc number of the document being scored.
  • weight – the frequency * boost of the term in this document.
  • QTF – the frequency of the term in the query.
Return type:

float

class whoosh.scoring.BM25F(B=0.75, K1=1.2, field_B=None)

Generates a BM25F score.

Parameters:
  • B – free parameter, see the BM25 literature.
  • K1 – free parameter, see the BM25 literature.
  • field_B – If given, a dictionary mapping fieldnums to field-specific B values.
class whoosh.scoring.Cosine
A cosine vector-space scoring algorithm, translated into Python from Terrier’s Java implementation.
class whoosh.scoring.DFree
The DFree probabilistic weighting algorithm, translated into Python from Terrier’s Java implementation.
class whoosh.scoring.DLH13(k=0.5)
The DLH13 probabilistic weighting algorithm, translated into Python from Terrier’s Java implementation.
class whoosh.scoring.Hiemstra_LM(c=0.14999999999999999)
The Hiemstra LM probabilistic weighting algorithm, translated into Python from Terrier’s Java implementation.
class whoosh.scoring.InL2(c=1.0)
The InL2 LM probabilistic weighting algorithm, translated into Python from Terrier’s Java implementation.
class whoosh.scoring.TF_IDF
Instead of doing any real scoring, this simply returns tf * idf.
class whoosh.scoring.Frequency
Instead of doing any real scoring, simply returns the term frequency. This may be useful when you don’t care about normalization and weighting.

Sorting classes

class whoosh.scoring.Sorter

Abstract base class for sorter objects. See the ‘sortedby’ keyword argument to the Searcher object’s search() method.

Concrete subclasses must implement the order() method, which takes a sequence of doc numbers and returns a sorted sequence.

class whoosh.scoring.NullSorter
Sorter that does nothing.
class whoosh.scoring.FieldSorter(fieldname, key=None, missingfirst=False)

Used by searching.Searcher to sort document results based on the value of an indexed field, rather than score. See the ‘sortedby’ keyword argument to the Searcher’s search() method.

This object creates a cache of document orders for the given field. Creating the cache may make the first sorted search of a field seem slow, but subsequent sorted searches of the same field will be much faster.

Parameters:
  • fieldname – The name of the field to sort by.
  • missingfirst – Place documents which don’t have the given field first in the sorted results. The default is to put those documents last (after all documents that have the given field).
class whoosh.scoring.MultiFieldSorter(sorters, missingfirst=False)

Used by searching.Searcher to sort document results based on the value of an indexed field, rather than score. See the ‘sortedby’ keyword argument to the Searcher’s search() method.

This sorter uses multiple fields, so if for two documents the first field has the same value, it will use the second field to sort them, and so on.

Parameters:
  • fieldnames – A list of field names to sort by.
  • missingfirst – Place documents which don’t have the given field first in the sorted results. The default is to put those documents last (after all documents that have the given field).

Table Of Contents

Previous topic

reading module

Next topic

searching module

This Page