scoring module
This module contains classes for scoring (and sorting) search results.
Scoring algorithm classes
-
class whoosh.scoring.Weighting
Abstract base class for weighting objects. A weighting
object implements a scoring algorithm.
Concrete subclasses must implement the score() method, which
returns a score given a term and a document in which that term
appears.
-
avg_field_length(ixreader, fieldnum)
- Returns the average length of the field per document.
(i.e. total field length / total number of documents)
-
final(searcher, docnum, score)
Returns a final score for each document. You can use this method
in subclasses to apply document-level adjustments to the score, for
example using the value of stored field to influence the score
(although that would be slow).
Parameters: |
- searcher – whoosh.searching.Searcher for the index.
- docnum – the doc number of the document being scored.
- score – the document’s accumulated term score.
|
Return type: | float
|
-
fl_over_avfl(ixreader, docnum, fieldnum)
- Returns the length of the current field in the current
document divided by the average length of the field
across all documents. This is used by some scoring algorithms.
-
score(searcher, fieldnum, text, docnum, weight, QTF=1)
Returns the score for a given term in the given document.
Parameters: |
- searcher – whoosh.searching.Searcher for the index.
- fieldnum – the field number of the term being scored.
- text – the text of the term being scored.
- docnum – the doc number of the document being scored.
- weight – the frequency * boost of the term in this document.
- QTF – the frequency of the term in the query.
|
Return type: | float
|
-
class whoosh.scoring.BM25F(B=0.75, K1=1.2, field_B=None)
Generates a BM25F score.
Parameters: |
- B – free parameter, see the BM25 literature.
- K1 – free parameter, see the BM25 literature.
- field_B – If given, a dictionary mapping fieldnums to
field-specific B values.
|
-
class whoosh.scoring.Cosine
- A cosine vector-space scoring algorithm, translated into Python
from Terrier’s Java implementation.
-
class whoosh.scoring.DFree
- The DFree probabilistic weighting algorithm, translated into Python
from Terrier’s Java implementation.
-
class whoosh.scoring.DLH13(k=0.5)
- The DLH13 probabilistic weighting algorithm, translated into Python
from Terrier’s Java implementation.
-
class whoosh.scoring.Hiemstra_LM(c=0.14999999999999999)
- The Hiemstra LM probabilistic weighting algorithm, translated into Python
from Terrier’s Java implementation.
-
class whoosh.scoring.InL2(c=1.0)
- The InL2 LM probabilistic weighting algorithm, translated into Python
from Terrier’s Java implementation.
-
class whoosh.scoring.TF_IDF
- Instead of doing any real scoring, this simply returns tf * idf.
-
class whoosh.scoring.Frequency
- Instead of doing any real scoring, simply returns the term frequency.
This may be useful when you don’t care about normalization and weighting.
Sorting classes
-
class whoosh.scoring.Sorter
Abstract base class for sorter objects. See the ‘sortedby’
keyword argument to the Searcher object’s
search() method.
Concrete subclasses must implement the order() method, which
takes a sequence of doc numbers and returns a sorted sequence.
-
class whoosh.scoring.NullSorter
- Sorter that does nothing.
-
class whoosh.scoring.FieldSorter(fieldname, key=None, missingfirst=False)
Used by searching.Searcher to sort document results based on the
value of an indexed field, rather than score. See the ‘sortedby’
keyword argument to the Searcher’s
search() method.
This object creates a cache of document orders for the given field.
Creating the cache may make the first sorted search of a field
seem slow, but subsequent sorted searches of the same field will
be much faster.
Parameters: |
- fieldname – The name of the field to sort by.
- missingfirst – Place documents which don’t have the given
field first in the sorted results. The default is to put those
documents last (after all documents that have the given field).
|
-
class whoosh.scoring.MultiFieldSorter(sorters, missingfirst=False)
Used by searching.Searcher to sort document results based on the
value of an indexed field, rather than score. See the ‘sortedby’
keyword argument to the Searcher’s search()
method.
This sorter uses multiple fields, so if for two documents the first
field has the same value, it will use the second field to sort them,
and so on.
Parameters: |
- fieldnames – A list of field names to sort by.
- missingfirst – Place documents which don’t have the given
field first in the sorted results. The default is to put those
documents last (after all documents that have the given field).
|