fields module
Contains functions and classes related to fields.
Schema class
-
class whoosh.fields.Schema(**fields)
Represents the collection of fields in an index. Maps field names to
FieldType objects which define the behavior of each field.
Low-level parts of the index use field numbers instead of field names for
compactness. This class has several methods for converting between the
field name, field number, and field object itself.
All keyword arguments to the constructor are treated as fieldname =
fieldtype pairs. The fieldtype can be an instantiated FieldType object,
or a FieldType sub-class (in which case the Schema will instantiate it
with the default constructor before adding it).
For example:
s = Schema(content = TEXT,
title = TEXT(stored = True),
tags = KEYWORD(stored = True))
-
add(name, fieldtype)
Adds a field to this schema. This is a low-level method; use keyword
arguments to the Schema constructor to create the fields instead.
Parameters: |
- name – The name of the field.
- fieldtype – An instantiated fields.FieldType object, or a
FieldType subclass. If you pass an instantiated object, the schema
will use that as the field configuration for this field. If you
pass a FieldType subclass, the schema will automatically
instantiate it with the default constructor.
|
-
analyzer(fieldname)
- Returns the content analyzer for the given fieldname, or None if
the field has no analyzer
-
field_by_name(name)
Returns the field object associated with the given name.
Parameter: | name – The name of the field to retrieve. |
-
field_by_number(number)
Returns the field object associated with the given number.
Parameter: | number – The number of the field to retrieve. |
-
field_names()
- Returns a list of the names of the fields in this schema.
-
fields()
- Yields (“fieldname”, field_object) pairs for the fields in this
schema.
-
has_vectored_fields()
- Returns True if any of the fields in this schema store term vectors.
-
name_to_number(name)
- Given a field name, returns the field’s number.
-
number_to_name(number)
- Given a field number, returns the field’s name.
-
scorable_fields()
- Returns a list of field numbers corresponding to the fields that
store length information.
-
stored_field_names()
- Returns the names, in order, of fields that are stored.
-
stored_fields()
- Returns a list of field numbers corresponding to the fields that are stored.
-
to_name(id)
- Given a field name or number, returns the field’s name.
-
to_number(id)
- Given a field name or number, returns the field’s number.
-
vectored_fields()
- Returns a list of field numbers corresponding to the fields that are
vectored.
FieldType base class
-
class whoosh.fields.FieldType(format, vector=None, scorable=False, stored=False, unique=False)
Represents a field configuration.
The FieldType object supports the following attributes:
- format (fields.Format): the storage format for the field’s contents.
- vector (fields.Format): the storage format for the field’s vectors
(forward index), or None if the field should not store vectors.
- scorable (boolean): whether searches against this field may be scored.
This controls whether the index stores per-document field lengths for
this field.
- stored (boolean): whether the content of this field is stored for each
document. For example, in addition to indexing the title of a document,
you usually want to store the title so it can be presented as part of
the search results.
- unique (boolean): whether this field’s value is unique to each document.
For example, ‘path’ or ‘ID’. IndexWriter.update_document() will use
fields marked as ‘unique’ to find the previous version of a document
being updated.
The constructor for the base field type simply lets you supply your own
configured field format, vector format, and scorable and stored values.
Subclasses may configure some or all of this for you.
-
clean()
- Clears any cached information in the field and any child objects.
-
index(value, **kwargs)
- Returns an iterator of (termtext, frequency, encoded_value) tuples.
Pre-made field types
-
class whoosh.fields.ID(stored=False, unique=False, field_boost=1.0)
Configured field type that indexes the entire value of the field as one
token. This is useful for data you don’t want to tokenize, such as the path
of a file.
Parameter: | stored – Whether the value of this field is stored with the document. |
-
class whoosh.fields.IDLIST(stored=False, unique=False, expression=None, field_boost=1.0)
Configured field type for fields containing IDs separated by whitespace
and/or puntuation.
Parameters: |
- stored – Whether the value of this field is stored with the
document.
- unique – Whether the value of this field is unique per-document.
- expression – The regular expression object to use to extract
tokens. The default expression breaks tokens on CRs, LFs, tabs,
spaces, commas, and semicolons.
|
-
class whoosh.fields.STORED
- Configured field type for fields you want to store but not index.
-
class whoosh.fields.KEYWORD(stored=False, lowercase=False, commas=False, scorable=False, unique=False, field_boost=1.0)
Configured field type for fields containing space-separated or
comma-separated keyword-like data (such as tags). The default is to not
store positional information (so phrase searching is not allowed in this
field) and to not make the field scorable.
Parameters: |
- stored – Whether to store the value of the field with the
document.
- comma – Whether this is a comma-separated field. If this is False
(the default), it is treated as a space-separated field.
- scorable – Whether this field is scorable.
|
-
class whoosh.fields.TEXT(analyzer=None, phrase=True, vector=None, stored=False, field_boost=1.0)
Configured field type for text fields (for example, the body text of an
article). The default is to store positional information to allow phrase
searching. This field type is always scorable.
Parameters: |
- stored – Whether to store the value of this field with the
document. Since this field type generally contains a lot of text,
you should avoid storing it with the document unless you need to,
for example to allow fast excerpts in the search results.
- phrase – Whether the store positional information to allow phrase
searching.
- analyzer – The analysis.Analyzer to use to index the field
contents. See the analysis module for more information. If you omit
this argument, the field uses analysis.StandardAnalyzer.
|
-
class whoosh.fields.NGRAM(minsize=2, maxsize=4, stored=False, field_boost=1.0)
Configured field that indexes text as N-grams. For example, with a field
type NGRAM(3,4), the value “hello” will be indexed as tokens
“hel”, “hell”, “ell”, “ello”, “llo”.
Parameters: |
- stored – Whether to store the value of this field with the
document. Since this field type generally contains a lot of text,
you should avoid storing it with the document unless you need to,
for example to allow fast excerpts in the search results.
- minsize – The minimum length of the N-grams.
- maxsize – The maximum length of the N-grams.
|
Exceptions
-
exception whoosh.fields.FieldConfigurationError
-
exception whoosh.fields.UnknownFieldError