writing module

Globals

whoosh.writing.DOCLENGTH_TYPE
The data type (“H” or “i”) used to store field lengths on disk. The default is “H”, but if you are indexing very large documents and need to be able to store field lengths longer than 65535 tokens, you can change this to “i”.
whoosh.writing.DOCLENGTH_LIMIT
The highest possible value representable by DOCLENGTH_TYPE. For “H” this is 2 ** 16 - 1. Remember to set this if you change DOCLENGTH_TYPE.

Writers

class whoosh.writing.IndexWriter

High-level object for writing to an index.

To get a writer for a particular index, call writer() on the Index object.

>>> writer = my_index.writer()

You can use this object as a context manager. If an exception is thrown from within the context it calls cancel(), otherwise it calls commit() when the context exits.

add_document(**fields)

Adds all the fields of a document at once. This is an alternative to calling start_document(), add_field() [...], end_document().

The keyword arguments map field names to the values to index/store.

For fields that are both indexed and stored, you can specify an alternate value to store using a keyword argument in the form “_stored_<fieldname>”. For example, if you have a field named “title” and you want to index the text “a b c” but store the text “e f g”, use keyword arguments like this:

writer.add_document(title=u"a b c", _stored_title=u"e f g")
cancel()
Cancels any documents/deletions added by this object and unlocks the index.
commit()
Finishes writing and unlocks the index.
delete_document(docnum, delete=True)
Deletes a document by number.
searcher(**kwargs)
Returns a searcher for the existing index.
update_document(**fields)

Adds or replaces a document. At least one of the fields for which you supply values must be marked as ‘unique’ in the index’s schema.

The keyword arguments map field names to the values to index/store.

Note that this method will only replace a committed document; currently it cannot replace documents you’ve added to the IndexWriter but haven’t yet committed. For example, if you do this:

>>> writer.update_document(unique_id=u"1", content=u"Replace me")
>>> writer.update_document(unique_id=u"1", content=u"Replacement")

...this will add two documents with the same value of unique_id, instead of the second document replacing the first.

For fields that are both indexed and stored, you can specify an alternate value to store using a keyword argument in the form “_stored_<fieldname>”. For example, if you have a field named “title” and you want to index the text “a b c” but store the text “e f g”, use keyword arguments like this:

writer.update_document(title=u"a b c", _stored_title=u"e f g")
class whoosh.writing.AsyncWriter(writerfn, delay=0.25, **writerargs)

Convenience wrapper for a writer object that might fail due to locking (i.e. the filedb writer). This object will attempt once to obtain the underlying writer, and if it’s successful, will simply pass method calls on to it.

If this object can’t obtain a writer immediately, it will buffer delete, add, and update method calls in memory until you call commit(). At that point, this object will start running in a separate thread, trying to obtain the writer over and over, and once it obtains it, “replay” all the buffered method calls on it.

In a typical scenario where you’re adding a single or a few documents to the index as the result of a Web transaction, this lets you just create the writer, add, and commit, without having to worry about index locks, retries, etc.

The first argument is a callable which returns the actual writer. Usually this will be the writer method of your Index object. Any additional keyword arguments to the initializer are passed into the callable.

For example, to get an aynchronous writer, instead of this:

>>> writer = myindex.writer(postlimit=128 * 1024 * 1024)

Do this:

>>> from whoosh.writing import AsyncWriter
>>> writer = AsyncWriter(myindex.writer, postlimit=128 * 1024 * 1024)
Parameters:
  • writerfn – a callable object (function or method) which returns the actual writer.
  • delay – the delay (in seconds) between attempts to instantiate the actual writer.

Exceptions

exception whoosh.writing.IndexingError

Table Of Contents

Previous topic

util module

Next topic

Technical notes

This Page