engine

Pythonic wrapper around PyLucene search engine.

Provides high-level interfaces to indexes and documents, abstracting away java lucene primitives.

indexers

Wrappers for lucene Index{Read,Search,Writ}ers.

The final Indexer classes exposes a high-level Searcher and Writer.

Analyzer

class engine.indexers.Analyzer(tokenizer, *filters)
Return a lucene Analyzer which chains together a tokenizer (or analyzer) and filters.

IndexReader

class engine.indexers.IndexReader(directory)

Delegated lucene IndexReader, with a mapping interface of ids to document objects.

Parameter:directory – lucene IndexReader or directory
__len__()
__contains__(id)
__iter__()
__getitem__(id)
__delitem__(id)
Acquires a write lock. Deleting from an IndexWriter is encouraged instead.
comparator(name, *names, **kwargs)

Return sequence of documents’ field values suitable for sorting.

Parameters:
  • name – field name
  • names – additional names return tuples of values
  • default – keyword only default value
count(name, value)
Return number of documents with given term.
delete(name, value)
Delete documents with given term. Acquires a write lock. Deleting from an IndexWriter is encouraged instead.
directory
reader’s lucene Directory
docs(name, value, counts=False)
Generate doc ids which contain given term, optionally with frequency counts.
names(option='all')
Return field names, given option description.
positions(name, value, payloads=False)
Generate doc ids and positions which contain given term, optionally only with payloads.
positionvector(id, field, offsets=False)
Generate terms and positions for given doc id and field, optionally with character offsets.
spans(query, positions=False)

Generate docs with occurrence counts for a span query.

Parameters:
  • query – lucene SpanQuery
  • positions – optionally include slice positions instead of counts
terms(name, start='', stop=None, counts=False)
Generate a slice of term values, optionally with frequency counts.
termvector(id, field, counts=False)
Generate terms for given doc id and field, optionally with frequency counts.

Searcher

class engine.indexers.Searcher(arg, analyzer=None)

Mixin interface common among searchers.

__getitem__(id)
Return Document
__del__()
Closes index.
count(*query, **options)

Return number of hits for given query or term.

Parameters:
  • querysearch() compatible query, or optimally a name and value
  • options – additional search() options
highlight(query, text, count=1, span=True, formatter=None, encoder=None, field='', **attrs)

Return highlighted text fragments which match the query.

Parameters:
  • query – query string or lucene Query
  • text – text string to be searched
  • count – maximum number of fragments
  • span – only highlight terms which would contribute to a hit
  • formatter – optional lucene Formatter
  • encoder – optional lucene Encoder
  • field – default query field name
  • attrs – additional attributes to set on the highlighter
parse(query, field='', op='', **attrs)

Return lucene parsed Query.

Parameters:
  • field – default query field name
  • op – default query operator (‘or’, ‘and’)
  • attrs – additional attributes to set on the parser
search(query=None, filter=None, count=None, sort=None, reverse=False, **parser)

Run query and return Hits.

Parameters:
  • query – query string or lucene Query
  • filter – doc ids or lucene Filter
  • count – maximum number of hits to retrieve
  • sort – if count is given, lucene Sort parameters, else a callable key
  • reverse – reverse flag used with sort
  • parserparse() options

IndexSearcher

class engine.indexers.IndexSearcher(directory, analyzer=None)

Bases: engine.indexers.Searcher, IndexSearcher, engine.indexers.IndexReader

Inherited lucene IndexSearcher, with a mixed-in IndexReader.

Parameters:
  • directory – directory path or lucene Directory
  • analyzer – lucene Analyzer, default StandardAnalyzer
filters
Mapping of cached filters, which are also used for facet counts.
facets(ids, *keys)

Return mapping of document counts for the intersection with each facet.

Parameters:
  • ids – document ids
  • keys – field names, term tuples, or any keys to previously cached filters

MultiSearcher

class engine.indexers.MultiSearcher(searchers, analyzer=None)

Bases: engine.indexers.Searcher, MultiSearcher

Inherited lucene MultiSearcher.

Parameters:
  • searchers – lucene.Searchers or directory
  • analyzer – lucene Analyzer, default StandardAnalyzer

ParallelMultiSearcher

class engine.indexers.ParallelMultiSearcher(searchers, analyzer=None)

Bases: engine.indexers.MultiSearcher, ParallelMultiSearcher

Inherited lucene ParallelMultiSearcher.

IndexWriter

class engine.indexers.IndexWriter(directory=None, mode='a', analyzer=None, mfl=10000)

Bases: IndexWriter

Inherited lucene IndexWriter. Supports setting fields parameters explicitly, so documents can be represented as dictionaries.

Parameters:
  • directory – directory path or lucene Directory, default RAMDirectory
  • mode – file mode (rwa), except updating (+) is implied
  • analyzer – lucene Analyzer, default StandardAnalyzer
  • mfl – MaxFieldLength, default IndexWriter.DEFAULT_MAX_FIELD_LENGTH
fields
Mapping of assigned fields. May be used directly, instead of set() method, for further customization.
__del__()
Closes index.
__len__()
__iadd__(directory)
Add directory (or reader, searcher, writer) to index.
add(document=(), **terms)

Add document to index. Document is comprised of name: value pairs, where the values may be one or multiple strings.

Parameters:
  • document – optional document terms as a dict or items
  • terms – additional terms to document
delete(*query, **options)

Remove documents which match given query or term.

Parameters:
parse(query, field='', op='', **attrs)

Return lucene parsed Query.

Parameters:
  • field – default query field name
  • op – default query operator (‘or’, ‘and’)
  • attrs – additional attributes to set on the parser
segments
segment filenames with document counts
set(name, cls=<class 'engine.documents.Field'>, **params)

Assign parameters to field name.

Parameters:
  • name – registered name of field
  • cls – optional Field constructor
  • params – store,index,termvector options compatible with Field

Indexer

class engine.indexers.Indexer(*args, **kwargs)

Bases: engine.indexers.IndexWriter

An all-purpose interface to an index. Creates an IndexWriter with a delegated IndexSearcher.

commit()
Commit writes and refresh searcher. Not thread-safe.

documents

Wrappers for lucene Fields and Documents.

Document

class engine.documents.Document(doc=None)

Delegated lucene Document. Provides mapping interface of field names to values, but duplicate field names are allowed.

Parameter:doc – optional lucene Document
__len__()
__contains__(name)
__iter__()
__getitem__(name)
__delitem__(name)
add(name, value, cls=<class 'engine.documents.Field'>, **params)
Add field to document with given parameters.
dict(*names, **defaults)

Return dict representation of document.

Parameters:
  • names – names of multi-valued fields to return as a list
  • defaults – include only given fields, using default values as necessary
fields()
Generate lucene Fields.
get(name, default=None)
Return field value if present, else default.
getlist(name)
Return list of all values for given field.
items()
Generate name, value pairs for all fields.

Hit

class engine.documents.Hit(doc, id, score)

A Document with an id and score, from a search result.

dict(*names, **defaults)
Return dict representation of document with __id__ and __score__.

Hits

class engine.documents.Hits(searcher, ids, scores, count=0)

Search results: lazily evaluated and memory efficient. Provides a read-only sequence interface to hit objects.

Parameters:
  • searcherSearcher which can retrieve documents
  • ids – ordered doc ids
  • scores – ordered doc scores
  • count – total number of hits
__len__()
__getitem__(index)
items()
Generate zipped ids and scores.

Field

class engine.documents.Field(name, store=False, index='analyzed', termvector=False, **attrs)

Saved parameters which can generate lucene Fields given values.

Parameters:
  • name – name of field
  • store, index, termvector – field parameters, expressed as bools or strs, with lucene defaults
  • attrs – additional attributes to set on the field
items(*values)
Generate lucene Fields suitable for adding to a document.

FormatField

class engine.documents.FormatField(name, format='{0}', **kwargs)

Bases: engine.documents.Field

Field which uses string formatting on its values.

Parameter:format – format string
format(value)
Return formatted value.
items(*values)
Generate fields with formatted values.

NumericField

class engine.documents.NumericField(name, step=None, store=False, index=True)

Bases: engine.documents.Field

Field which indexes numbers in a prefix tree.

Parameters:
  • name – name of field
  • step – precision step
items(*values)
Generate lucene NumericFields suitable for adding to a document.
range(start, stop, lower=True, upper=False)
Return lucene NumericRangeQuery.

PrefixField

class engine.documents.PrefixField(name, start=1, stop=None, step=1, store=False, index=True, termvector=False)

Bases: engine.documents.Field

Field which indexes every prefix of a value into a separate component field. The customizable component field names are expressed as slices. Original value may be stored for convenience.

Parameter:start, stop, step – optional slice parameters of the prefix depths (not indices)
getname(depth)
Return prefix field name for given depth.
indices(depth)
Return range of valid depth indices.
items(*values)
Generate indexed component fields. Optimized to handle duplicate values.
join(words)
Return text from separate words.
prefix(value)
Return prefix query of the closest possible prefixed field.
range(start, stop, lower=True, upper=False)
Return range query of the closest possible prefixed field.
split(text)
Return immutable sequence of words from name or value.

NestedField

class engine.documents.NestedField(name, sep=':', **kwargs)

Bases: engine.documents.PrefixField

Field which indexes every component into its own field.

Parameter:sep – field separator used on name and values
getname(depth)
Return component field name for given depth.
join(words)
Return text from separate words.
split(text)
Return immutable sequence of words from name or value.

DateTimeField

class engine.documents.DateTimeField(name, start=1, stop=None, step=1, store=False, index=True, termvector=False)

Bases: engine.documents.PrefixField

Field which indexes each datetime component in sortable ISO format: Y-m-d H:M:S. Works with datetimes, dates, and any object whose string form is a prefix of ISO.

getname(depth)
Return component field name for given depth.
join(words)
Return datetime components in ISO format.
prefix(date)
Return prefix query of the datetime.
range(start, stop, lower=True, upper=False)
Return optimal union of date range queries. May produce invalid dates, but the query is still correct.
split(text)
Return immutable sequence of datetime components.
within(days=0, weeks=0, utc=False, **delta)

Return date range query within current time and delta. If the delta is an exact number of days, then dates will be used.

Parameters:
  • days, weeks – number of days to offset from today
  • utc – optionally use utc instead of local time
Params delta:

additional timedelta parameters

queries

Query wrappers and search utilities.

Query

class engine.queries.Query(base, *args)

Inherited lucene Query, with dynamic base class acquisition. Uses class methods and operator overloading for convenient query construction.

__and__(other)
<BooleanQuery +self +other>
__or__(other)
<BooleanQuery self other>
__sub__(other)
<BooleanQuery self -other>
classmethod all(*queries, **terms)
Return lucene BooleanQuery (AND) from queries and terms.
classmethod any(*queries, **terms)
Return lucene BooleanQuery (OR) from queries and terms.
filter(cache=True)
Return lucene CachingWrapperFilter, optionally just QueryWrapperFilter.
classmethod fuzzy(name, value, minimumSimilarity=0.5, prefixLength=0)
Return lucene FuzzyQuery.
classmethod multiphrase(name, *values)
Return lucene MultiPhraseQuery. None may be used as a placeholder.
classmethod phrase(name, *values)
Return lucene PhraseQuery. None may be used as a placeholder.
classmethod prefix(name, value)
Return lucene PrefixQuery.
classmethod range(name, start, stop, lower=True, upper=False)
Return lucene ConstantScoreRangeQuery, by default with a half-open interval.
classmethod span(name, value)
Return lucene SpanTermQuery.
classmethod term(name, value)
Return lucene TermQuery.
classmethod wildcard(name, value)
Return lucene WildcardQuery.

BooleanQuery

class engine.queries.BooleanQuery(base, *args)
__len__()
__iter__()
__getitem__(index)
__iand__(other)
add +other
__ior__(other)
add other
__isub__(other)
add -other

SpanQuery

class engine.queries.SpanQuery(base, *args)
__getitem__(slc)
<SpanFirstQuery: spanFirst(self, other.stop)>
__sub__(other)
<SpanNotQuery: spanNot(self, other)>
__or__(*spans)
<SpanOrQuery: spanOr(spans)>
near(*spans, **kwargs)

Return lucene SpanNearQuery.

Parameters:
  • slop – default 0
  • inOrder – default True

Filter

class engine.queries.Filter(ids)

Inherited lucene Filter with a cached BitSet of ids.

bits(reader=None)

Return cached BitSet. Although this method is deprecated in Lucene, it’s in use in PyLucene.

Parameter:reader – ignored IndexReader, necessary for lucene api

spatial

Geospatial fields.

Latitude/longitude coordinates are encoded into the quadkeys of MS Virtual Earth, which are also compatible with Google Maps and OSGEO Tile Map Service. See http://www.maptiler.org/google-maps-coordinates-tile-bounds-projection/.

The quadkeys are then indexed using a prefix tree, creating a cartesian tier of tiles.

Tiler

class engine.spatial.Tiler(tileSize=256)

Utilities for transforming lat/lngs, projected coordinates, and tile coordinates.

coords(tile)
Return TMS coordinates of tile.
decode(tile)
Return lat/lng bounding box (bottom, left, top, right) of tile.
encode(lat, lng, precision)
Return tile from latitude, longitude and precision level.
project(lat, lon)
Converts given lat/lon in WGS84 Datum to XY in Spherical Mercator EPSG:900913
radiate(lat, lng, distance, precision, limit=inf)
Generate tile keys within distance of given point, adjusting precision to limit the number considered.
walk(bottomleft, topright, precision)
Generate tile keys which span bounding box.
zoom(tiles)
Return reduced number of tiles, by zooming out where all sub-tiles are present.

PointField

class engine.spatial.PointField(name, precision=30, **kwargs)

Bases: engine.documents.PrefixField, engine.spatial.Tiler

Geospatial points, which create a tiered index of tiles. Points must still be stored if exact distances are required upon retrieval.

Parameter:precision – zoom level, i.e., length of encoded value
items(*points)
Generate tiles from points (lng, lat).
near(lng, lat, precision=None)
Return prefix query for point at given precision.
within(lng, lat, distance, limit=4)

Return prefix queries for any tiles which could be within distance of given point.

Parameters:
  • lng, lat – point
  • distance – search radius in meters
  • limit – maximum number of tiles to consider

PolygonField

class engine.spatial.PolygonField(name, precision=30, **kwargs)

Bases: engine.spatial.PointField

PointField which implicitly supports polygons (technically linear rings of points). Differs from points in that all necessary tiles are included to match the points’ boundary. As with PointField, the tiered tiles are a search optimization, not a distance calculator.

items(*polygons)
Generate all covered tiles from polygons.