What’s new?
Version 0.7 is out!
gensim now completes LSI of the English Wikipedia (3.2 million documents) in 5 hours 14 minutes, using a one-pass incremental SVD algorithm, on a Macbook Pro laptop (NIPS workshop paper). Be sure to check out the distributed mode, too.
For an overview of what you can (or cannot) do with gensim, go to the introduction.
For examples on how to use it, try the tutorials.
>>> from gensim import corpora, models, similarities
>>>
>>> # load corpus iterator from a Matrix Market file on disk
>>> corpus = corpora.MmCorpus('/path/to/corpus.mm')
>>>
>>> # initialize a transformation (Latent Semantic Indexing with 200 latent dimensions)
>>> lsi = models.LsiModel(corpus, numTopics=200)
>>>
>>> # convert the same corpus to latent space and index it
>>> index = similarities.MatrixSimilarity(lsi[corpus])
>>>
>>> # perform similarity query of another vector in LSI space against the whole corpus
>>> sims = index[query]