Blei’s LDA-C format.
Corpus in Blei’s LDA-C format.
The corpus is represented as two files: one describing the documents, and another describing the mapping between words and their ids.
Each document is one line:
N fieldId1:fieldValue1 fieldId2:fieldValue2 ... fieldIdN:fieldValueN
The vocabulary is a file with words, one word per line; word at line K has an implicit id=K.
Initialize the corpus from a file.
fnameVocab is the file with vocabulary; if not specified, it defaults to fname.vocab.
Load a previously saved object from file (also see save).
Save the object to file via pickling (also see load).
Save a corpus in the Matrix Market format.
There are actually two files saved: fname and fname.vocab, where fname.vocab is the vocabulary file.