LingPy

This documentation is for version 2.0.dev, which is not released yet.

lingpy.compare.lexstat.LexStat

class lingpy.compare.lexstat.LexStat(filename, **keywords)

Basic class for automatic cognate detection.

Parameters :

filename : str

The name of the file that shall be loaded.

model : Model

The sound-class model that shall be used for the analysis. Defaults to the SCA sound-class model.

merge_vowels : bool (default=True)

Indicate whether consecutive vowels should be merged into single tokens or kept apart as separate tokens.

transform : dict

A dictionary that indicates how prosodic strings should be simplified (or generally transformed), using a simple key-value structure with the key referring to the original prosodic context and the value to the new value. Currently, prosodic strings (see prosodic_string()) offer 11 different prosodic contexts. Since not all these are helpful in preliminary analyses for cognate detection, it is useful to merge some of these contexts into one. The default settings distinguish only 5 instead of 11 available contexts, namely:

  • C for all consonants in prosodically ascending position,
  • c for all consonants in prosodically descending position,
  • V for all vowels,
  • T for all tones, and
  • _ for word-breaks.

check : bool (default=False)

If set to c{True}, the input file will first be checked for errors before the calculation is carried out. Errors will be written to the file errors.log.

Notes

Instantiating this class does not require a lot of parameters. However, the user may modify its behaviour by providing additional attributes in the input file.

Methods

add_entries(entry, source, function[, override]) Add new entry-types to the word list by modifying given ones.
align_pairs(idxA, idxB[, method, mode, gop, ...]) Align all or some words of a given pair of languages.
calculate(data[, taxa, concepts, cognates, ...]) Function calculates specific data.
cluster([method, cluster_method, threshold, ...]) Function for flat clustering of words into cognate sets.
get_dict([col, row, entry]) Function returns dictionaries of the cells matched by the indices.
get_entries(entry) Return all entries matching the given entry-type as a two-dimensional list.
get_etymdict([ref, entry, loans]) Return an etymological dictionary representation of the word list.
get_list([row, col, entry, flat]) Function returns lists of rows and columns specified by their name.
get_paps([ref, entry, missing]) Function returns a list of present-absent-patterns of a given word list.
get_random_distances([method, runs, mode, ...]) Method calculates randoms scores for unrelated words in a dataset.
get_scorer([method, ratio, vscale, runs, ...]) Create a scoring function based on sound correspondences.
output(fileformat, **keywords) Write wordlist to file.
pickle() Store a dump of the data in a binary file.
tokenize([ortho_profile, source, target]) Tokenize the data with help of orthography profiles.

This Page