LingPy

This documentation is for version 2.0.dev, which is not released yet.

lingpy.compare.lexstat.LexStat.cluster

LexStat.cluster(method='sca', cluster_method='upgma', threshold=0.55, scale=0.5, factor=0.3, restricted_chars='_T', mode='overlap', verbose=False, gop=-2, restriction='', **keywords)

Function for flat clustering of words into cognate sets.

Parameters :

method : {‘sca’,’lexstat’,’edit-dist’,’turchin’} (default=’sca’)

Select the method that shall be used for the calculation.

cluster_method : {‘upgma’,’single’,’complete’} (default=’upgma’)

Select the cluster method. ‘upgma’ (Sokal1958 refers to average linkage clustering.

threshold : float (default=0.6)

Select the threshold for the cluster approach. If set to c{False}, an automatic threshold will be calculated by calculating the average distance of unrelated sequences (use with care).

scale : float (default=0.5)

Select the scale for the gap extension penalty.

factor : float (default=0.3)

Select the factor for extra scores for identical prosodic segments.

restricted_chars : str (default=”T_”)

Select the restricted chars (boundary markers) in the prosodic strings in order to enable secondary alignment.

mode : {‘global’,’local’,’overlap’,’dialign’} (default=’overlap’)

Select the mode for the alignment analysis.

verbose : bool (default=False)

Define whether verbose output should be used or not.

gop : int (default=-2)

If ‘sca’ is selected as a method, define the gap opening penalty.

restriction : {‘cv’} (default=””)

Specify the restriction for calculations using the edit-distance. Currently, only “cv” is supported. If edit-dist is selected as method and restriction is set to cv, consonant-vowel matches will be prohibited in the calculations and the edit distance will be normalized by the length of the alignment rather than the length of the longest sequence, as described in Heeringa2006.

This Page