LingPy

This documentation is for version 2.0.dev, which is not released yet.

lingpy.sequence.orthography.OrthographyParser

class lingpy.sequence.orthography.OrthographyParser(orthography_profile)

Class for orthographic parsing using orthography profiles as designed for the QLC project.

Parameters :

orthography_profile : file

A document source-specific orthography profile.

Notes

The OrthographyParser reads in an orthography profile and calls a helper class to build a trie data structure, which stores the possible Unicode character combinations that are specified in the orthography profile and appear in the data source.

For example, an orthography profile might specify that in source X <uu> is a single grapheme (Unicode parlance: tailored grapheme) and thus should be chunked as so. Thus given an orthography profile and some data to parse, the process would look like this:

input string example: uubo output string example: # uu b o #

where the output is given in QLC string format.

Additionally, if a second column in an orthography profile is specified (the first lists the graphemes in a given source), this class assumes that that column is the IPA translation of the graphemes. A dictionary is created that keeps a mapping between source-specific graphemes and their IPA counterparts.

Deprecated methods in this class return a tuple of (True or False, parsed-string). The first element in the tuple relays whether the string parsed sucessfully. The second element returns the parsed string.

Methods

exists_multiple_columns() Returns boolean of whether multiple columns exist in the orthography profile, e.g.
graphemes_to_ipa(string) Returns the parsed and formated string given the orthography profile.
parse_formatted_string_to_ipa_string(string) Deprecated function to parse formatted string into graphemes.
parse_graphemes(string) Parses orthograhy profile specified graphemes given a string.
parse_string_to_graphemes(string) Deprecated function that parses string and returns tuple of graphemes.
parse_string_to_graphemes_string(string) Deprecated methods that returns parsed str in a tuple.
parse_string_to_graphemes_string_DEPRECATED(string) Deprecated function to parse str into tuples (success, parsed str).
parse_string_to_ipa_phonemes(string) Deprecated function to parse string and returns tuple of success and phonemes.
parse_string_to_ipa_string(string) Deprecated function to parse str into tuple of success and phonemes.

This Page