API documentation: exported types and functions

Types

Functions

Public functions implemented for all subtypes of OrthographicSystem.

Orthography.codepointsFunction

Delegate to specific functions based on type's orthography trait value.

codepoints(x)
source

It is an error to invoke the codepoints function on anything but an orthographic system.

codepoints(_, x)
source

Orthographic systems must implement codepoints.

codepoints(_, ortho)
source

Implement codepoints function for SimpleAscii.

codepoints(ortho)
source

Implement codepoints function for SimpleAscii.

codepoints(ortho)
source
Orthography.tokentypesFunction

Delegate to specific functions based on type's orthography trait value.

tokentypes(x)
source

It is an error to invoke the tokentypes function on anything but an orthographic system.

tokentypes(_, x)
source

Orthographic systems must implement tokentypes.

tokentypes(_, ortho, s)
source

Implement tokentypes function for SimpleAscii.

tokentypes(ortho)
source

Implement tokentypes function for WSTokenizer.

tokentypes(ortho)
source
Orthography.validcpFunction

True if ch appears in list of all valid characters (codepoints) for this orthography.

validcp(ch, ortho)

ch is a string possibly including more than one Julia Char but representing a single character in the orthographic system ortho.

source
Orthography.tokenizeFunction

Delegate to specific functions based on type's orthography trait value.

tokenize(s, x)
source

It is an error to invoke the tokenize function on anything but an orthographic system.

tokenize(_, s, x)
source

Orthographic systems must implement tokenize.

tokenize(_, s, ortho)
source

Tokenize citable node cn using the tokenizer of the given orthographic system.

tokenize(psg, ortho; edition, exemplar)

The return value is a list of pairings of a CitablePassage and a token category. The citable node is citable at the level of the token.

source

Tokenize corpus c using the tokenizer of the given orthographic system.

tokenize(c, ortho; edition, exemplar)

The return value is a list of pairings of a CitablePassage and a token category. The citable node is citable at the level of the token.

source

Tokenize document doc using the tokenizer of the given orthographic system.

tokenize(doc, ortho; edition, exemplar)

The return value is a list of pairings of a CitablePassage and a token category. The citable node is citable at the level of the token.

source

Implement tokenize function for SimpleAscii orthography.

tokenize(s, o)
source

Implement tokenize function for WSTokenizer orthography.

tokenize(s, o)
source

Working with text corpora:

Orthography.corpus_histoFunction

Create an ordered dictionary of text values for tokens in corpus c. Optional parameters let you filter the results to include only tokens of a specified type and normalize the text value of tokens before counting.

corpus_histo(c, ortho; filterby, normalizer)
source

Other utilities

Example implementation