Concise summary of typical XML usage
Our XML markup falls in 4 tiers:
- transcription level: markup we use to indicate the editorial status of our reading
- tokenization level: words, numbers, abbreviations, scribal multiforms, or other tokens that our parser should recognize as a single unit
- editorial disambiguation level: named entities
- discourse disambiguation: quotations and citable references. (NB: we don’t expect to find any examples in our work on the Aratus Latinus)
At each level, the following TEI elements are allowed:
Transcription level
Element | Meaning | Example | Comments |
---|---|---|---|
unclear | There are traces of a letter or letters, but not enough to be certain how to read them. | <unclear>et</unclear> | Some traces visible, you think the reading is et |
gap | Letters are missing or completely illegible due to damage | <gap/> | Note that gap is an empty element with no content, |
del | Text deleted by scribe (e.g., with underdots) | et <del>et</del> | The scribe deleted the second et |
add | Text added by script (eg, above line) | <add>et</add> | et was not part of the original text but was added by the scribe |
Tokenization level
Element | Meaning | Example | Comments | |
---|---|---|---|---|
num | One or more numeric characters. | <num value="7">VII</num> | Put the numeric value in the value attribute. Note that you should not tag number words like septem! | |
w | Wrap any word that is broken up by markup. | <w>stell<unclear>as</unclear></w> | The letters as are unclear, but we want our parser to recognize stellas as a single word. | |
choice containing abbr and expan | Text includes an abbreviation; you are including an expansion for it | <choice><abbr>dr</abbr><expan>dicitur</expan></choice> | The scribe writes dr (perhaps including a distinct mark signaling an abbreviation); you interpret it to mean dicitur | |
choice containing sic and corr | Scribe deletes original reading, inserts a correction | <choice><sic>medios</sic><corr>medias</corr></choice> | Scribe orginally wrote medios but changed the text to medias | |
choice containing orig and reg | Scribe offers an alternate second reading without deleting the original reading. | <choice><orig>eam</orig><reg>Helicem</reg></choice> | Scribe wrote eam in the main text, then added helicem as an alternative above it, but did not delete eam. |
Editorial disambiguation level
Named entities with @n
attribute with URN value:
Element | Meaning | Example | Comments |
---|---|---|---|
persName | Proper name of a person | <persName n="urn:cite2:hmt:persname.v1:pers1">Achilles</persName> | Include a URN value in the n attribute |
placeName | Proper name of a place (real or imaginary) | <placeName n="urn:cite2:hmt:place.v1:place1">Athenas</placeName> | Include a URN value in the n attribute |
Discourse level
While we’re unlikely to find quoted text in the Aratus Latinus, we include for reference these notes on how the Homer Multitext project encodes quoted text.
q
alone:- quotation, work not extant
- quoted example of language. Test: you would not translate this when reading the text, e.g. explaining the declension of a noun by using another common word as an example
cit
containingq
andref
: quotation of extant worktitle
with either CTS URN (extant work) or CITE2URNrs
withtype="waw"
quoted expression not parseable as a Greek word, e.g. “the letter σ”