Help (Lexical)

Unlike English, Greek is a highly inflected language. This means that a single word (lexeme) can have associated with it a plethora of word forms, not all of which necessarily look like each other. As of this writing, the TLG corpus is not lemmatized. (That is to say, the underlying lexemes have not been identified for each word form appearing in the corpus.) As a result, typing in text to search will not guarantee that you will retrieve all the instances of the word you might be looking for. This problem has always been the case for the TLG, and affects searches done with the TLG CD ROM as well as the web site.

The usual work-around is to type in the stem of the word being search for, as a prefix to which the various inflectional affixes are appended. For example, a search for ANQRWP will retrieve the various inflected forms of A)/NQRWPOS. However, the following provisos (which should be familiar to students of Greek) should be borne in mind:

  1. Greek has not only inflectional affixes, but also derivational affixes (forming new words), and word compounding. As a result, searching for ANQRWP will return not only inflections of A)/NQRWPOS (e.g. A)NQRW=POU, A)NQRW/POIS), but also words derived from A)/NQRWPOS (e.g. A)NQRWPAI=ON, A)NQRWPIKO/S), and compounds involving A)/NQRWPOS (e.g. A)NQRWPARE/SKEIA, A)NQRWPOBO/ROS).

    To eliminate these, you might try using wildcard search to set the maximum length of the word (e.g. ^ANQRWP..?.?$ requests ANQRWP- followed by one to three letters — though this will not eliminate such words as A)NQRW/PINH), or to specify the allowed suffixes (see the examples (1, 2) in the wildcard search help page.)

  2. Crasis is treated by our search engine as giving rise to a single word form. As a result, a prefix search for ANQRWP will not retrieve such instances of crasis as TA)NQRW/POU or KA)NQRW/POIS. Furthermore, an infix search for A)/NQRWP would still not retrieve such instances as W)/NQRWPE or A(/NQRWPOS, where the crasis affects the initial vowel.

    The bad news is that, even though relatively few words trigger crasis, a wildcard specification of all the alternatives would be prolix, and it might be more expedient to simply specify an infix search (e.g. NQRWP.) The good news is that crasis is relatively rare, particularly in prose, and is restricted to nominals beginning with vowels.

  3. Verbs have inflectional prefixes as well as inflectional suffixes (augment, reduplication), and often modify the end of their stem to give other tense forms. Thus, to retrieve all forms of GRA/FW, it is not enough to search for GRAF-; you will also have to search for EGRAF (Imperfect), GRAY (Future), EGRAY (Aorist), GEGRAMM GEGRAY GEGRAPT GEGRAFQ (Perfect), EGEGRAMM EGEGRAY EGEGRAPT EGEGRAFQ (Pluperfect), and GRAF GRAFQ (Aorist Passive). For EU)/XOMAI, you will need EUX (Present), HUX (Imperfect), HUC (Aorist), HUGM EUKT (Pluperfect), EUC (Future).

    Here too, you can only either specify all possible variants of the stem and augment (e.g. E?GRA(F|Y|PT|FQ|MM), [EH]U(X|C|GM|KT)), or cut down the stem in an infix search (-GRA-).

  4. This is not even counting irregular (suppletive). In such verbs, the different stems do not have anything in common. For example, to retrieve all forms of OI)=DA, you will also need to search for EID, EIS, ISM, IST, H|D, HSM, HST, OISQ, and ISASI.

    This is the kind of case in which lemmatization is most crucial; there is really no good workaround.

  5. Derivational prefixes will also be treated as distinct word forms, so a search for KAQARMA will not retrieve instances of PERIKAQARMA.

    Normally, an infix search will take care of these (wildcard for the index search, default for textual search.)

  6. The TLG has a small number of diplomatic editions of texts, in which the orthography has not been normalized. This concerns papyrological editions in particular, but also some apocrypha (e.g. Apocalypsis apocrypha Joannis (Versio altera)), and many astrological works (where the vernacular spellings also predate modern standardization.)

    Users can specify every possibly etacism and misspelling (e.g. S(H|I|U|OI|EI|UI)+N+(AI|E)R+G+(H|I|U|OI|EI|UI)+ will retrieve the second apocryphal Johannine Apocalypse's mispelling of SHNE)RGH/ for SUNERGEI=); but this is a law of diminishing returns, and some mispellings (SAMBA/TOU for SABBA/TOU) will be impossible to guess.

  7. The TLG encompasses a variety of variants of Greek, including Epic, Aeolic, Doric, Attic, Koine, Byzantine Atticist (including hypercorrections), and the mediaeval vernacular. All of these can lead to different inflections.

    Again, there is no general solution, though a stem search (possibly as infix) will take care of most cases.

Created: 2001-2-27
Last Modified: 2001-2-27
Authored by: Nick Nicholas
Maintained by
TLG® is a registered trademark of The Regents of the University of California.