Lemmatized searches (BETA version)
Last updated on June 3, 2010
- About the TLG Lemmatization project
Work on lemmatization began in 2003 and benefited from
access to software known as Morpheus developed by the Perseus Project.
Morpheus was designed to deal effectively with a relatively narrow,
well-documented cross section of the Greek language, i.e. the classical
canon, meaning Epic and Attic Greek with some Doric, Ionic, and Koine
forms. The TLG corpus encompasses the totality of Greek literature,
including Early Modern Greek, and Byzantine texts. As a result,
lemmatization of the TLG corpus required a different philosophy and a
significantly more complex architecture which combines lexical and
morphological databases and extensive programming in order to increase
parses and achieve higher and more accurate form recognition. This project
was executed largely thanks to the efforts of Nick Nicholas and Nishad Prakash. Cindy Moore and Zeya Myint have
contributed to the implementation of the system. The current version of the
TLG lemmatizer recognizes approximately 96.35% of the unique wordforms in the
The lemmatiser makes use of the following sources:
- H.G. Liddell, R. Scott, H.S. Jones & R. McKenzie. A Greek-English
Lexicon. Oxford: Clarendon Press. 1940.
- Danker, Frederick W. A Greek-English Lexicon of the New Testament and
Other Early Christian Literature. 3d ed. Based on Walter Bauer's
Griechisch-deutsches Worterbuch zu den Schriften des Neuen Testaments und
der fruhchristlichen Literatur, 6th ed. Chicago: University of Chicago
- G.W.H. Lampe. A Patristic Greek Lexicon. Oxford: Clarendon Press. 1968.
- E. Trapp. Lexikon zur byzantinischen Graezitaet. Vienna: Oesterreichische
Akademie der Wissenschaften. 1994-2006.
- W. Smith. Dictionary of Greek and Roman Geography. London: Walton &
- CATSS (Computer Assisted Tools for Septuagint Studies), R. Kraft
(director). Morphologically Analyzed Septuagint. 1988. Center for Computer
Analysis of Texts, University of Pennsylania. http://
- Helbing, R. 1979 . Grammatik der Septuaginta Laut- und
Wortlehre. 2nd ed. Göttingen: Vandenhoeck & Ruprecht.
- Blass, F. & Debrunner, A. 1961. A Greek grammar of the New
Testament and other early Christian Literature. Ed. Funk, R.W.
Chicago: University of Chicago Press.
- Gignac, F. T. 1981. A Grammar of the Greek papyri of the Roman
and Byzantine Periods. Vol 2: Morphology. Milan: Istituto Editoriale
Cisalpino, La Goliardica.
- Psaltes, S. 1913. Grammatik der byzantinischen
Chröniken. Göttingen: Vandenhoeck & Ruprecht.
- Reinhold, H. 1898. De graecitate patrum apostolicorum
librorumque apocryphorum Novi Testamenti quaestiones grammaticae.
Halle an der Saale: Max Niemeyer.
- Smyth, H. W. 1983. Greek Grammar. Cambridge (MA): Harvard
- R. Kühner and F. Blass, Ausführliche Grammatik der griechischen Sprache,
1.Teil: Elementar- und Formenlehre, 2 vols. (3rd edn., Hannover 1890-92)
- Glossary of terms and brief guide to using Lemmatized Searches
- Lemmatization is the process of organizing all wordforms present in a text so that all inflected and variant forms of the same word are grouped together under one lemma or headword.
- A Lemmatized Search allows the user to enter a lemma, i.e. the dictionary form of a word and retrieve all inflected forms linked to this lemma and present in the TLG corpus. For instance, a search for the verb FE/RW will produce all the forms of FE/RW as well as OI)/SW and H)/NEGKON.
The advantage of using lemmatized searches is that the user can retrieve in one search a large number of forms that might otherwise require multiple string searches. One disadvantage maybe that the current TLG lemmatizer recognizes approximately 92% of the wordforms, therefore, some forms may not be recognized or maybe recognized incorrectly. Due to morphological ambiguity, the engine may also retrieve forms that could be part of the requested lemma but may actually belong to a similar lemma.
- Simple vs. Advanced Lemmatized Searches
These two pages are designed to mirror the Simple and Advanced pages in the online TLG. Instead of entering a string of characters The Simple Lemmatized Search page allows you to enter a lemma. The Advanced Page allows you to search for up to three lemmata in context.
- Search for a Lemma Substring.
If you don't know the exact lemma you are looking for, the search engine will try to match your string to its database of available lemmata. You will be presented with a list of possible lemmata to choose from (up to three lemmata maybe selected). This type of search is recommended when searching for compound verbs.
- Grammatical Categories
This feature allows you to specify the grammatical category you wish to search for and, therefore, limit your search results to one or more grammatical categories.
- Lower Confidence Forms are wordforms recognized with a lower degree of certainly. In most cases, they are guesses and may or may not belong to the requested lemma. The user has the option to include or exclude them from the search.
- TLG Links vs Perseus Links
The sidebar offers a number of choices including links to further lexicographical and morphological resources. TLG links is the default setting for Lemmatized Searches--no links is the default setting for Simple and Advanced pages. When TLG Links are selected all wordforms in the TLG corpus appear as hyperlinks. Once a form is clicked, the user is presented with the TLG Dictionary page which provides information about the lemma, the morphological analysis of the selected wordform and, when available, links to dictionary meanings from several lexica. User may choose Perseus Links instead. In this case, words will be linked to the Perseus morphological and lexicographical resources.
TLG® is a registered trademark of The Regents of
the University of California.