Lemmatized searches (BETA version)

Last updated on June 3, 2010

  1. About the TLG Lemmatization project

    Work on lemmatization began in 2003 and benefited from access to software known as Morpheus developed by the Perseus Project.

    Morpheus was designed to deal effectively with a relatively narrow, well-documented cross section of the Greek language, i.e. the classical canon, meaning Epic and Attic Greek with some Doric, Ionic, and Koine forms. The TLG corpus encompasses the totality of Greek literature, including Early Modern Greek, and Byzantine texts. As a result, lemmatization of the TLG corpus required a different philosophy and a significantly more complex architecture which combines lexical and morphological databases and extensive programming in order to increase parses and achieve higher and more accurate form recognition. This project was executed largely thanks to the efforts of Nick Nicholas and Nishad Prakash. Cindy Moore and Zeya Myint have contributed to the implementation of the system. The current version of the TLG lemmatizer recognizes approximately 96.35% of the unique wordforms in the TLG corpus.

  2. Resources

    The lemmatiser makes use of the following sources:

  3. Glossary of terms and brief guide to using Lemmatized Searches

TLG® is a registered trademark of The Regents of the University of California.