Advanced Search

Greek examples are given in Beta code, to prevent forcing a particular encoding on different browsers.

Index versus Non-Index (=Textual) Searches:

Both the Simple and the Advanced Search forms offer two types of search. The first is the Word Index Search, which uses a precompiled index of all full and partial Greek words in the corpus. The second is a full text (textual) search, in which the individual texts are searched, letter by letter, to find an instance of the word or string.

In terms of performance, an indexed-based search is almost always preferable.

  1. It is consistently faster because it allows instantaneous access to just those texts (and those passages within the texts) where the sought word is contained.
  2. It is more accurate. The program used to generate the word index is not subject to the time constraints of an on-line search; therefore some pitfalls in identifying words and word fragments can be avoided. There are some cases (admittedly extremely infrequent) where the on-line full text search may return false results; these are discussed below. More conspicuously, the word index has a sophisticated means of detecting fragmentary words, which is necessarily absent from the on-line full text search. Thus, the word index knows to distinguish between !AN, AN! and !AN! in most cases; the full text search cannot discern a fragmentary word terminus other than a missing letter, and at best will search for all instances of AN anywhere in a word.
  3. It allows you to know ahead of time the number of possible search results. A textual search will not know how many matches it will find until it is completed; so it will not give the heading with the total matches expected that index searches display.

There are some instances in which a Textual search is preferrable.

  1. For a small enough search corpus --- including any individual works or authors --- a full text search will be complete within a few seconds, and avoids the overhead of having to select individual word forms out of the word index. For instance, a full text search can find all 67 instances of FER- in Hesiod in a few seconds. An index search cannot in fact perform the search accurately, because there are no less than 758 corresponding word forms, and the search will stop after the first 500. Even if it were accepted, the user would still likely end up having to redundantly select word forms out of the list.
  2. Textual searches allow phrasal searches. These are not supported in the index search, which only searches for individual words --- although many such searches can be emulated with proximity searches.
  3. Textual searches allow sophisticated wildcard searches --- though these are substantially slower than normal full text searches, and should be conducted on small corpora.
  4. Textual searches allow searches for material other than Greek text --- including Roman script text and Beta escapes.


Search for:

For index searches, the specification of searches has already been discussed. For full-text searches, the user specifies not a complete word or prefix, but an arbitrary string to be matched in the text. This may be a word prefix, a suffix, a complete word, or a series of words. The search engine does not query which complete words in the index match the specification, but searches directly for the search string in the texts.

The word delimiter in full text searches is space. Unlike Index searches, full text searches require an initial space if the beginning of a word is indicated. Thus,

In a normal search, beta escapes are ignored (but see With non-alphabetic characters.) Thus, a search for "BW" will return such instances as LA/B[W], LA/B<6W>6 (= LA/BW), LA/B%17W (=LA/B||W) and LA/B?W. However, the missing letter dot is not ignored: LA/B!W is not considered to contain an instance of BW.

Hyphens (and any line content after the hyphen) are also ignored. Thus, a search for OI)/KWN will return the following, where the word has been interrupted by a marginal note of an antistrophe:

TA\N D' E)XQRA\N STA/SIN EI)=RG' A)P' OI)/- {2A)NT.}2
KWN TA\N MAINOME/NAN T' E)/RIN

The search algorithm used for non-wildcard searches scans text in a single pass. This does not affect normal text input, as the search string is preprocessed. Thus, the search engine will have no difficulty recognizing PER in PEPERASME/NOS, despite the 'false start' in the initial PE: it realizes that the second P could still be the start of the search string. The search engine thus does not have to backtrack, i.e. restart each search from the next letter onwards (e.g. check PEPERASME/NOS, then EPERASME/NOS, then PERASME/NOS...)

There are infrequent instances where the failure to backtrack can interfere with the search engine retrieving a valid result. All these instances involve pathological inputs virtually impossible in real text.

For example, the preprocessing to eliminate backtracking would impose an inordinate performance penalty if applied to any text interrupting a hyphenated word, and has not been attempted. Thus, the search engine will not recognize an instance of ANT in the following:

KAI\ A)N- {2A)NT.}2
U/POPTOS W)=N...
This is because the search engine, in resolving A)N-U/POPTOS, suspects by the time the hyphen is reached that A)N- could be followed on the next line by T (e.g. A)N-TI/QETOS); so it completely skips over the text after the hyphen, and proceeds directly to the upsilon in the next line. So if text after a hyphen shares a common prefix with text just before a hyphen, the instance after the hyphen won't be found. The problem would not occur for, say:
KAI\ FIL- {2A)NT.}2
U/POPTOS W)=N...
Still, in practice, if you are searching for text likely to interrupt a hyphenated word (overwhelmingly, this only occurs with STR. and A)NT.), you should instead use either an index search, or the (slower) wildcard search option, which does multiple passes over the same text, and would not miss the instance after the hyphen.


Diacritics-sensitive Textual Searches:

If this option is specified, the search is sensitive to Greek diacritics, which are otherwise ignored in the search string. In contrast to word index searches, where the accentuation of the word forms is already normalized, full text searches needs such normalization to take place on the fly. The following conventions are applied:

Diacritic-sensitive text searches disallow a search ending in a non-diacritic from matching a text where the next character is a diacritic. For example, ARA is a substring of PARA/GW, but a normal diacritic-sensitive search detects the acute following the second alpha, and rules this out as a succesful result. This qualification does not apply if the search includes non-alphabetic characters or wildcard expressions.


Case sensitive:

If this option is specified, the search is sensitive to case in Greek. Since case is not stored in the word index, this option applies only to full text searches. For example, a case-sensitive search for W(/S will not return instances of *(/WS.

Remember that, when entering text in Beta Code, case is indicated by a preceding asterisk, and not by actual case: both WS and ws are lowercase, but *W*S and *w*s are uppercase.

If your search is both case- and diacritics-sensitive, bear in mind that in Beta Code capital letters bearing breathing marks have their diacritics before the letter (e.g. *)/|A: Capital alpha with smooth breathing, acute and iota subscript/adscript), while capital letters bearing accent, diaeresis and/or iota subscript/adscript but no breathing mark have their diacritics after the letter: *A/ (capital alpha with acute), *A| (capital alpha with iota adscript/subscript), *I+ (capital iota with diaeresis). The latter types of capitals do not represent conventional Classical orthography, but the last two are normal in Modern Greek, and all three are frequent in nineteenth-century editions. Instances of such strings in the corpus include the following:

A search for such strings sensitive to diacritics but not case will find these instances normally.

In infrequent (and pathological cases), backtracking will fail for case- and diacritic-sensitive searches. This will occur if a search string contains both a capital and small-case version of the same letter with diacritics (e.g. *(/EN E(/N.) If a text occurs where the prefix before the small-case version occurs twice (e.g. *(/EN . *(/EN E(/N), the search engine will reject the second *(/EN as a possible instance of the lowercase E(/N, but fail to realise that it can itself begin a valid instance of the search string. Needless to say, such a configuration should not eventuate in any natural texts, and can always be obviated with a wildcard search (which enforces backtracking).


Treat adscript as subscript:

The TLG corpus has not in general resolved adscript iota into subscript iota. Adscript iota turns up routinely in the following contexts:

Depending on the edition, the Greek for 'Hades' may thus appear as *)/A|DHS, *)/AIDHS, or *)/A*IDHS in title case, and as *A|*D*H*S, *AI*D*H*S, or *A*I*D*H*S in upper case.

If the adscript option is specified, the search treats unambiguous instances of iota adscript identically to iota subscripts. What counts as an unambiguous adscript depends on the search modes; this option makes the following interpretations:

is treated as equivalent to: ignore diacritics accents only iota subsc. only accents & iota subsc. breathing & iota subsc w/ non-alphabetic
HI H H H| H| H| HI
H/I H H/ H| H/| H| H/I
WI W W W| W| W| WI
W/I W W/ W| W/| W| W/I
AI AI AI AI AI AI AI
A/I AI A/I AI A/| AI A/I
A)I AI AI AI AI A)| A)I

In other words, all instances of I after H and W are treated as adscripts equivalent to iota subscripts; all instances of I after A are treated as adscripts only if the alpha has a diacritic, and the search is sensitive to that diacritic. (So A)I is equivalent to A)|, but AI) is a short diphthong.) When the search is for non-alphabetic characters, and the Beta Code is treated literally, this interpretation is ignored.

To illustrate, contrast the following retrievals from the word index for adscript mode on and off: (_ stands for space)

Search expression Diacritic sensitivity Ignoring adscripts With adscripts
ADH_ Ignore diacritics
  • ADH
  • A)/DH
  • A)/|DH
  • A)/|DH|
  • A)DH=
  • A(/DH
  • A(/DH|
  • A(/|DH
  • A(/|DH|
  • A(DH/
  • ADH
  • A)/DH
  • A)/|DH
  • A)/|DH|
  • A)DH=
  • A(/DH
  • A(/DH|
  • A(/|DH
  • A(/|DH|
  • A(DH/
  • A(/DHI
A/DH_ Accents only
  • A)/DH
  • A)/|DH
  • A)/|DH|
  • A(/DH
  • A(/DH|
  • A(/|DH
  • A(/|DH|
  • A)/DH
  • A)/|DH
  • A)/|DH|
  • A(/DH
  • A(/DH|
  • A(/|DH
  • A(/|DH|
  • A(/DHI
  • A(/IDH
  • A(/IDH|
  • A(/IDHI
A|DH|_ Iota subsc. only
  • A)/|DH|
  • A(/|DH|
  • A)/|DH|
  • A(/|DH|
A/|DH|_ Accents & iota subsc.
  • A)/|DH|
  • A(/|DH|
  • A)/|DH|
  • A(/|DH|
  • A(/IDH|
  • A(/IDHI

This search will have some false matches, mostly in WI (it will treat TRWIKO/S as TRW|KO/S), and it will miss unaccented instances of long diphthong AI (e.g. the capitalized *Q*R*AI*K*H in the Periplus Scylacis). The appropriate caution should be exercised with search results.


Citations Only Results:

The normal maximum number of search results displayed per page is 100. However, by specifying the number of lines of context per search as 0 --- namely, requesting only the citations of the search results, and not the text itself --- users can request up to 1000 results per page. Such citations are displayed as follows:

1. {0019.001}. Aristophanes Comic., Acharnenses. Line 42.

2. {0019.001}. Aristophanes Comic., Acharnenses. Line 367.

3. {0019.001}. Aristophanes Comic., Acharnenses. Line 373.


With non-alphabetic characters:

If this option is specified, the text is searched not for Greek words, but for any sequence of characters matching the search string precisely. Rather than ignoring beta escapes, this option allows the user to search for beta escapes, whether by themselves or in conjunction with Greek word searches. For example, the user can search for all beginnings of orations (~y):

1. Michael Psellus Epist., Phil., Polyhist., Rhet. et Scr. Rerum Nat., Orationes panegyricae. {2702.006}. Oration 2 line 1t.

RION, KA)\N TAU/THN KATIOU=SAN XWRH/SH|S EI)S DU/NAMIN, AU)TO/S TE
QEO\S GE/NOIO KAI\ H(MA=S E)CERGA/SAIO.
 (385)
(2.) 

*LO/GOS EI)S TO\N BASILE/A TO\N *MONOMA/XON. (1t)

2. Michael Psellus Epist., Phil., Polyhist., Rhet. et Scr. Rerum Nat., Orationes panegyricae. {2702.006}. Oration 3 line 1t.

KAI\ PROSQEI/H TOI=S E)/TESI, KAI\ TH\N E)KEI=QEN BASILEI/AN XARI/-
SAITO TH\N O)/NTWS U(YHLH/N TE KAI\ A)KATA/LUTON.
@1
(3.) 

*TW=| AU)TW=| BASILEI= (1t)

3. Michael Psellus Epist., Phil., Polyhist., Rhet. et Scr. Rerum Nat., Orationes panegyricae. {2702.006}. Oration 4 line 1t.

E)PAGALLO/MENOS KAI\ TW=| A)KHRA/TW| STE/FEI TH=S FILOSOFI/AS
KATASTEFO/MENOS.
@1
(4.) 

*(/ETEROS LO/GOS PRO\S TO\N  (1t)

all instances of asteriskos (#13):

1. Scholia in Aristophanem, Scholia in nubes (scholia recentiora Eustathii, Thomae Magistri et Triclinii). {5014.005}. Argumentum-dramatis personae-scholion sch th-tr nub verse 263 line 1.

sch th-tr nub.
(263e.) {2&Th2Tr1/2$}2 TH=S EU)XH=S] H(\N AU)TO\S EU)/COMAI.
(263.) {2&Tr2}2 #13 Vat solus; add.$ E(TERO/STROFA.
(264a.) {2&Th2Tr1/2$}2 A)ME/TRHT'] A)/PEIRE.

2. Scholia in Aristophanem, Scholia in nubes (scholia recentiora Eustathii, Thomae Magistri et Triclinii). {5014.005}. Argumentum-dramatis personae-scholion sch th-tr nub verse 276 line 1.

sch th-tr nub.
(276b.) {2&Th2Tr1/2$}2 FANERAI/] POTAPAI/.
(276.) {2&Tr2}2 #13 Vat solus (post$ A)RQW=MEN FANERAI\, &sc. primi vs. alt. colon).
(277a.) {2&Th2Tr1/2$}2 DROSERA\N] U(DATW/DH.

3. Scholia in Aristophanem, Scholia in nubes (scholia recentiora Eustathii, Thomae Magistri et Triclinii). {5014.005}. Argumentum-dramatis personae-scholion sch th-tr nub verse 299bis line 1.

sch th-tr nub.
(299.) {2&Th2Tr1/2$}2 LIPARA\N [6EI)S TH\N &Tr2$ EU)/GAION.
(299bis.) {2&Tr2}2 #13 Vat solus (post primi vs. alt. col.$ E)/LQWMEN LIPARA\N).
(300a.) {2&Th2Tr1/2$}2 [6E)S &Th2$ XQO/NA] TH\N *)ATTIKH/N.

or all marginalia ({2):

1. Scholia in Theocritum, Scholia in Theocritum. {5038.001}. Prolegomenon-anecdote-poem proleg section-verse A line t.

&Prolegomena$
(A.) 
*GE/NOS *QEOKRI/TOU {2
&KEbAPT$}2  (t)
(A a.) 
*QEO/KRITOS O( TW=N BOUKOLIKW=N POIHTH\S *SURAKOU/SIOS H)=N

2. Scholia in Theocritum, Scholia in Theocritum. {5038.001}. Prolegomenon-anecdote-poem proleg section-verse A b line 1.

MA/SQH. @1
(A b.) {2&GEbPT$}2 I)STE/ON, O(/TI O( *QEO/KRITOS E)GE/NETO I)SO/XRONOS TOU= TE *)ARA/TOU
KAI\ TOU= *KALLIMA/XOU KAI\ TOU= *NIKA/NDROU: E)GE/NETO DE\ E)PI\ TW=N

3. Scholia in Theocritum, Scholia in Theocritum. {5038.001}. Prolegomenon-anecdote-poem proleg section-verse B line t.

XRO/NWN *PTOLEMAI/OU TOU= *FILADE/LFOU.
(B.) {2&KGEbAT$}2

*EU(/RESIS TW=N BOUKOLIKW=N  (t)
(B a.) *TA\ BOUKOLIKA/ FASIN E ) N   * L A K E D A I M O N I / A |   EU(REQH=NAI

Similarly, the user can search for runs of Roman script in text. The following is a search for Hesychian citations (Hesych) in Mette's Fragmenta of Aeschylus:

1. Aeschylus Trag. Atheniensis, Fragmenta (Mette). {0085.008}. Tetralogy 1 play D fragment 7 line 1.

    {<*P*R*W*T.?>} 'A)MH/XANON TEU/XHMA KAI\ DUSE/KLUTON'.
(7.)   &Hesych. Lex. $*A& 1357 L. (aus Diogenian.): $'A)/E{L}PTOI': DEINOI/, KAI\
'A)/APTOI'. * A I ) S X U / L O S   * P R W T E I = . 

2. Aeschylus Trag. Atheniensis, Fragmenta (Mette). {0085.008}. Tetralogy 1 play D fragment 8a line 1.

'A)/APTOI'. * A I ) S X U / L O S   * P R W T E I = . 
(8a.)   &Hesych. Lex. $*A& 3404 L. (aus Diogenian.) %106 Etym. Genuin. p. 26,
&11 Mi. [Etym. Magn. p. 75, 22 Gaisf.]: $'A)MA/DA': TH\N NAU=N, A)PO\ TOU= 'A)MA=N'

3. Aeschylus Trag. Atheniensis, Fragmenta (Mette). {0085.008}. Tetralogy 1 play D fragment 9 line 1.

TH\N QA/LASSAN. H( LE/CIS P A R '   * A I ) S X U / L W I .   
(9.)   &Hesych. Lex. II 137, 21 Schm. (aus Diogenian.): $'E)PA/SW': E)KTH/SW.
* A I ) S X U / L O S   * P R W T E I =   S A T U R I K W = I .   
@1

Because this mode of searching is literal, you will need to search for any Greek text in capitals, rather than lowercase: ARGO rather than argo. Furthermore, you wil need to include any diacritics or capital asterisks in your search: *)/ARGO rather than ARGO.


Display only raw beta escapes:

If this option is specified, no beta escape is resolved into HTML-based formatting: all beta escapes (including citations) are displayed as raw beta codes. Compare the following with the instances given in Suppress raw beta escapes:

&nehmer an den von Sisyphos aus Anlab der Anschwemmung des toten
&Melikertes zu Ehren des Poseidon gestifteten Isthmischen Spielen]1. @1
~yz"1n"
@&Pap. Ox. 2250
~y"16a"
&10[1oberer Rand]1$8
~z1
{4%43?}4 A)/]GE DH/, BASILEU=, #74 [%40 %40 %41 %40 %40 %41
KAI\ CU/MPASAN #74 M[%40 %40 %41 %40 %40 %41
TOU= BAQUPLOU/TO[U #74 %40 %40 %41 E)/CW
P
?ENI/AS NAI/WN #74 K[AI\ %41 %40 %40 %41
PAL?]I?KH\N SKH/PTR?[WI #74 %40 %40 %41 %40 %40 %41,
DE/C
]A?[I?] ME?[!!!] #74 F?I?[LI/AS XW/RAS

This option is incompatible with Suppress raw beta escapes.


Order by date:

In either an Index or a Textual search, results are normally ordered by TLG author number; for example, results from Euripides (0006) will precede results from Homer (0012). Specifying chronological ordering forces the results to be displayed instead in ascending chronological order, so that results from the earliest authors (following the TLG sorting order for dates) are displayed first. This allows the earliest instance of a word to be identified. Since all results have to be retrieved and sorted before they can be displayed, the search will be considerably slower. Textual searches in particular may be too slow.

Created:Feb. 14, 2000
Last Modified: March12, 2009
Maintained by tlg-support@uci.edu
TLG® is a registered trademark of The Regents of the University of California.