Both the Simple and the Advanced Search forms offer two types of search. The first is the Word Index Search, which uses a precompiled index of all full and partial Greek words in the corpus. The second is a full text (textual) search, in which the individual texts are searched, letter by letter, to find an instance of the word or string.
In terms of performance, an indexed-based search is almost always preferable.
!AN, AN! and !AN! in most cases; the full text search cannot discern a fragmentary word terminus other than a missing letter, and at best will search for all instances of AN anywhere in a word.
There are some instances in which a Textual search is preferrable.
FER- in Hesiod in a few seconds. An index search cannot in fact perform the search accurately, because there are no less than 758 corresponding word forms, and the search will stop after the first 500. Even if it were accepted, the user would still likely end up having to redundantly select word forms out of the list.
For index searches, the specification of searches has already been discussed. For full-text searches, the user specifies not a complete word or prefix, but an arbitrary string to be matched in the text. This may be a word prefix, a suffix, a complete word, or a series of words. The search engine does not query which complete words in the index match the specification, but searches directly for the search string in the texts.
The word delimiter in full text searches is space. Unlike Index searches, full text searches require an initial space if the beginning of a word is indicated. Thus,
"BW" will return any word containing BW, such as BWMO/S, LA/BW, and LA/BWSIN.
" BW" will return any word starting with BW, such as BWMO/S.
"BW " will return any word ending with BW, such as LA/BW.
" PRO/S " will return any instances of the complete word PRO/S;
it will not return affixes like PRO/SWPON or E)MPRO/S.
" PRO\S TA\S " will return any instances of the phrase PRO/S TA/S, with each word complete; this includes instances like:
E)XW/REI PRO\S TA\S *)E)PIPOLA/S
E)XW/REI PRO\S_TA\S *)E)PIPOLA/S (=E)XW/REI PRO\S --- TA\S *)E)PIPOLA/S)
E)XW/REI PRO\S. TA\S *)E)PIPOLA/S.
In a normal search, beta escapes are ignored (but see With non-alphabetic characters.) Thus, a search for "BW" will return such instances as LA/B[W], LA/B<6W>6 (= LA/BW), LA/B%17W (=LA/B||W) and LA/B?W. However, the missing letter dot is not ignored: LA/B!W is not considered to contain an instance of BW.
Hyphens (and any line content after the hyphen) are also ignored. Thus, a search for OI)/KWN will return the following, where the word has been interrupted by a marginal note of an antistrophe:
TA\N D' E)XQRA\N STA/SIN EI)=RG' A)P' OI)/- {2A)NT.}2
KWN TA\N MAINOME/NAN T' E)/RIN
The search algorithm used for non-wildcard searches scans text in a single pass. This does not affect normal text input, as the search string is preprocessed. Thus, the search engine will have no difficulty recognizing PER in PEPERASME/NOS, despite the 'false start' in the initial PE: it realizes that the second P could still be the start of the search string. The search engine thus does not have to backtrack, i.e. restart each search from the next letter onwards (e.g. check PEPERASME/NOS, then EPERASME/NOS, then PERASME/NOS...)
There are infrequent instances where the failure to backtrack can interfere with the search engine retrieving a valid result. All these instances involve pathological inputs virtually impossible in real text.
For example, the preprocessing to eliminate backtracking
would impose an inordinate performance penalty if applied to any text interrupting a hyphenated word, and has not been attempted. Thus, the search engine will not recognize an instance of ANT in the following:
KAI\ A)N- {2A)NT.}2
This is because the search engine, in resolving
U/POPTOS W)=N...
A)N-U/POPTOS, suspects by the time the hyphen is reached that A)N- could be followed on the next line by T (e.g. A)N-TI/QETOS); so it completely skips over the text after the hyphen, and proceeds directly to the upsilon in the next line. So if text after a hyphen shares a common prefix with text just before a hyphen, the instance after the hyphen won't be found. The problem would not occur for, say:
KAI\ FIL- {2A)NT.}2
Still, in practice, if you are searching for text likely to interrupt a hyphenated word (overwhelmingly, this only occurs with
U/POPTOS W)=N...
STR. and A)NT.), you should instead use either an index search, or the (slower) wildcard search option, which does multiple passes over the same text, and would not miss the instance after the hyphen.
If this option is specified, the search is sensitive to Greek diacritics, which are otherwise ignored in the search string. In contrast to word index searches, where the accentuation of the word forms is already normalized, full text searches needs such normalization to take place on the fly. The following conventions are applied:
KALO/S returns instances of both KALO/S and KALO\S.
KALO\S returns only instances of KALO\S, and not KALO/S.
A)/NQRWPOS returns instances of both A)/NQRWPOS and A)/NQRWPO/S.
A)/RRWST returns instances of both A)/RRWST and A)/R)R(WST.
XW)S returns instances of both XW)S and XW(S.
Diacritic-sensitive text searches disallow a search ending in a non-diacritic from matching a text where the next character is a diacritic. For example, ARA is a substring of PARA/GW, but a normal diacritic-sensitive search detects the acute following the second alpha, and rules this out as a succesful result. This qualification does not apply if the search includes non-alphabetic characters or wildcard expressions.
If this option is specified, the search is sensitive to case in Greek. Since case is not stored in the word index, this option applies only to full text searches. For example, a case-sensitive search for W(/S will not return instances of *(/WS.
Remember that, when entering text in Beta Code, case is indicated by a preceding asterisk, and not by actual case: bothWSandwsare lowercase, but*W*Sand*w*sare uppercase.
If your search is both case- and diacritics-sensitive, bear in mind that in Beta Code capital letters bearing breathing marks have their diacritics before the letter (e.g. *)/|A: Capital alpha with smooth breathing, acute and iota subscript/adscript), while capital letters bearing accent, diaeresis and/or iota subscript/adscript but no breathing mark have their diacritics after the letter: *A/ (capital alpha with acute), *A| (capital alpha with iota adscript/subscript), *I+ (capital iota with diaeresis). The latter types of capitals do not represent conventional Classical orthography, but the last two are normal in Modern Greek, and all three are frequent in nineteenth-century editions. Instances of such strings in the corpus include the following:
1. Mithridatis Epistula, Epistula. {0039.001}. Section t line 1.
(t.)
*M*I*Q*R*I*D*A*T*H*S *B*A*S*I*L*E*U*S *M*I*Q*R*I*D*A*T*H| *T*W|(1.) *TA\S *BROU/TOU E)QAU/MASA POLLA/KIS E)PISTOLA/S, OU) MO/-
*A*N*E*Y*I*W| *X*A*I*R*E*I*N.
1. Anonymi Grammatici Gramm., Supplementa artis Dionysianae vetusta. {0072.001}. Part 1 volume 1 page 121 line 10.
* ( H R A K L E I / D H S3.
@6 (9)
*P*E*R*I\ *T*O*U= *(H*R*W*I+*K*O*U= *M*E/*T*R*O*U (10)*TO\ H(RWI+KO\N ME/TRON E(CA/METRO/N E)S3TIN: E(\C GA\R XW/RAS3 E)/XEI. TA\S3 ME\N OU)=N PE/NTE @1
In infrequent (and pathological cases), backtracking will fail for case- and diacritic-sensitive searches. This will occur if a search string contains both a capital and small-case version of the same letter with diacritics (e.g. *(/EN E(/N.) If a text occurs where the prefix before the small-case version occurs twice (e.g. *(/EN . *(/EN E(/N), the search engine will reject the second *(/EN as a possible instance of the lowercase E(/N, but fail to realise that it can itself begin a valid instance of the search string. Needless to say, such a configuration should not eventuate in any natural texts, and can always be obviated with a wildcard search (which enforces backtracking).
The TLG corpus has not in general resolved adscript iota into subscript iota. Adscript iota turns up routinely in the following contexts:
In accented capital letters in the dominant Western typographical tradition. However, a subscript iota under the capital can appear in older Western editions, and in many texts published in Greece (particularly of an ecclesiastic nature.)
In diplomatic editions of pre-Byzantine texts -- namely epigrahical and papyrological texts.
*)/A|DHS, *)/AIDHS, or *)/A*IDHS in title case, and as *A|*D*H*S, *AI*D*H*S, or *A*I*D*H*S in upper case.
If the adscript option is specified, the search treats unambiguous instances of iota adscript identically to iota subscripts. What counts as an unambiguous adscript depends on the search modes; this option makes the following interpretations:
| is treated as equivalent to: | ignore diacritics | accents only | iota subsc. only | accents & iota subsc. | breathing & iota subsc | w/ non-alphabetic |
|---|---|---|---|---|---|---|
HI |
H
| H
| H|
| H|
| H|
| HI
|
H/I |
H
| H/
| H|
| H/|
| H|
| H/I
|
WI |
W
| W
| W|
| W|
| W|
| WI
|
W/I |
W
| W/
| W|
| W/|
| W|
| W/I
|
AI |
AI
| AI
| AI
| AI
| AI
| AI
|
A/I |
AI
| A/I
| AI
| A/|
| AI
| A/I
|
A)I |
AI
| AI
| AI
| AI
| A)|
| A)I
|
In other words, all instances of I after H and W are treated as adscripts equivalent to iota subscripts; all instances of I after A are treated as adscripts only if the alpha has a diacritic, and the search is sensitive to that diacritic. (So A)I is equivalent to A)|, but AI) is a short diphthong.) When the search is for non-alphabetic characters, and the Beta Code is treated literally, this interpretation is ignored.
To illustrate, contrast the following retrievals from the word index for adscript mode on and off: (_ stands for space)
| Search expression | Diacritic sensitivity | Ignoring adscripts | With adscripts |
|---|---|---|---|
ADH_ |
Ignore diacritics |
|
|
A/DH_ |
Accents only |
|
|
A|DH|_ |
Iota subsc. only |
|
|
A/|DH|_ |
Accents & iota subsc. |
|
|
This search will have some false matches, mostly in WI (it will treat TRWIKO/S as TRW|KO/S), and it will miss unaccented instances of long diphthong AI (e.g. the capitalized *Q*R*AI*K*H in the Periplus Scylacis). The appropriate caution should be exercised with search results.
The normal maximum number of search results displayed per page is 100. However, by specifying the number of lines of context per search as 0 --- namely, requesting only the citations of the search results, and not the text itself --- users can request up to 1000 results per page. Such citations are displayed as follows:
1. {0019.001}. Aristophanes Comic., Acharnenses. Line 42.
2. {0019.001}. Aristophanes Comic., Acharnenses. Line 367.
3. {0019.001}. Aristophanes Comic., Acharnenses. Line 373.
If this option is specified, the text is searched not for Greek words, but for any sequence of characters matching the search string precisely. Rather than ignoring beta escapes, this option allows the user to search for beta escapes, whether by themselves or in conjunction with Greek word searches. For example, the user can search for all beginnings of orations (~y):
all instances of asteriskos (1. Michael Psellus Epist., Phil., Polyhist., Rhet. et Scr. Rerum Nat., Orationes panegyricae. {2702.006}. Oration 2 line 1t.
RION, KA)\N TAU/THN KATIOU=SAN XWRH/SH|S EI)S DU/NAMIN, AU)TO/S TE
QEO\S GE/NOIO KAI\ H(MA=S E)CERGA/SAIO. (385)
(2.)*LO/GOS EI)S TO\N BASILE/A TO\N *MONOMA/XON. (1t)
2. Michael Psellus Epist., Phil., Polyhist., Rhet. et Scr. Rerum Nat., Orationes panegyricae. {2702.006}. Oration 3 line 1t.
KAI\ PROSQEI/H TOI=S E)/TESI, KAI\ TH\N E)KEI=QEN BASILEI/AN XARI/-
SAITO TH\N O)/NTWS U(YHLH/N TE KAI\ A)KATA/LUTON. @1
(3.)*TW=| AU)TW=| BASILEI= (1t)
3. Michael Psellus Epist., Phil., Polyhist., Rhet. et Scr. Rerum Nat., Orationes panegyricae. {2702.006}. Oration 4 line 1t.
E)PAGALLO/MENOS KAI\ TW=| A)KHRA/TW| STE/FEI TH=S FILOSOFI/AS
KATASTEFO/MENOS. @1
(4.)*(/ETEROS LO/GOS PRO\S TO\N (1t)
#13):
or all marginalia (1. Scholia in Aristophanem, Scholia in nubes (scholia recentiora Eustathii, Thomae Magistri et Triclinii). {5014.005}. Argumentum-dramatis personae-scholion sch th-tr nub verse 263 line 1.
sch th-tr nub.
(263e.) {2&Th2Tr1/2$}2 TH=S EU)XH=S] H(\N AU)TO\S EU)/COMAI.
(263.) {2&Tr2}2 #13 Vat solus; add.$ E(TERO/STROFA.
(264a.) {2&Th2Tr1/2$}2 A)ME/TRHT'] A)/PEIRE.
2. Scholia in Aristophanem, Scholia in nubes (scholia recentiora Eustathii, Thomae Magistri et Triclinii). {5014.005}. Argumentum-dramatis personae-scholion sch th-tr nub verse 276 line 1.
sch th-tr nub.
(276b.) {2&Th2Tr1/2$}2 FANERAI/] POTAPAI/.
(276.) {2&Tr2}2 #13 Vat solus (post$ A)RQW=MEN FANERAI\, &sc. primi vs. alt. colon).
(277a.) {2&Th2Tr1/2$}2 DROSERA\N] U(DATW/DH.
3. Scholia in Aristophanem, Scholia in nubes (scholia recentiora Eustathii, Thomae Magistri et Triclinii). {5014.005}. Argumentum-dramatis personae-scholion sch th-tr nub verse 299bis line 1.
sch th-tr nub.
(299.) {2&Th2Tr1/2$}2 LIPARA\N [6EI)S TH\N &Tr2$ EU)/GAION.
(299bis.) {2&Tr2}2 #13 Vat solus (post primi vs. alt. col.$ E)/LQWMEN LIPARA\N).
(300a.) {2&Th2Tr1/2$}2 [6E)S &Th2$ XQO/NA] TH\N *)ATTIKH/N.
{2):
1. Scholia in Theocritum, Scholia in Theocritum. {5038.001}. Prolegomenon-anecdote-poem proleg section-verse A line t.
&Prolegomena$(A.)
*GE/NOS *QEOKRI/TOU {2&KEbAPT$}2 (t)
(A a.) *QEO/KRITOS O( TW=N BOUKOLIKW=N POIHTH\S *SURAKOU/SIOS H)=N
2. Scholia in Theocritum, Scholia in Theocritum. {5038.001}. Prolegomenon-anecdote-poem proleg section-verse A b line 1.
MA/SQH. @1
(A b.) {2&GEbPT$}2 I)STE/ON, O(/TI O( *QEO/KRITOS E)GE/NETO I)SO/XRONOS TOU= TE *)ARA/TOU
KAI\ TOU= *KALLIMA/XOU KAI\ TOU= *NIKA/NDROU: E)GE/NETO DE\ E)PI\ TW=N
3. Scholia in Theocritum, Scholia in Theocritum. {5038.001}. Prolegomenon-anecdote-poem proleg section-verse B line t.
XRO/NWN *PTOLEMAI/OU TOU= *FILADE/LFOU.
(B.) {2&KGEbAT$}2*EU(/RESIS TW=N BOUKOLIKW=N (t)(B a.) *TA\ BOUKOLIKA/ FASIN E ) N * L A K E D A I M O N I / A | EU(REQH=NAI
Similarly, the user can search for runs of Roman script in text. The following is a search for Hesychian citations (Hesych) in Mette's Fragmenta of Aeschylus:
1. Aeschylus Trag. Atheniensis, Fragmenta (Mette). {0085.008}. Tetralogy 1 play D fragment 7 line 1.
{<*P*R*W*T.?>} 'A)MH/XANON TEU/XHMA KAI\ DUSE/KLUTON'.
(7.) &Hesych. Lex. $*A& 1357 L. (aus Diogenian.): $'A)/E{L}PTOI': DEINOI/, KAI\
'A)/APTOI'. * A I ) S X U / L O S * P R W T E I = .
2. Aeschylus Trag. Atheniensis, Fragmenta (Mette). {0085.008}. Tetralogy 1 play D fragment 8a line 1.
'A)/APTOI'. * A I ) S X U / L O S * P R W T E I = .
(8a.) &Hesych. Lex. $*A& 3404 L. (aus Diogenian.) %106 Etym. Genuin. p. 26,
&11 Mi. [Etym. Magn. p. 75, 22 Gaisf.]: $'A)MA/DA': TH\N NAU=N, A)PO\ TOU= 'A)MA=N'
3. Aeschylus Trag. Atheniensis, Fragmenta (Mette). {0085.008}. Tetralogy 1 play D fragment 9 line 1.
TH\N QA/LASSAN. H( LE/CIS P A R ' * A I ) S X U / L W I .
(9.) &Hesych. Lex. II 137, 21 Schm. (aus Diogenian.): $'E)PA/SW': E)KTH/SW.
* A I ) S X U / L O S * P R W T E I = S A T U R I K W = I . @1
Because this mode of searching is literal, you will need to search
for any Greek text in capitals, rather than lowercase: ARGO rather
than argo. Furthermore, you wil need to include any diacritics
or capital asterisks in your search: *)/ARGO rather than
ARGO.
If this option is specified, no beta escape is resolved into HTML-based formatting: all beta escapes (including citations) are displayed as raw beta codes. Compare the following with the instances given in Suppress raw beta escapes:
&nehmer an den von Sisyphos aus Anlab der Anschwemmung des toten
&Melikertes zu Ehren des Poseidon gestifteten Isthmischen Spielen]1. @1
~yz"1n"
@&Pap. Ox. 2250
~y"16a"
&10[1oberer Rand]1$8
~z1
{4%43?}4 A)/]GE DH/, BASILEU=, #74 [%40 %40 %41 %40 %40 %41
KAI\ CU/MPASAN #74 M[%40 %40 %41 %40 %40 %41
TOU= BAQUPLOU/TO[U #74 %40 %40 %41 E)/CW
P?ENI/AS NAI/WN #74 K[AI\ %41 %40 %40 %41
PAL?]I?KH\N SKH/PTR?[WI #74 %40 %40 %41 %40 %40 %41,
DE/C]A?[I?] ME?[!!!] #74 F?I?[LI/AS XW/RAS
This option is incompatible with Suppress raw beta escapes.
In either an Index or a Textual search, results are normally ordered by TLG author number; for example, results from Euripides (0006) will precede results from Homer (0012). Specifying chronological ordering forces the results to be displayed instead in ascending chronological order, so that results from the earliest authors (following the TLG sorting order for dates) are displayed first. This allows the earliest instance of a word to be identified. Since all results have to be retrieved and sorted before they can be displayed, the search will be considerably slower. Textual searches in particular may be too slow.
tlg@ptolemy.tlg.uci.edu