Wildcard Searches

The Online TLG® includes a powerful Wildcard facility, allowing for indeterminate searches of various kinds --- on both the word index and in full text searches. These searches are specified using regular expressions, a notation used to describe different string patterns. Each element is introduced below with examples.

Search for ANA as part of a word:ANA
Search for ANA as a prefix (word index)/at the start of a line (full text):^ANA (Details)
Search for ANA as a suffix (word index)/at the end of a line (full text):ANA$ (Details)
Search for the word ANA (word index only):^ANA$ (Details)
Search for ANADU or ANEDU:AN[AE]DU (Details)
Search for ANADU or ANEDU:ANADU|ANEDU (Details)
Search for ANADU or ANEDU:(ANA|ANE)DU (Details)
Search for A)NADU or A)NEDU:(A\)NA|A\)NE)DU (Details)
Search for ANAD followed by anything but U:ANAD[^U] (Details)
Search for a word consisting of ANAD, then any two letters, then U:^ANAD..U (Details)
Search for EDU or ANEDU:(AN)?EDU (Details)
Search for a numeric digit:[0-9] (Details)
Search for an Arabic numeral (a sequence of one or more digits):[0-9]+ (Details)
Search for a quotation mark Beta escape (" followed by zero or more digits):"[0-9]* (Details)

Note that there are different versions of the regular expression notation, with additional features to the basic standard. The version outlined here is not necessarily identical to the versions documented above, but they correspond in their essentials.

While this kind of search is much more powerful than the ordinary searches offered by the search engine, it is also significantly slower (at the most a quarter of the speed of ordinary searches). This is because, to guarantee correct results, regular expression searches need to scan the text repeatedly: the optimization whereby normal searches can be performed in a single pass through the text does not apply here.

The search may be run in one of three modes. If Including Beta escapes is specified, all characters in the Beta Code text are considered; this is a full Beta search. If Including Beta escapes is not specified, beta escapes, citations, and hyphens are ignored. If the search is diacritics-sensitive and case-sensitive (full-letteral search), all Beta Code Greek characters are included in the search, including * (capital letter) and the various diacritics. If the search is neither diacritics nor case-sensitive (letter-only search), then * and the diacritics are ignored. These features are illustrated below.

Normal diacritic-sensitive searches disallow a search ending in a non-diacritic character from matching a text where the next character is a diacritic. For example, ARA is a substring of PARA/GW, and will give a succesful match. But a normal diacritic-sensitive search detects the acute following the second alpha, and rules this out as an instance of (unaccented) ARA. This qualification applies to regular expression searches only in full-letteral search, and not in full Beta search.

Similar features of diacritic-sensitive behaviour in normal searches apply to full-letteral searches, but not full Beta searches: treat more than one word delimiter in a row as a single space; match graves to acutes; ignore second accent; ignore internal breathings, conflate rough and smooth coronides.


Character classes: [ ]

The most common regular expression feature likely to be used are character classes. This allows users to specify that the next character to search for may be one of several alternatives. Character classes are enclosed in square brackets.

For example, the search string [AB] matches any instances of either A or B. The search string GU[YP] matches instances of either GUP or GUY; this allows a user to retrieve both nominative and oblique instances of GU/Y 'vulture'. The search string AN[AE]PTUC matches either ANAPTUC or ANEPTUC; this allows a user to retrieve aorist instances of A)NAPTU/SSW 'unfold' both with and without internal augment.

The following are results from a search for X[IO]S in the Lobel & Page Fragmenta of Sappho. The search is done in each of the three modes: letter-only, full-letteral, and full Beta.


Letter-only (Consider only the letters in the search expression)

1. Sappho Lyr., Fragmenta (Lobel & Page). Fragment 1 line 28.

<QU=MOS I)ME/RREI, TE/LESON, SU\ D' AU)/TA>
{2#310}2[#6]<SU/MMAXOS E)/SSO.> @1
(2.) !RANOQEN KATIOU[S- (1a)

2. Sappho Lyr., Fragmenta (Lobel & Page). Fragment 3 line 11.

[ ]MH?D?[ ]!AZE, [
[
]X?IS, SUNI/HM[
[ ]
!HS KAKO/TATO[S

3. Sappho Lyr., Fragmenta (Lobel & Page). Fragment 23 line 13.

[ ]TAIN
[ PAN]NUXI/S[D]HN
. . .
@1
(24a.) . . .
[ ]ANA/GA?[ (1)

The search identifies instances of either XOS or XIS, ignoring intervening accents and escape codes (such as the littera dubia question mark.)


Full literal (Consider all valid characters of Greek in making a match)

1. Sappho Lyr., Fragmenta (Lobel & Page). Fragment 1 line 28.

<QU=MOS I)ME/RREI, TE/LESON, SU\ D' AU)/TA>
{2#310}2[#6]<SU/MMAXOS E)/SSO.> @1
(2.) !RANOQEN KATIOU[S- (1a)

2. Sappho Lyr., Fragmenta (Lobel & Page). Fragment 3 line 11.

[ ]MH?D?[ ]!AZE, [
[
]X?IS, SUNI/HM[
[ ]
!HS KAKO/TATO[S

3. Sappho Lyr., Fragmenta (Lobel & Page). Fragment 27 line 9.

S]TEI/XOMEN GA\R E)S GA/MON: EU)= DE[
KA]I\ SU\ TOU=T', A)LL' O)/TTI TA/XISTA[
PA]R?[Q]E/NOIS A)/P[P]EMPE, QE/OI[

The search is sensitive to diacritics, and thus rejects any instances of XOS or XIS containing diacritics (as in §23.13).


Full Beta (Consider all characters of Beta code in making a match)

1. Sappho Lyr., Fragmenta (Lobel & Page). Fragment 1 line 28.

<QU=MOS I)ME/RREI, TE/LESON, SU\ D' AU)/TA>
{2#310}2[#6]<SU/MMAXOS E)/SSO.> @1
(2.) !
RANOQEN KATIOU[S- (1a)

2. Sappho Lyr., Fragmenta (Lobel & Page). Fragment 27 line 9.

S]TEI/XOMEN GA\R E)S GA/MON: EU)= DE[
KA]I\ SU\ TOU=T', A)LL' O)/TTI TA/XISTA[
PA]R?[Q]E/NOIS A)/P[P]EMPE, QE/OI[

3. Sappho Lyr., Fragmenta (Lobel & Page). Fragment 30 line 3.

PA/RQENOI D[
PANNUXISDO![!]A?![
SA\N A)EI/DOI!N F[ILO/TATA KAI\ NU/M-

The search now rejects instances in which beta escapes (such as ? in §3.11) intervene between the letters searched; it only accepts literal instances of XOS or XIS.

Space can be one of the characters specified in a character set. If the search is letter-only or full-letteral, space is treated as a word delimiter, and will match carriage return and dash as well. Thus, the following is a letter-only search for X[IO]S[ T] in Sappho; it will match XOS or XIS, followed by a word delimiter (= at the end of a word) or T:

1. Sappho Lyr., Fragmenta (Lobel & Page). Fragment 1 line 28.

<QU=MOS I)ME/RREI, TE/LESON, SU\ D' AU)/TA>
{2#310}2[#6]<SU/MMAXOS E)/SSO.> @1
(2.) !
RANOQEN KATIOU[S- (1a)

2. Sappho Lyr., Fragmenta (Lobel & Page). Fragment 3 line 11.

[ ]MH?D?[ ]!AZE, [
[ ]
X?IS, SUNI/HM[
[ ]
!HS KAKO/TATO[S

3. Sappho Lyr., Fragmenta (Lobel & Page). Fragment 27 line 9.

S]TEI/XOMEN GA\R E)S GA/MON: EU)= DE[
KA]I\ SU\ TOU=T', A)LL' O)/TTI TA/XISTA[
PA]R?[Q]E/NOIS A)/P[P]EMPE, QE/OI[

Note the highlighting: the comma in §3.11 is now included in the search match, because it is skipped over between the sigma and the space. Similarly, the tau in §27.9 is now highlighted.


Character exclusion classes: [^ ]

A character class specifies what may match the next character in the text. A character exclusion class specifies what may not match the next character in the text. Such a class is specified by inserting a caret (^) after the opening square bracket.

For example, the search string [^AB] matches any character other than A or B. Depending on the mode the search is run in, this could include

A letter-only search for DAFN[^I] would return words beginning in DAFN and not continuing with an iota; this would help separate out instances of DA/FNIS and DA/FNH. A full Beta search for #2[^0123456789] would find any instances of the Beta escape #2 (stigma) not followed by a digit; this would prevent matches being reported for such escapes as #20 (Angular half symbol), #200 (Jupiter), and #246 (Heaven symbol, Doctrina Patrum).

The following is a search for TA[^N] in the Mani-Codex:


Letter-only:

1. Anonymus Manichaeus Biogr., $*PERI\ TH=S GE/NNHS TOU= SW/MATOS AUTOU=&. Codex page 1 line 6.

MOI E(AUT!!!! [— — —].
(2.) "— — —
[
!!! KAT]A\? BRAXU\ B?[RA]X?U\
!!!![!!]O?N. AS?E?B?[!!!!]

2. Anonymus Manichaeus Biogr., $*PERI\ TH=S GE/NNHS TOU= SW/MATOS AUTOU=&. Codex page 2 line 4.

SOI E)/DEIC[A!!!!! !!!]
A)PO\ POLL[W=N. E)CE/]S?TA[I]
DE/ SOI MEGA[LOP]REPW=S

3. Anonymus Manichaeus Biogr., $*PERI\ TH=S GE/NNHS TOU= SW/MATOS AUTOU=&. Codex page 2 line 8.

KAI\ O)FQALMOFANE/STA-
T
A QEWRH=SAI TO\ MUSTH/RI-
ON E)KEI=NO.
" KAI\ TO/TE O(

This search skips accents and beta escapes. Thus, §2.4 skips [, matching [^N] with I; similarly, §2.7 skips the hyphen, and matches [^N] with T on the next line. Note in §1.6 that the match for [^N] is not the grave or the question mark (which are highlighted as part of the search result), but the following space, which still counts as a "character other than N".


Full-letteral:

1. Anonymus Manichaeus Biogr., $*PERI\ TH=S GE/NNHS TOU= SW/MATOS AUTOU=&. Codex page 1 line 6.

MOI E(AUT!!!! [— — —].
(2.) "— — —
[
!!! KAT]A\? BRAXU\ B?[RA]X?U\
!!!![!!]O?N. AS?E?B?[!!!!]

2. Anonymus Manichaeus Biogr., $*PERI\ TH=S GE/NNHS TOU= SW/MATOS AUTOU=&. Codex page 2 line 4.

SOI E)/DEIC[A!!!!! !!!]
A)PO\ POLL[W=N. E)CE/]S?TA[I]
DE/ SOI MEGA[LOP]REPW=S

3. Anonymus Manichaeus Biogr., $*PERI\ TH=S GE/NNHS TOU= SW/MATOS AUTOU=&. Codex page 2 line 8.

KAI\ O)FQALMOFANE/STA-
T
A QEWRH=SAI TO\ MUSTH/RI-
ON E)KEI=NO.
" KAI\ TO/TE O(

The results are the same; a match would only be rejected if it contained an accent within the search span, e.g. if §2.7 had O)FQALMOFANESTA/TWS rather than O)FQALMOFANE/STATA. However, note the highlighting in §1.5: in a full-letteral search, grave counts as a "character other than N", so TA[^N] matches TA\, not TA\? . As a result, the question mark and space are excluded from the match.


Full Beta:

1. Anonymus Manichaeus Biogr., $*PERI\ TH=S GE/NNHS TOU= SW/MATOS AUTOU=&. Codex page 1 line 6.

SOI E)/DEIC[A!!!!! !!!]
A)PO\ POLL[W=N. E)CE/]S?TA[I]
DE/ SOI MEGA[LOP]REPW=S

2. Anonymus Manichaeus Biogr., $*PERI\ TH=S GE/NNHS TOU= SW/MATOS AUTOU=&. Codex page 2 line 7.

DE/ SOI MEGA[LOP]REPW=S
KAI\ O)FQALMOFANE/S
TA-
TA QEWRH=SAI TO\ MUSTH/RI-

3. Anonymus Manichaeus Biogr., $*PERI\ TH=S GE/NNHS TOU= SW/MATOS AUTOU=&. Codex page 2 line 8.

KAI\ O)FQALMOFANE/STA-
TA QEWRH=SAI TO\ MUSTH/RI-
ON E)KEI=NO.
" KAI\ TO/TE O(

Now [^N] can be matched by any character, and no character may intervene between T and A; as a result, T]A\ in §1.5 is rejected. The hyphen is considered a valid character to match in §2.7, so it matches TA- to TA[^N].

Space can be specified as one of the characters to be excluded from the match; this means that the string to be matched should not occur at the beginning or end of a word. For example, the following is a search for [ ]EIS[^ ] in the Mani-Codex --- i.e. EI)S- as a prefix to a word (= not followed by a word delimiter), and not the preposition EI)S on its own:

1. Anonymus Manichaeus Biogr., $*PERI\ TH=S GE/NNHS TOU= SW/MATOS AUTOU=&. Codex page 4 line 7.

E(STW/SHS.
PLEI=STAI DE/
EI)SIN O)PTASI/-
AI KAI\ TA\ QEA/MATA ME/GI-

2. Anonymus Manichaeus Biogr., $*PERI\ TH=S GE/NNHS TOU= SW/MATOS AUTOU=&. Codex page 11 line 3.

[XRI] T?ETA/RTOU E)/TOUS.
[TO/T]E EI)SH/LASA EI)S TO\ DO/-
G
?M?A TW=N BAPTISTW=N

3. Anonymus Manichaeus Biogr., $*PERI\ TH=S GE/NNHS TOU= SW/MATOS AUTOU=&. Codex page 53 line 10.

DES MOU E)PI\ TOU\S A)STRA-
GA/LOUS OU)X
EI(STH/KEISAN.
A)PH=LQON DE\ EI)S SUXNA\S

Observe that the preposition EI)S in §11.3 is skipped over in the search.


Character range classes: [ - ]

Character range expressions can be simplified by using hyphen, to indicate a continuous range of characters in ASCII. Thus, [A-Z] means any capital letter of the alphabet, and [^0-9] means any character other than a digit. Since ASCII order is used, [A-C] means the characters ABC (i.e. alpha, beta, xi), and not the letters alpha, beta, gamma... through to xi.

The main use of ranges is to specify digits in beta escapes. Thus, the search for #2 mentioned in Character Exclusion classes could be rephrased as #2[^0-9]. The following is a search in Phlegon's Fragmenta for %[^0-9] (dagger only among the punctuation escapes --- i.e. exclude %1 'question mark', %2 'asterisk' etc.), within 0 lines of %[0-9] (any punctuation other than dagger):

1. Publius Aelius Phlegon Paradox., Fragmenta. Volume-Jacoby#-F 2b,257,F fragment 23 line 2.

  &ET. M. p. 18, 54: $*)ADRI/AS: TO\ PE/LAGOS. *DIONU/SIOS *SIKELI/AS TU/-
RANNOS,
O(\S PRO/TERON E)PI\ TH=I * O)LUMPIA/DI PO/LIN E)/KTISEN *)ADRI/AN E)N TW=I *)IONI-
KW=I KO/LPWI, A)F' H(=S KAI\ TO\ PE/LAGOS *)ADRI/AS KALEI=TAI. *EU)/DOCOS DE\ E)N TW=I Q TW=N

2. Publius Aelius Phlegon Paradox., Fragmenta. Volume-Jacoby#-F 3b,257,F fragment 25bis line 2.

  &SCHOL. LIV. 8, 15, 7 a. 336 a. Chr. (L. Voit &Philol.
&91, 1936, p. 310): &Minutia virgo Ves<talis> †minutioris primo suspecta
&cultus moxq(ue) flagitii servo accusante convicta ad porta(m) Collina(m)

3. Publius Aelius Phlegon Paradox., Fragmenta. Volume-Jacoby#-F 2b,257,F fragment 37 line 57.

  &(II) $*OI( A)PO\ E(KATO\N E(NO\S E)TW=N ME/XRI E(KATO\N DE/KA A)POGRAYA/MENOI.
&69 $*GA/IOS *LHLH/DIOS *PRI=MOS, *, PO/LEWS *BONWNI/AS, E)/TH E(KATO\N E(/N. &70 $*KLWDI/A
*POTE/STA, *GAI/OU A)PELEUQE/RA, PO/LEWS *BONWNI/AS, E)/TH RA.
&71 $*KOUSINI/A

The search identifies instances of dagger, ignoring the adjacent instances of asterisk (%2, §3.2, §37.57) and dicolon (%10, §25bis.2).


Any character: .

It is useful in searches to leave one character completely indeterminate; this is what is usually meant by wildcard. The wildcard is represented here by dot. Its interpretation varies according to search mode, as one would expect:

The following, by way of illustration, is a search for TO.. in the anonymous grammatical fragment PSI 7.761:


Letter-only:

1. Anonymi Grammatici Gramm., Fragmentum grammaticum (PSI 7.849). Line 5.

[           ]N EI)/H TO\ A EI)/TE ME
[           ]!TO(UN): GRAPT(E/ON?) D(E\) S(U\N) TW=| I?  
[           ]
F?W?NH/(ENTOS?) K(ATA\) T(H\N) A)RX(H/N): TAU/TH?N?

2. Anonymi Grammatici Gramm., Fragmentum grammaticum (PSI 7.849). Line 12.

[                   ]RO(U) FWNH/(ENTOS?) !MO()!!!`O´!!`S´  
[                   ]
T?OSET?I?TO?!! K(A)T(A\) FW-
[NH\N? TA\ A)PO\ TO(U=) E A)RX(O/MENA) OU(/T]W EU(RI/SKOMEN DI-  

3. Anonymi Grammatici Gramm., Fragmentum grammaticum (PSI 7.849). Line 12.

[                   ]RO(U) FWNH/(ENTOS?) !MO()!!!`O´!!`S´  
[                   ]
T?OSET?I?TO?!! K(A)T(A\) FW-
[NH\N? TA\ A)PO\ TO(U=) E A)RX(O/MENA) OU(/T]W EU(RI/SKOMEN DI-  

The search retrieves words containing TO and at least two letters after it. The missing letter in T?OSET?I?TO?!! counts as a letter.


Full-letteral:

1. Anonymi Grammatici Gramm., Fragmentum grammaticum (PSI 7.849). Line 5.

[           ]N EI)/H TO\ A EI)/TE ME
[           ]!TO(UN): GRAPT(E/ON?) D(E\) S(U\N) TW=| I?  
[           ]
F?W?NH/(ENTOS?) K(ATA\) T(H\N) A)RX(H/N): TAU/TH?N?

2. Anonymi Grammatici Gramm., Fragmentum grammaticum (PSI 7.849). Line 7.

[           ]T?A G(A\R) O(LOKLHRO/TERA
[       XAR]AKTH=RI: TA\ A)(PO\) TO(U=) I H)\
[U A)RX(O/MENA), I(PPEU/W, I(/PPE]UON, U(MNW=, U(/MNO(UN)

3. Anonymi Grammatici Gramm., Fragmentum grammaticum (PSI 7.849). Line 9.

[U A)RX(O/MENA), I(PPEU/W, I(/PPE]UON, U(MNW=, U(/MNO(UN)
[                   ]
!`O´ TA\ A)(PO\) TO(U=) W H)\ H A)RX(O/MENA)
[                   ]
RO(U) FWNH/(ENTOS?) !MO()!!!`O´!!`S´  

Now the requirement is only that two letters or diacritics follow TO; this is satisfied by TOU= (§7, §9).


Full Beta:

1. Anonymi Grammatici Gramm., Fragmentum grammaticum (PSI 7.849). Line 3.

[PARHGGELLO/M]HN A)L(LA\) E)KAQEZO/MH(N)
[           ]
N EI)/H TO\ A EI)/TE ME
[           ]!TO(UN): GRAPT(E/ON?) D(E\) S(U\N) TW=| I?  

2. Anonymi Grammatici Gramm., Fragmentum grammaticum (PSI 7.849). Line 5.

[           ]N EI)/H TO\ A EI)/TE ME
[           ]!TO(UN): GRAPT(E/ON?) D(E\) S(U\N) TW=| I?  
[           ]
F?W?NH/(ENTOS?) K(ATA\) T(H\N) A)RX(H/N): TAU/TH?N?

3. Anonymi Grammatici Gramm., Fragmentum grammaticum (PSI 7.849). Line 6.

[           ]!TO(UN): GRAPT(E/ON?) D(E\) S(U\N) TW=| I?  
[           ]
F?W?NH/(ENTOS?) K(ATA\) T(H\N) A)RX(H/N): TAU/TH?N?
[           ]T?A G(A\R) O(LOKLHRO/TERA

The constraint is further relaxed: a match occurs if any two characters follow TO in Beta Code (whether they are word delimiters, diacritics, beta escapes or letters). For example, in §3 the two wildcard characters are matched by grave and space; in §5, they are matched by the Beta escape [1 (open parenthesis). However, now no characters may intervene between T and O.


Alternatives: |

Character classes give a simple means of specifying alternative sequences to match; however, they are limited to alternatives between single characters. A more comprehensive means of specifying alternatives is given by the disjunction bar. Used in a search expression, it requests that either the sequence preceding it or the sequence following it be matched. For example, a search for [ ]ERMH|[ ]ERMOU (where [ ] can be used to denote blank space) will return instances of either [ ]ERMH or [ ]ERMOU, providing the likely forms of 'Hermes' (Hermê-, Hermou-). Note that a search for [ ]ERM[HO] would also return any instances of ERMO-, e.g. *(ERMOGE/NHS.

Multiple alternatives can be stacked in an expression. For example, the following search will return all the possible forms of PATH/R:

This can be broken down as follows:
  1. [ ]PATHR[ ]: The nom.sg. PATH/R, as a self-standing word (preceded and followed by space);
  2. [ ]PATER[ EOWA]: The stem PATER-, followed by:
    • nothing (voc.sg. PA/TER)
    • E (nom.du. PATE/RE, nom.pl. PATE/RES)
    • O (gen.du. PATE/ROIN)
    • W (gen.pl. PATE/RWN)
    • A (acc.pl. PATE/RAS)
  3. [ ]PATROS[ ] (gen.sg. PATRO/S)
  4. [ ]PATRI[ ] (dat.sg. PATRI/)
  5. [ ]PATRASI[ N] (dat.pl. PATRA/SI(N): PATRA/SI followed by nothing or nu)

The following are the results of this query applied to the Book of Joshua (Vatican and Alexandrian codices) in the Septuagint:

1. Septuaginta 006:  1.6.3. W)/MOSA TOI=S PATRA/SIN U(MW=N DOU=NAI AU)TOI=S.

2. Septuaginta 006:  1.11.4. TH\N GH=N, H(\N KU/RIOS O( QEO\S TW=N PATE/RWN U(MW=N DI/DWSIN U(MI=N.

3. Septuaginta 006:  2.12.3. KAI\ U(MEI=S E)/LEOS E)N TW=| OI)/KW| TOU= PATRO/S MOU

4. Septuaginta 006:  2.13.2. OI)=KON TOU= PATRO/S MOU KAI\ TH\N MHTE/RA MOU KAI\ TOU\S A)DELFOU/S

5. Septuaginta 006:  2.18.3. EI)S TH\N QURI/DA, DI' H(=S KATEBI/BASAS H(MA=S DI' AU)TH=S, TO\N DE\ PATE/RA

6. Septuaginta 006:  2.18.5. OI)=KON TOU= PATRO/S SOU SUNA/CEIS PRO\S SEAUTH\N EI)S TH\N OI)KI/AN SOU.

7. Septuaginta 006:  5.6.5. SEN MH\ I)DEI=N AU)TOU\S TH\N GH=N, H(\N W)/MOSEN KU/RIOS TOI=S PATRA/SIN

8. Septuaginta 006:  6.23.3. SAN *RAAB TH\N PO/RNHN KAI\ TO\N PATE/RA AU)TH=S KAI\ TH\N MHTE/RA AU)-

9. Septuaginta 006:  15.18.2. SUNEBOULEU/SATO AU)TW=| LE/GOUSA *AI)TH/SOMAI TO\N PATE/RA MOU A)GRO/N:

10. Septuaginta 006:  17.1.2. TOKOS TW=| *IWSHF: TW=| *MAXIR PRWTOTO/KW| *MANASSH PATRI\ *GALAAD

11. Septuaginta 006:  17.4.5. A)DELFOI=S TOU= PATRO\S AU)TW=N.

12. Septuaginta 006:  21.43.2. DOU=NAI TOI=S PATRA/SIN AU)TW=N, KAI\ KATEKLHRONO/MHSAN AU)TH\N KAI\

13. Septuaginta 006:  21.44.2. O/TI W)/MOSEN TOI=S PATRA/SIN AU)TW=N: OU)K A)NE/STH OU)QEI\S KATENW/PION

14. Septuaginta 006:  22.28.4. E)POI/HSAN OI( PATE/RES H(MW=N OU)X E(/NEKEN KARPWMA/TWN OU)DE\ E(/NEKEN

15. Septuaginta 006:  24.2.2. *ISRAHL *PE/RAN TOU= POTAMOU= KATW/|KHSAN OI( PATE/RES U(MW=N TO\ A)P'

16. Septuaginta 006:  24.2.3. A)RXH=S, *QARA O( PATH\R *ABRAAM KAI\ O( PATH\R *NAXWR, KAI\ E)LA/TREU-

17. Septuaginta 006:  24.2.3. A)RXH=S, *QARA O( PATH\R *ABRAAM KAI\ O( PATH\R *NAXWR, KAI\ E)LA/TREU-

18. Septuaginta 006:  24.3.1. @8 KAI\ E)/LABON TO\N PATE/RA U(MW=N TO\N *ABRAAM E)K

19. Septuaginta 006:  24.6.2. KAI\ KATEDI/WCAN OI( *AI)GU/PTIOI O)PI/SW TW=N PATE/RWN U(MW=N E)N A(/RMA-

20. Septuaginta 006:  24.14.3. PERIE/LESQE TOU\S QEOU\S TOU\S A)LLOTRI/OUS, OI(=S E)LA/TREUSAN OI( PATE/RES


Grouping: ( )

The foregoing specification of the forms of PATH/R is somewhat prolix; after all, the prefix [ ]PAT is common to all the forms. In general, one will want to keep most of the search string constant, and only have alternatives for part of the search string constant. This can be accomplished by using parentheses, to group together parts of the search string as a separate regular expression. For example, the expression (HR|ROS) alternates between HR and ROS. As a result, the expression PAT(HR|ROS) alternates between PATHR and PATROS. The search expression for the forms of PATH/R can now be written as

This corresponds to
pat- -êr, -er{Ø, e, o, ô, a}, -r{os, i, asi(n)}

Grouping is a powerful tool. In the following, for example, the search expression in the following (involving Basil of Cappadocia's letters) is [ ]O (MEN|DE)[ ](KURI|UI|QE)OS[ ]; i.e. O(, followed by either ME/N or DE/, followed by the nominative of KU/RIOS, U(IO/S, or QEO/S.

1. Basilius Theol. 004:  8.2.22. FU/SEI E)STI/N: O( DE\ *QEO\S A(PLOU=S KAI\ A)SU/NQETOS PARA\ PA=SIN

2. Basilius Theol. 004:  8.2.50. E)STI KAKI/AS. *(O DE\ *UI(O\S KAI\ TO\ *PNEU=MA TO\ *(/AGION PHGH/ E)STIN

3. Basilius Theol. 004:  8.3.12. XA/RIN O)NOMA/ZONTAI, OI(\ DE\ KATA\ YEU=DOS *(O DE\ *QEO\S MO/NOS KAT'

4. Basilius Theol. 004:  8.4.11. AU)TOQERMO/THS EI)=NAI: O( DE\ *KU/RIOS H(MW=N EI)/RHKEN: *)EGW/ EI)MI

5. Basilius Theol. 004:  8.9.7. KAI\ KREI=TTON R(OPH/N, O( DE\ *UI(O\S OU) DU/NATAI/ TI POIEI=N A)F'

6. Basilius Theol. 004:  8.9.10. DU/NATAI. *(O DE\ *UI(O\S E)N TW=| OU)RANW=| KAI\ E)PI\ TH=S GH=S PA/NTA

7. Basilius Theol. 004:  8.9.13. H)\ TW=N E)NANTI/WN E)STI\ DEKTIKA/. *(O DE\ *UI(O\S AU)TODIKAIOSU/NH

8. Basilius Theol. 004:  38.4.29. KAI\ TO\ E)K TOU= *PATRO\S U(FESTA/NAI. *(O DE\ *UI(O\S O( TO\ E)K TOU=

9. Basilius Theol. 004:  210.6.32. A)CIOPISTI/AN PROSDEOME/NOIS. *EI) DE\ O( ME\N *KU/RIOS TH\N

10. Basilius Theol. 004:  290.1.39. OI)KONOMOU=NTI TO\ SUMFE/RON *QEW=|. *(O DE\ *QEO\S O( A(/GIOS A)PAGA/-


Escapes: \

The astute reader will have noted a dearth of diacritics in the foregoing search expressions. In addition, the parentheses used for alternates are identical to the Beta code breathing marks; and all the symbols used to delimit regular expressions also double as Beta code escapes or diacritics. How can these be entered into searches?

The answer is that, when wildcard searches are being performed, the literal equivalents of symbols used to build up regular expressions (such as parentheses, square brackets, and asterisks) can only be accessed by prefixing them with backslash (\). This includes backslash itself, so that grave has to be written as \\. For example, [AB] denotes a search for either A or B; \[AB\] denotes a search for the literal character sequence [AB], including the brackets. [\[{] is a search for either [ or {. A version of the search for the forms of PATH/R with diacritics is:

And a version of the search for all the forms of A)/NQRWPOS (including the results of crasis with O( and case sensitivity, but excluding crasis with oblique articles, e.g. T)ANQRW/PW|) is:

This is a formidable expression, and being able to construct it is by no means a prerequisite to using the online TLG! However, it is worth going through as an illustration of what regular expressions can do. To do so, we will use a tree notation to represent alternative pathways:

[Tree diagram of ANQRWPOS Regexp]
Note that the second accent of A)/NQRWPOS under enclisis need not be included in the search expression, as it is automatically incorporated into the search.


Optional match: ?

It is also useful to specify that a segment of the search expression is optional; the match will succeed whether or not the segment is present. This is done by use of the question mark, which makes optional the letter preceding it, or the expression contained in parentheses preceding it. Thus, OTT?I will search for instances of either OTI or OTTI; the second tau is marked as optional. Similarly, O (MEN )? ANHR will search for instances of either O ANHR or O MEN ANHR. And A\)/R\)?R\(?WSTOS is a search for A)/RRWSTOS, with the rhos optionally followed by internal breathing marks (A)/R)R(WSTOS). Finally, some of the repetitions in the A)/NQRWPOS search above can be eliminated with optional matches; a (somewhat) simpler version, now including crasis with oblique articles, is:

The following is a search for [ ]EIPEN? TOU[ ] in the Chapbook of Alexander the Great; while this is a seventeenth century work, the treatment of nu movable applies to any period of Greek:

1. Historia Alexandri Magni, Recensio $F&. Gerardi page 4 line 8.

TH\N KUBE/RNHSIN DIA\ NA\ MH\N MA=S A)FANI/SOUN. *KAI\ W(S H)/KOUSEN TOU\S
LO/GOUS E)TOU/TOUS O( *)EKTENABO/S,
EI)=PEN TOU= *BERBE/RH GELW/NTAS: *SU/RE
A)NAPAU/SOU, KAI\ E)GW\ ME\ E(/NAN LO/GON QE/LW TOU\S KA/MEI O(/LOUS NA\ E)PI-

2. Historia Alexandri Magni, Recensio $F&. Gerardi page 32 line 11.

E)MA/ZWCE TA\ FOUSA/TA TOU, XILIA/DES DW/DEKA, KAI\ E)PH=GEN EI)S TO\ KA/-
STRON TOU= *FILI/PPOU. *KAI\ EI)SE/BH ME/SA KAI\
EI)=PE TOU= *FILI/PPOU: *EI)S
BOH/QEIA/N SOU H)=LQA, BASILEU=. *KAI\ AU)TO\S E)KOI/TAZE NA\ EU(/RH A)/DEIAN

3. Historia Alexandri Magni, Recensio $F&. Gerardi page 34 line 6.

PIA/NEI TON ZWNTANO\N KAI\ H)/FERE/ TON EI)S TO\N *FI/LIPPON ME\ O)LI/GHN YU-
XH/N. *KAI\
EI)=PE TOU= PATRO/S TOU: *SHKW/SOU, PA/THSE TO\N E)XQRO/N SOU.
*KAI\ A)NE/STH O( *FI/LIPPOS ME\ O)LI/GHN YUXH\N KAI\ EI)=PE PRO\S TO\N *)ANA/CAR-


Line boundaries: ^$

In full text searches, the elements ^ and $ are used to match line boundaries; ^ matches the beginning of a line, and $ the end. The line in this case corresponds to a printed line of the source edition of the text. In the case of prose this has no intrinsic meaning, but is merely a typographical convenience; so while in many contexts such searches are significant, they are rarely meaningful in textual searches through the kind of texts encompassed by the TLGTM (In such searches, though, trailing spaces at the end of the line are ignored.)

In Word Index searches, however, these signs are needed to delimit the beginning and end of the word being sought. This means that the word index wildcard search is by default a substring search, whereas in the case of literal searches it is treated as a prefix search. Since searches through a sorted word list expect that the beginning of the word is known, wildcard searches not prefixed by ^ will be quite slow.

In Word Index searches, spaces are treated as word delimiters identical to ^ and $.


Zero or more instances: *

The last two elements of regular expressions are essential, but occur only rarely in natural texts. An asterisk ("star") indicates zero or more instances of the preceding letter or bracketted expression. For instance, KA*R would match KR, KAR, KAAR, KAAAR, and so on. [ ](AMHN[ ])*LEGW UMIN will match LEGW UMIN, AMHN LEGW UMIN, AMHN AMHN LEGW UMIN, AMHN AMHN AMHN LEGW UMIN, and so on.

The main use of star for the TLGTM corpus is in specifying beta escapes. For example, the following is a search in Diophantes' Arithmetica for #[0-9]* on the same line as GINETAI, except for #1513[^0-9]. The search will retrieve any Beta code text symbol, defined as a hash/pound sign followed by zero or more digits. It will thus retrieve instances of # 'keraia (numeric signifier)' (§48.27), #20 'angular half symbol' (§48.27), and #166 'negative sign' (§24.12). However, it will reject a match with #1513 (Diophantus' variable x.)

1. Diophantus Math. 001:  24.12. DE\ TA\ E)LA/SSONA GI/NETAI #1513 G #166 *MO T. TAU=TA I)/SA #1513 A #166 *MO K.

2. Diophantus Math. 001:  24.12. DE\ TA\ E)LA/SSONA GI/NETAI #1513 G #166 *MO T. TAU=TA I)/SA #1513 A #166 *MO K.

3. Diophantus Math. 001:  28.15. #1513 A *MO K: E)A\N DE\ TOU= R A)FAIREQH=|, GI/NETAI *MO R #166 #1513 A.

4. Diophantus Math. 001:  28.18. SONA GI/NETAI *MO U #166 #1513 D: TAU=TA I)/SA #1513 A *MO K.

5. Diophantus Math. 001:  34.17. KAI\ GI/NETAI O( #1513 *MO L#2.

6. Diophantus Math. 001:  44.8. EI)SIN #1513 D #166 *MO O: TAU=TA I)/SA #1513 B: KAI\ GI/NETAI O( #1513 *MO LE.

7. Diophantus Math. 001:  46.4. #1513 G #166 *MO IE, KAI\ GI/NETAI O( #1513 *MO KE.

8. Diophantus Math. 001:  48.27. *MO I HU(RE/QH: KAI\ GI/NETAI O( #1513 *MO IB #20#.

9. Diophantus Math. 001:  48.27. *MO I HU(RE/QH: KAI\ GI/NETAI O( #1513 *MO IB #20#.

10. Diophantus Math. 001:  50.18. PA/NTA Q$KIS$. #1513 A)/RA H I)/SOI *MO R. KAI\ GI/NETAI O( #1513 *MO IB #20#.


One or more instances: +

Finally, the plus sign indicates one or more instances of the preceding element, and can be used wherever the asterisk can. The following is a search for two or more consecutive alphas in the Magical papyri; the search is expressed as AA+ (alpha, followed by one or more alphas), and could be equivalently expressed as AAA* (two alphas, followed by zero or more alphas.)

1. Magica 001:  1.138. AUWI PTAUXARHBI AWUOSWBIAU PTABAI+N AAAAAAA

2. Magica 001:  1.218. E)PI\ TH\N GH=N: $AQHZOFWIM ZADHAGHWBHFIAQEAA *)AM-

3. Magica 001:  1.219. BRAMI *)ABRAAM QALXILQOE ELKWQWWHH AXQWNWN

4. Magica 001:  1.227. O)/NOMA $BORKH FOIOUR I+W ZIZIA? A?PA?RCEOUX QUQ?H LAILAM
AAAAAA [II]III WWWW I+EW I+EW I+EW I+EW I+EW I+EW I+EW

5. Magica 001:  1.238. *XNOUFI BRINTATHNWFRIBRISKULMAAROUAZARBAMESEN

6. Magica 001:  1.242. II AA OO UU HH EE WW.'$ TAU=TA POIH/SAS A)PO/KLUSON KAI\

7. Magica 001:  2.96. A)MFI\ TE/NONTA DEDOUPO/TA R(OI=ZON I(MA/SQLHS, $AAAAAAA: EEEEEEE: HHH-

8. Magica 001:  2.130. HE: EEH: HEE: AAW: WEA: EAW: WI: WE: HW: EH: EAE:

9. Magica 001:  2.138. EOUW?: AA[:] AHW: EE: EHU: HH: EHA: XABRAX FLIES

10. Magica 001:  2.158. $AA EE$ *MIXAH/L: $HIA: EUW: UAE: EUW: IAE:$


Window

The search engine processes a window of two text lines at any one time. This is done in order to prevent the search result taking up an excessively long range of text; in principle, there is nothing prevent a search for .* returning an entire text. The usual window for a wildcard search engine is only one text line; we have expanded this to two to allow for searches to pick up hyphenated words, and phrases ranging across two lines.


Mode restrictions

Technical issues make it impossible to offer wildcard searches which are sensitive to case but not diacritics, or diacritics but not case.

Searches sensitive to diacritics but not case would result in automaton specifications too complex to be implemented practically. Such searches can be simulated with appropriate regular expressions, offering the capital and lower-case instances of the accented letter as alternates. For example, a search for (\*\)/A|A\)/)NQRWPOS will return instances of both *)/ANQRWPOS and A)/NQRWPOS. Refer to the expression for searching all variants of A)/NQRWPOS above for further illustration.

Searches sensitive to case but not diacritics run afoul of the backtracking nature of wildcard searches. A single-pass automaton (as is used in normal searches) can ignore *(/WS as an instance of WS: on seeing the capital asterisk, it swallows up any diacritics, and then rejects any following character as a possible match (so the sequence *(/W is passed over as a match for unaccented omega.) But once backtracking is admitted, the search restarted three characters later will see only WS, unaware of what has gone before it, and will match it accordingly.

Created: Feb. 25, 2000
Last Modified: May 12, 2002
Authored by: Nick Nicholas
Maintained by tlg-support@uci.edu
TLG® is a registered trademark of The Regents of the University of California.