Outside the 24 letters that form the conventional Greek alphabet, Unicode includes four more letters. Of these, the first three are archaic characters, though the extent to which they represented distinct phonemes, and thus have separate characterhood in linguistic terms, varies. The fourth is a character added on when Greek was used for a separate language; it's surprising that such additions have not happened more frequently in the history of Greek script. (If you exclude the creation of Latin, Gothic, and Cyrillic, of course.)

Further potential candidates as extra letters of Greek are discussed separately.

1. Digamma

U+03DC Greek Letter Digamma [Ϝ]; U+03DD Greek Small Letter Digamma [ϝ]

The letter wau or vau (ϝαῦ) was originally the sixth letter of the Greek alphabet, and stood for /w/. If you've never head of a Greek word with a /w/ in it, that's because the standard dialect, Attic, had dropped its /w/, so the letter never made it to canonical flavours of Greek. The phoneme did stick around in the other dialects, however, and persisted even after the dialects switched from their epichoric alphabets to the Milesian standard; it was used as late as the 2nd century B.C.

The closest /w/ got to literature, though, was in lyric poetry, for example in Sappho—and that, only because what little Sappho we have was in papyri. The mainstream texts that survived into manuscripts were in Attic or Ionic, and so didn't have /w/; the original text of Homer certainly did (we can tell from his meter), but the form we have was standardised in Athens, and part of the standardisation, much like the White House in January 2001, was lopping the dubyas off.

If you've never heard of the letter wau, that's because the letter itself became so unfamiliar once the ancient dialects passed from the scene (and were not perpetuated by the mediaeval scribes), that its name became a description: digamma, "double gamma". Its unfamiliarity is also why, in its cursive form, it became associated with the stigma ligature instead of /w/.

But though you may not have heard of the name or the sound of wau, you've certainly seen its form, which was passed on to the Latin alphabet as F. How do you get from /w/ to /f/? Via /wh/—𐌅𐌇—which is the closest approximation to /f/ you can manage if you script has no letter for /f/. (And, as with Maori, even if it does.)

Although it did not make it to standard forms of the language, /w/ was definitely a distinct phoneme wherever it was used; so both published texts and grammatical discussion of dialects use digamma.

Very rarely, one will see digamma used in Modern transcriptions of [w] into the Greek alphabet: I have seen it so used for Albanian and Propontis Tsakonian. This is not mainstream practice in Greek linguistics, although it is arguably more sensible than what is mainstream practice, ου with a tie.

The shape of the letter has little variability: in lowercase it typically has a descender, a curved back, or both, while in uppercase it is either an enlarged version of the lowercase glyph, or a capital Latin F (or something very close to it). SIL Galatia Extras has two bizarre variants of the lowercase digamma: ; I do not know where they originate, but presumably they have some usage in Biblical Greek, where digamma would have only turned up as a numeral; the left glyph is a stylised version of the uncial digamma turning into a stigma, and in the accompanying documentation is used as a numerical glyph.

2. Archaic Koppa

U+03D8 Greek Letter Archaic Koppa [Ϙ]; U+03D9 Greek Small Letter Archaic Koppa [ϙ]

The Phoenecian alphabet was adapted for Greek more stupidly than we might think. The greater dumbness occured with san; but koppa also owes its short tenure to archaic Greeks being slow on the uptake.

Phoenecian had a velar plosive, kap̱ (Hebrew kaf, כ), and a uvular plosive, qôp̱ (Hebrew qof, ק). When the Greeks adopted the Phoenecian alphabet, they took both letters on. Greek does not have a uvular; but the /k/ before back vowels was pronounced slightly retracted, as one would expect: [ḵ]. So the Greeks spent a couple of centuries writing /ko/ and /ku/ as ϙο, ϙυ; this happened throughout Greece (Jeffery 1990:33). Gradually, though, Greeks realised that [ḵ] and [k] are the same phoneme, and should be written as the same letter; while some Doric regions held on to koppa into the fifth century, it did not survive the switch to the Milesian alphabet.

Since koppa does not represent a real phonological distinction, it is only used in transcription of inscriptions, not in linguistic discussion of dialects. It does not appear in lexica, for example.

The capital form used for koppa is uniformly a larger version of the lowercase, with its tail usually on the line rather than descending. Haralambous mentions the proposed form for archaic koppa capital as a backwards P, which Michael Everson rejects as inauthentic in his Archaic Koppa proposal. The issue of course is not the authenticity—all Greek lowercase is a mediaeval invention, after all—but whether there is any usage of such a glyph by the Classicists who would primarily use it; I have not seen it anywhere else, although it inevitably turns up in that glyph graveyard :-) , SIL Galatia Extras: .

The letter was proposed by Michael Everson in December 1998 (in order to be differentiated from the numeric koppa), and adopted in Unicode 3.2.

3. San

U+03FA Greek Capital Letter San [Ϻ]; U+03FB Greek Small Letter San [ϻ]

With Phoenecian sibilants, Greeks reached the apogee of confusion. As pieced together by Jeffery (1990:25-28), the development went something like this:

Phoenecian Letter Hebrew Glyph Pronunciation Greek Letter Greek Glyph Pronunciation
zayin ז /z/ san Ϻ */z/ > /s/
sāmeḵ ס /s/ sigma Σ /s/
ṣāḏê צ /ts/ zeta Ζ /dz/ > /zd/
šîn ש /ʃ/ xi Ξ */ʃ/ > */kʃ/ > /ks/

Faced with the four sibilants of Phoenecian, which they memorised without quite distinguishing correctly—especially when they didn't have half the sibilants in their own language—the Greeks jumbled the forms and the names of the letters, so that they no longer correspond to the Phoenecian originals. I've colour-coded the correspondences in form and alphabetical order; the names and sounds correspond as rows in the table. The ordering continued to be as for the Phoenecian letters, by glyph rather than by name:

The jumbling is the first problem. The second problem is that not all the sibilants were present in all the dialects. Most Greek scripts initially avoided xi, and wrote /ks/ as ΧΣ; Jeffery (1990:32) suspects the Ionians held on to it because /ks/ in Ionic could be realised as [kʃ] (which is speculative), and under the influence of neighbouring non-Hellenic languages like Carian which did have /ʃ/. (Circumstantial evidence for this lies in the separate Ionic invention of sampi as yet another sibilant, after they'd skipped san.) Once the Milesian alphabet was adopted by Athens, xi was reintroduced to the rest of Greece as /ks/.

San was a bigger problem. In Attic, [z] did not have phonemic status; it was an allophone of /s/ before voiced phonemes (Bubenik 1983:80-81). So Greek does not seem to have had a phonemic distinction between /s/ and /z/, to represent with a san as distinct from sigma. What ended up happening is that abecedaria (inscriptions of ABC's) preserved both san and sigma into the sixth century BC—Greeks slavishly repeating what they were taught by the Phoenecians; but actual inscriptions used only one letter, depending on the region: sigma in most of Greece, san in Crete and Corinth. (Jeffery 1990:33 speculates that in those regions /s/ was realised as [z]; Buck 1955:18 admits that the phonetic value of the early Cretan glyph for xi, I, is uncertain.) By the fifth century Crete was the only region that held on to san, and sigma is unattested there (Jeffery 1990:308)—until the epichoric Cretan alphabet yielded to the Milesian.

After that, san survived more as a rememberance than anything else. The Dorians and poets kept calling sigma san, and Dorians (the city of Sicyon?—Ϻικύων) branded their horses with san, just as the city of Corinth (Ϙόρινθος) branded their horses with koppa:

καὶ τόδε ἄλλο σφι ὧδε συμπέπτωκε γίνεσθαι,τὸ Πέρσας μὲν αὐτοὺς λέληθε, ἡμέας μέντοι οὔ: τὰ οὐνόματά σφι ἐόντα ὅμοια τοῖσι σώμασι καὶ τῇ μεγαλοπρεπείῃ τελευτῶσι πάντα ἐς τὠυτὸ γράμμα, τὸ Δωριέες μὲν σὰν καλέουσι ,Ἴωνες δὲ σίγμα: ἐς τοῦτο διζήμενος εὑρήσεις τελευτῶντα τῶν Περσέων τὰ οὐνόματα, οὐ τὰ μὲν τὰ δ' οὔ, ἀλλὰ πάντα ὁμοίως. (Herodotus 1.139)

There is another thing that always happens among them; we have noted it although the Persians have not: their names, which agree with the nature of their persons and their nobility, all end in the same letter, that which the Dorians call san, and the Ionians sigma; you will find, if you search, that not some but all Persian names alike end in this letter.

Νεοπτόλεμος δὲ ὁ Παριανὸς ἐν τῷ περὶ ἐπιγραμμάτων ἐν Χαλκηδόνι φησὶν ἐπὶ τοῦ Θρασυμάχου τοῦ σοφιστοῦ μνήματος ἐπιγεγράφθαι τόδε τὸ ἐπίγραμμα·
      τοὔνομα θῆτα ῥῶ ἄλφα σὰν ὖ μῦ ἄλφα χεῖ οὖ σάν,
      πατρὶς Χαλκηδών· ἡ δὲ τέχνη σοφίη. (Athenaeus, Deipnosophists 10.81)

Neoptolemus of Paros, in his book on epigrams, says that in Chalcedon, on Thrasymachus the Sophist's tomb, this epigram is written:
      Name: Theta rho alpha san u mu alpha chi ou san.
      Country: Chalcedon; Profession: Wisdom.
(Thrasymachus of Chalcedon is mentioned in Aristophanes' Banqueteers and Plato's Republic, so by the time he died only sigma was in use, and his epitaph writer was being consciously archaic. Note also the pre-Byzantine names of upsilon and omicron.)

εἶτα τὰς κώπας λαβόντες ὥσπερ ἡμεῖς οἱ βροτοὶ
ἐμβαλόντες ἀνεβρύαξαν, ‘ἱππαπαῖ, τίς ἐμβαλεῖ;
ληπτέον μᾶλλον. τί δρῶμεν; οὐκ ἐλᾷς ὦ σαμφόρα;’ (Aristophanes Knights 601-603)

Despite this, they [the horses] nevertheless seized the sweeps just like men, curved their backs over the thwarts and shouted, “Hippapai! Give way! Come, all pull together! Come, come! How! Samphoras! [= San-bearer] Are you not rowing?”

So the status of san as a letter is much weaker than for koppa.

Outside Greek, san has more independent status: Greek didn't need the four different sibilants it inherited from Phoenecian, but other languages found them handy.

Given its scarcity in Greek typography, the shape of san has not been normalised. The Ancient glyph is a modern M with straight legs; towards the end of its use its legs were slanted, as is shown in the reference glyph for U+03FA Greek Capital Letter San. Commentators on Aristophanes (Athenaeus, Deipnosophists 11.30, citing Aristoxenus of Tarento) describe the brand on his samphoras horses as C, which is where the old TLG glyph for san (Beta code escape #711) comes from. But that may well be lunate sigma creeping in to the text tradition: some scribe along the way—more familiar with Herodotus' throwaway lines than 600 BC Corinthian inscriptions—must have jumped to the conclusion that san and sigma were the same thing. The Unicode reference glyph for U+03FB Greek Small Letter San looks like a reverse archaic mu; I don't know if that design has any authority, but at least it cannot be confused with mu. New Athena Unicode takes this further by slanting the left leg and eliminating the right.

David Perry has a common-sense proposal of writing lowercase san as a baseline M with a descending left leg; this ties in with digamma, and seems a useful way of distinguishing san from mu. For the capital, he proposes having the middle v-shaped section not come down as far as for M; this is done for the Unicode reference glyph. Gerry Leonidas has also commented on the design of san; he basically concurs with Perry.

The letter was proposed by the TLG in November 2002, and adopted in Unicode 4.0.

4. Sho

U+03F7 Greek Capital Letter Sho [Ϸ]; U+03F8 Greek Small Letter Sho [ϸ]

Now, I wouldn't know Bactrian from Pashto. Fortunately, Nicholas Sims-Williams does, and his proposal for this character, submitted with Michael Everson, is available to fill you in. Basically, this a character a little like a thorn, made up in Northern Afganistan after Alexander the Great brought Greek script to the region, to write the sound [ʃ].

The proposal includes discussion of what the character should look like, and that should always set off warning bells (as it does for san): if there isn't an established typographical tradition for what the character should look like, then why are we encoding the character at all? In this case, it's because the character isn't anywhere near θ, and because of the general Unicode tendency to avoid script mixing.

One might well picture the different dialogues that might have gone on in 1999 and 1917:


—Michael, I need two new codepoints for a Bactrian character.

—Buíochas le Apple! How about ten different glyphs in three fonts? Whee!


—So, vot you are sayink, Sehr geehrter Herr Professor Doktor Kirste, is zat you vant a character zat looks ein bisschen like a thorn.

—Ja, for mein article on Baktrian.

—Vell, you are in luck. I happen to haff a character in shtock zat looks just like a thorn.

—And zat vould be...

—A thorn:


Different times, different results...

The ordering of sho is not settled. Everson mentions that sho could be conflated with san, and assumes san is the same as sampi, in which case sho would go between pi and koppa. Sims-Williams orders it after omega, and other scholars after sigma. Everson accepts that it should go after omega. However, his numerical argument is flawed—namely that san as the numeral 900 follows omega as the numeral 800. Sampi is not the same as san, and his argument actually conflates sampi and sho, not san and sho.

Gerry Leonidas has also commented on the design of sho.

This letter was proposed by Michael Everson and Nicholas Sims-Williams in January 2002, and adopted in Unicode 4.0.

