By Mark Sundaram
Welcome to the Endless Knot! Today, with help from my friend Jade & some math, I’m going to spell some things out for you!
When I was a kid, I really struggled with spelling. Other kids seemed to pick it up so easily, and I was told to just memorize lists of words, but no one would ever tell me why words were spelled the way they were. It was only when I learned some history of the language in university that it finally start to make sense. At first glance, English seems to have a downright chaotic spelling system, causing difficulties for young native speakers and adult second language speakers alike. Why is it ‘circus’ not ‘serkis’? Why are we so confused about whether it’s Gif or Jif? And why can a rough, dough-faced ploughman stride, coughing thoughtfully, through the streets of Scarborough?! Can’t we just simplify English spelling? Well, as we’ll see, English may not be quite as irregular as it seems, and there may actually be some benefits to those peculiarities; and maybe the problem isn’t so much the spelling as the way it’s taught, unconnected to the fascinating story of its development. Now, that’s a fairly complicated story, so I’m going to pick a few key examples, and I’ll also be filling in a lot of details later with some other videos about specific letters and sound changes. But for now, let me try to help make things make sense for you, as they finally do for me!
What is spelling anyway? Well, it’s putting the letters of words in the so-called right order. But what does that mean? You might be surprised to know that the word spelling didn’t have that meaning until the early modern period, which is when spelling first really started to be standardized in English; before that you just wrote words the way you said them depending on your own particular dialect or accent. The Old English verb spellian, from the Proto-Indo-European root *spel- “say aloud or recite”, meant “to tell or speak” and the noun spell meant “narrative or story” as well as “message or news”. That sense is clear in the second element of the word gospel which literally means “good news”. Spell could also refer to a magical incantation, a sense we still have today. But the Germanic root that lies behind the word spell also made it into French via the Franks, and there it took on a new meaning. The Anglo-Norman and Old French forms of the word espeler or espelir meant “to read out loud” as well as “read out letter by letter”. After the Norman conquest of England, the French and English words merged, and it’s from the French senses that we get the modern sense of spelling. But spell isn’t the only language word that has magical connections. The word grammar comes from the Proto-Indo-European root *gerbh- meaning “to scratch”, and in fact also gives us the word carve as well as graph, the idea being that writing was originally carved into wood or stone. From the word grammar we also get the word glamour, first appearing in Scots English ,which originally implied magic, meaning “enchantment” or “spell”, from the notion of arcane learning. Glamour then gains its modern sense from the idea that someone who is glamorous kind of casts a spell on people. So I suppose it’s not surprising that I found the English spelling system mystifying!
So one big problem is that there isn’t a consistent letter-to-sound, one-to-one correspondence in the English writing system. Some sounds require multiple letters, like the /θ/ in thin, or the /oʊ/ in oak. And some letters or letter combinations can make multiple sounds as in the words streak and steak, now and know, here and there. This makes English spelling harder to learn, so why haven’t we got rid of them to make thing easier? Part of the answer, surprisingly, has to do with the mathematics of information! But you’ll have to head over to my friend Jade’s channel, Up and Atom, to get the full story on that and why the redundancies are really useful information! In the meantime, in order to see how those redundancies and complexities of spellings came about in the first place, we need to look at the history of the alphabet.
So an alphabet is a writing system in which individual characters, at least theoretically, represent individual distinct sounds. By the way, that word character ultimately comes from another Proto-Indo-European root that implies the original carving of writing, *gher- meaning “scrape, scratch”, which came into Greek as kharassein “to make sharp” and kharakter which after passing through Latin and French give us not only the word character, but also gash. The word letter, on the other hand, is a bit of a mystery. It comes through French from Latin littera “letter”, but before that it’s uncertain. One suggestion is that it came through Etruscan (and we’ll be talking about that language in a minute), from Greek diphthera “writing tablet” originally “prepared hide, piece of leather”, which I suppose might suggest another medium of writing with ink on animal skin. Interestingly, this Greek word makes it into French and English again, as a more direct borrowing from Greek, when physician Pierre Bretonneau named the disease diphtheria on account of the leathery false membrane which forms in the throat of someone who has the disease.
But as I was saying, an alphabetic writing system theoretically can have a one-to-one sound-to-letter correspondence, but obviously that isn’t the case in English, and to understand why we have to take a look at the journey the alphabet took to get to English. And when I say the alphabet, I really do mean THE alphabet. With only a few exceptions, such as the Hangul script of Korea which was developed independently, all the alphabets used today descend from one original alphabet. The story starts in ancient Egypt with their famous hieroglyphics. This was a logographic system in which characters represented words. However, sometimes the hieroglyphs could be used phonetically to represent the consonants of the word the picture depicted, and this could be particularly useful for writing things like foreign names. Around 2000 BCE a Semitic group in Egypt borrowed from the Egyptians the idea of using pictures to represent individual consonant sounds. They borrowed the pictures from the hieroglyphics, such as a hand, but ignored the Egyptian word they represented, substituting their own Semitic word for hand, in this case kaph, and used that character to represent the consonant at the beginning of that word, in this case the /k/ sound. And that hand character eventually became our letter k. Now at this point there were only letters for the consonants, which is why that Semitic alphabet is sometimes referred to as an abjad, an acronym made from the names of the first four letters of the Arabic alphabet, rather than a full alphabet with consonants and vowels. This was fine for the Semitic languages, which tended to have relatively more consonants than vowels, so writing down the consonants is generally enough to tell you the word, and this is basically still how the writing systems work in modern semitic languages like Hebrew and Arabic. And this was the beginning of the alphabet’s journey to English, because another closely related Semitic group known as the Phoenicians picked it up. Not that they called themselves the Phoenicians—that’s the Greek word for them, literally meaning “purple people”, because they were the source of a prized purple dye extracted from sea mollusks, which they sailed around the Mediterranean selling, and also, it seems, spreading their alphabet.
And that’s how the Greeks picked it up. Now Greek was a very different language from Phoenician, not a Semitic language, but from the completely unrelated Indo-European language family. It had many more vowels, and fewer consonants. So what the Greeks did was use some of the letters that represented consonants they didn’t use for their vowel sounds. Like the first letter in the alphabet. The Phoenicians called it aleph, which meant “ox”, and the letter form was meant to represent the head of an ox with its two horns. It stood for a consonant sound that wasn’t used in Greek, but they did need to represent the vowel /ɑ/, so that character became Greek alpha, and eventually English’s letter <a>. To round things off, the next letter in the Phoenician alphabet, bayt meaning “house” and representing /b/, became Greek beta and English <b>, and together those first two letters, alpha and beta, give us the word alphabet, appropriate since the Greek alphabet is the first full alphabet including vowels as well as consonants.
The next stop for the alphabet was the Etruscans, a group of people who lived in the part of Italy known today as Tuscany. The Etruscan language is not Indo-European, and in fact is not related to any other known language, what linguists call a language isolate. So again, this language had a rather different sound system compared to Greek, and so some adaptations had to be made to fit the letters to the language. And from there the alphabet rolled down into Rome, where it became the basis of the Latin alphabet, which in turn spread around Europe and ended up as what we write English with today, with a few extra letters added in and some tweaks to the sounds some of the letters make; and that’s why the English alphabet is often called the Roman alphabet.
Now why is it so important to know all of this to understand English spelling? Well, each time the alphabet moved from one language to another, it produced redundancies and quirks in the letter-to-sound correspondences. For example, the /k/ sound. As we saw before, this was represented in the original Semitic alphabet as kaph. But the Semitic languages had more varieties of consonants produced at the back of the throat than Greek did, so the Greek alphabet didn’t need all those distinct characters. Kaph it kept, which became kappa, and later English <k>. The Greeks also initially kept the letter qoph, forerunner of our letter <q>, although it was redundant for them, and they later dropped it. The Phoenicians also had a /ɡ/ letter, called gimmel, which became Greek gamma. /ɡ/ and /k/ are similar sounds, but it’s an important distinction in Greek (as it is in English). But in Etruscan it wasn’t, although that language had a number of other varieties of back of the throat sounds. So they didn’t need that Greek gamma, and assigned another type of K sound to that letter, in addition to keeping both <k> and the <q> from early Greek. And notice that the gamma looks a lot like the letter <c>? Well that’s how we got the letter <c>, making a /k/ sound, not the hard /ɡ/ sound of Greek gamma. And then when the Romans got their hands on the alphabet, there was no longer a letter to represent the /g/ sound, which Latin DID have, so initially they used the letter <c> to represent both /k/ and /ɡ/. They eventually invented the letter <g> by putting an extra stroke onto a <c>, but that was only later. That’s why the common Roman name Gaius was abbreviated with the letter <c>. For whatever reason, the Romans didn’t uses the letter <k> very much, though it hung around as a quaint redundancy. As for the letter <q>, for the Romans it also represented a /k/ sound, but was restricted to the letter combination <qu> followed by a vowel sound, which was common in Latin. And that’s why English has the redundant letters <k>, <c>, and <q>, often the target of those who complain about the English spelling system. We’ll come back to the letter <c> and the multiple sounds it can represent in Modern English later.
Now this problem of new languages using this old system came up again when Old English speakers started to use the Latin alphabet to write down their Germanic language which has sounds not present or distinguished in Latin. The Anglo-Saxon scribes coped by adding in some letters from their own earlier runic writing system or modifying existing letters in the Latin alphabet. Later on, after Viking invaders conquered and settled in large parts of the country, there was an influx of Norse loanwords. At least Old Norse and Old English were related languages, but there are some significant differences, which led to further adaptations of the spelling system. But the biggest shake up came after the French-speaking Normans conquered the country. In addition to a vast amount of French vocabulary with its own sounds and spellings that came into the language, the Norman scribes didn’t like the barbaric Old English spelling conventions and began spelling the Germanic-derived English words in new ways. So it’s this mashup of different spelling conventions, and a bunch of snooty scribes, that made my life so hard as a kid!
For example, /dʒ/, a sound not in Latin, had been spelled in Old English as <cg> as in the word ecg, but under the Normans was now spelled <dge> as in the modern spelling, and that convention was eventually carried over to some words of French origin as well such as judge. But what about the /dʒ/ sound at the beginning of that word? What about the letter <j>? Well it hadn’t really been invented yet. In fact it’s the most recent addition to the English alphabet. In Latin the letter <i> did double duty representing both the vowel /i/ sound and the closely related consonant /j/. But as the various local dialects began transforming into what would become the Romance languages, that /j/ sound began to shift to a /dʒ/ sound in early French. But it was still spelled with the letter <i>. So Latin Iupiter became Jupiter, though still spelled with an <i>. The <j> letter form did grow out of the letter <i>, but it wasn’t at first used to differentiate between the two sounds, it was really just a fancy way of writing the same letter. It wasn’t until 16th century French that the letter <j> started to be used systematically, and not until the 17th century did it arrive in English. In fact as late as the 18th century, when Samuel Johnson wrote his famous Dictionary, though he did use the letter <j>, he interfiled all the <i> and <j> words together. It wasn’t until later lexicographers such as Noah Webster that the letter <j> got its own section in dictionaries. So that explains the two /dʒ/ sounds in judge which came from Latin iudex. If only they’d taught me etymologies in school I’d have won all the spelling bees. Not that I’m judging.
But you can also spell /dʒ/ with a <g>, so what’s up with that? Well in Latin the letter <g> always made the so-called hard /ɡ/ sound. But again as French developed out of Latin, the letter <g> when it came before a front vowel, that is vowels produced towards the front of the mouth such as /i/ and /e/, it came to be pronounced /dʒ/. A similar sound change had already happened in Old English with /ɡ/ in some contexts becoming /j/ which Norman scribes started to spell with the letter <y> as in yard. Confused yet? Don’t worry, it gets worse. So we see French loanwords in English like gentle, following our hard-G soft-G rule that we’re taught in elementary school. But there are exceptions, I hear you say. What about words like get and give? Well here’s where we see the influence of Old Norse. Get was a loan word from Old Norse, where /ɡ/ hadn’t changed at all. And though give did exist in Old English with that /j/ sound as giefan and should have become *yive, the word also existed in a related Old Norse form in the north of England with a hard-G and therefore give has the pronunciation it does today. So neither word is subject to the hard-G soft-G rule derived from French, and you can generally identify a word as coming from or influenced by Old Norse if it breaks that rule. So the important question is: gif or jif? Norse or French? Well as far as I’m concerned it’s an English word so it should be yif!
Now Old English did of course also have a hard /g/ sound so that mapped easily onto the Roman letter <g>. But it also had a couple of guttural sounds that didn’t exist in Latin, which the English scribes spelled with either <h> or <g>, in addition to still using those letters for their previous Latin sounds. But again the Norman scribes turned their noses up at that double use of letters, and instead often used the combination <gh> to represent those guttural sounds. But why, then, is <gh> pronounced in so many different ways in Modern English?
Well, first of all, there were actually three slightly different guttural sounds in Old English and the sounds diverged in different ways, and some scribes changed the spellings to reflect that and some didn’t. In some contexts, the guttural sound became a /w/ sound and came to be spelled <w> in Modern English, as in the Old English word boga becoming Modern English bow. But notice that Old English plog, sometimes spelled with a <g> and sometimes spelled with an <h>, is spelled in Modern English as either plow or plough. Similarly we have Modern English words with a <gh> spelling like dough and bough, which were spelled with a <g> in Old English, and through and though, which were spelled with an <h> in Old English. In some cases, such as when following a front vowel, the guttural sound of <gh> just disappeared, as in high and night. And in one surprising sound change the guttural sound became /f/ as in rough, particularly in northern dialects of English. This one’s so weird I’ll have to cover it in a separate video! As for the different vowel sounds of the various words spelled <ough>, they often represented quite different vowels in Old English which all got lumped together under the one spelling and therefore developed in very different ways.
So to summarize, this train wreck is the result of the shifting spelling conventions in Middle English and subsequent sound changes that happened. Unfortunately the <gh> spellings became standard even though we no longer pronounce those guttural sounds.
Now let’s return to the letter <c> again and consider another sound it makes. Why do we have soft <c> and hard <c>? Well, this is a sound shift that happened as Latin became French. In Latin, <c> always indicated /k/. But as the various Romance languages developed out of Latin, as with the letter <g>, when /k/ came before a front vowel it changed, eventually becoming /s/, and the French-speaking Normans brought that with them to England, so we now have the hard-C/soft-C rule.
And these are just some of the different spelling conventions that influenced English spelling. In addition to the various French conventions, English has also grappled with spellings from Greek, filtered through the Latin system of transliterating Greek words, as well as loanwords from languages from around the world, such as Dutch, Hindi, and Arabic. But that’s a journey for another video—for now, let’s look at another source of my scholastic struggles, namely sound changes in English itself.
Sound changes are of course a natural part of all languages over time, so this is always a potential problem for phonetic writing systems. If you have a one for one letter-for-sound correspondence, then over time you either have to change the way you spell things or live with the fact that the letters stop matching the sounds. We’ve talked about a number of changes that happened to consonants so far, and there have been A LOT of changes to vowels too. But I’m going to focus on the most important one in terms of its effect of spelling, which has to do with the short and long vowels. Originally short and long vowels in Old English, as in Latin, were just that, short and long in terms of duration, with the quality of the vowel sound more or less the same, and I’m simplifying slightly here to make this a little easier. The letter <a> represented /ɑ/ and was pronounced quickly /ɑ/ or held longer /ɑ:/. So it wasn’t too much of a problem representing both the long and short versions of a vowel with the same letter. And if you speak other continental European languages like French or Italian, you know that’s still roughly true in them. But something weird happened in English, right around the time that Middle English was becoming Early Modern English, gradually changing the sounds of those long vowels over a few hundred years. But it didn’t affect the short vowels, so we ended up with the vowel letters representing quite different sounds. (Again, I'm simplifying a bit here as there were some more minor sound changes that did affect the short vowels in Middle English.) So the short /ɑ/ in swan remains basically the same from Old English to Modern English, but the long /ɑ:/ in Middle English name became name in Modern English. This change is called the Great Vowel Shift because it affected the whole system of long vowels, with each vowel in turn moving in its position in the mouth. So /ɑ:/ became /e:/, /e:/ became /i:/, /i:/ eventually became /aɪ/ and so forth. And again, I swear I’m simplifying here! But that’s why today we often say to children learning to spell that the long vowels say their name, A, E, I, O, U. This is also why it’s become more important in Modern English to indicate long and short vowels in the spelling system. There actually had been earlier attempts at that, well before the Great Vowel shift. In the 12th century a little while after the Norman Invasion, a monk named Orm, who is now only remembered for his spellings not the literary quality of his work (yes it’s that boring), was unhappy with the way people were pronouncing English, and developed his own system of spelling. This included using a doubled consonant to indicate that the preceding vowel was pronounced short. We do that today as in the words write and written, but we don’t do it because of Orm. No one actually picked up on Orm’s spelling reforms, but the same idea was reinvented by later scribes. Poor Orm. Also, in the Middle English period, many of the Old English inflectional endings, basically word endings that indicated the grammatical functions of words, began to become reduced or disappear altogether, with different vowel sounds becoming an indistinct /ə/ or schwa sound spelled simply with the letter <e>, and over time those <e>s stopped being pronounced altogether. But they stuck around as the so-called silent E, useful for marking the preceding vowel sound as long.
But what’s really crucial here is the timing of the Great Vowel Shift, along with the other sound shifts that were taking place at the end of the Middle English period, since this was right around when standard spellings started to be fixed. Since the pronunciation of English at that time was so radically in flux, the spellings that became fixed reflected sometimes older and sometimes newer forms, leaving us with the mixed bag of spellings we have today. There had been earlier attempts at standardized spellings, but in the 15th century, there were two factors that fundamentally influenced the standard spellings that we have today.
The first is the development of the so-called Chancery Standard, which was used in official government writings in the first half of the 15th century. It actually started with King Henry V, who in August of 1417 decided to communicate with his officials in English rather than French. The Signet Office, which was in charge of his personal communications, developed standard spellings based on the Central East Midland and London dialects. From there it spread to the other government offices, and as official documents were sent around the country other professional scribes began to adopt this standard.
The other major factor is the arrival of the printing press. William Caxton, born in Kent, relocated to Bruges (in what is now Belgium), working in the textile industry. He wrote an English translation of a French account of the Trojan War, and, after he picked up the technique of printing during a trip to Cologne, printed the first book in English, his own translation, in 1475. Then in 1476 he moved back to England and set up his printing press in Westminster, near all those government offices, and began his printing business. Caxton was well aware of the problems posed by the variety of dialects around England. For his books to sell, they had to be widely understandable. In the prologue to one of his books he tells a story which really shows the scope of the problem. A certain merchant from the north of England, visiting London, tries to buy eggs from a local southern woman. He asks for egges and the woman replies that she can’t understand him because she doesn’t speak French. The merchant gets upset, his egg craving being unsatisfied, since he also could speak no French, until a bystander steps in to translate telling the woman that he wanted eyren. This slapstick comedy story of a food order gone wrong is based on the fact that the northern form egges, which comes from Old Norse, and the southern form eyren, which comes from Old English, are so different. And if you can’t do something as simple as order some eggs, how are you going to publish books understandable by all? Caxton’s solution was to publish in the London standard, rather than his own native Kentish dialect, which he considered crude, and other printers soon merged this with Chancery English and spread those spellings even further. Of course it wasn’t all smooth sailing. Early printed books were often inconsistent in their spellings such as the silent <e> being dropped or added to equalize line lengths, and odd things sometimes crept in like the <h> in the spelling of ghost from the influence of Flemish printers (possibly introduced by Caxton himself). But in the end Chancery English and the printing press give us the modern English spelling system we’re stuck with today.
There have been many attempts and proposals over the years at reforming the English spelling system, in fact almost since standard spellings arose. An early one worth noting is Sir Thomas Smith’s who in 1568 proposed a system involving a 34 character alphabet which for instance reassigned the redundant <c> to the /tʃ/ sound, added characters, and used diacritics or accent marks to show short and long vowels. Others were more conservative such as William Bullokar’s 1580 proposal which stuck to only the already existing characters plus diacritics. He also wanted to drop unnecessary double consonants and silent <e>s, and objected to the so-called etymologically based spelling. This is when, for instance, the silent letter <b> is added to words like debt and doubt because it shows they came from the Latin words debitum and dubitare, even though they were never pronounced that way in English. In another example, the <s> was added to island because of the mistaken belief that it was connected to the Latin derived word isle (from Latin insula) when in fact island came from the unrelated Old English iegland and never had an <s> in there to begin with. I’ll admit that if only this one suggestion had been taken up, my life would have been much easier! But spelling reformers over the years more or less split into either conservatives or radicals, either tidying up the worst inconsistencies or reforming the whole system. What the more conservative reformers realised was that radical proposals were unlikely to be accepted and would create the difficulty of learning a whole new system. But that didn’t stop the proposals.
The two individuals most influential on English spelling standards were the dictionary writers Samuel Johnson and Noah Webster. Dr Johnson started out initially as a language reformer, but soon realised this was impractical, and his ultimately conservative spellings used in his great Dictionary served to further entrench existing standards. The American Noah Webster, on the other hand, ended up being the only successful reformer of the English spelling system. In the various editions of his Dictionary of American English and spelling books, he started out rather conservative in his reforms, then later radicalized, and then gradually became more and more conservative again. But he is why the American spelling system to this day differs from the British system, which has in fact made things harder for all of us!
Now I know I said I wished some of these reforms had happened, but really what I wish is that I’d been taught some of this history way back in school. Because I think there are some real benefits to the spelling system as it now stands. First of all it tells us so much about the history of the language. And there are some advantages to having a spelling system that doesn’t have a simple one-to-one letter-to-sound correspondence. It helps us distinguish between “the rights of the Church” and “the rites of the Church”, or more recently between “fishing” and “phishing”. And how would a strictly phonetic writing system work with the many different accents around the English-speaking world? If you based your system on only one of those accents it would be a highly political decision, favouring some and disadvantaging others. And it would obscure the relationship between many words such as nature and natural which currently use the letter <a> to represent quite different sounds. And finally a somewhat illogical spelling system gives so much scope for creativity from brand names like Flickr to text speak like gr8 to the unpronounceable pwn.
Leave a comment or use the community tab to tell me about your most hated English spellings, and maybe I’ll try to explain them in a follow-up video. I’ll also be doing some videos exploring the detailed linguistics and phonology of some of the letters and sound changes I covered here, as well as some others I didn’t have time to include, probably in the summer. For now, please head over to the Up and Atom channel to learn more about the fascinating mathematical concept of entropy and how it’s connected to spelling and writing.
Thanks for watching! If you’ve enjoyed these etymological explorations and cultural connections, please subscribe, & click the little bell to be notified of every new episode. And check out our Patreon, where you can make a contribution to help me make more videos. I’m @Alliterative on Twitter, and you can visit our website alliterative.net for more language and connections in our podcast, blog, and more!