Mini-Lecture on Linguistics

Indo-European Languages

The Indo-European language super-family is sometimes called Aryan. The Persians’ name for themselves was the Arya; the term is probably related to the Greek arth-, aristo- root, i.e., fitted together, best. Aryan and Iran, the name of their country, are genitive plurals, “of the Aryas”. The historical Persian Empire called itself “Iran and Non-Iran”, making a sharp distinction between the rulers and their captured territories. As a linguistic term, Aryan has given way to “Indo-European” mainly because of the evil connotations the Nazis managed to give to the perfectly good word “Aryan”. It’s ironic that the Nazis most certainly did not consider Iranians as Aryan by their definition. Needless to say, the Germans call the language family Indo-Germanic. They were definitely stretching things when they claimed that Jesus was Aryan, but cf. the many conventional pictures of Jesus with blond hair and blue eyes from the Renaissance onward. Even so, I’m not quite sure how the Nazis justified the official decision during World War II that their Japanese allies were also Aryans.

To be fair to the Nazis, this racist sense of Aryans as blue-eyed blonds by no means originated with them; it goes back at least as far as a book entitled Essay on the Inequality of the Human Races written in French in 1855, which argued that Europeans must be the “true” Aryans, since it was ridiculous to think that the most perfect language (French) could have been derived from that of darker-skinned Asiatics. The Nazis picked up and used this bit of “science” (and their national symbol — swastika is Sanskrit svastika, good luck) just like they acquired the idea of the Übermensch (beyond-man, or superman) from the 19th-century philosopher Nietzsche, who had used the term to describe the species which he felt would someday replace humans through evolution.

Parenthetically, there is also a book “proving” that Jesus spoke Welsh. At least there actually were Keltic speakers within a couple hundred miles of Palestine (the Galatians), but the book also “proved” that American Indians spoke Welsh. By complete coincidence, the author was from Wales. There’s another book, by an Indian, of course, “proving” that England, Rome, Dutch, and Arab are all Hindi words. Before scientific linguistics established the major human language families during the 19th century, most Europeans assumed on Biblical grounds that all languages from Chinese to Navajo to English were derived from the Hebrew spoken by Adam and Eve, and from the time of the first Spanish and Portuguese explorers, many people theorized that the indigenous Americans were descendants of the “lost tribes” of Israel.

Some curious linguist once tried a scientific experiment — a baby was raised without allowing it to hear any language at all, to see if it would start babbling in Hebrew. The test unfortunately proved inconclusive because the toddler never said anything at all. If this abuse had continued for any length of time, the child probably would never have spoken, even when returned to “normal” surroundings. Studies of feral children (those raised by animals, or in some other way prevented from interacting with other humans) show that these unfortunates cannot develop language at all if it hasn’t been started by perhaps age three. It doesn’t have to be spoken — if mommy and daddy make a conscientious effort to sign around a congenitally deaf child, it will pick up (and babble in) sign language. (Although Helen Keller was eight when she first “got” the idea of sign language, as depicted in The Miracle Worker, she had normal sight and hearing until she was 18 months old, so she had been exposed to language and had a typical toddler vocabulary of several dozen words before her illness.)

Archaeologists and Linguists are still uncertain about the original homeland of the proto-Indo-Europeans, although the best current guess is the so-called Kurgan culture that lived in the south Russian steppes above the Black Sea and the Caspian around 4,000 bce. Cf. Caucasian for another (but now discredited) guess on the IE homeland. In fact, there is little solid ground between the Atlantic and China that hasn’t been “proven” to be the original location by some scholar or another.

We can learn a surprising amount about the Indo-Europeans just from their ancient vocabulary. There are common words for “snow”, “beech”, and “birch”, but not for “palm tree” or “sea”. Common ie plant (grain, cherry, apple) and animal (bear, beaver, fox) names indicate they lived in a temperate inland climate. Terms for family relationships show their society was patrilineal (children belonged to the father) and patrilocal (a wife moved into her husband’s family). Indo-European has common words for horse, cattle, pig, goat, sheep, and dog, showing that all these had been domesticated. ie terms for weaving (wool was the main fabric), wheel, mill, wagon, carpenter, plough, and copper or bronze demonstrate their level of technology. As one might expect, the words for mouse, louse, snake, fly, wasp, and bedbug are equally ancient.

The earliest Indo-European we can reconstruct — six thousand years old — was already a highly-evolved language, far more complex than modern English. Everything was declined, nouns and adjectives had three genders, verb conjugations were complicated, and in addition to singular and plural, ie had a “dual” ending for referring to exactly two of something. Obviously there must have been a long line of prior languages that eventually became Indo-European. Professors with too much time on their hands try to demonstrate connections between pre-Indo-European, pre-Altaic, pre-Uralic, and pre-Semitic, but the general concensus is that until somebody invents a time machine, this is just too far back into the fog of pre-history to ever prove anything.

The main living branches of Indo-European are Germanic (German, English, Dutch, and the Scandinavian languages, mainly), Italic (Latin and its descendants — French, Italian, Spanish, etc.), Persian (Avestan and its descendants — Farsi, Pashto (Afghan), Kurdish), Indic (Sanskrit and its derivatives — Hindi, Urdu, Bengali, Romany (Gypsy), etc.), Slavonic (Russian, Ukrainian, Polish, Serbo-Croatian, Czech, etc.), Hellenic (Greek), Keltic (Irish and Scots Gaelic, Breton, Welsh), and a couple of orphans, Armenian and Albanian, linguistically isolated when the Turks took over Anatolia and the Slavs overran the Balkans, respectively. Some scholars split off Baltic (Lithuanian, Latvian, Old Prussian) as a separate family from Slavic, while on the other hand many books merge Persian and Indic into Indo-Iranian on grounds that Avestan (Old Persian) and Sanskrit were quite similar.

Quite a few dead languages like Cornish, Manx, Gothic, Thracian, Scythian, Old Norse, Umbrian, Old Church Slavonic (Glagolitic), Lydian, Mycenaen, and Hittite were Indo-European, as were two now-extinct branches, Illyric and Anatolian. Latin and Sanskrit are “dead” in the sense that nobody uses them as a mother tongue, but both are alive and well as liturgical languages. Hittite (the major language of the Anatolian branch) is actually the oldest ie language to leave a written record — they obliged the archaeologists by using a cuneiform script on indestructible clay tablets like their Semitic neighbors the Assyrians and Babylonians. The earliest known tablets date from about 1800 bce. The Rig-Veda (in Sanskrit) and the sayings of Zoroaster (in Avestan) are also ancient (perhaps 1500 bce), but they weren’t written down for a thousand years after their composition. The oldest written records in a European language are the so-called Linear B tablets found on Crete and dated to about 1400 bce. They are in Mycenaean, an ancient form of Greek.

Bronze Age civilization in the Middle East and the eastern Mediterranean totally collapsed between 1200 and 1150 bce — archaeologists show that almost every palace and village in Greece, Crete, Cyprus, the Levant, and Anatolia was burned down at that time. The Mycenaean and Hittite empires disappeared forever, and Egypt was invaded by barbarians but survived. There was a “Dark Ages” until about 800 bce, with nothing but tiny disconnected villages. In that 400-year interval, there is no evidence that the Greeks were literate, and when they started writing again (in the so-called “Iron Age”), Linear B had been forgotten, and they used a modified Phoenecian alphabet. Linear B wasn’t deciphered until the 1950’s. (Linear B had about 80 symbols. Since that’s too many for an alphabet and far too few for an ideographic system like Chinese, the scholars deduced that it must be syllabic, and so it proved.)

After the collapse of the Hittites, there was also a 400-year “dark age” in Anatolia and Mesopotamia until the rise of the Iron-Age Assyrians around 800 bce.

The easternmost ie language is called Tocharian, now extinct, once spoken by a people in what is now western China. Tocharian shows some similarities to Greek, which tended to drive early scholars crazy, since there seemed to be no possible point of contact. For example, they are the only two known languages to have the pan- root meaning “all”, as in Pan-American, Pandora, panorama, pandemic, and panties, q.v.. It is now recognized that Greek and Sanskrit were once almost the same language (see below), so if the Greeks originated on the steppes it makes the connection a little more probable.

Oops — I have to qualify my list of extinct languages, since there is a tiny pocket in the Caucasus still speaking Ossetic, which is a descendant of Scythian.

In general, for a word to be called “Indo-European”, it has to be found in at least one European and one Asian branch, or at minimum, in one existing language plus Hittite. Many common roots, though, left descendants in almost all branches. For example, the ie kerd- root meaning “heart” is in eleven groups:

   Germanic: heart (English)
   Italic: cor (Latin)
   Hellenic: ker/kardian (Greek)
   Keltic: cride (Old Irish)
   Baltic: sirdis (Lithuanian)
   Slavic: swerdce (Russian)
   Persian: zered (Avestan)
   Indic: hard/hardaya (Sanskrit)
   Armenian: sirt
   Anatolian: kir (Hittite)
   Tocharian: karyan

As described below, Indo-European /K/ changes to /S/ in Balto-Slavic and Indo-Iranian and to /H/ in Germanic, so the resemblances are closer than they look at first glance. Another example is ie melh-, to grind. This is also in the same eleven families and is even more recognizable, because all ie languages have retained the prehistoric /M/ sound:

   Germanic: mill (English)
   Italic: molo (Latin)
   Hellenic: mule (Greek)
   Keltic: mhuilinn (Scots Gaelic)
   Baltic: malu (Lithuanian)
   Slavic: maljo (Old Church Slavonic)
   Persian: marnati (Avestan)
   Indic: marnati (Sanskrit)
   Armenian: malem
   Anatolian: mall (Hittite)
   Tocharian: mely

In both these lists, the “odd man out” is Albanian, the only branch where the words have not yet been found. As mentioned elsewhere, very few genuine Albanian words have been preserved in either of the two modern dialects of the language — Gheg (influenced by Turkish and Slavic) in the north and Kosovo and Tosk (influenced by Greek), the “standard” Albanian, in the south and the cities.

The world record for the same word in different languages is held by ma and its variations. Everyone assumes it is imitative, from the “ma-ma-ma” sound of a baby at the breast. English has mother, mama, maternal, mammal, and so on. All the Indo-European languages have it — French mère, Spanish madre, German mutter, Latin mater and matrix (“womb”), Russian mat, and Hindi mata, for example. Outside of ie, the Chinese word for mother is ma, too. In Turkish it is mut. In Arabic it is ommah, although these days it is usually reduced to umm. “Mother of battles” is translated from Arabic umm al-ma’arik. (In this sense, “mother” means “chief” or “most important” — cf. “mother lode” and “motherboard”.) Many other languages have similar words. (Pa- words for father (pater, père, padre, father itself…)are also quite common and are also presumably derived from a baby’s babbling.)

A baby’s ma-ma-ma… mouthing could just as easily be interpreted as am-am-am…, and so many languages also have an am- root that means “love”. From Latin amor we have amorous, enamor, paramour, and amateur. This got softened into amicus, friend, producing amicable, amiable, amity, and the Spanish amigo. Inimical is “not friendly”, and French massaged that into enemy and enmity. Yet another member of the family is aunt, Latin amita.

At this time, approximately 3 billion people have an Indo-European mother tongue — almost half the world’s population. The largest are Hindi/Urdu, English, and Spanish, each with 350-400 million native speakers, followed by Russian, Portuguese and Bengali with about 200M. Other language super-families with a large number of present-day speakers include Sinic (Chinese — over a billion simply for Mandarin), Austronesian (Malay, Indonesian, Tagalog — 250 million, or possibly 300 million if one includes Thai), Semitic (Arabic, Ethiopian, Hebrew — 250M speakers, mostly of Arabic), Altaic (Japanese, Korean, Mongolian — 200M), Niger-Kordofanian (many western and southern African languages, the largest of which are Mandinko, Zulu, Swahili, and Yoruba — almost 200M), Dravidian (Tamil, Malayalam, and the other languages of southern India — the original languages of the subcontinent with 150M speakers), and Ugric (Turkish, Finnish, Estonian, and Magyar (Hungarian) — about 100M).

About a billion people understand English either as their primary or secondary language, and you are currently consulting one of the reasons. The explosion of the Internet has meant an explosion in the use of English, because the vast majority of the world’s web sites are in English, to the extent that France had to pass a law making it a felony to have web sites in anything but French. Given the world-wide nature of the Web, such laws don’t have much chance of success. Somewhere there is a jurisdiction where the people don’t wear clothes, the age of sexual consent is six, gambling, drugs, piracy, pornography, and libel are legal, financial secrecy is absolute, patents, copyrights, marriage, and organized religion are unknown, and there are no extradition treaties. Trust me, that place is going to be chock full of web servers, not to mention hackers of all the sites that aren’t. Think of Delaware (and now Bermuda) corporations, Liberian or Panamanian oil tankers, Irish authors and artists, and Swiss, Caymans, or Liechtenstein banks as examples of migrating to where the laws are friendly.

There has already been a case of an Internet server farm taking over a World War II anti-aircraft platform in international waters in the North Sea which had previously declared itself the independent country of Sealand, complete with its own laws, currency, passports, etc. Needless to say, the Principality of Sealand insists that no other courts have jurisdiction in its territory.

Keltic was once much more widespread than at present. A couple of thousand years ago, Keltic speakers ranged all the way across north-central Europe from central Turkey to the Atlantic. The famous statue of the Dying Gaul is from Pergamon, in Turkey, and those Anatolian Gauls were the Galatians who received a letter from St. Paul. Going westward, Vienna, Paris, and London are all Keltic place names. Gradually, the encroaching Germans and Slavs pushed the Kelts right to the brink of the Atlantic Ocean, where they clung to the edge of the world in Ireland, Scotland, Wales, Cornwall, and Brittany. (Even within Ireland, Galway (land of the Gaels) is one of the westernmost counties, bordering the Atlantic.) Keltic is the only living branch of Indo-European which is an endangered species — many scholars think there will be nobody speaking Keltic as a mother tongue within fifty years.

Although the world currently boasts almost 7,000 languages, many are now spoken by only a few people, and those speakers tend to be elderly, as the younger generation learns a “standardized” language in school, from TV and movies, etc. Sometimes this process is helped along by government prodding — China’s efforts to standardize on Mandarin is a good example, or the Soviet Union’s promotion of Russian. According to a recent report, the 3,500 smallest languages account for only 0.2% of the world’s population, and the average speaker is in his or her 60’s. At the moment a language goes extinct (the last speaker dies) every two weeks, so most of these “tiny” languages will die out in the next century. It is unlikely that new ones will arise to take their place. For a new language to split off from an existing one, it’s necessary for a population to be isolated, and that’s just about impossible in our modern Internet/satellite TV/cellphone world. (When the New World was first populated by Europeans, many people thought that the settlers would develop their own languages, but even the Atlantic didn’t provide enough of a barrier to allow English, Spanish, French, and Portuguese to wander very far from European norms, courtesy of books, newspapers, and a constant flow of new immigrants. There’s even less reason to believe the colonies on the Moon, Mars, and Alpha Centauri will linguistically stray, unless the human race falls back to a state of pre-technology and pre-literacy.)

The pre-Columbian natives of the New World had hundreds of languages, and very few still exist today. Preservation of endangered species has been a priority with linguists just as much as with biologists, and so when sound recording was invented in the late 19th century, anthropologists dashed around the continent sticking microphones under the noses of hundred-year-old natives, trying to capture languages before the last speakers died. In other fields, preservationists worked to save folk stories (cf. the Grimms or Harris, mentioned below) and folk music (Child, Lomax) for the same reasons.

02Jul11 Trying to classify the languages of the New World into larger families is an excellent way to start a fist-fight at a convention of linguists. Everybody agrees on the Aleut (aka “Eskimo”) family and the Na-Dené family, comprising Athabaskan, Navajo/Apache, and some languages in eastern Siberia, but from there everything goes to hell. One prominent school claims that all other North and South American languages have a common ancestor, dubbed Amerind. (At root this is an archeological and anthropological issue — the real question being how many different groups settled the New World at how many different times, so the Amerind supporters claim there was only one other wave of settlement besides Aleut and Na-Dené.) At the other extreme, at least one prominent school maintains that Basque is related to Navajo, and that the largest member of this alleged super-family is Chinese!

If that citation of 7,000 languages seems high, here is a more or less complete list of existing Romance languages, going more or less west to east across Europe: Portuguese, Galician, Eonaviegan, Asturian, Mirandese, Extremaduran, Spanish, Ladino, Aragonese, Catalan, Ribagorcan, Roussillonese, Valencian, Balearic, Alguerese, Occitan (langue d’oc), Gascon, Aranese, Provencal, Francoprovencal, French, Picard, Walloon, Norman, Jerriais, Dgernesiais, Gallo, Franc-Comtois, Champenois, Poitevin, Bourguignon, Lorrain, Friulian, Ladin, Romansh, Piemontese, Ligurian, Lombard, Emilio-Romagnolo, Italian, Venetian, Sicilian, Sardinian, Campidanese, Logudorese, Gallurese, Sassarese, Corsican, Istriot, Rumanian/Moldovan, Istro-Rumanian, Megleno-Rumanian, and Macedo-Rumanian.

The island of New Guinea, split into Papua New Guinea and the Papua province of Indonesia, has about 1,200 documented indigenous languages. Since the island’s population is less than nine million, that averages out to 7,000 speakers per language, occupying regions not much over ten miles on a side. Indonesia (740) and Papua New Guinea (820) between them have 1,560 living languages, so those two countries alone account for almost a quarter of the world’s total! If the United States had the demographics of New Guinea, it would have 42,000 languages, over 1,100 just in New York City.

Some of these are called “languages” and some are called “dialects”, but in practice, a language is a dialect that has acquired an army and navy. (Similarly, a “cult” becomes a “sect” when it has achieved some degree of respectability, but a “religion” only when it gets a critical mass of political power.) It has been frequently noted that Austria’s mobilization order in 1914 had to be issued in 15 languages, which makes the debates about the status of French in Canada or Spanish in the United States seem pretty trivial. In turn, the Austrian language problem was undoubtedly regarded as pretty trivial by the Soviet Union. Each of the fourteen republics of the USSR had its own official language (Russian, Ukrainian, Estonian, Armenian, Georgian, Azerbaijani, Moldovan, Tajik, …) and there were at least another one hundred indigenous languages spoken somewhere or other in the country, despite strenuous efforts to get everyone to learn Russian, at least as a second language.

Some languages are historical constructs, where people speaking different languages wound up in close contact and established either a blend of the two or a simplified version of the dominant group’s language, usually for commercial purposes. Linguists call these a creole if the language eventually became a mother tongue, or a pidgin if not. A good example of a creole is Swahili, which is a blend of Bantu and Arabic. It’s a mother tongue only in a small portion of East Africa, but much of southern Africa knows it as a second language for communication between different groups. The original Pidgin English or Tok Pisin was a pidgin developed by Pacific Islanders imported into Australia as plantation workers. It’s based primarily on Australian English with native Austronesian additions, but it also has become an official language (and therefore a creole) in Papua New Guinea and the Solomons.

In the United States, Gullah, still spoken on the sea islands of South Carolina and Georgia, is a creole of English and West African. Linguists think there probably were quite a few “plantation creoles” among West African slaves transplanted to America, but Gullah is the only survivor, presumably because of its isolation. Today, not many people know that Joel Chandler Harris, a white journalist who was determined to accurately preserve the Black speech patterns and folklore of his native Georgia, wrote a series of Daddy Jack books using pure Gullah in addition to his more famous Uncle Remus “plantation dialect” children’s stories. It’s obvious from the latter books that his Uncle Remus character — an elderly ex-slave — speaks Gullah as his mother tongue and is semi-translating into English for his young white listeners. A few Gullah words have made it into standard English — the foodstuffs goober, gumbo, and yam, for instance. The “juke” of jukebox is another Gullah word; it meant wicked, and the musical apparatus was so-named for residing in a “juke joint”, a disorderly roadhouse.

Actually, a remnant of another American English/West African creole still exists in a small area of the Dominican Republic where some US slaves were imported in the 1820’s.

Note that an argument could be made that modern English is a creole of Old English and Norman French. The counter-argument is whereas creoles usually have a simplified grammar that is a blend of the parents, modern English still has a purely Germanic grammar; as mentioned elsewhere, about the only trace of French grammar is in noun phrases where the adjective comes after the noun — “attorney general” for instance. 02Jul11 Another Frenchification of English was the use of /-S/ to form plurals, instead of the native English /-EN/. The only remnants of this in standard English are the fossils children, oxen, brethren, and kine (plural of “cow”), but to this day, several British dialects still make use of -en plurals.

Another situation is when an existing language spreads over a much wider area. The canonical example is the use of Greek in the Roman empire. Nobody could be “educated” without knowing Greek, even if their mother tongue was Latin, Keltic, Germanic, Egyptian, Slavic, or whatever. This is called a koine, Greek for “common”. Another example is Latin in western Europe during the Middle Ages, and the Internet might be turning English into a world-wide koine today. Note that a koine has to spread due to convenience, so a language imposed politically (Mandarin in China, Russian in the Soviet Union) doesn’t count.

Existing “big” languages will continue to gradually drift in vocabulary, syntax, and pronunciation, just as King Alfred’s English became Chaucer’s, which became Shakespeare’s, which became ours, but it is a safe prediction that the English of 3000 ce will still be recognizable to Chaucer, let alone to us.

Barbarians at the Gates

It’s easy to lump all the invaders of the Roman Empire as “barbarians”, but there was a major difference between the Goths, Franks, Lombards, and Vandals, who were Germanic-speaking Indo-Europeans, and the Huns, who were Ugric speakers from the Central Asian steppes. The Vandals migrated all the way through the Roman Empire and eventually settled down in southern Spain (bumping out yet more Kelts) and North Africa. (In fact, the tribal name is related to wander.) Southern Spain is still called Andalusia; it was originally “Vandalusia”, but the Moors held that territory for hundreds of years, and Arabic does not have a /V/ sound. The current bad sense of vandalism is because the Vandals, once they got comfortable in their new home, rather spectacularly sacked the city of Rome in 455 ce.

Meanwhile, the Goths split up and settled two different areas of Europe, thus becoming known to history as the Visigoths and Ostrogoths, the West and East Goths, in Portugal/Spain and Italy/Dalmatia, respectively. In 500 ce, between the Visigoths in Castile, the Vandals in Andalusia, and the Franks spilling over the Pyrenees, Spain was pretty well settled with blue-eyed blond Germans rather than the darker Mediterranean stock one associates with Iberia.

One could plausibly blame this collapse of the western Roman Empire on a single man, a brilliant Han Chinese general named Ban Chao (or in the old style, Pan Ch’ao), who between 80-100 ce destroyed the power of the Huns (Hsuing-Nu in Chinese records), then occupying territory just outside the Great Wall and making a living either by raiding across it or being paid not to. Ban Chao’s army was ruthless even by the standards of the Hsuing-Nu, and so the intimidated Huns, looking for easier pickings, adopted a “Let’s Get Away from That Bastard” policy and started migrating westward, causing a domino effect all the way across Asia and Europe by pushing other tribes before them. Four hundred years after Ban Chao, Attila and his Huns were in Italy, there were a million people in Chang’an, the capitol of China, and there were wolves in the streets of Rome.

Ban Chao drove his army all the way to the east side of the Caspian. Since at that time the Roman Empire extended to the west side, one can imagine a Legionary patrol and some of Ban’s troops, both thousands of miles from home, staring in amazement at each other across the Volga or Ural River.

In reality, Ban Chao’s troops probably had already met Romans face to face. The Chinese captured and occupied the territory where the Parthians had resettled their Roman captives after the slaughter at the battle of Carrhae 140 years before. The Roman general Crassus had decided to march his 35,000 troops (seven legions in full armor) across the Iraqi desert in 53 bce. The Parthians killed 20,000 (including Crassus) and captured 10,000 — only 5,000 got back. The Parthians actually captured Crassus alive, but they were so aggravated they executed him anyway, even though he was the richest man in the world and presumably could have brought a huge ransom. (Some experts list him as the richest man of all time, taking inflation and money supply into account.) Other historians think the Parthians were smarter than that, and that Crassus was assassinated by his own troops for stupidly leading them into the catastrophe.

Crassus held the dubious honor of the most disastrous general in Roman history for sixty years, until the battle of the Teutoburg Forest in 9 ce, where the German tribes ambushed Varus and three legions (over 20,000 soldiers) and exterminated them to the last man. From the Roman point of view those legions just disappeared — the massacre was so thorough that the exact site of the battle wasn’t found until about 1990, about 30 miles from where the German monument to the battle stands. That battle might be why I speak English instead of a Romance language, because the Romans never again made a serious attempt to push their boundary past the Rhine into central Germany. The effect on the Roman psyche was so profound that they never re-used the numbers of those three legions (Legio XVII, XVIII, and XIX) — the only time in Rome’s history that defeated legions weren’t promptly reconstituted. (Until that battle, it looked like the Romans had the upper hand; for example Julius Caesar had throughly awed the German tribes by building a timber bridge across the Rhine near Coblenz in only ten days and raiding the German side. When the legions went back across again, they destroyed the bridge behind them. Two years later, they did it again, taking even less time to build a bridge, and proving the point that the Romans could invade whenever they wanted, and that the Rhine was no protection at all. Cf. Trajan’s famous stone bridge across the Danube at the Iron Gates, the longest arch bridge in the world for the next 1,500 years, built in two years. It was torn down when the Romans abandoned Dacia so the barbarians couldn’t use it.)

Ban Chao must have felt out of place slaughtering Huns in the steppes, because he was from a family of famous scholars. He lived his adult life in a tent, but the rest of the Ban family lived in the imperial palace. His father Ban Biao and twin brother Ban Gu have remained the most noted historians of ancient China. They wrote the official history of the previous dynasty (the so-called Former Han), and they were responsible for the tradition that each dynasty would subsidize an unprejudiced history of the old regime while keeping archives for the historians who would some day follow, a practice that lasted for almost two thousand years. (Several emperors were talked out of a bad idea by advisors mentioning that, even though quite brilliant, it wouldn’t look good in the history book someday.) Meanwhile Ban Chao’s younger sister Ban Zhao became the Chinese model for a female intellectual. She was tutor to the Empress and the other aristocratic women of the palace, wrote well-received books, assisted her father and brother in writing the history, and then completed it after their deaths.

Saturation

As I was saying before I was rudely interrupted, the only non-ie languages to be found in modern Europe are Turkish and its distant relatives Finnish, Hungarian, and Estonian (all originally from Central Asia), plus Basque, which seems to be the only remainder of the aboriginal languages of Europe. (Actually, there’s another minor non-ie language more or less in Europe — Maltese is derived from Arabic.)

There are all sorts of words in European languages from aboriginal sources; often place names or terms for plants and animals that the Indo-Europeans hadn’t seen before. English has some Pictish place names, for instance, and a couple of nouns like woad. Both the chamois and the ibyx are from some prehistoric Alpine language. I’ve already mentioned wine, olive, etc. from a Mediterranean source, and assorted nautical terms from some unknown Baltic people. I’ve also noted the don- in East European river names. The Romans picked up some vocabulary from Etruscan, including the word Roma itself. The scholars are still debating whether that non-IE language and its relatives are aboriginal or came from Central Asia. In any case, Etruscan died out in the First Century ce. The emperor Claudius spearheaded a project to create a dictionary of Etruscan from interviews with the last surviving speakers, but unfortunately for linguists the dictionary has not survived either.

Someone humorously described Basque as “Neandertal as pronounced by the Spanish”. On the other hand, someone else said that Spanish is “Italian as spoken by Arabs”. (Remember, the Moors were already in Spain when the rest of the peninsula was still mostly Germanic.) There are a whole lot of one-liners like this here. For example, English is variously described as “Norse as spoken by French thugs”, “Bad Dutch with horribly pronounced French and Latin vocabulary”, and more picturesquely, “…what you get from Norman soldiers trying to pick up Saxon girls”. Also, “Germänn ist eßëntiälly Ënglisch mit ein few Tschängen und das käpitäal Lëtteren und Lötten von Dötten,” and French is “what happened when Germans tried to learn Latin and said, ‘screw it’.”

Some of these languages are assigned to a particular Indo-European branch by their structure more than their vocabulary or appearance. English, for example, is Germanic because of its grammar even though the vast majority of its words are now from French or Latin, which at least are also ie languages. Modern Persian (Farsi) is ie, a close cousin to Sanskrit, Hindi, and the other languages of northern India, but its vocabulary is now dominated by Arabic, a Semitic language which is completely unrelated. The exact same situation occurs with Yiddish, which is (ie) German saturated with (Semitic) Hebrew vocabulary. Ladino is similarly a Spanish/Hebrew blend used by the Sephardim. (That site mentioned above describes Ladino as “…Spanish vowels with Portuguese consonants written in Hebrew by people living in the Netherlands and Turkey”.) Going the other way, Maltese (mentioned previously) is gramatically Arabic with a strong mixture of Italian words. Another ie language with a large non-ie vocabulary is Bulgarian/Macedonian, a Slavic language with a major Turkish component. Rumanian is a descendant of Latin, as the name implies, but because of its geographic position most of its vocabulary has been borrowed from Slavic, Greek, Hungarian, and Turkish. One scholar claims that modern Albanian only contains a couple of hundred words that are “truly” Albanian, with everything else coming from its neighbors.

The Gypsies call themselves the Roma, which has no relation to Rumania or the Romans; they started out in northern India. (The name is the plural of rom, a man. A non-Gypsy is called a gorgio.) Their language Romany is grammatically similar to Sanskrit and Hindi, but it has picked up vocabulary from practically every country the Gypsies have been through in their wanderings. (The English word Gypsy itself comes from the mistaken belief that they originated in Egypt, while the Spanish term is either Flamenco (Flemish) or Gitano (Egyptian), and the French call them Bohemians!) In the USA, “gypsy” has become a lower-case adjective with the meaning of “wandering”, as in New York’s gypsy cabs and that curse of the forests, the gypsy moth.

A non-Indo-European example of saturation is Japanese. Structurally it isn’t related to Chinese — it’s actually in the same family as Korean and languages of the Asian steppe — but both its old writing system (Kanji) and much of its original vocabulary are derived from China. (Kan-ji literally means “Chinese character” in Japanese.) In recent times, Japanese has enthusiastically borrowed thousands of ie words, often from English. These can be almost unrecognizable in transliteration until one realizes the rules:

  1. Japanese is constructed with consonant-vowel syllable pairs and can’t have two consonants in a row, so vowels get inserted to break up foreign consonant clusters. To Western eyes, it looks as if “Honda” and “Shinto”, for example, break this rule, but in reality Japanese has two sets of vowels — straight and nazalized — so “in”, “on”, etc. are vowels. To the Japanese, “Nippon” and “Nikon” end with a vowel. Words are allowed to start with a bare vowel though, as in “Asahi” and “Osaka”.
  2. Also because of the consonant-vowel pattern, they either tack a vowel on the end of foreign words ending in a consonant or simply drop the troublesome terminal letter entirely. The most common terminal addition is an unstressed /U/.
  3. They regularly replace Western sounds that don’t exist in Japanese, particularly using /R/ for /L/ and /B/ for /V/.
  4. They like to abbreviate, even when it isn’t necessary. A prime example is anime, even though English “animation” is perfectly pronounceable and spellable in Japanese. (Manga is a real Japanese word, though. It means “extemporaneous drawing”, more or less. It dates back to the 1820’s as a Japanese art style, and a comic book style since the 1920’s.)

All this leads to creations like burusu (blues [music]), toraburo (trouble), resotoran (restaurant), takushii (taxi), konkuriito (concrete), etc. etc. etc. Anke’eto is French inquete, questionaire, just to prove the borrowing isn’t always from English. With these hints, it is easy to see that erekutoronikkusu is “electronics” and Rinukkusu is “Linux”, while a worst-case example is rabu, which is English “love”. Some are quite funny — the aisle down which a bride proceeds in a western-style wedding is a bajinrodo, “virgin road”. A Phillips screwdriver is a purasudoraiba while a standard screwdriver is a mainasudoraiba — “plus-driver” and “minus-driver” respectively. American culture has invaded the country to the extent that some Japanese eat lunch at Makudonarudo. There are also lots of blends; a good example is karaoke, from Japanese kara (empty) plus okesutora, orchestra. (“Kara” is in another well-known Japanese word — karate is “empty hand”.) Pokemon is an all-English blend — Pocket Monster.

Given these rules, Godzilla cannot be a Japanese word, and it isn’t. The Japanese name of the monster is Gojira, which is a blend of western gorira (i.e., gorilla) and the native kujira, whale. Allegedly the creators of the original 1954 movie picked the name because it was the nickname of a very large man who worked for the studio. “Godzilla” was invented when the movie was dubbed into English.

Speaking of movies and Japanese titles, there is a US thriller film called Ronin where the protagonists are former special forces and intelligence agents, currently unemployed. That is the Japanese term for a masterless wandering samurai, the Japanese equivalent of the Old West gunfighter or the medieval knight errant. Japanese films about ronin translate perfectly into US westerns — The Seven Samurai and Yojimbo into The Magnificent Seven and A Fistful of Dollars, for example. Clint Eastwood specialized in playing ronin. Note the literal definition of a “free lance”, a knight not sworn to any particular lord.

Another non-Indo-European language with lots of borrowed vocabulary is Turkish. As far back as 1610, someone noted that the Turks, being originally nomads, “borrowed their terms of state and office from the Persians, of religion from the Arabians, as they did of maritime names from the Greeks and Italians.” As I’ve mentioned before, borrowing the indigenous population’s words for unfamiliar items when you move into a new area is pretty universal. See the discussion of “wine”, borrowed by both Indo-European and Semitic from some pre-historic Mediterranean people, and cf. all the aboriginal names of brand-new flora and fauna (kangaroo, koala, and so on) in Australian English. American English picked up everything from a tomahawk to a woodchuck to a kayak from American Indian languages.