History Files

Please help the History Files

Contributed: 175

Target: 400

Totals slider

The History Files still needs your help. As a non-profit site, it is only able to support such a vast and ever-growing collection of information with your help, and this year your help is needed more than ever. Please make a donation so that we can continue to provide highly detailed historical research on a fully secure site. Your help really is appreciated.



Central Asia

Indo-European Daughter Languages: Tocharian

by Edward Dawson & Peter Kessler, 15 September 2017. Updated 15 March 2020

The Tocharians are perhaps the most mysterious of all of the Indo-European branches. Thankfully, recent DNA evidence has provided a vital ingredient when it comes to telling their story but, despite this, it is a somewhat complicated story.

The core Indo-Europeans began to separate into definite proto languages around 3000 BC, during an expansion phase which is known as the Yamnaya horizon. These proto languages soon became unintelligible to each other, although this fragmenting process excludes the Anatolian branch of IEs who had already headed southwards from the Pontic-Caspian steppe (see the feature, A History of Indo-Europeans, Migrations and Language, for more detail).

The western or centum language section of Indo-Europeans (IEs) would evolve into Celtic, Italic, Venetic, Illyrian, Ligurian, Vindelician/Liburnian and Raetic branches. This group appears to be associated with a specific Y-DNA haplogroup called R1b. A related Y-DNA haplogroup - R1a - is associated with eastern or satem IE languages. It's the Indo-Iranian/Indo-Aryan, Baltic, and Slavic groups which fall into this latter grouping.

Map of Indo-Europeans c.3000 BC
Map 3 from the earlier feature on Indo-European (IE) language and migration shows IE migration out of the Pontic-Caspian steppe by around 3000 BC, with the centum-speaking Tocharians apparently being edged ever eastwards by satem speakers who were also expanding into the east (click or tap on map to view full sized)

A History of Indo-Europeans, Migrations and Language


Two groups, however, do not fit perfectly into that tidy pair of east and west IE boxes. One of these involves the Germanic language speakers, who appear to have been founded by R1a/satem people but with a very mixed subsequent heritage. The other anomaly, one which appears early in the Yamnaya horizon, involves a western group which apparently decided to be different from all the others and head eastwards. It is this group which evolved into the Tocharian branch of Indo-Europeans.


This eastwards migration by the Tocharians has in the past been referred to as their u-turn migration.

A favourite current theory is that the satem (eastern) languages evolved in the core Indo-Europeans on the Caspian steppe after the departure both of the West IEs and the Tocharians. Both of these latter divisions would have been left with an older, centum version of the language which did not receive the same later influences which the satem version received.

If this is correct then a u-turn theory in which the Tocharians initially headed west and then changed direction to head east would be a very realistic one because the Anatolians were the first to detach themselves from the Indo-European core, and they also spoke a centum language which did not show those later influences. In fact, they left early enough to miss even some of the later centum influences.

Another theory, based on the DNA evidence, points toward IEs around 3000 BC being divided into two main groups. These would have been steppe dwellers who were speaking centum dialects and who bore the R1b Y chromosome and, to the north of them in the forests and forest-steppe, satem dialect speakers who bore the R1a Y-chromosome. The problem for their centum neighbours is that in this theory the satem group moved south once they had the benefit of horse riding, and they proceeded to occupy swathes of the former group's territory. It seems very unlikely that this process occurred peacefully!

This, though, is the perfect way of explaining how the centum-speaking Tocharians were separated from the main mass of centum speakers and were forced to head east instead of west.

West IEs in the east?

When focussing specifically on the Tocharians, it was the increasing realisation that they appeared to have a very odd history which confirmed their West Indo-European origins despite being the most eastern of IEs. As theorised above, it has become likely that they were amongst those centum speakers who may have been forced out by satem speakers appropriating their territory.

However, where the Tocharians are concerned, it's never quite that simple.

Their language showed elements both of the eastern satem/R1a and western centum/R1b influences. Working out how this may have happened is the tricky part of any examination of the Tocharians, but an intriguing possibility is that they ended up being a hybrid people who were made up of various elements of multiple Indo-European groups, scooping up more followers as they passed through West IE, South IE, and East IE groups.

Tocharian is, at its core, a centum language - just like Indo-European languages in the west - despite its Far Eastern setting. The most reasonable likelihood for the hybridisation process is that a specific group took over other groups, and they all adopted the most dominant language variant whilst also picking up influences from the later arrivals. The key to understanding who conquered whom lies in the male lineage and therefore in the Y-DNA.

A vital tool in helping to solve the Tocharian mystery was the discovery of the 'Tarim mummies', a series of mummified bodies discovered in the Tarim Basin which includes the Takla Makan Desert (Taklamakan) in its territory. A DNA analysis of twelve of the earliest mummies has shown that eleven of them were Caucasoid men who possessed Y-DNA belonging to the R1a group, making them eastern, satem speakers. For this region and time such a finding would be very normal.

From this fact it can be postulated that a group of nomadic satem/R1a types, most likely a group of IEs who were closely related to the later Indo-Iranians, conquered other groups as they progressed eastwards. They may have overcome many small groups, including a more sizable population of centum/R1b types (the original Tocharians), as they also headed east. Therefore the original centum-speaking Tocharians would seem to have fallen under the control of a more dominant group of satem speakers - easy enough with the Tocharians passing through the eastern steppe which may already have been full of satem speakers. The alternative is that the later expansion of satem speakers effectively followed and overtook the Tocharians.

The predominance of R1a (eleven out of twelve mummies) in the limited sample points to R1a satem males being responsible for mating with centum-speaking women. That finding makes it likely that the women were either brides from centum groups (perhaps now dominated by satem speakers), or that they had been captured in raids or warfare.

Tocharians in relation to archaeological cultures

Most studies of IE sequencing put the separation of Tocharian after that of Anatolian and before any other branch. The rather notable migration from around 3500 BC which created the Afanasevo culture meets that expectation, with a section of the Volga-Ural steppe population making its way eastwards across Kazakhstan, covering a distance of more than two thousand kilometres to reach the Altai Mountains.

This then, was the Tocharian migration in its original form. Whether its people were satem-speaking men who had already collected a population of centum-speaking wives either as prizes or through trade and intermarriage, or centum-speakers who were later dominated by satem-speakers is unclear (although the latter is favoured here). What would have happened though was that these wives would have raised any children they had, and would have taught them their own language alongside whatever basic satem influences they may have needed. These early Tocharians were already centum-speaking hybrids.

Although that is a theory, it's the most likely theory. What is certain is that, alongside the hybridisation process, Tocharians also borrowed heavily from other languages, probably during their subsequent migrations. We find Sanskrit words which they adopted due to their later adherence to Buddhist religion, such words coming from Indo-Aryans who were themselves an offshoot of the Indo-Iranians - both satem-speakers. Could Tocharian be heavily hybridised in the manner of modern English with its large French vocabulary, and its religious-adopted Latin vocabulary? It certainly seems possible.

Khakassia standing stone
Burial mounds in the modern Russian region of Khakhassia can be marked with small standing stones as shown here, with this area being a core part of the territory of the Afanasevo culture

The United Sites of Indo-Europeans website rounds off much of the discussion with the following (with additions in italics for text which was not written by a native English-speaker):

This group is perhaps the least studied in all of the Indo-European macro-family. It consists of two dead languages, Tocharian A (or Agnean) and Tocharian B (or Kuchanian). These were spoken in the first millennium AD in East Turkestan, in several cases in which inscriptions and texts written in these languages were found.

The routes and methods used in Tocharic migrations from the Near East to East Asia are still unknown. The languages show many borrowings from early Iranian languages, archaic Finno-Ugric (of the Uralic family), and even Tibetan-like forms, but the structure itself shows much similarity with Germanic languages primarily, and also with Balto-Slavic languages.

Linguists think that Tocharians moved through Central Asia from west to east and, on their way, had a large number of linguistic contacts which were reflected in their tongue. Before these migrations, it being a dialect in the proto-Indo-European community, Tocharians must have communicated closely with future Anatolians and Italo-Celts.

Tocharian hybridisation

In truth, the Y-DNA results from the Tarim mummies were quite a surprise. Whilst the general expectation was that they would be R1b types (centum-speakers), they were anything but that, being R1a satem types. As discussed above, this means that the Tocharian males were descended from the satem-speaking forest and forest steppe IEs, not the steppe-dwelling, centum-speaking IEs as was generally expected.

The sense of surprise at the result was despite the fact that Central Asia was dominated by satem-speaking Indo-Iranians, while the only centum speakers were the Tocharians themselves. Primarily the expectation existed because Indo-Iranians don't seem to have reached as far east as the Tarim Basin (not entirely accurate when one looks at the Greater Yuezhi horsemen of the first two centuries BC). Simply put, no one expected the Tocharians themselves to have satem-speaking influences.

However, in language terms, there doesn't appear to be any evidence of those words in Tocharian A which are used in Asha (Arte/Rte). This is possibly because the Tocharians separated from other Indo-Europeans prior to the formulation of Asha; or alternately that they never had it or were a military elite which did not include priests among them.

Asha is the modern term for the philosophical practice of adherence to the truth of what is, what exists. The word 'Asha' comes from Zoroastrianism. Its ancient names were Rte among Indians (Indo-Aryan Hindus), and Arte among Iranians. There are also linguistic pointers toward the philosophy existing amongst early Germans under the name of Istwae. All of these names are the verb 'to be', used as nouns.

In addition, the language of Tocharian A seems to have more in common with Celto-Italic languages than it does the Avestan/Vedic of Indo-Iranian and Indo-Aryan satem languages. Many familiar words are contractions, with sounds having been dropped - a common enough Celto-Italic practice. These contracted words can come about as a result of a population using a hybrid language; or it can result from sheer laziness. The latter, if true, would be another pointer towards a lack of the Asha philosophy, as Asha is extremely precisionist in character. [1]

With that examination of Tocharian A in mind, the theory which sees a satem military elite taking over another, centum-speaking tribe (or at least its women) seems to be the only rational explanation for the creation of the hybrid Tocharians of recorded history. And the take-over happened early enough that Asha did not yet exist. That date of approximately 3000 BC - or perhaps a bit later - still looks reasonable for the separation of Tocharians from other Indo-Europeans, with their dominance by satem-speaking Indo-Iranian East IEs following on relatively soon after that.

Tocharian tongues survived for a good three or four thousand years. By AD 500 they could still be found in Xinjiang (early home of the Göktürks of this same period), and in the caravan cities of the Silk Road. By this time they had divided into two or three quite distinctive languages, all of which exhibited archaic Indo-European traits. Despite their long journey to the Altai Mountains, along the Chinese border, and then towards Central Asia, they were able to maintain a strong identity... and a strong language.

Tocharian Tarim Basin mummy
The oldest of the Caucasoid (Indo-European) Tarim Basin mummies to be preserved by the increasing desert conditions there date from about 2000 BC, having been found on the eastern edge of the basin, an area for now-lost river systems

[1] Here's a perfect example of why Tocharian is so odd: 'wäl, walo', meaning a prince (IE *wal-, meaning 'strong, powerful'); 'wäl', meaning 'to die'. The words 'wal, walo', meaning 'strong', can be extended to mean a prince or king, and this is Celtic form. The Germanic word for Celts probably derives from it. The word 'wal', meaning 'to die', is the Germanic usage, cognate with English and German 'fall', and the Norse 'valr', seen in 'valkyrie'. All of this shows that Tocharian simply must be a hybrid language.

Addendum: a common naming source

Religions and philosophies are well known to jump cultural and linguistic barriers, spreading far from their points of origin. This may have happened with the Rte philosophy of the Indo-Iranian nomads, the main body of the eastern Indo-Europeans.

Although they have been labelled in various ways by others, such as Lesser Yuezhi, the Tocharian name for themselves has been recorded in several different forms, one of which is the very suggestive 'Arsi'. This appears to be yet another example of the verb 'to be', used as a noun. This contention is supported by the existence of IEs in Anatolia who also used a form of the same name, and from which we get the name used for the entire continent: Asia (see related links).


Main Sources

Yardley, John & Heckel, Waldemar - Epitome of the Philippic History of Pompeius Trogus: Books 11-12, Volume 1, Marcus Junianus Justinus

Anthony, David W - The Horse, the Wheel, and Language: How Bronze-Age Riders from the Eurasian Steppes Shaped the Modern World

Pokorny, J - Indo-European Etymological Dictionary, online database which updates Pokorny's Indogermanisches Etymologisches Wörterbuch

Online Sources

Ancient History Encyclopaedia

Geochronology - Indo-European Chronology - Countries and Peoples

Indo-European Chronology - Countries and Peoples

Indo-European Etymological Dictionary (J Pokorny)

Linguistics Research Center, University of Texas at Austin

Peering at the Tocharians through Language

Studies in the History and Language of the Sarmatians

United Sites of Indo-Europeans



Maps and text copyright © Edward Dawson & P L Kessler. An original feature for the History Files.