Keith Briggs

This page was last modified 2013-05-12  


·maths notes 
·languages « 
·ex libris 
·site map 


A critique of P Forster et al, Evolution of English basic vocabulary ...

P Forster, T Polzin & A Röhl Evolution of English basic vocabulary within the network of Germanic languages, in P Forster & C Renfrew Phylogenetic methods and the prehistory of languages (McDonald Institute 2006).

NB: Peter Forster has presponded to some of my criticisms here in the introduction to the volume (pp. 3-6).

my summary

The authors take a Swadesh-type list of 100 concepts for 21 Germanic languages and classify the words for each concept into 4 categories of cognates. They then compute a network, claimed to be a minimum-distance fit, using DNA-type heuristic. From this network they claim to draw conclusions such as that Modern English is isolated and Common Germanic existed at most 5600 years ago.

my opinion

I find this paper no more satisfactory than the earlier PNAS paper of Forster & Toth [1] (in other words, very unsatisfactory). It is easy to find many linguistic objections. As for the mathematics, I have not yet formed an opinion. The algorithms used are claimed to be described in references given in the free software available at [5].

linguistic objections

The data is in a potentially exciting centrefold but which disappoints upon closer inspection. For example:

  1. What is a language? Why is there only one English, but several varieties of Frisian? Mutual intelligibility is mentioned, but that is not at all a clear-cut criterion.
  2. Everything is deduced from vocabulary, and in fact from 56 words, nearly all nouns and verbs. No use is made of grammatical structure, inflectional suffixes, word order, etc.
  3. There is semantic fuzziness everywhere. When does a neck stop being a neck and become a throat, a nape (Nacken), or a column (hals)? Is a mountain (munt) a hill (berg)? Is walking so different from going (gehen), running (rinne), or loping or leaping (lopen)? In fact Danish (at least) has the word "walk", but it means "to full (cloth)" - I suppose this was the original meaning in English too.
  4. The concept "small" is translated by cognates of "little" in all languages except English. Why? This leaves English apparently isolated in this concept, for no good reason. The same criticism applies to "mountain", where the inclusion of a Romance word obscures the picture unnecessarily.
  5. Icelandic has apparently innovated in replacing the ON words for moon (by tungl - a common Germanic word lost in the other modern languages) and for eat (by borða). But in fact Old Icelandic already had these "new" words as well as the more familiar ones. Borða just means "to take one's place at the table", so is just a polite way of saying "eat". Why should we attach any fundamental significance to this minor change of vocabulary?
  6. Vowel length is ignored, despite it being known to be highly significant in language evolution. There is no indication that the authors know that Gothic fon 'fire' has long o. In fact this word is an oddity. Why was the more common and regular funa not used?

procedural objections

  1. As in the previous paper, identification of cognates is done by visual inspection, with recourse to reference books in a few cases of doubt. But in fact these cases of doubt will be cases where the experts are not in agreement, so it might have been better to exclude these words from the analysis.
  2. Distances are assumed equal. That is, it is as hard to go from A to B as B to C, etc. But this cannot be right linguistically. It is hard to replace the word for "five", but easy to replace e.g. the word for "woman", where issues of social relationships come into play.
  3. In DNA analysis, mutations at separated sites are assumed statistically independent. This is not correct for most types of language evolution, where changes are influenced by context.
  4. We have no guarantee that the network found really minimises the total distances. Anyway, why is this what we want to minimise?
  5. The authors admit to arbitrarily omitting four words simply because they cause "chaotic reticulation" in the network. But the truth may be chaotic.
  6. It is not all clear that using a computer program is needed here. I think one could build an equally "good" network by hand.
  7. Where is the robustness check? How sensitive are the conclusions to the data?

objections to the conclusion

The authors place Common Germanic (without questioning whether such ever existed) between 3600 BC and 350 AD. Common sense would give a very much smaller interval, so what has this analysis achieved? In fact, looking at the reasons for this conclusion, we see that it's those two Icelandic words, arbitrarily chosen, that has enforced this conclusion. Icelandic has apparently replaced two words out of 56 in 1000 years. This gives our mutation rate, the yardstick to be now applied to Gothic which has replaced eight words, so must be four times older. Never mind that Icelandic is by far the most conservative of all the languages studied. This is just the crudest possible glottochronology.

Basic statistical sampling theory would show that no useful conclusion at all can be drawn in this way. A small change in the choice of words would have led to a very different conclusion. See Kessler [2] for a discussion of statistical significance issues in word list analyses.

See also Trask [4].

In my own work I have made mathematical models of the phonetic evolution of the number words one, two, three, ..., ten. Although this is unpublished, I can claim to get plausible trees for IE languages from just two or three of these words alone. From this I know that 56 words are not needed to reproduce the known relationships. But I emphasize that this work was about phonetics, not cognation. I suggest that such a phonological approach would be a more fruitful direction for future work than vocabulary studies.

[1] P Forster & A Toth, Toward a phylogenetic chronology of ancient Gaulish, Celtic, and Indo-European. Proc Natl Acad Sci USA 100:9079-9084 (2003), available here.

[2] B Kessler The significance of word lists, CSLI publications 2001.

[3] S Oppenheimer Myths of British ancestry, October 2006.

[4] L Trask Linguist List Message 14-1876 - critique of [1].


This website uses no cookies. This page was last modified 2013-05-12 10:17 by Keith Briggs.