A critique of P Forster et al, Evolution of English basic vocabulary ...
P Forster, T Polzin & A Röhl Evolution of English basic
vocabulary within the network of Germanic languages, in P Forster & C Renfrew
Phylogenetic methods and the prehistory of languages (McDonald Institute
NB: Peter Forster has presponded to some of my criticisms here in the
introduction to the volume (pp. 3-6).
The authors take a Swadesh-type list of 100 concepts for 21 Germanic languages
and classify the words for each concept into 4 categories of cognates.
They then compute a network, claimed to be a minimum-distance fit, using
DNA-type heuristic. From this network they claim to draw conclusions such as
that Modern English is isolated and Common Germanic existed at most 5600 years
I find this paper no more satisfactory than the earlier PNAS paper of Forster
& Toth  (in other words, very unsatisfactory). It is easy to find many
linguistic objections. As for the mathematics, I have not yet formed an
opinion. The algorithms used are claimed to be described in references given
in the free software available at .
The data is in a potentially exciting centrefold but which disappoints upon
closer inspection. For example:
- What is a language? Why is there only one English, but several varieties
of Frisian? Mutual intelligibility is mentioned, but that is not at all a clear-cut criterion.
- Everything is deduced from vocabulary, and in fact from 56 words, nearly
all nouns and verbs. No use is made of grammatical structure, inflectional
suffixes, word order, etc.
- There is semantic fuzziness everywhere. When does a neck stop being a
neck and become a throat, a nape (Nacken), or a column (hals)? Is a mountain (munt) a hill
(berg)? Is walking so different from going (gehen), running (rinne), or loping or leaping
(lopen)? In fact Danish (at least) has the word "walk", but it means "to full
(cloth)" - I suppose this was the original meaning in English too.
- The concept "small" is translated by cognates of "little" in all languages
except English. Why? This leaves English apparently isolated in this
concept, for no good reason. The same criticism applies to "mountain", where
the inclusion of a Romance word obscures the picture unnecessarily.
- Icelandic has apparently innovated in replacing the ON words for moon (by
tungl - a common Germanic word lost in the other modern languages) and
for eat (by borða). But in fact Old Icelandic already had these
"new" words as well as the more familiar ones. Borða just means
"to take one's place at the table", so is just a polite way of saying "eat".
Why should we attach any fundamental significance to this minor change of
- Vowel length is ignored, despite it being known to be highly significant
in language evolution. There is no indication that the authors know that
Gothic fon 'fire' has long o. In fact this word is an oddity. Why was
the more common and regular funa not used?
- As in the previous paper, identification of cognates is done by visual
inspection, with recourse to reference books in a few cases of doubt. But in
fact these cases of doubt will be cases where the experts are not in agreement,
so it might have been better to exclude these words from the analysis.
- Distances are assumed equal. That is, it is as hard to go from A to B as
B to C, etc. But this cannot be right linguistically. It is hard to
replace the word for "five", but easy to replace e.g. the word for "woman",
where issues of social relationships come into play.
- In DNA analysis, mutations at separated sites are assumed statistically
independent. This is not correct for most types of language evolution, where
changes are influenced by context.
- We have no guarantee that the network found really minimises the total
distances. Anyway, why is this what we want to minimise?
- The authors admit to arbitrarily omitting four words simply because they
cause "chaotic reticulation" in the network. But the truth may be chaotic.
- It is not all clear that using a computer program is needed here. I think
one could build an equally "good" network by hand.
- Where is the robustness check? How sensitive are the conclusions to the
objections to the conclusion
The authors place Common Germanic (without questioning whether such ever
existed) between 3600 BC and 350 AD. Common sense would give a very much
smaller interval, so what has this analysis achieved? In fact, looking at
the reasons for this conclusion, we see that it's those two Icelandic words,
arbitrarily chosen, that has enforced this conclusion. Icelandic has
apparently replaced two words out of 56 in 1000 years. This gives our
mutation rate, the yardstick to be now applied to Gothic which has replaced
eight words, so must be four times older. Never mind that Icelandic is by far
the most conservative of all the languages studied. This is just the crudest
Basic statistical sampling theory would show that
no useful conclusion at all can be drawn in this way. A small change in the
choice of words would have led to a very different conclusion.
See Kessler  for a discussion of statistical significance issues in word list
See also Trask .
In my own work I have made mathematical models of the phonetic evolution of
the number words one, two, three, ..., ten. Although this is unpublished, I
can claim to get plausible trees for IE languages from just two or three of these
words alone. From this I know that 56 words are not needed to reproduce the
known relationships. But I emphasize that this work was about phonetics, not
cognation. I suggest that such a phonological approach would be a more fruitful direction for future work than vocabulary studies.
 P Forster & A Toth, Toward a phylogenetic chronology of ancient Gaulish, Celtic, and Indo-European. Proc Natl Acad Sci USA 100:9079-9084 (2003), available here.
 B Kessler The significance of word lists, CSLI publications
 S Oppenheimer Myths of British ancestry, October 2006.
 L Trask Linguist List Message 14-1876 - critique of .