Tuesday, June 9, 2009

Theories about the language

Many theories have been advanced as for the nature of the Voynich manuscript "language". Here is a partial list:

Letter-based cipher

According to this theory, the Voynich manuscript contains a meaningful text in some European language, that was intentionally rendered obscure by mapping it to the Voynich manuscript "alphabet" through a cipher of some sort—an algorithm that operated on individual letters.

This has been the working hypothesis for most decipherment attempts in the 20th century, including an informal team of NSA cryptographers led by William F. Friedman in the early 1950s. Simple substitution ciphers can be excluded, because they are very easy to crack; so decipherment efforts have generally focused on polyalphabetic ciphers, invented by Alberti in the 1460s. This class includes the popular Vigenere cipher, which could have been strengthened by the use of nulls and/or equivalent symbols, letter rearrangement, false word breaks, etc. Some people assumed that vowels had been deleted before encryption. There have been several claims of decipherment along these lines, but none has been widely accepted — chiefly because the proposed decipherment algorithms depended on so many guesses by the user that they could extract a meaningful text from any random string of symbols.

The main argument for this theory is that the use of a weird alphabet by a European author can hardly be explained except as an attempt to hide information. Indeed, Roger Bacon knew about ciphers, and the estimated date for the manuscript roughly coincides with the birth of cryptography as a systematic discipline. Against this theory is the observation that a polyalphabetic cipher would normally destroy the "natural" statistical features that are seen in the Voynich manuscript, such as Zipf's law. Also, although polyalphabetic ciphers were invented about 1467, variants only became popular in the 16th century, somewhat too late for the estimated date of the Voynich manuscript.

Codebook cipher

According to this theory, the Voynich manuscript "words" would be actually codes to be looked up in a dictionary or codebook. The main evidence for this theory is that the internal structure and length distribution of those words are similar to those of Roman numerals—which, at the time, would be a natural choice for the codes. However, book-based ciphers are viable only for short messages, because they are very cumbersome to write and to read.


This theory holds that the text of the Voynich manuscript is mostly meaningless, but contains meaningful information hidden in inconspicuous details—e.g. the second letter of every word, or the number of letters in each line. This technique, called steganography, is very old, and was described e.g. by Johannes Trithemius in 1499. Some people suggested that the plain text was to be extracted by a Cardan grille of some sort. This theory is hard to prove or disprove, since stegotexts can be arbitrarily hard to crack. An argument against it is that using a cipher-looking cover text defeats the main purpose of steganography, which is to hide the very existence of the secret message.

Some people have suggested that the meaningful text could be encoded in the length or shape of certain pen strokes. There are indeed examples of steganography from about that time that use letter shape (italic vs. upright) to hide information. However, when examined at high magnification, the Voynich manuscript pen strokes seem quite natural, and substantially affected by the uneven surface of the vellum.

Exotic natural language

The linguist Jacques Guy once suggested that the Voynich manuscript text could be some exotic natural language, written in the plain with an invented alphabet. The word structure is indeed similar to that of many language families of East and Central Asia, mainly Sino-Tibetan (Chinese, Tibetan, and Burmese), Austroasiatic (Vietnamese, Khmer, etc.) and possibly Tai (Thai, Lao, etc.). In many of these languages, the "words" (smallest language units with definite meaning) have only one syllable; and syllables have a rather rich structure, including tonal patterns.

This theory has some historical plausibility. While those languages generally had native scripts, these were notoriously difficult for Western visitors; which motivated the invention of several phonetic scripts, mostly with Latin letters but sometimes with invented alphabets. Although the known examples are much later than the Voynich manuscript, history records hundreds of explorers and missionaries who could have done it—even before Marco Polo's 13th century voyage, but especially after Vasco da Gama discovered the sea route to the Orient in 1499. The Voynich manuscript author could also be a native from East Asia living in Europe, or educated at a European mission.

The main argument for this theory is that it is consistent with all statistical properties of the Voynich manuscript text which have been tested so far, including doubled and tripled words (which have been found to occur in Chinese and Vietnamese texts at roughly the same frequency as in the Voynich manuscript). It also explains the apparent lack of numerals and Western syntactic features (such as articles and copulas), and the general weirdness of the illustrations. Another possible hint are two large red symbols on the first page, which have been compared to a Chinese-style book title, upside down and badly copied. Also, the apparent division of the year into 360 degrees (rather than 365 days), in groups of 15 and starting with Pisces, are features of the Chinese agricultural calendar (jie q`i). The main argument against the theory is the fact that no one (including scholars at the Academy of Sciences in Beijing) could find any clear examples of Asian symbolism or Asian science in the illustrations.

In late 2003, Zbigniew Banasik from Poland proposed that the manuscript is plaintext written in Manchurian language and gave an incomplete translation of the first page of the manuscript: http://www.dcc.unicamp.br/~stolfi/voynich/04-05-20-manchu-theo/

Polyglot tongue

In his book Solution of the Voynich Manuscript: A liturgical Manual for the Endura Rite of the Cathari Heresy, the Cult of Isis (1987, Leo Levitov declared the manuscript a plaintext transcription of a "polyglot oral tongue". This he defined as 'a literary language which would be understandable to people who did not understand Latin and to whom this language could be read.' He proposed a partial decipherment into a mixture of medieval Flemish with many borrowed Old French and Old High German words.

According to Levitov, the rite of Endura was none other than the assisted suicide ritual famously associated with the Cathar faith (although the reality of this ritual is also in question). He explains that the chimerical plants are not meant to represent any species of flora, but are secret symbols of the faith. The women in the basins with elaborate plumbing represent the suicide ritual itself, which he believed involved venesection: the cutting of a vein to allow the blood to drain into a warm bath. The constellations with no celestial analogue are representative of the stars in Isis' mantle.

This theory is questioned on several grounds. One incongruity is that the Cathar faith is widely understood to have been a Christian gnosticism, and not in any way associated with Isis. Another is that this theory places the book's origins in the twelfth or thirteenth century, which is considerably older than even the adherents to the Roger Bacon theory believe. Levitov offered no evidence beyond his translation for this assertion.

Constructed language

The peculiar internal structure of Voynich manuscript "words" has led William F. Friedman and John Tiltman to arrive independently at the conjecture that the text could be a constructed language in the plain—specifically, a philosophical one. In languages of this class, the vocabulary is organized according to a category system, so that the general meaning of a word can be deduced from its sequence of letters. For example, in the modern constructed language Ro, bofo- is the category of colors, and any word beginning with those letters would name a color: so red is bofoc, and yellow is bofof. (This is an extreme version of the book classification scheme used by many libraries — in which, say, P stands for language and literature, PA for Greek and Latin, PC for Romance languages, etc..)

This concept is quite old, as attested by John Wilkins's Philosophical Language (1668). In most known examples, categories are subdivided by adding suffixes; as a consequence, a text in a particular subject would have many words with similar prefixes — for example, all plant names would begin with the similar letters, and likewise for all diseases, etc. This feature could then explain the repetitious nature of the Voynich text. However, no one has been able to assign a plausible meaning to any prefix or suffix in the Voynich manuscript; and, moreover, known examples of philosophical languages are rather late (17th century).


The bizarre features of the Voynich manuscript text (such as the doubled and tripled words) and the suspicious contents of its illustrations (such as the chimeric plants) have led many people to conclude that the manuscript may in fact be a hoax.

In 2003, computer scientist Gordon Rugg showed that text with characteristics similar to the Voynich manuscript could have been produced using a table of word prefixes, stems, and suffixes, which would have been selected and combined by means of a perforated paper overlay. The latter device, known as a Cardan grille, was invented around 1550 as an encryption tool. However, the pseudo-texts generated in Gordon Rugg's experiments do not have the same words and frequencies as the Voynich manuscript; its resemblance to "Voynichese" is only visual, not quantitative. Since one can produce random gibberish that resembles English (or any other language) to a similar extent, these experiments are not yet convincing.

No comments:

Post a Comment