The Voynich Manuscript

Author:                  Mary E. D’Imperio

Date Updated:  February 15, 2017

The Voynich manuscript is an illustrated codex hand-written in an unknown writing system. The vellum in the book pages has been carbon-dated to the early 15th century (1404–1438), and may have been composed in Northern Italy during the Italian Renaissance. The manuscript is named after Wilfrid Voynich, a Polish book dealer who purchased it in 1912.

The pages of the codex are vellum. Some of the pages are missing, but about 240 remain. The text is written from left to right, and most of the pages have illustrations or diagrams.

The Voynich manuscript has been studied by many professional and amateur cryptographers, including American and British codebreakers from both World War I and World War II. No one has yet succeeded in deciphering the text, and it has become a famous case in the history of cryptography. The mystery of the meaning and origin of the manuscript has excited the popular imagination, making the manuscript the subject of novels and speculation. None of the many hypotheses proposed over the last hundred years has yet been independently verified. Many people have speculated that the writing might be nonsense, or proto-asemic[1] writing.

The Voynich manuscript was donated by Hans P. Kraus to Yale University’s Beinecke Rare Book and Manuscript Library in 1969, where it is catalogued under call number MS 408[2]. A digitized high-resolution copy is also accessible freely at their website[3].

The text was clearly written from left to right, with a slightly ragged right margin. Longer sections are broken into paragraphs, sometimes with star-or flower-like “bullets” in the left margin. There is no obvious punctuation, and no indications of any errors or corrections made at any place in the document. The ductus[4] flows smoothly, giving the impression that the symbols were not enciphered, as there is no delay between characters as would normally be expected in written encoded text.

The text consists of over 170,000 glyphs, usually separated from each other by narrow gaps. Most of the glyphs are written with one or two simple pen strokes. While there is some dispute as to whether certain glyphs are distinct or not, an alphabet with 20–30 glyphs would account for virtually all of the text; the exceptions are a few dozen rarer characters that occur only once or twice each. Various transcription alphabets have been created, to equate the Voynich glyphs with Latin characters in order to help with cryptanalysis, such as the European Voynich Alphabet. The first major one was created by cryptographer William F. Friedman in the 1940s, where each line of the manuscript was transcribed to an IBM punch card to make it machine-readable.

Wider gaps divide the text into about 35,000 “words” of varying length. These seem to follow phonological or orthographic laws of some sort, e.g., certain characters must appear in each word (like English vowels), some characters never follow others, some may be doubled or tripled but others may not, etc.

Statistical analysis of the text reveals patterns similar to those of natural languages. For instance, the word entropy (about 10 bits per word) is similar to that of English or Latin texts. Some words occur only in certain sections, or in only a few pages; others occur throughout the manuscript. There are very few repetitions among the thousand or so “labels” attached to the illustrations.

On the other hand, the Voynich manuscript’s “language” is quite unlike European languages in several aspects. There are practically no words with fewer than two letters or more than ten. The distribution of letters within words is also rather peculiar: some characters occur only at the beginning of a word, some only at the end, and some always in the middle section. While Semitic alphabets have many letters that are written differently depending on whether they occur at the beginning, in the middle or at the end of a word, letters of the Latin, Cyrillic, and Greek alphabets are generally written the same way regardless of their position within a word (with the Greek letter sigma and the obsolete long s being notable exceptions).

The text seems to be more repetitive than typical European languages; there are instances where the same common word appears up to three times in a row. Words that differ by only one letter also repeat with unusual frequency, causing single-substitution alphabet decipherings to yield babble-like text. Elizebeth Friedman in 1962 described such attempts as “doomed to utter frustration”.

There are only a few words in the manuscript written in a seemingly Latin script. On the last page, there are four lines of writing written in rather distorted Latin letters, except for two words in the main script. The lettering resembles European alphabets of the late 14th and 15th centuries, but the words do not seem to make sense in any language. Also, a series of diagrams in the “astronomical” section has the names of ten of the months (from March to December) written in Latin script, with spelling suggestive of the medieval languages of France, northwest Italy or the Iberian Peninsula. However, it is not known whether these bits of Latin script were part of the original text or were added later.

A physicist has tackled the Voynich manuscript with statistical methods[5]. He is Marcelo Montemurro, and he insists he is not obsessed with the Voynichy manuscript. The UK-based physicist has never even seen the actual calfskin pages (he has accessed a digital form of the manuscript[6].) With a colleague, Montemurro published a paper in 2013 that compares Voynich statistics to English, Chinese, Latin, and even Fortran. The manuscript’s statistics resemble those of real languages too much to fake, they wrote[7].

[1] Asemic writing is a wordless open semantic form of writing. The word asemic means “having no specific semantic content”. With the nonspecificity of asemic writing there comes a vacuum of meaning which is left for the reader to fill in and interpret. All of this is similar to the way one would deduce meaning from an abstract work of art. The open nature of asemic works allows for meaning to occur trans-linguistically; an asemic text may be “read” in a similar fashion regardless of the reader’s natural language. Multiple meanings for the same symbolism are another possibility for an asemic work.

[2] The book being reviewed is, however, cataloged under cryptography Z105.5.V65 D55

[3] It is also available at the NSA website at

[4] In linguistics, ductus refers to qualities and characteristics of writing or speaking instantiated in the act of speaking or the flow of writing the text. For instance, in writing, ductus includes the direction, sequencing, and speed with which the strokes making up a character are drawn.

[5] Sophia Chen, “Profiles in Versatility: Physicist Tackles Voynich Manuscript with Statistical Methods,” in APS Newsz (26, 2, February 2017, pp. 4-5)

[6] A digital version, in several versions, is available from The Voynich Manuscript for free download.

[7] Montemurro, Marcelo A. and Damián H. Zanette, “Keywords and Co-Occurrence Patterns in the Voynich Manuscript: An Information-Theoretic Analysis,” PLoS ONE 8(6): e66344. doi:10.1371/journal.pone.0066344


