The Size of Simenon's Vocabulary

"Simenon limited himself to a vocabulary of 2,000 words, acting on the advice of Colette, who warned him against writing 'beautiful sentences'."

—Paul Bailey, introduction to the 2003 Penguin edition of Inspector Cadaver

"[Simenon] had employed a vocabulary of 2,000 words, while admitting that he knew more for his personal use."

—Patrick Marnham, in The Man Who Wasn't Maigret, (Farrar, Strauss, Giroux, New York, 1993, p 2)

"I asked Simenon: 'Is it true that you have not used more than 2,000 words?" "That's too many – he replied – I did not reach that figure. Besides, Racine only used 800.'"

—Giulio Nascimbeni, 2003 Lunario article.

"I have since the age of eighteen tried to have a style as simple as possible," he told a French interviewer [Jean-Louise Egine] in February 1978 on the eve of his seventy-fifth birthday. "And that was for a reason: I once read a statistic that revealed that over half the people in France used no more than a total of 600 words. So what was the good of my using abstract words?"

—Fenton Bresler, in The Mystery of Georges Simenon, (Beaufort Books, New York, 1983, p 2)

(If you know of any similar comments, please e-mail me the reference.)

Did Simenon use only 2,000 unique words in his novels?

Technical issues

The two main problems in analyzing the frequency of word use in Simenon's works are the definition of a "word", and the method of counting. If the texts are in electronic form, software for the counting is available. Hal Brown's Wurdz software, available as downloadable freeware on the Net, does word frequency analysis. There is similar software which can be used online, Word Frequency Indexer, by Catherine N. Ball of Georgetown Linguistics. Both will produce a file of the "unique" words in a text with the number of occurrences.
In the case of Wurdz, a "word" is anything separated by characters other than the 26 letters of the English alphabet. That means that in analyzing French, for example, all accented characters are treated as word separators, and removed. To get around this problem, alphabetic sequences can be substituted for the accented characters, which can be converted back after the analysis is done. In the case of Ball's program, words are considered as occurring between spaces or normal punctuation, which means that apostrophes and hyphens remain within words.
Neither program deals directly with the issue of "uniqueness" — grammatical variants of the "same word" are treated as separate words. This means that, for example, trouva, trouvais, trouvait, trouve, trouver, trouverait, trouveras, trouvé, trouvée... actually all grammatical forms of the verb trouver, 'to find', are counted as nine distinct words. In the case of French, the problem extends to singular and plural forms of nouns and adjectives, as well as masculine/feminine forms of adjectives. These must all be "collapsed" – like entries grouped together – manually to get a true estimate of the number of unique words used in a text.


So far I have only performed the manual grouping of like forms on a single text, the longest Maigret short story, "Une Noël de Maigret". In that case the "raw" count for unique words, the result of the initial Wurdz analysis, was 2,925. The count after grouping was approximately 1950 — 33% smaller. Assuming that this 33% figure will hold throughout for the French texts, that figure has been estimated in the table below.


Unique words
as % of total
Maigret et son mort47,2355,72311.133,815
La Maison de l'inquiétude30,8384,68115.183,120
Un Noël de Maigret18,8272,92515.541,950
L'improbable Monsieur Owen9,7992,29623.431,530
Ceux du Grand Café9,7322,17422.341,449
Menaces de mort9,4642,36625.001,577

I'll continue this with additional texts to validate these results, but at first glance it seems like the vocabulary Simenon used in his works may not be significantly smaller than that of many other authors, and that he probably never wrote a novel using as few as 2,000 unique words... more likely they fall between 3,000 and 4,000 for the Maigrets. There's a fairly natural rule that the shorter the piece the greater the percentage of unique words. (In other words, the longer the piece of writing, the more likely you are to use the same words over again.)

Is this another of the notorious Simenon myths he "complains" about in "When I was old"?

ST - July 1, 2003


