This collection of proverbs has been assembled by the language reformer Vuk Stefanovic Karadzic. Its first edition dates from the year 1849. All the later editions, including the edition published jointly by NOLIT and PROSVETA in 1987, reflect this first edition in many aspects, including the obscure ones: the elements of the old orthography are present (for instance, ne &cx;u instead of ne&cx;u) as well as the characters that do not exist in modern Serbian Cyrillic alphabet (for instance, the hard sign). The inventory of proverbs has not changed significantly from the first edition although there are many links to nonexistant proverbs as well as a certain number of proverbs that do not differ but for the word ordering (for instance, the proverb number 4286 Po dvaput se u vodenici govori and the proverb number 5748 U vodenici se po dvaput govori. As one collection of proverbs is basically a referental book that is rarely read from cover to cover the presentation of proverbs given only in a lexicographical order for a language with a free word order is often unsuitable. Having in mind that this text is the unavoidable part of any corpus of contemporary language and that it has been one of the main sources of examples for the traditional dictionaries, the new better equiped edition has been longed for both among linguists and lexicographers.

New edition of Serbian Proverbs

In 1989 the distinguished Belgrade publisher NOLIT started the project of modernization of this collection of proverbs that ended in 1996 with the publication of the new edition. Except for the adjustement to the modern orthography and the correction of typographical errors, the original text was not otherwise altered: it contains, as the previous editions, 6919 proverbs separated historically in four parts. The text is accompanied by the comprehensive index that contains all the lexical words - nouns, verbs, adjectives and numbers - that occure in proverbs. The first step in the index compilation was the production of classical concordances that served as a basis for the lemmatization that was performed twice by two experianced traditional lexicographers (example). Their solution, however differed in many cases due to many graphic variations not properly covered in existing dictionaries (example). Many of these variant forms are in use today and can be both heard and found in written text. As a result, all variations in source text where reproduced in printed index as a links of a type see also. A certain number of dummy keywords were introduced in order to make the index more suitable for the modern user: they are the source of the links of type see. The presence of all the variations encoumbred significantly the index.

Electronic edition of Serbian Proverbs

In parallel with the preparation of the new edition the preparation of the electronic edition has started purely as a scientifc, non-commertial project. Like the new paper edition, this electronic version is based on the 1987 edition.

TEI encoding of the original

The electronic edition is based on the encoding sheme proposed by Text Encoding Initiative TEI. As a first step, the document type definition (DTD) has been developed that represents the enhancement of TEI DTD which uses the basic TEI DTD set for prose with addition of element set for names and dates and element set for links. The new DTD enables the coding of:

In the electronic edition, all the elements of the source edition are properly encoded in order to enable the reproduction of both its original and modernized version.

For the encoding of Serbian alphabet, Latin and Cyrillic, the special set of entities is used. The same set is used for the encoding of Serbian version of Plato's Republic and for Serbian Version of Orwell's 1984

Construction of the electronic dictionary

The excerpts from the constructed dictionary can be seen here.

Underlying the text with the dictionary

Some examples of the electronic text of proverbs underlied by the electronic dictionary can be seen here.

Example of the new concordances can be seen here.

