𝓔𝓿𝓸Sem manual

𝓔𝓿𝓸Sem is a scientific project meant to explore the “Evolving Semantics” at play in the world's languages. It brings in one place the vast knowledge acquired by generations of scholars in the domain of etymology, for a variety of language families.

Our purpose is to observe empirically the way languages have built semantic connections between concepts, through the historical evolution of their lexicons. Thus in many parts of the world, a word that meant ‘sky’ at some early stage, was later used to mean ‘day’. When a given language uses a single word to encode these two concepts, we speak of colexification (François 2008): e.g. Mandarin colexifies ‘sky’ & ‘day’ using the word 天 (tiān). But 𝓔𝓿𝓸Sem goes beyond colexification – a synchronic property of individual languages – and observes how words extend their meaning not just in one language, but from language to language in a same family.

𝓔𝓿𝓸Sem builds on the new concept of dialexification – the diachronic counterpart of colexification. The units of observation are not individual words, but word families or cognate sets – that is, all the words that are descended from a single etymon.

When observing different languages, two words are cognate if they descend from the same etymon: e.g. French voir and Italian vedere ‘see’ both descend from Latin vidēre, through regular sound change; ultimately, they descend from a PIE etymon *wéyd-e-ti. In fact, Italian vedere ‘see’ and German wissen ‘know’ are also cognate, because they both descend from that same PIE etymon *wéyd-e-ti.

Two meanings are dialexified iff they are attached to words that are cognate, i.e. that descended historically from the same etymon. In our example, the concepts see and know are dialexified in Indo-European, because they are attached to words that are cognate, being both descendants of the etymon *wéyd-e-ti. We could also say that the PIE etymon *wéyd-e-ti dialexifies the two meanings see and know.


If you wish to know more about 𝓔𝓿𝓸Sem — why and how it was created, or how to read its graphs and tables — you can read our paper:

Mathieu Dehouck, Alexandre François, Siva Kalyan, Martial Pastor & David Kletz. (2023) pdf 𝓔𝓿𝓸Sem: A database of polysemous cognate sets. In Nina Tahmasebi et al. (conv.), Proceedings of the 4th Workshop on Computational Approaches to Historical Language Change, 66–75. Singapore. Association for Computational Linguistics.

Here is how you can cite the 𝓔𝓿𝓸Sem database:

Alexandre François, Siva Kalyan, Mathieu Dehouck, Martial Pastor & David Kletz. () 𝓔𝓿𝓸Sem: A database of dialexification across language families. Online database. CNRS—LaTTiCe, Paris. https://tiny.cc/EvoSem [access date: ]