Mixing Semantic Networks and Conceptual Vectors: the Case of Hyperonymy Violaine Prince LIRMM-CNRS and University Montpellier 2 161 Ada Street, 34392 Montpellier cedex 5 France

[email protected] Mathieu Lafourcade LIRMM-CNRS and University Montpellier 2 161 Ada Street, 34392 Montpellier cedex 5 France

[email protected] Abstract In this paper, we focus on lexical semantics, a key issue in Natural Language Processing (NLP) that tends to con- verge with conceptual Knowledge Representation (KR) and ontologies. When ontological representation is needed, hy- peronymy, the closest approximation to the is-a relation, is at stake. In this paper we describe the principles of our vec- tor model (CVM: Conceptual Vector Model), and show how to account for hyperonymy within the vector-based frame for semantics. We show how hyperonymy diverges from is-a and what measures are more accurate for hyperonymy rep- resentation. Ourdemonstrationresultsininitiatinga’coop- eration’processbetweensemanticnetworksandconceptual vectors. Text automatic rewriting or enhancing, ontology mapping with natural language expressions, are examples of applications that can be derived from the functions we defi ne in this paper. Keywords: knowledge representation, cognitive linguistics, natural language processing. 1Introduction Lexical semantics are a key issue in Natural Language Processing (NLP) since they represent the point of conver- gencewithconceptualKnowledgeRepresentation(KR)and ontologies. They also browse the area of lexical resources processing, so that many works in both NLP and AI have been devoted to lexical semantic functions, as a way to tackle the problem of word sense representation and dis- crimination. Among the well established trends in lexical semantics representations, two trends seem to be confl ict- ual: the WordNet approach [11], [4], born from semantic networks, and KR-oriented, and the Vector approach, origi- nated from the Saltonian representation in Information Re- trieval [17], which has found a set of applications in NLP. The fi rst is based on logic and the second on vector-space algebra. The fi rst is very effi cient for is-a relationships (consid- ered as the conceptual relation often embedded in hyper- onymy) but is silent, or almost so, about several other in- teresting lexical functions such as antonymy and thematic association.Synonymy has been tackled [19], [13], but discrimination between synonymy and hyperonymy has of- ten led researchers to look for a more fl exible notion such as semantic similarity [14]. The vector approach is com- pletely at the at the opposite. Offering very easily thematic association, it allows many fi ne-grained synonymy [7] and antonymy [20] functions to be defi ned and implemented, but is unable to differentiate or to valid the existence of hy- peronymous relations. In this paper we show how to account for hyperonymy within the vector-based frame for semantics, relying on a cooperation between semantic networks and conceptual vectors, and how this can be applied to new functions such as word substitution, and semantic approximation, that be- long to the fi eld of semantic similarity. We use a semantic network to enhance vector learning, and symmetrically we build customized semantic networks out of hyperonymous relations between vectors. 2Hyperonymy and is-a relations 2.1 Defi ning Hyperonymy Hyperonymy is a lexical function that, given a term t, associates to t one or many other terms that are more gen- eral, such as those used to defi ne t in genus and differentiae (aristotelian defi nition). Its symmetrical function is called hyponymy. Hyperonymy, in almost all KR papers, is assimilated to the general argument of the is-a relationship (fundamentals are given in [1]). Let us remind that the is-a relationship is such as if X is a class of objects, and X0a subclass of X, then is ? a(X0,X) is true. The rightmost argument X is called the general argument whereas X0is said to be the specifi c argument. The problem is that linguistic hyper- onymy is not a ”pure” is-a relation. When the word horse is defi ned, we fi nd: ”a herbivorous animal, with four legs, etc.”. A good hyperonym for this defi nition of horse is herbivorous mammal. Animal is another hyperonym, since ’herbivorous mammal is-a mammal and mammalis-a ani- mal’ is true. However, thematically, a horse is very close to a herbivore, whereas herbivores do not constitute a class but a set of individuals that may belong to different lines of the taxonomy (birds and insects and reptiles could be herbiv- orous, but also metaphorically, many other things). Thus, even if, in language, one wants to write that a horse is a herbivore eventhough horse is-a herbivore is false. 2.2WordNet and Hyperonymy WordNet is a built taxonomy of words, and as such, only captures is-a relations. A hyperonym is a linguistic superor- dinate, generally used in defi nitions that also captures par- ticular properties that cannot act as classes by themselves. Polysemous words have many defi nitions, and thus many hyperonyms: a horse is also a vehicle, that is, a mean of transportation.This implies many is-a relations, which ex- plains why WordNet is a network and not a tree. The only constraint in language is that a hyperonym needs to be more general (and thus herbivore could act as a hyperonym for horse) whereas in a semantic network, every step of the chain of classes and subclasses must verify the order rela- tion. 2.3 Hyperonymy and Word Defi nition As shown before, hyperonyms could be extracted, when they are not known, from most dictionary like defi nitions. Only general concepts, which tend to play the role of hy- peronyms (and is-a ) superclasses of many others, are not defi ned through aristotelian defi nition, but are explained by their hyponyms. This is why, in our CVM (Conceptual Vec- tor Model) model presented in next section, we consider the existence of a ”hyperonymy horizon” beyond which defi - nitions become inversed: hyperonyms are more diffi cult to fi nd and less explicative than hyponyms. The word action is almost at the top of the WordNet taxonomy and dictionary defi nitions tend to explain it with more specifi c words. 3The Conceptual Vector Model (CVM) Vectors have been used in Information Retrieval for long [18] and for meaning representation by the LSI model [3] from latent semantic analysis (LSA) studies in psycholin- guistics. In NLP, [2] proposes a formalism for the projec- tion of the linguistic notion of semantic fi eld in a vectorial space, from which our model is inspired. From a set of elementary notions, concepts, it is possi- ble to build vectors (conceptual vectors) and to associate them to lexical items.1The hypothesis that considers a set of concepts as a generator to language has been long de- scribed in [16] (thesaurus hypothesis) and has been used by researchers in NLP (e.g. [21]). Polysemous words combine different vectors corresponding to different meanings. This vector approach is based on well known mathematical prop- erties: it is thus possible to undertake formal manipulations attached to reasonable linguistic interpretations. Concepts are defi ned from a thesaurus (in our prototype applied to French, we have chosen [8] where 873 concepts are identi- fi ed to compare with the thousand defi ned in [16]). To be consistent with the thesaurus hypothesis, we consider that this set constitutes a generator space for words and their meanings. This space is probably not free (no proper vecto- rial base) and as such, any word would project its meaning on this space according to the following principle. 3.1Principle Let be C a fi nite set of n concepts, a conceptual vector V is a linear combinaison of elements ciof C. For a mean- ing A, a vector V (A) is the description (in extension) of activations of all concepts of C. For example, the different meanings of ,door- could be projected on the following con- cepts (the set of pairs (CONCEPT[intensity]) are ordered by increasing values): V(,door-) = (OPENING[0.3],BARRIER[0.31], LIMIT[0.32],PROXIMITY[0.33],EXTERIORR[0.35],INTERIOR[0.37], . In practice, the largest C is, the fi ner the meaning de- scriptions are. In return, the computer manipulation is less easy. As most vectors are dense (very few null coordinates), the enumeration of activated concepts is long and diffi cult to evaluate. We generally prefer to select the thematically closest terms, i.e., the neighbourhood. For instance, the closest terms ordered by increasing distance of ,door- are: V(,door-)=,portal-, ,portiere-, ,opening-, ,gate-, ,barrier-,. To handle semantics within this vector frame, we use the common operations on vectors. An interesting measure is the angular distance that accounts for a similarity measure. As an example, we present, hereafter, the vector sum , the scalar product and the angular distance equations. 3.1.1Vectors Sum Let A and B be two vectors, we defi ne V as their normed sum: V = X ⊕ Y|vi= (xi+ yi)/kV k(1) Intuitively, the vector sum of A and B corresponds to the union of semantic properties of A and B. This operator is idempotent as we have A ⊕ A = A. The null vector~0 is 1Lexical items are words or expressions which constitute lexical en- tries. For instance, ,car- or ,white ant- are lexical items. In the following we will sometimes use word or term to speak about a lexical item. a neutral element of the vector sum and, by defi nition, we have~0 ⊕~0 =~0. 3.1.2Vectors Product The vector product is equivalent to a normed term to term product. Let X and Y be two vectors, we defi ne V as their normed term to term product: V = X ? Y|vi= √x iyi (2) This operator is idempotent and~0 is absorbent. V = X ? X = XandV = X ?~0 =~0(3) Intuitively, the vector product of A and B corresponds to the intersection of semantic properties of A and B. This is a crucial feature for hyperonymy since a hyperonym and its hyponym could be seen as one containing the properties of the other. But it is also important in synonymy and may give hints about polysemous properties of some conceptual vectors (intersections with many different vectors). A better function for emphasizing intersection is given in the para- graph about contextualisation. 3.1.3Angular Distance Let us defi ne Sim(A,B) as one of the similarity measures between two vectors A et B, often used in Information Re- trieval. We can express this function as: Sim(A,B) = cos(dA,B) = A · B kAk × kBk with “·” as the scalar product. We suppose here that vector componentsarepositiveornull. Then, wedefi neanangular distance DAbetween two vectors A and B as follows: DA(A,B) = arccos(Sim(A,B)) withSim(A,B) = cos(dA,B) = A · B kAk × kBk (4) Intuitively, this function constitutes an evaluation of the- matic proximity and is the measure of the angle between the two vectors. We would generally consider that, for a distance DA(A,B) ≤ π 4, (i.e. less than 45 degrees), A and B are thematically close and share many concepts. For DA(A,B) ≥ π 4, the thematic proximity between A and B would be considered as loose. Around π 2, they have no re- lation. DA is a real distance function. It verifi es the prop- erties of refl exivity, symmetry and triangular inequality. In the following, we will speak of distance only when these last properties will be verifi ed, otherwise we will speak of measure. 3.1.4Contextualisation When two terms are in presence of each other, some of the meanings of each of them are thus selected by the presence of the other, acting as a context. This phenomenon is called contextualisation. It consists in emphasizing common fea- tures of every meaning. Let X and Y be two vectors, we defi ne γ(X,Y ) as the contextualisation of X by Y as: γ(X,Y ) = X ⊕ (X ? Y )(5) These functions are not symmetrical. The operator γ is idempotent (γ(X,X) = X) and the null vector is the neu- tral element.(γ(X,~0) = X ⊕~0 = X).We will no- tice, without demonstration, that we have thus the following properties of closeness and of farness: DA(γ(X,Y ),γ(Y,X)) ≤ {DA(X,γ(Y,X)),DA(γ(X,Y ),Y )} ≤ DA(X,Y ) (6) The function γ(X,Y ) brings the vector X closer to Y proportionally to their intersection. The contextualization is a low-cost meaning of amplifying properties that are salient in a given context. For a polysemous word vector, if the context vector is relevant, one of the possible meanings is activated through contextualization. For example, bank by itself is ambiguous and its vector is pointing somewhere be- tween those of river bank and money institution. If the vec- tor of bank is contextualized by river, then concepts related to fi nance would considerably dim. 3.2Implemented Lexical Functions: Synonymy and Antonymy 3.2.1Synonymy Two lexical items are in a synonymy relation if there is a semantic equivalence between them. Synonymy is a pivot relation in NLP, but remains problem- atic, since semantic equivalence is not translatable into an equivalence relationship. It does not necessarily verify tran- sitivity [10] and it could be, at least partially, confused with hyperonymy, when equivalence is reduced to semantic sim- ilarity [14]. A possible solution in a vector framework is to defi ne a contextual synonymy (also proposed in [5]) repre- sented by a three argument relation, which then supports the properties of an equivalence relationship. The suggested so- lution is called relative synonymy [7]. The functional repre- sentation is the following: We defi ne the relative synonymy functionSynR, between three vectors A, B and C, the later playing the role of a pivot, as: SynR(A,B,C) = DA(γ(A,C),γ(B,C)) = DA(A ⊕ (A ? C),B ⊕ (B ? C)) (7) The interpretation corresponds to testing the thematic close- ness of two meanings (A and B), each one enhanced with what it has in common with a third (C). The advantage of such a solution is that it circumvents the effects of polysemy in cutting transitivity and symmetry. However, it does not provide a real distinction between a hyperonym of a given meaning of a word, and a true synonym of such a word. This problem is discussed in next section, when introduc- ing more fl exible notions such as word substitution. 3.2.2Antonymy Two lexical items are in antonymy relation if there is a sym- metry between their semantic components relatively to an axis. Three types of symmetry have been defi ned, inspired from linguistic research [12]. As an example, we expose only the ‘complementary’ antonymy proposed by [20]:The same method is used for the other types. Complementary antonyms are couples like event/unevent, presence/absence. Complementary antonymy presents two kinds of symmetry, (i)avaluesymmetryinabooleansystem, asintheexamples above, and (ii) a symmetry about the application of a prop- erty (black is the absence of color, so it is “opposed” to all other colors or color combinaisons). The functional repre- sentation is the following: The function AntiLexSreturns the n closest antonyms of A in the context defi ned by C in reference to R. The pa