WG8: Semantic Information Retrieval: A Natural Language Processing Task

In many approaches to information retrieval, users’ requests are expressed as keywords, and documents are represented in terms of the words they contain. Therefore, such systems cannot retrieve documents in which terms, with similar meaning to those of the query, are used. Search engines do not consider the semantic relation between documents. In order to provide inexperienced users with a flexible access to information, it is crucial to take also into account the concepts expressed by the documents and therefore to add word semantics to the classic word based access.

Description of work

The activity of this workgroup will allow for a conceptual search of the information on the Web. Word Sense Disambiguation (WSD), in the field of Natural Language Processing, consists in assigning the correct sense (semantics) to a word by means of the context in which it is found. Labeling each word with the correct sense in “usual” WSD field is a non-trivial task since many words are polysemic (i.e., have more than one sense). A typical solution is to examine the portion of text in which the word is embedded (i.e., the context of the word) and to assign a sense to the word accordingly. Most of the WSD approaches are corpus-based approaches, that is, they apply machine-learning algorithms to learn from training corpora. On the other hand, the knowledge-based approaches rely only on an external knowledge source to perform disambiguation in a fully automated way. These approaches may exploit the structure of ontology to quantify the correlation among the sense of a given word and its context by evaluating the conceptual density between them. The final aim of this activity is the application of the WSD techniques to Indo-European languages in order to allow for a multi-lingual and conceptual Information Retrieval.

