Untitled Document

You are in: WORKGROUPS > WG4

WORKGROUPS

WG1: Web3D Technologies

WG2: E-Contents for cultural dissemination: Heritage and Science

WG3: Specification and verification of Web Sites

WG4: Clustering techniques for document organisation and retrieval

WG5: Innovation dissemination from academia to business (A2B)

WG6: Quality Control in Digital Libraries

WG7: Intelligent Tutoring Systems

WG8: Semantic Information Retrieval: A Natural Language Processing Task

WG9: Personalization in e-learning systems

WG4: Clustering techniques for document organisation and retrieval

Start date	Year 1, Month 7
Duration	Eleven months
Partner responsible	Genoa
Other partners	Valencia , Hyderabad
Exchange	1 post-graduate (10 days), 1 expert (10 days)
Workshop	Genoa – Year 2, Month 6

Objectives

When searching a document collection, it is often useful (for speed, efficiency, or understandability) to provide it with a structure. This is similar to what is commonly done in libraries, where books are sorted by categories and by year or alphabetical order to make access easier and faster.

In an electronic document collection, such structure should be provided automatically, and may be based on several similarity criteria: by contained terms, by document structure, by document category, by meaning of content. A popular structure is provided by grouping, or clustering. It is easier to search a collection, or to scan the results of a search, if documents are organised in clusters of items which are in some way similar to each other. Clustering can therefore be used preliminarly, to guide the automatic search process, or a posteriori, to organise the results to be presented to the user.

Description of work

The activity of this workgroup is focused on the research on novel clustering techniques and their application to the information retrieval task. Some original clustering algorithms have been proposed by the participants (they are informally termed “Partially Possibilistic c-Means” and “Farthest Neighbour Divisive Clustering”). The first method is especially suitable for the organisation of a document collection, whereas the second is useful in the presentation of search results sorted in a hierarchy of categories.

The performance of these techniques is promising, so their application to the information retrieval task will be investigated. This will be done by means of a set of thorough experimental tests and by analysis of their theoretical properties.

Experts involved in this workgroup: 5

Deliverables

The deliverable of the project will be: scientific publications; demonstrations of prototypes.