You are in: WORKGROUPS > WG4

WG4: Clustering techniques for document organisation and retrieval

Start date Year 1, Month 7
Duration Eleven months
Partner responsible Genoa
Other partners Valencia , Hyderabad
Exchange 1 post-graduate (10 days), 1 expert (10 days)
Workshop Genoa – Year 2, Month 6



When searching a document collection, it is often useful (for speed, efficiency, or understandability) to provide it with a structure. This is similar to what is commonly done in libraries, where books are sorted by categories and by year or alphabetical order to make access easier and faster.

In an electronic document collection, such structure should be provided automatically, and may be based on several similarity criteria: by contained terms, by document structure, by document category, by meaning of content. A popular structure is provided by grouping, or clustering. It is easier to search a collection, or to scan the results of a search, if documents are organised in clusters of items which are in some way similar to each other. Clustering can therefore be used preliminarly, to guide the automatic search process, or a posteriori, to organise the results to be presented to the user.

Description of work

The activity of this workgroup is focused on the research on novel clustering techniques and their application to the information retrieval task. Some original clustering algorithms have been proposed by the participants (they are informally termed “Partially Possibilistic c-Means” and “Farthest Neighbour Divisive Clustering”). The first method is especially suitable for the organisation of a document collection, whereas the second is useful in the presentation of search results sorted in a hierarchy of categories.

The performance of these techniques is promising, so their application to the information retrieval task will be investigated. This will be done by means of a set of thorough experimental tests and by analysis of their theoretical properties.

Experts involved in this workgroup: 5


The deliverable of the project will be: scientific publications; demonstrations of prototypes.