Cluster-Based Document Retrieval with Multiple Queries
The merits of using multiple queries representing the same information need to improve retrieval effectiveness have recently been demonstrated in several studies. In this paper we present the first study of utilizing multiple queries in cluster-based document retrieval; that is, using information induced from clusters of similar documents to rank documents. Specifically, we propose a conceptual framework of retrieval templates that can adapt cluster-based document retrieval methods, originally devised for a single query, to leverage multiple queries. The adaptations operate at the query, document list and similarity-estimate levels. Retrieval methods are instantiated from the templates by selecting, for example, the clustering algorithm and the cluster-based retrieval method. Empirical evaluation attests to the merits of the retrieval templates with respect to very strong baselines: state-of-the-art cluster-based retrieval with a single query and highly effective fusion of document lists retrieved for multiple queries. In addition, we present findings about the impact of the effectiveness of queries used to represent an information need on (i) cluster hypothesis test results, (ii) percentage of relevant documents in clusters of similar documents, and (iii) effectiveness of state-of-the-art cluster-based retrieval methods.
Joint work with Dr. Fiana Raiber (Yahoo Research) and Prof. J. Shane Culpepper (RMIT)