In this work we study how entities, which are semantically meaningful units associated with rich semantic information, can be utilized in information retrieval (IR). We address two different tasks: entity retrieval and ad-hoc document retrieval.
Entity retrieval is the task of ranking entities with respect to a user query. In our work we suggest a novel clustering based method for entity ranking which is shown to be highly effective. In addition, we explore the query performance prediction task for entity retrieval; that is, estimating retrieval effectiveness without having relevance judgements.
Ad-hoc document retrieval is the classic IR task of ranking documents with respect to a user query. Traditionally, this task is addressed by comparing term-based query and documents representations. We address the challenge of creating more semantically meaningful representations by suggesting two novel language models, considering both terms and entities marked in the text. These models serve for retrieval in the language modeling framework. We show that using these models significantly improves retrieval effectiveness compared with using terms (or entities) alone as well as with a state-of-the-art term proximity retrieval method.
Finally, we suggest two query expansion methods utilizing inter-entity similarities for the task of ad-hoc document retrieval. We evaluate the retrieval effectiveness of using these methods and demonstrate their considerable potential