Comparison of Search-based and Topic-based relevance ranking for online search results

Peter Noerr

Notification

We are sorry to announce that this presentation was cancelled.

Conventionally search results are ranked using an algorithm to determine the “goodness of fit” between the user’s search and the documents. This typically consists of some form of aggregation of the number of matching terms in the search and the result documents, with more matches being considered a measure of higher relevance.

Various alternative schemes have been put into use – most notably those of the web search engines, which use a ranking based mainly on the “popularity” of the result. However these methods are not available for use with corpora of scientific literature documents as there is little linking between the documents.

The existing search based relevance ranking methods suffer from a paucity of data to compare to create relevance scores.

Topic based ranking utilizes a Topic of Interest built up about the user to augment the sparse terms from the search. It also utilizes technology to enrich the documents with the abstract or similar. These together form a much larger set of terms on which to apply the scoring algorithm, thus providing a more graduated and sensible ranking.

The mechanisms built into the Muse federated search engine allow that the user can score individual documents and have that action re-rank the rest of the results so that documents like the good one (or bad one) float (or sink) accordingly. Methods such as thresholds can be used to remove the worst matching documents automatically, and tools allow the user to have multiple Topics and apply them as needed. Topics are created automatically, but can be edited by users.

This paper describes the basics of Topic Based ranking, and gives results for searches conducted with and without this ranking mechanism, showing the improved results obtained at or near the top of the results list when using the method. A demonstration of the method will be given, and made available during the conference for delegates to try.

 

 

Editor: Milan Janíček
Last modified: 28.4. 2011 14:04  
Contact: +420 232 002 515, milan.janicek@techlib.cz