An Effective, Efficient, and Stable Framework for Query Clustering

March 6, 2024
Abstract

Yahoo! Trending Now lists the most trending ten user queries from Yahoo! Search. To generate top trending queries, query clustering is an essential step that aggregates similar queries into clusters, each representing an event or a topic. Established on a heuristic clustering approach, the existing framework is able to generate suboptimal results but lacks the ability to: 1) fully exploit semantic information in news articles associated with user queries, and 2) account for changes in queries and news articles over consecutive timestamps. In this paper, we first introduce a two-stage query clustering approach that leverages both match-based grouping and distance-based clustering. This novel and effective solution significantly surpasses the existing production method. Furthermore, to address the challenges posed by high time complexity and potential cluster fluctuations on account of temporal factors, we optimize the newly proposed approach by 1) utilizing a caching mechanism to store historical query features to enhance computational efficiency, and 2) applying voting and rolling average strategies at the time window level, resulting in smoother feature representations and more robust clustering outcomes. Through offline evaluation, our integrated method speeds up the baseline by 20 times and reduces cluster fluctuations by 15 times. These improvements considerably enhance the efficiency and stableness of query clustering for Yahoo! Trending Now.

Download
Publication Type
Paper
Conference / Journal Name
ICDE 2024

BibTeX


@inproceedings{
    author = {},
    title = {‌An Effective, Efficient, and Stable Framework for Query Clustering‌},
    booktitle = {Proceedings of ICDE 2024‌},
    year = {‌2024‌}
}