A short paper on “Calibration: A Simple Way to Improve Click Models” by Alexey Borisov, Julia Kiseleva, Ilya Markov, and Maarten de Rijke will be presented at the ACM International Conference on Information and Knowledge Management (CIKM 2018) in Turin, Italy, and will be published in the conference proceedings.
Click models are important and widely used tools for interpreting user behavior in Web search. As for many machine learning algorithms, their prediction performance strongly depends on the hyperparameters used for training. We show that click models trained with suboptimal hyperparameters are not well calibrated. This means that their predicted click probabilities do not agree with the observed proportions of clicks in the held-out data. We adapt a non-parametric calibration method called isotonic regression to repair the discrepancy between the click probabilities predicted by a model and the proportion of clicks in the held-out data. We show that isotonic regression significantly improves click models trained with suboptimal hyperparameters in terms of perplexity, and that calibrated click models are less sensitive to the choice of hyperparameters than the original (non-calibrated) ones. Interestingly, the relative ranking of existing click models in terms of their predictive performance changes depending on whether or not their predictions are calibrated. We therefore advocate that calibration becomes a mandatory part of the click model evaluation protocol.
A full paper on Constructing an interaction behavior model for web image search by Xiaohui Xie, Jiaxin Mao, Maarten de Rijke, Ruizhe Zhang, Min Zhang, and Shaoping Ma was presented at the International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR 2018) in Ann Arbor, MI, USA, and was published in the conference proceedings.
User interaction behavior is a valuable source of implicit relevance feedback. In Web image search a different type of search result presentation is used than in general Web search, which leads to different interaction mechanisms and user behavior. For example, image search results are self-contained, so that users do not need to click the results to view the landing page as in general Web search, which generates sparse click data. Also, two-dimensional result placement instead of a linear result list makes browsing behaviors more complex. Thus, it is hard to apply standard user behavior models (e.g., click models) developed for general Web search to Web image search.
In this paper, we conduct a comprehensive image search user behavior analysis using data from a lab-based user study as well as data from a commercial search log. We then propose a novel interaction behavior model, called grid-based user browsing model (GUBM), whose design is motivated by observations from our data analysis. GUBM can both capture users’ interaction behavior, including cursor hovering, and alleviate position bias. The advantages of GUBM are two-fold: (1) It is based on an unsupervised learning method and does not need manually annotated data for training. (2) It is based on user interaction features on search engine result pages (SERPs) and is easily transferable to other scenarios that have a grid-based interface such as video search engines. We conduct extensive experiments to test the performance of our model using a large-scale commercial image search log. Experimental results show that in terms of behavior prediction (perplexity), and topical relevance and image quality (normalized discounted cumulative gain (NDCG)), GUBM outperforms state-of-the-art baseline models as well as the original ranking. We make the implementation of GUBM and related datasets publicly available for future studies.
A full paper on A Click Sequence Model for Web Search by Alexey Borisov, Martijn Wardenaar, Ilya Markov, and Maarten de Rijke was presented at the International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR 2018) in Ann Arbor, MI, USA, and was published in the conference proceedings.
Getting a better understanding of user behavior is important for advancing information retrieval systems. Existing work focuses on modeling and predicting single interaction events, such as clicks. In this paper, we for the first time focus on modeling and predicting sequences of interaction events. And in particular, sequences of clicks.
We formulate the problem of click sequence prediction and propose a click sequence model (CSM) that aims to predict the order in which a user will interact with search engine results. CSM is based on a neural network that follows the encoder-decoder architecture. The encoder computes contextual embeddings of the results. The decoder predicts the sequence of positions of the clicked results. It uses an attention mechanism to extract necessary information about the results at each timestep. We optimize the parameters of CSM by maximizing the likelihood of observed click sequences.
We test the effectiveness of CSM on three new tasks: (i) predicting click sequences, (ii) predicting the number of clicks, and (iii) predicting whether or not a user will interact with the results in the order these results are presented on a search engine result page (SERP). Also, we show that CSM achieves state-of-the-art results on a standard click prediction task, where the goal is to predict an unordered set of results a user will click on.
A short paper on Online Expectation-Maximization for Click Models by Ilya Markov, Alexey Borisov, and Maarten de Rijke was presented at the ACM Conference on Information and Knowledge Management (CIKM 2017) in Singapore and was published in the conference proceedings.
Click models allow us to interpret user click behavior in search interactions and to remove various types of bias from user clicks. Existing studies on click models consider a static scenario where user click behavior does not change over time. We show empirically that click models deteriorate over time if retraining is avoided. We then adapt online expectation-maximization (EM) techniques to efficiently incorporate new click/skip observations into a trained click model. Our instantiation of Online EM for click models is orders of magnitude more efficient than retraining the model from scratch using standard EM, while loosing little in quality. To deal with outdated click information, we propose a variant of online EM called EM with Forgetting, which surpasses the performance of complete retraining while being as efficient as Online EM.
A short paper on Evaluating and Analyzing Click Simulation in Web Search by Stepan Malkevich, Ilya Markov, Elena Michailova, and Maarten de Rijke was presented at the ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR 2017) in Amsterdam and was published in the conference proceedings.
We evaluate and analyze the quality of click models with respect to their ability to simulate users’ click behavior. To this end, we propose distribution-based metrics for measuring the quality of click simulation in addition to metrics that directly compare simulated and real clicks. We perform a comparison of widely-used click models in terms of the quality of click simulation and analyze this quality for queries with different frequencies. We find that click models fail to accurately simulate user clicks, especially when simulating sessions with no clicks and sessions with a click on the first position. We also find that click models with higher click prediction performance simulate clicks better than other models.