Talk: Removing bias from user interaction data at SEA meetup

Ilya Markov will give a talk on Removing bias from user interaction data at the Search Engines Amsterdam meetup (SEA). The meetup will also feature the talks by Diane Kelly (University of North Carolina at Chapel Hill) and Dolf Trieschnigg (MyDataFactory) and will host around 50 participants from academia and industry.

Abstract:
User interaction with search engines is affected by various biases. For example, users tend to click on top search results (position bias), they can be attracted by visually salient content such as images (attention bias), etc. At the same time, user interaction data contains invaluable information about users, their interests and preferences in search and, thus, is heavily used by search engines to improve their quality. However, to reliably use this interaction data and to uncover actual user preferences, various biases must be removed first. This talk discusses the problem of bias in user interaction data and approaches to removing this bias. After a general discussion, the talk focuses on a particular type of user interactions, namely time between user actions in search (e.g., time between clicks, time to first click, time between queries, etc). We show that such times are context-biased, i.e., they are affected by the context in which they are observed (e.g., ranks of clicked documents, user search history, etc). To remove the context bias, we model the time between user actions as a probability distribution. The parameters of this distribution are composed of two components: context-dependent and context-independent. After learning these components using neural networks, we show that the context-aware model approximates the time between user actions significantly better than models that do not consider context. Moreover, by splitting the model into context-dependent and context-independent parts we remove the context bias from the latter. As a result, we show that the context-independent component can be used to improve the quality of search.