Ilya Markov, Aleksandr Chuklin, Maarten de Rijke and Alexey Borisov will give a course on Click Models for Web Search at the Russian Summer School in Information Retrieval (RuSSIR 2016). This course is an extended version of the tutorial given at SIGIR 2015, AINL-ISMW 2015 and WSDM 2016 and is based on the book on Click Models for Web Search.
Click models, probabilistic models of the interaction behavior of search engine users, have been studied extensively by the information retrieval community in recent years. We now have a handful of click models, parameter estimation methods, evaluation principles and applications of click models, that form the building blocks of ongoing research efforts in the area. The time is right to present this material to a broad audience of information retrieval researchers and practitioners.
The course covers a wide range of topics from basic to advanced click models and from click model estimation and evaluation techniques to applications of click models. Most topics are augmented with live demos, where the participants can try the presented material in practice. Also, the course features two practical sessions, where the participants have a chance to implement a basic and an advanced click models using open-source tools and publicly available datasets with click logs. The participants of the course are provided with the authors’ version of the book on click models and the code and data samples for following the demo and practical sessions.
The material of the course is organized as follows. We start with the definition of a click model and an overview of click model applications. Then we give a unified view on basic click models for web search using a common notation and theoretical background. We discuss the main estimation methods for learning click model parameters, present a set of evaluation techniques for measuring the quality of click models, discuss available datasets and tools for working with click models and present a comprehensive experimental comparison of basic click models for web search.
We then focus on the main applications of click models, such as ranking, user simulation, model-based metrics, etc. We also describe the landscape of advanced click models, dealing with complex SERPs, diverse users, non-linear examination patterns, etc. Finally, we present the current trends in click modeling research, namely using neural networks to model clicks and other types of user interactions. We conclude the course by discussing future research directions in the area of click models.
Ilya Markov will give a talk on Removing bias from user interaction data at the Search Engines Amsterdam meetup (SEA). The meetup will also feature the talks by Diane Kelly (University of North Carolina at Chapel Hill) and Dolf Trieschnigg (MyDataFactory) and will host around 50 participants from academia and industry.
User interaction with search engines is affected by various biases. For example, users tend to click on top search results (position bias), they can be attracted by visually salient content such as images (attention bias), etc. At the same time, user interaction data contains invaluable information about users, their interests and preferences in search and, thus, is heavily used by search engines to improve their quality. However, to reliably use this interaction data and to uncover actual user preferences, various biases must be removed first. This talk discusses the problem of bias in user interaction data and approaches to removing this bias. After a general discussion, the talk focuses on a particular type of user interactions, namely time between user actions in search (e.g., time between clicks, time to first click, time between queries, etc). We show that such times are context-biased, i.e., they are affected by the context in which they are observed (e.g., ranks of clicked documents, user search history, etc). To remove the context bias, we model the time between user actions as a probability distribution. The parameters of this distribution are composed of two components: context-dependent and context-independent. After learning these components using neural networks, we show that the context-aware model approximates the time between user actions significantly better than models that do not consider context. Moreover, by splitting the model into context-dependent and context-independent parts we remove the context bias from the latter. As a result, we show that the context-independent component can be used to improve the quality of search.
A full paper on A Context-aware Time Model for Web Search by Alexey Borisov, Ilya Markov, Maarten de Rijke, and Pavel Serdyukov will be presented at the ACM International Conference on Research and Development in Information Retrieval (SIGIR 2016) in Pisa, Italy. The paper models the time between user actions in web search. It reveals that such times are affected by the context in which they are observed. The paper uses neural networks to automatically detect and remove various source of context bias.
In web search, information about times between user actions has
been shown to be a good indicator of users’ satisfaction with the
search results. Existing work uses the mean values of the observed
times, or fits probability distributions to the observed times. This
implies a context-independence assumption that the time elapsed
between a pair of user actions does not depend on the context, in
which the first action takes place. We validate this assumption using logs of a commercial web search engine and discover that it
does not always hold. For between 37% to 80% of query-result
pairs, depending on the number of observations, the distributions of
click dwell times have statistically significant differences in query
sessions for which a given result (i) is the first item to be clicked
and (ii) is not the first. To account for this context bias effect,
we propose a context-aware time model (CATM). The CATM allows us (i) to predict times between user actions in contexts, in
which these actions were not observed, and (ii) to compute context-independent estimates of the times by predicting them in predefined
contexts. Our experimental results show that the CATM provides
better means than existing methods to predict and interpret times
between user actions.
Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke presented the tutorial on Click Models for Web Search and their Applications to IR at the ACM International Conference on Web Search and Data Mining (WSDM 2016) in San Francisco, CA, USA. The tutorial discusses the methods for modeling user clicks in web search and their applications in the area of information retrieval. The tutorial is based on the book Click Models for Web Search by the same authors.
A full paper on A neural click model for web search by Alexey Borisov, Ilya Markov, Maarten de Rijke, and Pavel Serdyukov was presented at the World Wide Web Conference (WWW 2016) in Montreal, Canada, and was published in the conference proceedings. The paper proposes a new user modeling paradigm, which uses neural networks to learn patterns of user click behavior automatically (instead of designing them manually as done in all previous studies).
Understanding user browsing behavior in web search is key to improving web search effectiveness. Many click models have been proposed to explain or predict user clicks on search engine results. They are based on the probabilistic graphical model (PGM) framework, in which user behavior is represented as a sequence of observable and hidden events. The PGM framework provides a mathematically solid way to reason about one set of events given some information about other events. But the structure of the dependencies between the events has to be set manually. Different click models use different hand-crafted sets of dependencies.
We propose an alternative based on the idea of distributed representations: to represent the user’s information need and the information available to the user with a vector state. The components of the vector state are learned to represent concepts that are useful for modeling user behavior. And user behavior is modeled as a sequence of vector states associated with one query session: the vector state is initialized with a query, and then iteratively updated based on information about interactions with the search engine results. This approach allows us to directly understand user browsing behavior from click-through data, i.e., without the need for a predefined set of rules as is customary for PGM-based click models.
We illustrate our approach using a set of neural click models. Our experimental results show that the neural click model that uses the same training data as traditional PGM-based click models, has better performance on the click prediction task (i.e., predicting user click on search engine results) and the relevance prediction task (i.e., ranking documents by their relevance to a query). An analysis of the best performing neural click model shows that it learns similar concepts to those used in traditional click models; and that it also learns other concepts that cannot be designed manually.
The first stage of the project is well underway, and so are the first insights! Sabrina Sauer will present insights about the creative retrieval practices of media professionals at the International Association for Media and Communication Research conference in Leicester this July.
Media professionals such as news editors, image researchers, and documentary filmmakers increasingly rely on online access to digital content within audiovisual archives to create stories (Huurnink, Hollink, and De Rijke). Seeking and finding audiovisual sources therefore requires an in-depth knowledge of how to find sources digitally. This paper presents qualitative research insights into how media professionals search and use digital archives to create (trans)medial narratives. In these storytelling practices, production cultures, search technologies and user ideas intertwine. The paper proposes to unravel the dynamics of story production, using the notion of creative retrieval. The term combines ideas from media studies about the effects of media convergence on media content (Erdal, “Researching Media Convergence and Crossmedia News Production – Mapping the Field”), theories about serendipitous information retrieval (Toms), and anthropological studies of creativity (Hallam and Ingold). The paper furthermore exemplifies an ongoing research project in which, to support creative retrieval by media professionals, a user-centered design approach guides the development of new search technologies: open source self-learning search algorithms.
This paper specifically highlights the role of user-technology interactions within the media production process. Research outcomes are theoretically and methodologically based on the recognition that a focus on media users is key to understand how media technologies gain shape and meaning. This view, developed by Science and Technology Studies (Oudshoorn and Pinch; Silverstone and Haddon), also forms the basis of the research’s qualitative user-centered design approach; media professionals are involved in co-design workshops and semi-structured interviews to better understand their search culture and to iteratively build new search algorithms that accommodate audiovisual storytelling needs.
Sabrina Sauer presented research insights into how media professionals use audiovisual archives to create audiovisual narratives during the CHIIR 2016 (Chapel Hill, North-Carolina, 13-17 March) workshop “The Serendipity Factor: Evaluation the Affordances of Digital Environments”. The goal of the workshop was to bring together ideas and methods used to understand and facilitate serendipitous search within digital environments. For media professionals, part of creative storytelling depends on serendipitous findings within archives. For more information about the workshop and this topic, please view the workshop’s website
Ilya Markov was invited to give a tutorial on user behavior in web search at the AINL-ISMW FRUCT 2015 conference for young scientists, held in St. Petersburg, Russia, on November 9-14. The tutorial discussed how user clicks in web search can be analyzed, interpreted and modeled and how the resulting click models help to improve web search. The information and materials on click models can be found at the dedicated web site.
“Learning to Explain Entity Relationships in Knowledge Graphs”, an ACL 2015 paper authored by Nikos Voskarides, Edgar Meij, Manos Tsagkias, Maarten de Rijke and Wouter Weerkamp, will be presented at the Dutch-Belgian Information Retrieval (DIR) workshop on November 27th 2015.
We study the problem of explaining relationships between pairs of knowledge graph entities with human-readable descriptions. Our method extracts and enriches sentences that refer to an entity pair from a corpus and ranks the sentences according to how well they describe the relationship between the entities. We model this task as a learning to rank problem for sentences and employ a rich set of features. When evaluated on a large set of manually annotated sentences, we find that our method significantly improves over state-of-the-art baseline models.
The MediaNow project has officially kicked-off. Today, the team and partners met up to discuss the project’s plans and shared ideas about MediaNow. Because how do media-professionals, retrieval specialists, R&D professionals and technical experts see the future of creative retrieval, and self-learning search algorithms?
After a round of introductions and presentations by the research team, the group got to work and collectively discussed the current needs of media-professionals when it comes to audio-visual retrieval. Ideas for immediate feature requests surfaced as did more meta and multi-perspectival “the sky is the limit” conceptualizations of what an ideal retrieval experience would look like. And questions surfaced: is it possible to build an algorithm that makes creative associations and storylines between entities?
The morning ended with short presentations by the partners about who they felt “the media-professional” is, and what current and future needs and characteristics of this professional are. The kick-off has provided a lot of food for thought for the team!