A full paper on Generating descriptions of entity relationships by Nikos Voskarides, Edgar Meij and Maarten de Rijke was accepted for publication at the 39th European Conference on Information Retrieval. The paper will be presented in April at Aberdeen, Scotland.
Abstract: Large-scale knowledge graphs (KGs) store relationships between entities that are increasingly being used to improve the user experience in search applications. The structured nature of the data in KGs is typically not suitable to show to an end user and applications that utilize KGs therefore benefit from human-readable textual descriptions of KG relationships. We present a method that automatically generates textual descriptions of entity relationships by combining textual and KG information. Our method creates sentence templates for a particular relationship and then generates a textual description of a relationship instance by selecting the best template and filling it with appropriate entities. Experimental results show that a supervised variation of our method outperforms other variations as it best captures the semantic similarity between a relationship instance and a template, whilst providing more contextual information.
Sabrina Sauer will present MediaNow at the ICT.Open conference on the 21st-22nd of March 2017. The ICT.OPEN event is organised annually by the Netherlands Organisation for Scientific Research (NWO) under the auspices of ICT research Platform Netherlands (IPN). The title of the presentation is MediaNow – using a living lab method to understand media professionals’ exploratory search.
On April 20-21, Sabrina Sauer will present a full paper at the conference Researching Media Companies Producing Audiovisual Content, hosted by Lillehammer University College, Department of Film and Television Studies. This event is sponsored by the International Association for Media and Communication Research’s Media Production Analysis working group.
Television broadcasters increasingly rely on digitized audiovisual material for reuse in the production of new audiovisual content. Access to, and an expert understanding of how to quickly find existing material, for instance in online archives, has changed working practices of professionals creating online as well as broadcast television content. This paper focuses on how Dutch private and public media companies that produce cross-media content search for and reuse digitized archival material, and draws conclusions about how digital search technologies influence work routines and creative production processes. What do professionals perceive as affordances and constraints in their search process, and how does navigating these affordances and constraints shape their work practice?
To answer these questions, the paper presents qualitative research insights collected during focus group sessions and 20 semi-structured interviews with professionals who work for public and private companies in news, entertainment and documentary television production. The collected data focuses on work routines, specifically search practices, interactions with (online) archives and how these routines shape audiovisual content.
The analysis particularly reflects on professionals’ descriptions of how affordances and constraints such as time pressures, genre conventions, audience profiles, and technological and budgetary pressures shape the creative search and production process. Apart from clarifying how digital search technologies shape work routines and audiovisual content, the analysis suggests ways to further grasp the complex relationship between work routines, technological innovation and creativity.
The paper concludes with an overview of professionals’ search strategies to create audiovisual content, and how professionals see future developments in this area. Their future visions form the starting point – in an overarching research project – for the development of new search algorithms for a large Dutch audiovisual archive. This prompts a methodological discussion: how can academic analyses of creative media production strategies help shape new search technologies.
Sabrina Sauer gave a lunch talk at UvA’s ILPS group on the 7th of October about how innovators can channel serendipity and unforeseen user ideas into ICT innovations using the living lab approach.
Living labs are public-private-civic partnerships that facilitate user-centered ICT development in daily life environments. In living labs, prospective technology users are invited to join R&D processes. As experts of their daily life settings, they are believed to bring new, serendipitous and unforeseen ideas to the table. Yet user inclusion in innovation is regarded with some ambivalence exactly because of the uncertain outcomes. Based on research into user involvement in living labs, this talk offers six suggestions on how to successfully embrace this uncertainty and discusses opportunities of living labs and serendipity in IR.
A short paper by Sabrina Sauer and Maarten de Rijke on the role of serendipity in media professionals’ search practices was presented at ACM International Conference on Research and Development in Information Retrieval (SIGIR 2016) in Pisa, Italy.
This paper presents a method to map user needs and integrate serendipitous search behaviors in search algorithm development: the living lab approach. This user-centered design approach involves technology users during technology development to catch unexpected insights and successfully innovate. This paper focuses on the preliminary findings of a living lab case study to answer the question how this methodology reveals fine-grained information about users’ serendipitous search behaviors. The case study involves a specific user group, media professionals who work in broadcast television and use audiovisual archives to create audiovisual content, during the development of new search algorithms for a large audiovisual archive. Research insights are based on data gathered during one co-design workshop, and ten in-depth semi-structured interviews with media professionals.
Findings stipulate that these users balance socio-technical constraints and affordances during creative retrieval to (1) find exactly what is sought; and (2) increase the possibility of serendipitous, unforeseen search results. We conclude that modeling these search processes in terms of improvising with constraints and affordances enables an effective articulation and channeling of user-technology interaction insights into new technology development. The paper suggests next steps in the living lab approach to further understand serendipitous search and creative retrieval processes.
Ilya Markov, Aleksandr Chuklin, Maarten de Rijke and Alexey Borisov will give a course on Click Models for Web Search at the Russian Summer School in Information Retrieval (RuSSIR 2016). This course is an extended version of the tutorial given at SIGIR 2015, AINL-ISMW 2015 and WSDM 2016 and is based on the book on Click Models for Web Search.
Click models, probabilistic models of the interaction behavior of search engine users, have been studied extensively by the information retrieval community in recent years. We now have a handful of click models, parameter estimation methods, evaluation principles and applications of click models, that form the building blocks of ongoing research efforts in the area. The time is right to present this material to a broad audience of information retrieval researchers and practitioners.
The course covers a wide range of topics from basic to advanced click models and from click model estimation and evaluation techniques to applications of click models. Most topics are augmented with live demos, where the participants can try the presented material in practice. Also, the course features two practical sessions, where the participants have a chance to implement a basic and an advanced click models using open-source tools and publicly available datasets with click logs. The participants of the course are provided with the authors’ version of the book on click models and the code and data samples for following the demo and practical sessions.
The material of the course is organized as follows. We start with the definition of a click model and an overview of click model applications. Then we give a unified view on basic click models for web search using a common notation and theoretical background. We discuss the main estimation methods for learning click model parameters, present a set of evaluation techniques for measuring the quality of click models, discuss available datasets and tools for working with click models and present a comprehensive experimental comparison of basic click models for web search.
We then focus on the main applications of click models, such as ranking, user simulation, model-based metrics, etc. We also describe the landscape of advanced click models, dealing with complex SERPs, diverse users, non-linear examination patterns, etc. Finally, we present the current trends in click modeling research, namely using neural networks to model clicks and other types of user interactions. We conclude the course by discussing future research directions in the area of click models.
Ilya Markov will give a talk on Removing bias from user interaction data at the Search Engines Amsterdam meetup (SEA). The meetup will also feature the talks by Diane Kelly (University of North Carolina at Chapel Hill) and Dolf Trieschnigg (MyDataFactory) and will host around 50 participants from academia and industry.
User interaction with search engines is affected by various biases. For example, users tend to click on top search results (position bias), they can be attracted by visually salient content such as images (attention bias), etc. At the same time, user interaction data contains invaluable information about users, their interests and preferences in search and, thus, is heavily used by search engines to improve their quality. However, to reliably use this interaction data and to uncover actual user preferences, various biases must be removed first. This talk discusses the problem of bias in user interaction data and approaches to removing this bias. After a general discussion, the talk focuses on a particular type of user interactions, namely time between user actions in search (e.g., time between clicks, time to first click, time between queries, etc). We show that such times are context-biased, i.e., they are affected by the context in which they are observed (e.g., ranks of clicked documents, user search history, etc). To remove the context bias, we model the time between user actions as a probability distribution. The parameters of this distribution are composed of two components: context-dependent and context-independent. After learning these components using neural networks, we show that the context-aware model approximates the time between user actions significantly better than models that do not consider context. Moreover, by splitting the model into context-dependent and context-independent parts we remove the context bias from the latter. As a result, we show that the context-independent component can be used to improve the quality of search.
A full paper on A Context-aware Time Model for Web Search by Alexey Borisov, Ilya Markov, Maarten de Rijke, and Pavel Serdyukov will be presented at the ACM International Conference on Research and Development in Information Retrieval (SIGIR 2016) in Pisa, Italy. The paper models the time between user actions in web search. It reveals that such times are affected by the context in which they are observed. The paper uses neural networks to automatically detect and remove various source of context bias.
In web search, information about times between user actions has
been shown to be a good indicator of users’ satisfaction with the
search results. Existing work uses the mean values of the observed
times, or fits probability distributions to the observed times. This
implies a context-independence assumption that the time elapsed
between a pair of user actions does not depend on the context, in
which the first action takes place. We validate this assumption using logs of a commercial web search engine and discover that it
does not always hold. For between 37% to 80% of query-result
pairs, depending on the number of observations, the distributions of
click dwell times have statistically significant differences in query
sessions for which a given result (i) is the first item to be clicked
and (ii) is not the first. To account for this context bias effect,
we propose a context-aware time model (CATM). The CATM allows us (i) to predict times between user actions in contexts, in
which these actions were not observed, and (ii) to compute context-independent estimates of the times by predicting them in predefined
contexts. Our experimental results show that the CATM provides
better means than existing methods to predict and interpret times
between user actions.
Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke presented the tutorial on Click Models for Web Search and their Applications to IR at the ACM International Conference on Web Search and Data Mining (WSDM 2016) in San Francisco, CA, USA. The tutorial discusses the methods for modeling user clicks in web search and their applications in the area of information retrieval. The tutorial is based on the book Click Models for Web Search by the same authors.
A full paper on A neural click model for web search by Alexey Borisov, Ilya Markov, Maarten de Rijke, and Pavel Serdyukov was presented at the World Wide Web Conference (WWW 2016) in Montreal, Canada, and was published in the conference proceedings. The paper proposes a new user modeling paradigm, which uses neural networks to learn patterns of user click behavior automatically (instead of designing them manually as done in all previous studies).
Understanding user browsing behavior in web search is key to improving web search effectiveness. Many click models have been proposed to explain or predict user clicks on search engine results. They are based on the probabilistic graphical model (PGM) framework, in which user behavior is represented as a sequence of observable and hidden events. The PGM framework provides a mathematically solid way to reason about one set of events given some information about other events. But the structure of the dependencies between the events has to be set manually. Different click models use different hand-crafted sets of dependencies.
We propose an alternative based on the idea of distributed representations: to represent the user’s information need and the information available to the user with a vector state. The components of the vector state are learned to represent concepts that are useful for modeling user behavior. And user behavior is modeled as a sequence of vector states associated with one query session: the vector state is initialized with a query, and then iteratively updated based on information about interactions with the search engine results. This approach allows us to directly understand user browsing behavior from click-through data, i.e., without the need for a predefined set of rules as is customary for PGM-based click models.
We illustrate our approach using a set of neural click models. Our experimental results show that the neural click model that uses the same training data as traditional PGM-based click models, has better performance on the click prediction task (i.e., predicting user click on search engine results) and the relevance prediction task (i.e., ranking documents by their relevance to a query). An analysis of the best performing neural click model shows that it learns similar concepts to those used in traditional click models; and that it also learns other concepts that cannot be designed manually.