A short paper on “Calibration: A Simple Way to Improve Click Models” by Alexey Borisov, Julia Kiseleva, Ilya Markov, and Maarten de Rijke will be presented at the ACM International Conference on Information and Knowledge Management (CIKM 2018) in Turin, Italy, and will be published in the conference proceedings.
Click models are important and widely used tools for interpreting user behavior in Web search. As for many machine learning algorithms, their prediction performance strongly depends on the hyperparameters used for training. We show that click models trained with suboptimal hyperparameters are not well calibrated. This means that their predicted click probabilities do not agree with the observed proportions of clicks in the held-out data. We adapt a non-parametric calibration method called isotonic regression to repair the discrepancy between the click probabilities predicted by a model and the proportion of clicks in the held-out data. We show that isotonic regression significantly improves click models trained with suboptimal hyperparameters in terms of perplexity, and that calibrated click models are less sensitive to the choice of hyperparameters than the original (non-calibrated) ones. Interestingly, the relative ranking of existing click models in terms of their predictive performance changes depending on whether or not their predictions are calibrated. We therefore advocate that calibration becomes a mandatory part of the click model evaluation protocol.
A full paper on Constructing an interaction behavior model for web image search by Xiaohui Xie, Jiaxin Mao, Maarten de Rijke, Ruizhe Zhang, Min Zhang, and Shaoping Ma was presented at the International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR 2018) in Ann Arbor, MI, USA, and was published in the conference proceedings.
User interaction behavior is a valuable source of implicit relevance feedback. In Web image search a different type of search result presentation is used than in general Web search, which leads to different interaction mechanisms and user behavior. For example, image search results are self-contained, so that users do not need to click the results to view the landing page as in general Web search, which generates sparse click data. Also, two-dimensional result placement instead of a linear result list makes browsing behaviors more complex. Thus, it is hard to apply standard user behavior models (e.g., click models) developed for general Web search to Web image search.
In this paper, we conduct a comprehensive image search user behavior analysis using data from a lab-based user study as well as data from a commercial search log. We then propose a novel interaction behavior model, called grid-based user browsing model (GUBM), whose design is motivated by observations from our data analysis. GUBM can both capture users’ interaction behavior, including cursor hovering, and alleviate position bias. The advantages of GUBM are two-fold: (1) It is based on an unsupervised learning method and does not need manually annotated data for training. (2) It is based on user interaction features on search engine result pages (SERPs) and is easily transferable to other scenarios that have a grid-based interface such as video search engines. We conduct extensive experiments to test the performance of our model using a large-scale commercial image search log. Experimental results show that in terms of behavior prediction (perplexity), and topical relevance and image quality (normalized discounted cumulative gain (NDCG)), GUBM outperforms state-of-the-art baseline models as well as the original ranking. We make the implementation of GUBM and related datasets publicly available for future studies.
A full paper on A Click Sequence Model for Web Search by Alexey Borisov, Martijn Wardenaar, Ilya Markov, and Maarten de Rijke was presented at the International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR 2018) in Ann Arbor, MI, USA, and was published in the conference proceedings.
Getting a better understanding of user behavior is important for advancing information retrieval systems. Existing work focuses on modeling and predicting single interaction events, such as clicks. In this paper, we for the first time focus on modeling and predicting sequences of interaction events. And in particular, sequences of clicks.
We formulate the problem of click sequence prediction and propose a click sequence model (CSM) that aims to predict the order in which a user will interact with search engine results. CSM is based on a neural network that follows the encoder-decoder architecture. The encoder computes contextual embeddings of the results. The decoder predicts the sequence of positions of the clicked results. It uses an attention mechanism to extract necessary information about the results at each timestep. We optimize the parameters of CSM by maximizing the likelihood of observed click sequences.
We test the effectiveness of CSM on three new tasks: (i) predicting click sequences, (ii) predicting the number of clicks, and (iii) predicting whether or not a user will interact with the results in the order these results are presented on a search engine result page (SERP). Also, we show that CSM achieves state-of-the-art results on a standard click prediction task, where the goal is to predict an unordered set of results a user will click on.
A short paper on Online Expectation-Maximization for Click Models by Ilya Markov, Alexey Borisov, and Maarten de Rijke was presented at the ACM Conference on Information and Knowledge Management (CIKM 2017) in Singapore and was published in the conference proceedings.
Click models allow us to interpret user click behavior in search interactions and to remove various types of bias from user clicks. Existing studies on click models consider a static scenario where user click behavior does not change over time. We show empirically that click models deteriorate over time if retraining is avoided. We then adapt online expectation-maximization (EM) techniques to efficiently incorporate new click/skip observations into a trained click model. Our instantiation of Online EM for click models is orders of magnitude more efficient than retraining the model from scratch using standard EM, while loosing little in quality. To deal with outdated click information, we propose a variant of online EM called EM with Forgetting, which surpasses the performance of complete retraining while being as efficient as Online EM.
A short paper on Evaluating and Analyzing Click Simulation in Web Search by Stepan Malkevich, Ilya Markov, Elena Michailova, and Maarten de Rijke was presented at the ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR 2017) in Amsterdam and was published in the conference proceedings.
We evaluate and analyze the quality of click models with respect to their ability to simulate users’ click behavior. To this end, we propose distribution-based metrics for measuring the quality of click simulation in addition to metrics that directly compare simulated and real clicks. We perform a comparison of widely-used click models in terms of the quality of click simulation and analyze this quality for queries with different frequencies. We find that click models fail to accurately simulate user clicks, especially when simulating sessions with no clicks and sessions with a click on the first position. We also find that click models with higher click prediction performance simulate clicks better than other models.
Sabrina Sauer’s article titled ‘Audiovisual Narrative Creation and Creative Retrieval: How Searching for a Story Shapes the Story’ is published in the Journal of Science and Technology of the Arts.
Media professionals – such as news editors, image researchers, and documentary filmmakers – increasingly rely on online access to digital content within audiovisual archives to create narratives. Retrieving audiovisual sources therefore requires an in-depth knowledge of how to find sources digitally. These storytelling practices intertwine search technologies with the user’s ideas and production cultures. This paper presents qualitative research insights into how media professionals search in digital archives to create (trans)medial narratives, and uses the notion of creative retrieval to unravel the dynamics of audiovisual narrative production. Creative retrieval combines ideas about the effects of media convergence on media content, theories about serendipitous information retrieval, and studies of creativity to argue that retrieval practices of media professionals who create audiovisual narratives are governed by organizational, technological and content affordances and constraints. The paper furthermore exemplifies the first stage of an ongoing research project in which a user-centered design approach guides open source self-learning search algorithm development to support creative retrieval.
Please read the article here: https://dx.doi.org/10.7559/citarj.v9i2.241
Sabrina Sauer will be one of five researchers presenting work on serendipity at the Serendipity Society Symposium, hosted at the World Humanities Conference in Liège on the 9th of August 2017.
Symposium description: Serendipity in research is associated with the now classic experience of looking for a book on a library shelf, and finding another, even more valuable, book. However, our world is increasingly technical – traditional approaches to humanities research must integrate with the tools of scientific methodology and information technology. Nonetheless, there is a continuing need for insight and creativity, even in a world of controlled experiments and algorithms, and people continue to make unexpected and unpredictable discoveries. How does serendipity, a concept born in the humanities, clarify and contextualize the current push toward ‘innovation’? How do humanities researchers today experience serendipity? What is the nature of the ‘unsought finding’? How can we ensure future opportunities for serendipity? What technologies enable valuable, yet unpredictable, discoveries? This Symposium presses upon the boundaries between disciplines to illustrate how research in all fields will—and should—continue to be a human experience above all.
Sabrina’s paper titled “Serendipitous search practices of media researchers: Developing techniques to elicit ‘the unforeseen’” deals with the relation between serendipity, creativity and search. It presents insights into the role of serendipitous search for media researchers’ unearthing of research ideas and insights. Media researchers increasingly rely on digital access to audiovisual archived material to collect data. This stipulates that to retrieve relevant material, researchers require an in-depth understanding of digital search. Using qualitative (focus group and interview) data, this paper draws conclusions about how media researchers experience and elicit serendipitous search, as a form of craft. These conclusions aid the development of new search algorithms that embrace serendipitous search as a source of innovation.
Read more about the Serendipity Society here!
Sabrina Sauer co-presented a paper with Berber Hagedoorn at the DHBenelux conference hosted by Utrecht University on the 4th of July. The paper, titled Getting the Bigger Picture: An Evaluation of Media Exploratory Search and Narrative Creation, is partially informed by insights gained during MediaNow’s second user panel meeting ( 24th of May 2017).
Abstract: Digital Humanities centres on questions that are raised by and answered with digital tools in the Humanities. At the same time, it interrogates the value and limitations of digital methods in Humanities’ disciplines. While it is important to understand how digital technologies can offer new venues for Humanities research, it is equally essential to understand – and therefore, being able to interpret – ‘the user side’ of Digital Humanities. Specifically, how Humanities researchers appropriate and domesticate search tools to ask and answer new questions, and apply digital methods. Previous user research in Digital Humanities concentrates on assessing, for example, how and why Digital Humanities benefits from studies into user needs and behaviour (Warwick, 2012), user requirement research, as well as participatory design research (Kemman & Kleppe, 2014).
Exploratory search is crucial for Humanities researchers who draw upon media materials in their research. Audio-visual, online and digital sources are in abundance, scattered across different platforms, and changing daily in our contemporary landscape. Supporting researchers’ explorations becomes even more important when scholars study media events. A ‘media event’ is an event with a specific narrative that gives the event its meaning, and is in contemporary societies increasingly recognized as non-planned or disruptive. Disruptive media events, such as the ‘sudden’ rise of populist politicians, terrorist attacks or environmental disasters, are shocking and unexpected, making them difficult to interpret. This leads to problems for media researchers who analyse how narratives construct different political, economic or cultural meanings around such events. Previous research argues that media events should always be viewed in relation to their wider political and sociocultural contexts. Events, as they unfold in the media, may correspond to long-term social phenomena, and the way in which such events are ‘constructed’ has particular connotations (Jiménez-Martínez, 2016). Specific actors (newscasters, governments, institutions) use media events to build narratives in line with their own political, economic or cultural purposes. Media researchers also build narratives around events; prior research underlines the importance of visualizing, constructing and storing of narratives during the information navigation to contextualize material (Akker et al., 2011; Kruijt, 2016; De Leeuw, 2012). Offering media researchers the ability to explore and create lucid narratives about media events therefore greatly supports their interpretative work.
This paper proposes to add to this body of research by presenting the insights of a cross-disciplinary user study that involves, broadly speaking, researchers studying audio-visual materials, in a co- creative design process, set to fine-tune and further develop a digital tool that supports Humanities’ research through exploratory search. This paper focuses on how researchers – in both academic as well as professional settings – use digital search technologies in their daily work practices to discover and explore digital audio-visual archival material. We focus specifically on three user groups, namely (1) Media Studies researchers, (2) Humanities researchers that use audio-visual materials as a source and (3) Media professionals. These user groups are the foreseen end users of the tool, because they create audiovisual narratives for their respective work purposes. We set-up co-creative design sessions with 74 participants (group 1: 24; group 2: 40; group 3: 10) to observe and reflect on the practices of media researchers in terms of how they interact with search tools to explore, access and retrieve digitized audio-visual material, in order to interpret, and in some cases, re-use this material in new audio-visual productions.
Methodology: In our user study, we employ a user-centred design methodology to evaluate and fine-tune the exploratory search tool DIVE+ media browser. It offers events-driven exploration of digital heritage material, where events are prominent building blocks in the creation of narrative backbones (De Boer et al., 2015) and links a variety of different media sources and collections. DIVE+ offers intuitive exploration of media events at different levels of detail. It connects media objects, subjects (“concepts”), events, and persons to aid in the formulation of research questions, and to contextualize the former into overarching narratives and timelines. Our main research question throughout the case study is how does exploratory search support media researchers in their study of how media events are constructed across different media and instilled with specific cultural or political meanings? To be able to answer this question, we study how media researchers construct navigation paths via exploratory search and – by means of user studies – evaluate the role of narratives in (1) learning and (2) research. In this process, we compare DIVE+ to other online search tools.
The user study observes media researchers as they use DIVE+ to explore media events, across 3 stages: (1) during research question formulation (2) DIVE+ use; and (3) comparative user evaluations of the DIVE+ browser, compared to other online search tools. The collected data, consisting of both qualitative – observational and focus group – data, as well as logging data gathered during user testing, provides insights about how media researchers search and explore digital audio-visual archives. We utilize a case study approach, which combines grounded theory (that fosters an understanding of how researchers interpret and create narratives) with usability methodologies, such as work task evaluations. This, first of all, allows us to draw conclusions about how search tools and digital technologies co-construct the researcher’s professional practice. Second, the data helps us probe the question how the ‘digitality’ of search and retrieval shapes the practice of media research, and, in extension of this, creative processes.
The research presented in this paper takes an interdisciplinary approach: it combines insights from Media Studies, as well as from Information Studies and Science and Technology Studies and integrates ideas about narrative creation, search practices, and overarching notions about how users and technologies co-construct meaning. Therefore the presented research does not focus on how Digital Humanities’ tools have an impact on researchers’ practices, but rather analyses how researchers make use of search tools. We subsequently (1) draw conclusions about scholarly practice and the role of search technologies for digitized audio-visual materials therein; and (2) present lessons learned on how to optimize the search tool that is used, in order to improve its performance.
MediaNow’s second user panel meeting took place at the Netherlands Institute for Sound and Vision on the 24th of May 2017. Eighteen people gathered for the meeting to discuss the status of and insights produced by the project so far. After a lively round of project update presentations by the involved researchers, members of the user panel actively participated in the morning’s MAKE, TRY and ASK sessions.
The MAKE session focused on mapping search processes of media professionals, researching specific topics for audiovisual items, projects or programs. The media professionals collaborated in small groups to draw flow charts or search scenario’s, and shared their ideas with the rest of the group afterwards.
During the TRY session, the exploratory search browser DIVE+ was tested. Guided by a number of search tasks, users tinkered with and tested the browser. Feedback was collected by means of an online questionnaire, as well as shared afterwards during the ASK session.
The researchers thank the user panel for their very active participation! Insights provided by the created search scenario’s will help validate the team’s earlier findings about the context and practices of professional audiovisual search, while the DIVE+ testing will lead to recommendations for the further development of the search browser.
On the 29th of June 2017, Sabrina Sauer will present a paper at the European Network for Cinema and Media Studies conference, NECS Paris. The paper is titled ‘Habit, craft and creativity: How digital search habits shape the craft of professional audiovisual storytelling’.
The increased digitalization of audiovisual materials allows media professionals, such as news documentation specialists and documentary filmmakers, to increasingly make use of online access to digital archives to find sources for their audiovisual stories. This paper presents empirical insights into the digital search habits of media professionals, and questions how habitual user-technology interactions configure the craft and creative practice of digital storytelling.
Conceptually, the paper frames creative practices of media professionals by focusing on the tension between perspectives on habitual work routines and craft; between routines that are afforded by socio-technical context and ideas about individual creative agency. Here, the tension between habitual work routines and creative agency is studied empirically. By analyzing how media professionals make use of digital search technologies to retrieve archived audio-visual material for re-use in new digital stories, it becomes possible to form an empirically grounded understanding of how habits relate to craft; how habitual search strategies relate to socio-technical constraints and affordances and how, together, these shape creative processes and products.
This paper is presented in the context of an overarching research project that takes a user-centered design approach to co-create new open source search algorithms together with foreseen end users. The qualitative methods that are part of this approach, such as focus groups and semi-structures interviews, allow an in-depth understanding of digital search habits of the included media professionals. Apart from drawing conclusions about the relationship between habits, craft and creativity, the paper thus also draws methodological conclusions about how user-centered design methods can channel observations about mundane, tacit and habitual user-technology interactions into new media innovations.