Tell me Stories

Category: Best news application

Country/area: Portugal

Organisation: INESC TEC, Ci2 – Smart Cities Research Center – Polytechnic Institute of Tomar, University of Porto, University of Kyoto

Organisation size: Big

Publication date: 14/04/2019

Credit: Ricardo Campos, Arian Pasquali, Vítor Mangaravite, Alípio Jorge, Adam Jatowt

Project description:

Ever wondered if you could revisit the history of the war in Syria with a single click? Remember the details of the election of Donald Trump to President of the United States of America? Know more about climate change? Or what was said and written about the Iran nuclear deal? Tell me Stories [tellmestories.pt] is a website that automatically creates temporal summaries of a given topic. Thorugh a generated timeline, users are offered the chance to sift through a collection of news articles to discover the most relevant information and related stories.

Impact reached:

During the last decade, we have been witnessing an ever-growing number of online content posing new challenges for those who aim to understand a given event. This exponential growth of the volume of data, together with the phenomenon of media bias, fake news and filter bubbles, have contributed to the creation of new challenges in information access and transparency. For instance, following the media coverage of long-lasting events like wars, migration or economic crises can be oftentimes confusing and demanding for users and journalists. Media outlets often use temporal summary as a solution. However, manually building such timelines can be very laborious and time-consuming. One possible approach to overcome this problem is to automatically summarize a large amount of news into consistent narratives through timelines. Such tools may play an important role in a large spectrum of users looking for the most valuable and useful stories within large amounts of information. This may be the case of journalists, policymakers, students or casual readers in need of getting context about a given story or interested in checking a fact. Imagine how useful it would be to quickly obtain a timeline of news about a candidate to an important public role or background information to answer questions regarding an unexpected disaster. Tell me Stories offers users the opportunity to quickly access the story of an event over time providing a contextualized overview of it. This tool is the result of our participation at the 41st European Conference in Information Retrieval (ECIR 2019) where we won the Best Demo Presentation. A related preliminary project of Tell me Stories first appeared in 2018 (though in that version it was adapted to the Portuguese web-archive collection).

Techniques/technologies used:

Tell me Stories is a user-friendly interface that allows running queries on news sources and exploring the results in a summarized and chronologically organized manner with the help of an interactive timeline. Given a user query, the system automatically identifies relevant dates and the most important headlines to illustrate the story. To this purpose, we rely on a 4-step pipeline: (1) News Retrieval; (2) Identifying Relevant Time Intervals; (3) Computing Headline Scores and (4) Deduplication. The first step in the pipeline is to run the query against any data source of interest, and fetch matching documents. Tell me Stories is built on top of the Signal Media Dataset [https://research.signal-ai.com/newsir16/signal-dataset.html], a one-million news articles collection (mainly English, but also non-English and multi-lingual articles) which were originally collected from a variety of news sources (such as Reuters) for a period of 1 month (1–30 September 2015) and indexed on a database through ElasticSearch technology. However, our solution can be easily adapted to other scenarios including different kinds of data sources (e.g. social media posts, etc) and languages since it is mostly language-independent. This may be understood as an important contribution for anyone interested in having access to a summarized temporal view of their data. Next, we select relevant time periods, by applying a strategy that forces the system to select intervals with at least one peak of occurrence. Following, we rely on YAKE! [http://yake.inesctec.pt], a keyword extraction statistical method developed by our team [best short paper of ECIR 2018] to select the most important headlines over a huge number of documents. Finally, in an attempt to reduce the amount of duplicated content, we make use of deduplication algorithms. The source code for our temporal summarization framework, as well as examples of how to adapt for different data sources, are available online [https://github.com/LIAAD/TemporalSummarizationFramework].

What was the hardest part of this project?

Tell me stories lie on top of a complex structure that involved the development and the assembly of several tools to make it work. While manually constructing stories from different and disparate sources is possible, it turns out unfeasible and a time-consuming task in the long-run. Tell me Stories tries to fulfill this gap by offering an easy to play tool that automatically creates narratives over time. To make this happen, we had to build a search engine infrastructure, which makes available a collection of one million documents, that we had to previously index in a database. Documents are searchable by means of a typical query interface, however, unlike conventional search engines (such as Google), which are more focused on retrieving recent single web pages, we aim to offer users a comprehensible story of an event over time in a way that prevents them from having to grasp the entire web. Knowing that dozens of webpages can be related to the query event, raises, however, several concerns related to the information overload problem that we had to deal with. The difficulty here is to select the most relevant parts of the story without burdening users with too much information. To tackle this problem we begin by selecting the most relevant time periods of the story and devised a keyword extraction algorithm tuned to select the most important headlines. Users are then offered not only a timeline to navigate in-between the different time-periods, but also the most relevant news regarding that particular time-frame. Our tool may be of the utmost importance for journalists seeking high relevant data to write an article or for preparing an interview. In the era where Artificial Intelligence puts so many questions, helping journalists through this kind of tool may be the answer.

What can others learn from this project?

In this project, users are offered the chance to submit a query on Tell me Stories, either by selecting one of the pre-defined topics that we show on the first page or by issuing their query (naturally subject to the temporal window defined by the dataset, in this case, September 2015). Once a query is issued, the user is shown a timeline summary about the topic. This interactive visualization enables the user to navigate back and forth through time-periods supporting the understanding of long-lasting events like wars, international or financial crises. For each selected timeframe, users are offered the top-20 most relevant titles from that time period. In addition to this, users are also offered a word cloud that summarizes the most relevant keywords appearing in the set of documents. For instance, issuing the query “iran nuclear deal”, a hot topic nowadays, shows some of the efforts done by President Barack Obama to set a deal at that time. However, moving to the advanced search feature (which is visible once a query is issued) offers users the chance to play with other datasets. In this project, we provide (as an example) access to the Portuguese web-archive collection (which offers access to millions of documents over more than 10 years long). Issuing the query “acordo nuclear irão” will give us a more comprehensible story from 2010 onwards. A user interested in getting to know more information about the issued topic can then click on the corresponding headline to access the preserved webpage (which no longer exists in the conventional web). Can you imagine how great would it be for a journalist to query his/her own dataset and get some new insights that he/she was not aware of? Or just remember some forgotten details about a given topic?

Project links: