The Covid-19 pandemic tossed 2020 into disarray. But while cities and countries around the globe were shutting down, scientists and researchers from nearly every country embarked on an unprecedented effort to study, understand, and contain a virus that no one had ever seen before. By exploring the 90,000+ Covid-related research articles that emerged in 2020, this project celebrated the achievements and effort of the global science and research community, while also providing the public a behind-the-scenes glimpse of the scientific process more generally, and an appreciation for the resources that it requires.
The goal of the project was two-fold: 1) to call attention to the scale and enormity of the collaborative, worldwide research effort to combat the Covid-19 pandemic; and 2) , to emphasize to the public the normal workings of the scientific process and, impart an appreciation for the work that is required and the inevitable uncertainty that ensues along the way. Unlike a project advocating for a particular policy change, for example, it is hard to know whether a project like this has achieved its intended impact or not, particularly with regard to the second goal of promoting greater public awareness of the scientific process. However, the extent to which this story was picked up and highlighted by science, research, and journalism outlets perhaps offers some indication of its wider impact. Throughout 2020, this project was featured by, among others, The World Federation of Science Journalists, The University of Wisconsin’s Scout Report on STEM and humanities, DataJournalism.com’s year-end “12 brilliant data journalism projects of 2021”, and was selected for presentation during the Science Communication panel at the 2021 Information+ Conference.
This primary dataset of Covid-related research articles was obtained from PubMed, the online repository for biomedical and life sciences research articles maintained by the National Institute of Health in the US. Using PubMed’s public-facing API, I used Python and Jupyter Notebooks to download metadata on 90,000+ Covid-related research articles that came out in 2021. The libpostal library, in combination with the Google Geolocation API, was used to parse the author information from each article and geolocate to a particular city. In order to map the citation network across key research articles, additional PubMed APIs were used to scrape 2-way citation information (articles cited in current article; subsequent articles citing current article) for each article.
The website was built using React. The visualizations within the site were built using combinations of Three.js, D3.js, P5.js, Deck.gl, and Greensock Animation library.
What was the hardest part of this project?
The biggest challenge in this project was in figuring out how to take this massive dataset of over 90,000 research articles and present it in a way that would be quickly digestible without abstracting away the individual contributions of all of the participating researchers and scientists. Focusing on the collaborations among researchers emerged as a way to do that. Moreover, featuring these collaborations on a map of the world, and allowing viewers to watch as these collaborations unfold over the course of the year, offered a chance to emphasize the truly global nature of the research community’s response to the pandemic (particularly at a time when most of the rest of the news was focusing on how countries, cities, and businesses were all shutting down).
However, getting this data presented the biggest technical challenge of the project. Each of the 90,000+ articles included metadata on the author names and affiliations (e.g. a particular department at a particular university). The names and affiliations were returned as unstructured strings, which meant that many hours were spent fine-tuning algorithms to attempt to parse the relevant information from the string. The parsed string was then ultimately used to geolocate each author to a specific city around the world. In the end, this analysis allowed for a novel and compelling look at the global network of scientific collaborators working collectively on the Covid-19 pandemic.
What can others learn from this project?
It is my hope that other journalists, particularly science reporters, will see this project and be inspired to include greater emphasis on the scientific process, rather than focusing exclusively on the results, in their own reporting. By definition, science deals with the unknown, and researchers are accustomed to dealing with uncertainty. In contrast, the pandemic has underscored just how poorly the public deals with that same uncertainty, particularly around evolving public safety guidelines as new information about the virus is learned. Part of that, I suspect, has to do with how science is communicated, often presenting results as complete and definitive as opposed to placing the results in the context of “given what we know now…”. Recent polls from Pew Research Center show that individuals (Americans, at least) are more likely to trust scientific results when they perceive the process as open and transparent (“Trust and Mistrust in Americans’ Views of Scientific Experts.” Pew Research Center, Jan 2019). To improve trust, therefore, we can do a better job of inviting the public into the scientific process through our reporting.