Nature’s co-citation network

Category: Open data

Country/area: United Kingdom

Organisation: Nature

Organisation size: Big

Publication date: 11 Jun 2019

Credit: Kelly Krause, Wesley Fernandes, Noah Baker, Alice Grishchenko, Mauro Martino, Albert-László Barabási, Alexander Gates, Qing Ke, Onur Varol

Project description:

As part of its 150th anniversary, Nature, the world’s leading science journal, collaborated with network scientists led by Albert Laszlo Barabasi at Northeastern University in Boston. The scientists carried out a novel analysis using data on tens of millions of scientific articles. The resulting network created a powerful representation of the history of science and revealed how disciplines have arisen and become more connected over time. Nature published this analysis as a free package of content: our 150th anniversary issue cover, a video, opinion article (explaining the analysis in depth) and a data interactive, which inspired media stories worldwide.

Impact reached:

Nature press released the package as part of its 150th anniversary.  It was covered in 25 news stories around the world, including in Times Higher Education and La Republica, and another 200 stories in China including Xinhua and the People’s Daily in China (a list of key outlets is below).

Key news stories

The package of stories also received excellent traffic (an average news story for Nature might get 10-15,000 page views, for comparison).

Video: A network of science: 150 years of Nature papers 89,000 unique page views

Interactive: On the shoulders of giants 43,000 UPVs

Comment: Nature‘s reach: narrow work has broad impact 13,500 UPVs

We received excellent feedback and engagement on social media, including these below:





Techniques/technologies used:

This visualization project had five main components, each necessitating a different set of tools and techniques.  Ultimately, we used 4 visualization tools combined with 6 custom pieces of software.

1)    Data processing: we wrote custom python code to combine and process publication data from two different data sources (Web of Science and Nature’s database) and form the labeled co-citation network.

2)    Network Layout: we used the open source network software, Gephi, to generate a 2D network layout based on a manually controlled annealing of the force-directed layout algorithm.  We then wrote custom python code to perform edge-bundling in 3D.

3)    3D Interactive Website: we wrote custom JavaScript code based on the three.js package to draw network nodes and edges as a large interactive particle system.  We also created custom device orientation camera controls (VR mode) to allow navigation via touch screen and device sensors simultaneously by using different positional and rotational information from two mock cameras to control the view.

4)    3D Rendered Static Images: we wrote custom python code to generate the initial 3D geometry and then modified the geometry procedurally in Maya to produce the final renderings. To expedite the rendering process, we employed cloud-based rendering through Zync and the Google Cloud.

5)    3D Animated Video: we wrote custom Java code with OpenGL library to render the 3D network, temporal animation, and provide detailed camera controls.

While each of these 5 components required their own unique toolsets, the creative process at each stage greatly benefited from all components.  For example, the 3D edge bundling required setting several parameters which were selected based on the visual appeal of the 3D interactive website.  Similarly, the 3D interactive website allowed us to create the storyboard for the Animated Video and select interesting camera locations for the Rendered Static Images.

What was the hardest part of this project?

In answering the question “What does 150 years of scientific publications look like?”, this project encountered two primary challenges.  First, the data covered 150 years of technical innovation and research that represented so many more years of scientist’s lives spent on the work, so we wanted to present it without reducing the magnitude of the achievements or the record.  This meant our visualization needed to both capture the scale and complexity of Nature’s publication landscape, while also providing a poetic visual storytelling capable of communicating simplified messages for the viewer.  Given Nature’s prominent role as a scientific publication venue, this task was further complicated by the need to maintain the scientific validity of our project.  Ultimately, the design emerged from many iterations of prototyping and input from a highly interdisciplinary team of designers, scientists, and editors.  Second, the magnitude of the data presented its own computational challenges: the weighted network was created by processing citation data from over 20 million publications, and we built custom visualization solutions to efficiently draw the entire network of 88,000 nodes and 239,000 edges as an interactive object.  We also took extra steps to ensure compatibility across multiple devices and browsers, as well as implementing additional accessibility features.  The computational challenges for this visualization process was pushed to the extreme when creating the high-resolution video with all the effects of lights and materials on such a large number of components, and required highly parallelized code running on multiple remote servers with multiple GPU. 

What can others learn from this project?

The project was exemplary in showing how collaboration between academic data scientists and a team of journalists, editors, video editors and art editors can generate compelling content and images, as well as creating a resource which scientists and the public can use from now on to learn about science and its history.

The freely available interactive allows anyone to choose any paper from Nature’s archive – including seminal papers such as Watson and Crick’s discovery of the structure of DNA – and view its colourful ‘reference tree’: all the papers that it referenced (a representation of the academic work on which it was built) and all the academic papers that went on to cite it (a representation of all the academic work that paper inspired).  This reveals in a compelling and original way how every discovery is built ‘on the shoulders of giants’.

As one researcher wrote to us on Twitter: “This video is beautiful and gives one of the best introductions to how academic science works that I have seen in some time – “the shoulder’s of giants” story revealed as an interconnected maelstrom of ideas – thank you”  @nature for telling this story.

Project links: