Using epidemiological survey data collected over the past year, we review the spread and development of the 2020 coronavirus pandemic in China through a 3D visualization.
The first chapter is a starry sky symbolizing every individual in the prevention and containment of the virus. The second is a forest illustrating the intricate complexities of human-to-human virus transmissions. Readers are free to explore the details of each patient in the virus transmission chain. The third chapter regroups the dataset into a bar chart demonstrating the variety of disease symptoms at the time of diagnosis.
The data for the 11699 surveyed cases were made public for further research and usage. Readers can apply for the data at the end of the interactive project. After one week of the publication, near one thousand readers apply for the data. They come from various industries, like universities, media, pharmaceutical companies, consulting companies, freelancers, and more. We encourage them to discover more stories, findings, and patterns of the dataset. After all, humans forget easily. Data shall keep the memory of the special year of 2020 for us.
Python: Clean and analyze the dataset. Convert the data into the format for the visualization. Merge the words for interactive displaying.
Three.js: Creating and rendering visualizations. Building up animations and transformations among the scenes, like a starry sky, forests, bar charts, and more.
Shader: Data binding for background colors and icons of the circle, and group and regroup of the 3D circles.
What was the hardest part of this project?
Collecting and cleaning data for our first version dataset was very complicated and time-consuming because the data of local health commissions are spreading out over multiple websites and platforms, and they are all written in different formats. Therefore, we need to clean the data both by hand and by python scripts.
It is also very challenging to use three.js for loading a dataset of over ten thousand rows, converting every row into a 3D bubble, transforming all the bubbles into different shapes and scenes, grouping and regrouping all the bubbles smoothly.
Calculating and visualizing the transmission chain is difficult too. The dataset for the network only contains two columns: “from” and “to.” Therefore, we need to code in three.js to find multiple lines that connect to each other.
What can others learn from this project?
Being a data journalist in China is hard because it is rare to run into a well-formatted public dataset, especially during the pandemic when public departments are swamped with virus controls. For this project, we try to overcome this obstacle by cleaning and organizing data ourselves and cooperating with university researchers. Special thanks to Liu Xiaofan, Xu Xiaoke and Wu Ye for sharing the full dataset with us. In contrary to us, university researchers have more time and resources to conduct long-term data cleaning with machine learning and Natural Language Processing. Therefore, the cooperation between media and university researchers can produce interesting news stories while guaranteeing the high quality of large-scale manual collecting data. This project is a great tryout for the cooperation like this. I think in the future, data journalists in China would reach out to the universities for meaningful story ideas and research datasets.