The project is an open data website comprised of a daily Covid-19 dashboard, demographic case information, daily reports from provincial health departments, timeline and first-person narratives. The website not only demonstrates the covid-19 landscape in China but also provide Excels downloaded. The dashboard was updated in real-time before March then updated daily. China’s local health departments also publish the epidemiology features of individual cases. We broke down and structured the data into location, gender, age, symptoms, date of in hospitals, date of discharge, contact history, from their reports everyday province to province. There are more than 10 thousands cases released
The dashboard itself as a tracker has driven millions of traffic. Since we make the data public, we have received requests from universities, newsrooms and many other research organizations. They are specifically interested in epidemiology demographic information and reuse it for analysis based in the structured data we provide.
We collected and structured the data manually. By developing a database management system and front page, once inputting the data, the front pages will update automatically.
What was the hardest part of this project?
The hardest part is to collect and sort out as much information as possible about the patient cases, including their age, gender, activity trajectory, symptoms, past medical history, etc.
Since there was no unified standard for information release in the early stage of the epidemic, the degree of disclosure of non-private information about cases was different, meaning that browsing multiple web pages over and over again was needed to collect information; the formats of the public information varied also so that it was difficult to organize information with a universal set of crawlers and code.
In order to solve the above problems, we shared out the work and cooperated with each other. Each of us was responsible for several provinces and spent nearly six hours a day collecting, sorting, labelling and proofreading these case information. So far, we have kept the data free and public.
What can others learn from this project?
Data recycle and reuse.