What did people all over the world discuss when they talk about Covid-19? When the epidemic rage around the world, we analyzed over 5 millions tweets to find out what people in different countries have experienced and cared about in this epidemic. Although the national conditions were different, we have found some interesting points in common. For example, people felt that the government’s epidemic prevention needs to be strengthened, also while the early stage people also ignored the epidemic.
We collaborated with academic institutions to get more than 5 millions tweets about covid-19 from the end of January to the end of March 2020. This was the first report in the world to extensively analyze what people in different countries have experienced and cared about COVID-19. We found that people did neglect the epidemic in the early days. And even with different cultural backgrounds, criticism to the government and racial discrimination had become mainstream topics regardless of nationality.
We have a Python script to fetch the Twitter data by calling the Twitter API. And make the segments by Jieba, a Python library for CJK segmentation. After that, we have the term frequency analytics by TF/IDF to identify the keywords of each article. We also used R to analyze the data. We were trying to group the keywords manually after extracting the keywords. And we can analytics the most popular topics in all of the groups.
What was the hardest part of this project?
One of the difficulties was to analyze more than 5 million tweets to find topics that could be reported. In addition, the language and culture in each country were different. It was necessary to understand the local culture and the development of the epidemic to fully understand the meaning of these popular tweets, and to distinguish which tweets were ironic rather than literal.
What can others learn from this project?
We could use the data on the social network to better understand the real concerns of people and the hot topics they discussed.