We analyzed 3,522 revisions to post edit on the Quora-style Chinese site Zhihu over the past decade, which showcased waves after waves of netizens discussing, arguing, and explaining topics related to feminism, including academic theories, news issues, international movements, and even anti-feminism trends.
According to our analysis, prior to 2018, most related topics were closely related to gender and women’s issues. But after the year, as feminism became more visible, newly added topics began to expand to notable events and controversial public figures, which were constantly added and deleted by netizens on either side of increasingly polarized debates.
Many see 2021 as the year when feminism finally joined the mainstream and won greater attention from the public. But for some people, the war to fight for women’s rights and reputation has started years ago. How long have they tried to advocate for the issue? And what obstacles did they meet? What kind of message were they trying to convey? It is a pity that we don’t own a domestic Wikipedia to memorize the history of all these.
Fortunately, the editing records of Zhihu can shed a light on how the long-lived war unfolded. The story has effectively helped the public understand the long history of China’s online fight over feminism and broken down key items for readers who are interested in further researching the field. The piece has been translated into English by our sister media, Sixth Tone, and received positive comments on Twitter and other international social media sites.
Python: to crawl, clean, and analyze the 3,522 pieces of data. With the help of Jupyter Notebook and natural language processing skills, we managed to extract the word, categorize, and count word frequency.
What was the hardest part of this project?
When netizens chat over the internet, the last thing they cared about would be typos or clean format, so the data cleaning was a challenging task for us, especially when the content is in Chinese, as the natural language processing technology for Chinese is not as developed as for English. We not only need to clean the data but also need to put all the random wording in a structured format for efficient and reasonable analysis.
Originally, there are about thousands of data spreading over the timeline, among which some are just hateful or resentful speech while some are repetitive. We, as storytellers and journalists, need to decide what to leave and what to merge in order to make our story meaningful and help our readers grasp the key point without distorting the facts.
What can others learn from this project?
As the saying goes, the news is the brief version of history. We journalists should be the ones to take good notes of what is going on in our society and what leads it to happen now. Interviewing could be one way to do it, but in the era of the internet, the more effective and fascinating means could be to collect everyone’s digital trace, which altogether will draw a wide picture of how a topic, an event, or a discussion develop overtimes. How we think and create this data-driven story could be an example to show other journalists how to jump out of the box to think about reporting in the digital world.