Voters, look no further! Here are your city councilors’ performances based on their bill proposals
Entry type: Single project
Publishing organisation: Central News Agency
Organisation size: Big
Publication date: 2022-11-08
Authors: Chien Yi-hui, Chen Wen-shian, Liu Zi-jia
The Central News Agency launched its Media Lab in July 2018 to cope with new challenges that come with the digital era. Since then, the lab has released special reports which are often experimental in nature, examples of innovative technologies being applied to the media industry.
Taiwan held local elections in November 2022. Though county and city councilors are important local representatives of public opinion, their performances are often not well known. This project crawls through the bills proposed by each councilor from Taiwan’s six major municipalities. From the number, the categories and the key words of bills, our project provides a clear and complete councilor bills analysis for voters from each area.
Voters in different regions with different identities will receive customized analysis reports on local bills, allowing them to better understand the performances of local candidates before voting and observe which candidate’s bills align most closely with their own concerns. We are the first Taiwanese news media to quantitatively present the performances of councilors’ bills. After the release of the project, it was re-posted by political figures and candidates from various parties. According to feedback from readers, this project effectively helps readers to analyze candidates rationally and avoid blind voting.
We used Python to scrape the websites of the six major municipalities’ councils. The file formats include PDF, DOC, and JSON. We removed the text data irrelevant to the bills and converted remaining data into analyzable structured data. In terms of bill content processing, we used the CKIP Tagger, an open-source Chinese language processor to segment words, which was developed by Academia Sinica, Taiwan’s top research institution. We then filtered out nouns (excluding postpositions and quantifiers), verbs, and adjectives based on word attributes for subsequent frequency statistics and visualization.
Next, we compared data from the Central Election Commission to find candidates’ current positions and the number of terms she/he had been elected for analyses of her/his qualifications and proposal performance. Finally, we used Python to produce the corresponding word cloud of bills and data analysis according to the district and candidate.
Context about the project:
The data of some cities lack accessibility. In addition to file formats being hard to organize, there are problems such as irregular Chinese characters, errors in councilors’ names, and missing information. Therefore, a lot of time was spent cleaning and verifying the data in the early stages. We also made the project available to some communities before completion, allowing people to find and correct errors. Some of these people included current councilors, who then urged the council to complete the missing data and in turn improved the quality of public data.
What can other journalists learn from this project?
In recent years,Taiwan’s Academia Sinica has been constantly improving its Chinese word segmentation system. This project uses word segmentation and part-of-speech tagging techniques to analyze text content without relying on experts to provide data. Additionally, legislators have numerous duties, so we selected items that can be quantitatively analyzed and examined by the public. At the same time, we emphasized that proposal performance cannot be measured by the number of bills alone. The topic of bills must be presented without causing excessive burden on readers in order for the information to be maximally absorbed and utilized.