China is building its own international payment system, CIPS (Cross-Border Interbank Payment System) to compete with global standard SWIFT. The participation from global banks is growing, but the exact volume of transactions was not well known. We collected the PDF files from the Chinese central bank, People’s Bank of China and scraped data written in Chinese. We figured out that transaction volume increased dramatically in recent years and pointed out that it would transform the global payment system landscape.
The engagement from the financial sector was exceptionally high. Japanese banks are expanding their business in China and after the article, the movement has accelerated. We think it was a best timing to publish the article.
We used python package ‘pdftotext’ and regular expression to extract the figure from the PDF written in Chinese.
What was the hardest part of this project?
Since scraping data from Chinese government official site is prohibited, we needed to download a large amount of PDF files manually. Also, regular expression didn’t work for some special characters.
What can others learn from this project?
We learned that regular expression worked for Chinese langueage.