Cyberlove – pioneers who snuggle with the virtual lover

Country/area: China

Organisation: Not have

Organisation size: Small

Publication date: 20/12/2021

Credit: Zimeng Yang, Xinyang Liu, Jiayi Pan


The author of this work is Zimeng Yang, Xingyang Liu and Jiayi pan.We are students at the School of Literature and Journalism and Communication of Central South University. Our major is communication.

Project description:

The AI companion “Replika” had more than 20m users by 2020. More young people are pouring their emotion into AI chatbots. What do these pioneers have in common, and how do they understand this new kind of love? Based on these themes, we collected comments from Google Play store, the posts of douban’s “Man-machine Love” group and data from Baidu index. Then, we did some statistical analysis to present an image of pioneers and narrated vivid stories of different groups.

Impact reached:

This project provides the public with the brand story, development history, and algorithmic logic of the most popular AI software “Replika”. Then, according to the search engine data, the basic information of pioneers is displayed, including geographical distribution, gender, interests and hobbies. According to the Google Play store, which showcases bifurcated reviews of Replika’s AI software, it displays diverse views on trendy relationships, and finds many people struggling with conflict and hesitation. From the Douban group, we saw many rich individuals and their vivid stories. All of these can show the group images of these pioneers to the public, and can trigger thinking about the new love concept of young people.


Techniques/technologies used:

The web crawling tool and technology we used was Python, and we used Python for autonomous programming to get the accurate web information.
We also used the Baidu index to check the searcher profile of corresponding terms in the Baidu search engine, including their age, gender, region, preferences, etc.
We solved the problem of structuring unstructured text data with manual coding.
Most of our statistical visualization images are drawn using Dycharts, a professional online dynamic charting tool.
The H5 building platform we used is Eqxiu, a digital marketing web building platform based on creative design content in China.

What was the hardest part of this project?

Looking back on the whole production process, I think the biggest difficulty we encountered was how we could structure the unstructured data. Since the selection of the work was specific, we used web crawling technology to crawl the entire usage comment data under the app Replika page in Google Play Store as user data for analysis. However, the problem is that all the data are unstructured textual data, which cannot be analyzed and counted by computer directly. And if machine learning method is used, its cost is too high for student works.
Our solution was to manually code the text data. We had two professional coders go through two hundred randomly selected pieces of data out of the more than 10,000 pieces of data crawled, one by one, to artificially decode the top motivations for users to use Replika, and categorize the statistics. The whole coding process followed the process of trial coding, consistency checking, coding adjustment, and disagreement resolution. The two coders first coded 50 comments separately and independently at the same time to initially determine the coding categories, of which the number of consistent codes was 38. After that, the coding categories were adjusted according to the actual situation to ensure the accuracy of the coding.

What can others learn from this project?

On the one hand, one can learn how data journalism is produced. We use data mining and personal case studies in our reporting, which provide scientific arguments to support data journalism. We use big data to obtain a profile of replika’s users and then analyse individual behaviour to get a more comprehensive picture of user characteristics and preferences. On the other hand, one can focus on the choice of topics for news coverage. The current rapid development of artificial intelligence technology has created a number of problems. The selection of the topic of human-computer love provides a new perspective to think about the technical difficulties and ethical issues that exist with virtual lovers.

Project links: