mind Health (Groupe mind)

Country/area: France

Organisation: mind Health (Groupe mind)

Organisation size: Small

Publication date: 27/09/2021

Credit: Sara Chaouki, Aymeric Marolleau, Sandrine Cauchard, Aurélie Dureuil, Camille Boivigny, Anne-Laure Mercier


Sara Chaouki is datajournalist at Group mind since September 2020. She has a degree of science journalism from the Université de Paris and a degree of applied mathematics from ENSEEIHT. 

Aymeric Marolleau is in charge of the datajournalism team within the Group mind since early 2019. From 2015 to 2018, he was a specialized media and adtech journalist at mind Media. He has been a business journalist for about ten years. 

They both help the five editorial teams of Group mind (mind Fintech, mind Health, mind Media, mind Retail and Planet Labor) to produce data-driven stories.

Sandrine Cochard is editor in chief of mind Health. Journalist for 17 years, she’s been working for general and B2B media (BFMTV, Europe1, 20minutes, L’ADN) on Innovation, Tech and Cybersecurity. She is an external teacher for Journalism school of Sciences Po Paris since 2019.

After scientific studies, Aurélie Dureuil specialized in journalism. She has been a journalist in the scientific and health press for 15 years. Since June 2021, she is the editorial director of Le Généraliste. Previously, she notably held the position of editor-in-chief of the journal mind Health, of which she participated in the creation. Her background gives her expertise that is deeply rooted in the health ecosystem and its innovations. She identified the Clinical Trials database and the opportunity of this study and actively helped to write it. 

Camille Boivigny studied Pharmacy, has been working as a HealthTech journalist for the past 6 years (APM, Pharmaceutiques, among others). She’s passionate about medicine’s innovations (high and deeptech).

Project description:

The U.S. health authority, the National Institutes of Health (NIH) archives clinical trials conducted in the United States and 220 countries around the world on the ClinicalTrials.gov website. This represents approximately 370,000 studies, with a wealth of information. ClinicalTrials allows anyone to download this gold mine in the form of thousands of XML (Extensible Markup Language) files. This is what we did last March, and then we searched for keywords that could indicate the presence of digital technologies in each trial: machine learning, real-world data, blockchain, clinical trial management system, e-CRF… 

Impact reached:

Pharmaceutical laboratories and healthtech start-ups are part of the subscribers of our business oriented publication, mind Health. COVID-19 challenged them exceptionally in the recent past, particularly the need for quicker, safer and more efficient clinical trials to test new drugs and vaccines. 

Our first aim was to objectively measure the growing influence of digital technology (AI, wearables, blockchain…) in clinical trials and associate tangible data to this phenomena. By doing so we helped the health industry to understand this trend better and know better how to cope with it. It also allowed us to : 

  • move away from the mere PR advertising – which are far too common in the field of innovation – to report on the real transformation of the pharmaceutical industry based on tangible data

  • establish a solid database on the use of digital technology in clinical trials in order to design indicators that can be updated and compared over the coming years

This 3-articles story, illustrated with a dozen charts, was mind Health’s most read in 2021. Each of the three parts were among the first 20 contents.

Techniques/technologies used:

Since February 2000, the U.S. health authority, the National Institutes of Health (NIH), and the U.S. National Library of Medicine (NLM) have been archiving clinical trials conducted in the United States and in 220 countries around the world on the ClinicalTrials.gov website. This represents approximately 370,000 studies, the oldest of which date back to 1931. For each study, there is a wealth of information: the subject, the sponsors, the countries where the study was conducted, the therapeutic areas concerned, etc. The information is provided and updated throughout the study by its sponsor or principal investigator. 

ClinicalTrials allows anyone to download this gold mine in the form of thousands of XML (Extensible Markup Language) files. We started by selecting only interventional clinical trials (observational trials are not included in the study) and then searched all the information associated with each trial (title of the study, abstract, etc.) for defined keywords. For example, for trial management technologies: CTMS (clinical trial management system), e-CRF, eCOA and econsent. For telemedicine technologies: telehealth, teleconsultation, telecare, homecare and remote site monitoring. These keywords may not always be as comprehensive as we would like, and some terms may be common to clinical trials that do not include digital. 

We used Python to analyse the data, Datawrapper and Flourish to make them visual. 

What was the hardest part of this project?

When we decided to launch the project, the first step was to select a list of keywords that would indicate the presence of digital technologies in each of the 360,000 clinical trials we uploaded.  

Afterwards, the challenge was to find a solution that would allow us to transform thousands of XML files into a single database. In Python, we created a function that transforms the XML files into Pandas rows that are then assembled into a single dataframe. After this, it was easier for us to select the clinical trials that match our criteria and the fields that we wanted to analyze.

The clinical trial data is filled in by the sponsor, therefore the data was sometimes incomplete or filled in in different ways. We had to select, clean and standardize it in order to get the best insights during the analysis.  

Finally, we had to find out which clinical trials were interesting to highlight for each digital technology. A part of the project was dedicated to documentation in order to select the most innovative and impactful clinical trials of the last few years.

What can others learn from this project?

Stories about clinical trials results can be exciting for journalists and audiences. However, looking at the big picture allows us to track the development and the evolution of the pharma industry.

ClinicalTrials.org database contains huge amounts of data on all areas of clinical research that are worth exploring. The fact that they are stored in nearly 400,000 thousand XML files should not be a barrier since there are tools and methods to facilitate their analysis on a meta-level. Other journalists could lean on this project to learn how to work with a trove of XML files to analyse data with Python.

Project links: