It is an investigative article written by Data Critica and edited by Quinto Elemento Lab that revealed how Mexican authorities underrepresented indigenous people in the public currently 4 Between March and September of 2020, age, sex, medical conditions, if the patient speaks an indigenous language, state of residence, and other demographic data records of people registered after taking a Covid test were published, but ethnicity records were removed, effectively making impossible to know the impact of the pandemics on this populations for most of the pandemics. Even after releasing data, statistical estimations from census data show the numbers are grossly
“It is true: even in death we are invisible”, told me after the piece was published a 23 Mayan teacher in southeast Yucatán. He participated then in the writing of a manifesto, gathering with other indigenous activists. They have not finished it but hard data evidence of structural racism is often needed by this kind of organizations.
The piece, written by me for Data Critica and edited by Quinto Elemento Lab, was selected as part of the best of Data Journalism by the Global Investigative Journalism Network in January. During that month in Mexico it was taken by national broadcasters W Radio, the Mexican Institute for the Radio, Radio Educación an in national television by La Octava. It was also later on published by NBC Universal spanish arm Telemundo and the spanish outlet ElDiario.es, in addition to the 8 mexican investigative and independent news outlets that originally published in Mexico.
I used the techniques of automated data extraction from the federal open data Covid-19 repository, which was updated daily; scripted in R, and then also using R I weekly compared the evolution of several proportion statistics, detailed below. I also used the technique of joining two databases based on location of the municipalities, which I got from a different census data, as it was not provided in the original Covid-19. Finally, with the guide of a statistician, I calculated an estimation of percentage of indigenous deaths by municipality using the indigenous population for each of the +2400 in Mexico.
For the two groups death rate (indigenous vs .the rest of the population) I calculated confirmed Covid-19 deaths crossing four different variables that informed about positive Covid-19 cases and the column that informed a death date. Then I used the only available indicator of ethnicity in the beginning, indigenous language spoken, to calculate the death rate comparably between two different groups (speaks indigenous language vs. doesn’t) From the beginning death rate between indigenous speakers was higher, but not enough data to make an inference was possible until september.
By that time a Mayan nurse informed me of the case of his friend, which I found in the database using her description (as of course no name is on the database). I found that she was not registered as speaker of indigenous language (she couldn’t speak because of a disability) and some months later a new variable “appeared”, the one that officially records if a person acknowledges themselves indigenous, which legally in Mexico is enough to be considered so.
Data wrangling and cleansing, and interactive visualizations using Shiny were also used for the final dataviz outputs.
What was the hardest part of this project?
The hardest part was to give up the initial finding about the higher death rate between indigenous languages speakers, because there was just not enough data, and then going beyond the database to its construction. I interviewed the government official in charge of this information who finally admitted to have been removing ethnicity data, using the argument that it was personal data (just as other demographics like sex, age, conditions, if the person speaks an indigenous language, it was not personal).
But again the hardest part was to go beyond the data, speaking to a statistician who suggested using the official percentage of indigenous population per municipality as proxy for the proportions of indigenous people that could have died in Mexico and comparing it with the unbelievably low official records.
Then explaining all of this, with its uncertainty limits, and threading it with the personal story of Luis Cauich, who saw her indigenous friend invisibilized from the database for 6 months after her death, was also hard.
What can others learn from this project?
In investigative journalism it is of the utmost importance to restrain to publish the first finding that come right out the data, if we want to do justice to the real picture beyond the data. This I would consider the most important lesson. Then there are more technical capabilities that can be learnt, Mexico maps are not readily available in freemium software like Flourish and others, so coding the way up to geographic visualizations using R becomes a great solution to this visualization. The use of shiny Rmd documents also I want to teach to fellow journalists in Latin America.