At America’s largest and most powerful companies, 1 in every 97 white workers is an executive, but only 1 in every 443 Black or Hispanic workers can say the same. USA TODAY reached these findings after hounding dozens of firms in the Standard & Poor’s 100 for previously undisclosed hiring records following George Floyd’s murder. Employers send numbers on race, gender and ethnicity to the government annually; regulators keep the records hidden. With its unique database, USA TODAY revealed stark disparities in top hires, even as blue collar ranks diversified. And we made all the data searchable for readers.
More and more companies have agreed to share their hiring records with us since publication. Some of them said they did not wish to appear on our list of S&P 100 firms that refused to disclose data. From an initial 54 companies, we now have a database of 83. And the numbers are growing nearly every week. Existing participants have also contributed additional years of data, enabling us to begin analyzing trends over time.
Our analysis consisted of two major data components. First, we ingested, cleaned and tabulated race, ethnicity and gender data by job category. Second, for comparison, we compiled a matching set of data for all employers in each filer’s industry using data from the Census Bureau.
- Federal hiring data: Firms emailed tech us their federal EEO-1 forms, which they send to the Equal Employment Opportunity Commission annually. All raw data was in PDF or other image files of scanned paper forms. We used ABBY Fine Reader for OCR and Tabula for parsing. We used Pandas to clean and test the data for inconsistencies caused by bad OCR.
- Census data: We used the Census Bureau’s special Equal Employment Opportunity Tabulation, which uses responses from the five-year American Community Survey of individuals to tabulate occupation, race, gender and ethnicity data by industry. The Census Bureau applies the same job and demographic classifications used by the EEOC.
Using R, we married the Census Bureau data to the form EEO-1s by company. We then examined, in R, how closely employees in top jobs, especially, matched demographics of all U.S. workers and those in a particular industry.
Lastly, we published the data in searchable form using a simple WordPress content management system.
What was the hardest part of this project?
The hardest part was working the phones and email boxes to cajole corporate PR people and their bosses to get us documents. Senior tech and economic opportunity reporter Jessica Guynn took on the complex task of tracking contacts with 100 companies.
When dozens of Form EEO-1s arrived, we needed to apply OCR and parsing software to extract structured tables from scanned versions of paper documents. Often the scan was of very poor quality, requiring extra verification after parsing.
Once the data was in shape, wer took the unusual approach of comparing the share of Black and Latino workers who held executive positions with the share of white workers who did. Many of us assume that people of color are under-represented at companies generally. We wanted to see, regardless of what a company’s workforce looked like, the odds that a person of color had made it to the very top.
But reporters didn’t stop there. To get behind the numbers, they interviewed employees at these top companies in each of the major sectors for which we had a significant number of submissions. They drilled deep on specific barriers and opportunities facing individual industries. They interviewed current and former executives who were successful at recruiting and retaining people of color to key positions, pointing the way to possible solutions.
We published the resulting package of stories, visualizations and searchable data over the course of a week, offering entry points for readers in every business sector — including the sizable chunk of the American workforce that the S&P 100 employs. The series constituted a truly unique public resource. Academics who study the EEOC and its history told us that because of the agency’s stubborn secrecy, they had never seen so many companies’ form EEO-1s until our searchable database went online.
What can others learn from this project?
Some data projects provide a public service not because of their sophisticated techniques, but because they present previously hidden data for readers to use and understand.
One important reminder for us in this project was that even if a public record is treated as exempt from disclosure under FOIA, you should never assume it’s impossible to get. Always consider the possibility that a company or individual (or many of them) can be persuaded to divulge their data voluntarily. The more high-profile players you get on board, the more others will want to join in to avoid looking unfavorable next to their peers. The effort becomes a rolling snowball if you invest some hard work in pitching people.