2022 Shortlist

Pandora Papers

Country/area: United States

Organisation: International Consortium of Investigative Journalists, The Washington Post, SVT, Miami Herald and 147 media partners around the world

Organisation size: Big

Publication date: 03/10/2021

Credit: 600 journalists in 117 countries and territories | About: https://www.icij.org/investigations/pandora-papers/about-pandora-papers-investigation/ | Partners: https://www.icij.org/investigations/pandora-papers/pandora-papers-journalists-and-media-partners/


The Pandora Papers is an investigation by a global team of more than 600 journalists in 117 countries and territories. The team included data journalists, reporters, editors, researchers, fact-checkers and developers that mined together more than 11.9 million records of confidential financial information from 14 offshore service providers.

Project description:

The Pandora Papers investigation lays bare the global entanglement of political power and secretive offshore finance.

Based on more than 11.9 million records, containing 2.94 terabytes of confidential information from 14 offshore service providers, the investigation reveals the secret deals and hidden assets of more than 330 politicians and high-level public officials in more than 90 countries and territories, including 35 country leaders.

The files were obtained by the International Consortium of Investigative Journalists and shared with 150 media partners around the world. They also reveal secret holdings of more than 130 billionaires from 45 countries including 46 Russian oligarchs.

Impact reached:

The publication of the Pandora Papers has generated reactions around the world, among them:

  • Within hours of publication, authorities around the world vowed investigations. Officials in Pakistan, Mexico, Spain, Brazil, Sri Lanka, Australia and Panama, among other countries quickly promised inquiries while global watchdog groups demanded action in the wake of stories revealing how billionaires, politicians and criminals exploit a shadow financial system that covers up tax dodging and money laundering.
  • Parliaments, including the European Parliament, and those in Malaysia, Colombia, Ecuador, Brazil, among others opened discussions about the Pandora Papers.
  • US lawmakers proposed a legislation that experts say represents the most significant reform of anti-money laundering rules since 9/11.
  • The chairman of a Czech Senate commission called for investigations into the offshore deals of Czech Republic Prime Minister Andrej Babis’ exposed in Pandora Papers reporting. Czech prime minister’s party narrowly lost re-election days after Pandora Papers revelations in a surprise outcome.
  • A Denver museum promised to return looted relics to Cambodia after US moves to seize them. The repatriation of the ancient statues came weeks after Pandora Papers reporting identified dozens of Khmer antiquities linked to an accused trafficker in the collections of major art institutions.
  • Chilean legislators voted to impeach president Sebastián Piñera after Pandora Papers revelations. Proceedings advanced to the Senate, where the Senate voted 24-18 in favor of removing him from office, but the vote fell short of the required threshold.
  • Ecuador’s president Guillermo Lasso survived removal efforts after a majority of the country’s legislature voted against a recommendation to dismiss him following Pandora Papers revelations.
  • In Sri Lanka, President Gotabaya Rajapaksa ordered an investigation into the Pandora Papers findings, including those showing members of his family used shell companies to buy luxury property and artwork.

Techniques/technologies used:

  • The 11.9 million records were OCRed, indexed and shared using Datashare, a secure research and analytical open source tool developed by ICIJ’s technical team.
  • To explore and analyze the information, ICIJ identified files that contained beneficial ownership information by company and jurisdiction, structured it and generated lists by country. In cases where information came in spreadsheet form, ICIJ removed duplicates and combined it into a master spreadsheet. For PDF or document files, ICIJ used programming languages such as Python to automate data extraction and structuring as much as possible. ICIJ used machine learning and other tools, including Fonduer and Scikit-learn, to identify and separate specific forms from longer documents. Some provider forms were handwritten, requiring ICIJ to extract information manually.
  • SVT extracted data from passports.
  • After structuring the data, ICIJ used graphic platforms (Neo4J and Linkurious) to generate visualizations and make them searchable. Graph databases were also used in https://offshoreleaks.icij.org/
  • Machine learning (Universal Sentence Encoder) was used to cluster due diligence files that didn’t show offshore links and tag them in Datashare, enabling reporters to exclude them from their searches.
  • ICIJ also used Python, ElasticSearch, Google Sheets, Microsoft Excel, Datashare-Tarentula for analysis on the use of offshore entities by politicians (published in our Power Players feature), use of U.S. trusts, use of offshore entities by Forbes billionaires, suspicious activity reports, lawyers connected to Baker McKenzie who previously held government posts, use of offshore jurisdictions by clients from different countries and distribution by provider, the role offshore finance plays in hiding looted art and ancient relics, Mossack Fonseca clients in the Pandora Papers (with the Miami Herald).
  • ICIJ validated data using public records. The data and analysis were fact-checked through several rounds using spreadsheets and code. ICIJ used its in-house fact-checking tool “Prophecies” to

What was the hardest part of this project?

The Pandora Papers’s 11.9 million records arrived from 14 different offshore services firms in a jumble of files and formats presenting a massive data-management challenge.

The Pandora Papers information brought a new challenge because the 14 providers had different ways of presenting and organizing information. Some organized documents by client, some by various offices, and others had no apparent system at all.  A single document sometimes contained years’ worth of emails and attachments. Some providers digitized their records and structured them in spreadsheets; others kept paper files that were scanned. Some PDFs were 10,000-pages and had information in forms that had to be structured. The documents arrived in English, Spanish, Russian, French, Arabic, Korean and other languages.

The complexity of the data and the fact that only 4% of the records were in spreadsheet format required a major effort to validate and structure information about companies in secrecy jurisdictions and their owners in the Pandora Papers. The methods used to sort this out involved different approaches by provider, based on the quality of the data and format of the files. The scale of the leak required important computer power to process the information and structure data out of it to conduct analysis afterwards.

The reporting effort of more than 600 journalists in 117 countries and territories was central to the project.

Due to the sensitivity of the project and the difficult conditions of press freedom where many of the partners were, ICIJ took security considerations into account, such as the use of encryption for secure communication.

As the investigation was done in the middle of the Pandemic, it was not possible to organize an in-person meeting with all reporters. Instead, the team looked to stay connected virtually and online training sessions also helped overcome some of the challenges.

What can others learn from this project?

Dealing with a large number of records in different formats requires a combination of approaches. Having a tool, such as Datashare, that facilitates the process of indexing, OCRing and sharing the data securely is central to a global collaboration. Also having a secure place where to coordinate efforts and communicate is key. In the case of Pandora Papers, ICIJ used the Global I-Hub, which is a communication platform that uses the software Discourse and has been adjusted to the specific project needs. Establishing security protocols in global projects is also important. ICIJ and its media partners used encryption during the project. 

When working with diverse records coming from different sources and formats, it is important to identify key types of files that could be used to explore key topics in the data and structure information that could lead to the generation of datasets and analysis. Structuring information that comes from different files might require a combination of approaches including the use of code for automated data extraction, machine learning for more complex problems and manual work. Reporting outside the data is key to connect the dots and get the stories. More than 600 journalists worked for nearly two years on the Pandora Papers.

Visualizations, such as the use of graph databases, can help with the reporting process and find connections in the data. In global collaborations, it’s important to make data accessible to all team members and facilitate its exploration in a way that reporters with or without coding skills can have the same capacity of navigating the data. Training sessions can help with the process of making data and technologies accessible to everyone.

It is also key to allocate time for data validation and fact-checking of data analysis. Public records and comment requests can help with the validation processes.

Project links: