Shortly after the attempted insurrection on Jan. 6, we were the first news organization to publish the majority of these videos, which had been uploaded to the then-defunct social media service Parler. Although that system had been taken down by its hosts, we received and combed through a trove of thousands of video files collected by an online group that had archived them. The result of our work was a harrowing interactive and social media-like experience that let users experience the riot as though they were in the midst of its participants.
Readers responded strongly, making it one of the most-viewed features on the site in 2021. The Department of Justice cited the videos we published dozens of times in documents charging insurrectionists with crimes committed that day, and the videos were played countless times in Congress during former President Trump’s impeachment trial.
Perhaps most crucially, because these videos were made inaccessible when Parler’s web host took it off the Internet, if it weren’t for our project, all of this documentary evidence might have been lost.
This project was a huge technical undertaking. The initial cache of videos was over 30 terabytes, a truly enormous amount of data. We had to use metadata and write code to narrow down the videos to a reasonable number to review.
We put out a call to the rest of the newsroom and asked for volunteers to review the videos so we could surface the germane and newsworthy videos from the day. ProPublica journalists watched and tagged hundreds of videos in a spreadsheet.
We also needed to think through the experience we wanted readers to have. We wanted it to be easy to navigate and tell a gripping, unfolding story, but also let them specify which parts they wanted to see. We color-coded the videos and organized them by time, creating a timeline scrubber that is its own data visualization: because the timeline is color-coded, you can see at a glance how over time the videos go from outside of the Capitol complex, to inside the building itself.
Further, video is not easy or cheap to deal with. We had to transcode all of the videos to create versions of the files that we could serve to users, including those on mobile data connections. What’s more, some browsers crash when you load too many videos at once, so we had to create technical workarounds to make it possible for browsers to handle that many videos.
What was the hardest part of this project?
The sheer size of the original dataset — 30 terabytes comprising many hours of footage — made it a complex project from the start. The data we got from our sources included the full EXIF metadata, we were able to narrow the trove down to using timestamps (starting from Trump’s speech through the end of the day) and geographic coordinates (in or near the Capitol).. However, the EXIF data was messy and inconsistent across different devices so we needed to be careful to avoid missing pieces of video evidence.
More than 35 ProPublicans contributed to this project. They watched and tagged videos to augment what we knew about each of them past what the metadata could tell us, which helped narrow down to just videos we wanted to publish. Corralling dozens of colleagues into a Google Sheet together on such a tight timeline was hard work.
Really, the main challenges here given the scope of information we were working with was speed. We’re not a breaking news organization, but we sprinted and worked together to make sure we got these videos to the American people as soon as we could. We consider it a public service.
What can others learn from this project?
Sometimes an event comes around that is so momentous you need to drop everything to cover it. Most journalists know this, but knowing how you can make an impact on a national, fast-breaking story is hard.
We stuck to our strengths: Computational journalist Jeff Kao found a source with a huge cache of data, and collaborating with our news apps editors created a way for the entire organization to pitch in and help. The news apps team then got to work immediately sketching out how we could present videos to people in a compelling and meaningful way.
While other news organizations went with curated walk-throughs of the day, we realized there was power in a minimally-filtered and immersive piece. Giving people the “Parler-eye view” of the day gave people the ability to experience the day through the eyes of those who posted videos from it. That way of looking at it turned out to be powerful.