Category: Best data-driven reporting (small and large newsrooms)
Organisation: mind Media (Groupe mind – Frontline Media)
Organisation size: Small
Publication date: 29/11/2019
Credit: Aymeric Marolleau
mind Media is a French business outlet dedicated to analyze online advertising and companies specialized in advertising technologies. We have developed a tool that allows us to know, for each website that sells online advertising, the list of some of the service providers to which it is connected. It allows us to follow the impact of the main adtech trends and the relationships between its actors. We have drawn many articles from it since 2017. And we’ve developed an interface to let our subscribers to directly query our database for their own analysis.
Context: online advertisement is now powered by tracking tchnologies, open auctions and managed by trading desks. Transparency is a legal requirement and the ability to identify and monitor technology providers gives an overview of a publisher’s strategy.
The analysis of these files allows us to shed light on the market on several points: what technological architectures have the publishers implemented to sell their inventories (video, banners, native advertising) on desktop and mobile web? How many resellers do they work with? With which SSPs have they signed a commercial contract? Our study showed that a large number of SSPs are competing for the inventory of French publishers, despite a concentration around the main sellers in the market. We also showed that the sub-agencies generally work with the same SSPs as the publishers themselves.
We also use it as a thermometer to track certain trends in online advertising. French publishers are increasingly adopting header bidding? The number of their sales partners continues to grow, with an increasing use of resellers.
This tool also allowed us to reveal the names of the first publishers to connect to Eyeo’s ad exchange, the German company that ironically is behind the world’s most used adblocker, Adblock Plus. We also discovered the signature of some contracts between adtech companies and ad agencies, such as Deezer with the Triton Digital audio adexchange, or the presence of DMP Weborama at a Dutch publisher, the Telegraaf Media Groep, for “an R&D project”.
In 2017, adtech did not shine by its transparency and it was urgent to reassure advertisers who were struggling to understand the workings of programmatic advertising (automation of transactions) and saw a significant portion of their investments captured by fraudsters. A practice in vogue at the time, known as “domain spoofing”, consisted of usurping publishers’ domain names in order to pass themselves off as the publishers to the brands and thus take a share of their advertising revenue.
To remedy this, the Internet Advertising Bureau (IAB), the professional association in charge of putting order in online advertising, created the Ads.txt system in May 2017: each publisher is required to write on a text file at the root of its site – for example at the url lemonde.fr/ads.txt address – the list of service providers (SSP (supply-side platform) and adexchanges (market place for online advertising)) that it authorises to sell its advertising space. Buyers thus know that if a service provider who is not on the list tries to sell it to them, they are probably dealing with a fraudster.
At the beginning of 2018, to make it easier for us to keep track of the subject, we developed a crawler that visits more than 800 websites every night, which we have associated with nearly 400 publishers in the United States and Europe. In order to obtain usable statistics, several long correspondence tables, which we have created ourselves, make it possible to associate several advertising domains with a single adtech and several websites with a single publisher.
We export the results in .csv format in order to analyse them with Google Sheet and Excel.
What was the hardest part of this project?
A crawler automatically imports into a database, every night, the ads.txt file of 827 websites (the list of URLs can be found here), operated by 368 content publishers from nine of the world’s largest advertising markets: the United States, the United Kingdom, France, Germany, Italy, Spain, Portugal, Belgium and the Netherlands.
In each of these countries, we have relied on audience sources (Similar Web) and selected the most important publishers, often members of a professional organization, such as Digital Content Next in the United States, the Association of Online Publishers in the United Kingdom, or the Asociacion de Medios de Informacion in Spain. In France, we have also included several e-commerce sites – those with the largest audiences – whose revenues are increasingly derived from advertising.
We currently crawl the sites of 124 publishers in France, 76 in the United States, 37 in the United Kingdom, 23 in Germany, 44 in Italy, 22 in Spain, 12 in Portugal, 15 in Belgium and 13 in the Netherlands.
We made sure to link each URL to a publisher, each publisher to a category (media, e-commerce…) and to a country (France, USA…). In the case of France, we even went so far as to specify the control room associated with a publisher.
Similarly, to make it easier for Ads.txt Scan users to read the relationships between publishers and service providers, we have associated each advertising system domain name with the company that owns it (271 at the end of August 2018). For example, if appnexus.com is displayed in an ads.txt file, the user interface of our monitoring tool will display Xandr. As a result of mergers and acquisitions in adtech, a provider may be attached to several advertising domain names, not always very explicitly.
What can others learn from this project?
Market intelligence and the path to exclusive content for B2B medias is directly linked to our ability to monitor online datas and traces. As an organization strategy the integration of data analysts in the newsroom should be mandatory. In some ways a new tradition of modern muckrackers is born.
This project, as well as several others initiated by mind Media, illustrates how journalists can use the digital traces left by companies that experiment open data somewhat in spite of themselves, to get around the communication wall. This allows us to better understand their activity and to shed light on certain practices that are rarely known to the uninitiated, by finding answers to questions that companies never address in their press releases and press conferences, and about which they deliberately remain vague in interviews.
Our newsroom has a dozen journalists, each specialized in covering relatively opaque sectors (media and online advertising, finance, health, etc.). The data-journalism unit has trained and sensitized them so that they know how to identify data sources that are not very visible to the uninformed eye: APIs, lists, Excel and PDF files hosted on the websites of professional unions, ministries, certifying bodies… We also sometimes scrape the HTML code from the websites of the companies we study or their accounts on social networks.
In addition to the Ads.txt files, we have for instance specified the type of data collected by online advertising actors thanks to another IAB initiative, the Global vendor list of the Transparency & consent framework. We showed which mobile advertising trackers, especially geolocation trackers, are installed in French media applications, thanks to the open platform Exodus Privacy.