We used a neural network and a machine learning algorithm to measure visual similarity between 6,367 paintings at the Prado Museum, Spain’s largest collection of visual arts. In other words, without such tech lingo, we offered a data-driven overlook of the museum that goes beyond the classic paintings by Goya and Caravaggio. We discovered unexpected curiosities, such as an impressive collection of still lifes or the work of Carlos de Haes, a master of realistic landscape painting who spent his life perfecting techniques to paint mountains, fields and forests faithfully.
The article echoed Brazil abroad. Among the newspaper’s readers, the article was well received and had low rejection rate.
This material was produced using two artificial intelligence techniques. At first, a model capable of describing the content of an image was used to extract the features of each of the 6,367 works. In clearer language, this means that the computer extracted a mathematical description of the paintings.
These numerical values could be used by the program to, for example, detect that Carlos de Haes’ La Canal de Mancorbo en los Picos de Europa shows a mountainous landscape. This last step, the prediction, was not realized in the report. Only numerical values were used – and that is where the second artificial intelligence technique comes in.
Using an algorithm called t-SNE, we compared the similarity between the numerical description of each work. Analyzing these values, the computer calculated the position that each image should occupy in a plane, so that images with similar characteristics would be close to each other.
What was the hardest part of this project?
As in other reports made by the team of Data Viz do Estadão, there was no ready and pre-catalogue database. First, we created a scraper to download all the photos from the museum’s website. At the same time, we asked the administration of Museo do Prado for permission to reproduce the works on the site of the newspaper. With the approval, we went on to create a Machine Learning application with TSN-e. After training and testing a few dozen times, it was time to process the whole base – and then produce the story.
What can others learn from this project?
If I were to summarize, the greatest learning would be to think of solutions – and guidelines – out of the box, leaving aside limitations, whether technical or informational, for lack of datasets. In an unpretentious way, we created an explorer of works that guides the reader to one of the main museums in Europe