We’ve seen countless stories about what millennials have supposedly killed. From napkins to marriage to Applebees, just looking at headlines you’d guess that for the past decade the millennial generation’s been on a rampage. But we wanted to dig deeper: how does popular media report on millennials more broadly? We combed through 12,500 to find out.
We’d seen countless listicles of what millennials had killed, but no definitive, overall picture of how the media covers millennials. We wanted to create something that both confirmed and challenged the sterotypes the media constructs about millennials — and put that info directly in the hands of millennnials themselves.
We used the Event Registry API to scrape news articles about Millennials. The query filtered on news articles with the word “Millennials”, “millennials”, “Millennial”, or “millennial” in the headline published between June 15, 2015 and June 15, 2019. This query yielded nearly 38,000 articles. We obtained article metadata, including the URL, title, body, and publishing date from the query. Sometimes, multiple news outlets in the same media family publish the same article; removing these duplicates yielded a total of 26,565 articles.
We used the Spacy Python package to part-of-speech tag the headline text. Part-of-speech tagging identifies each word’s part-of-speech in the sentence (e.g., a noun versus a verb versus an adverb). We filtered on articles headlines in which Millennials perform an action (“Millennials are killing the napkin industry’”, for instance). Narrowing our focus made it easier to identify the focus of their love and/or destruction. Using the newly tagged headlines, we subsetted the main dataset on headlines where “millennials” is the subject noun of the sentence, yielding 12,500 articles. Of these articles, we also removed articles with less than five sentences in the body.
The objects you can explore are the noun chunks Spacy identified as the first direct object in the headline. We opted to look at noun chunks instead of just nouns to get a complete picture of the items Millennials are interacting with. Noun chunks include adjectives plus nouns, such as “second home” instead of “home”. This method left us with about 4,000 unique nouns and 2,000 unique verbs.
What was the hardest part of this project?
Often when people see our final projects, they assume that the development is the most consuming and most difficult part, but it’s almost always the behind-the-scenes data work that you don’t see that’s the most challenging. Please see our answer to the tools and techniques question for a step-by-step rundown of how we wrangled all the headline text — by far the hardest part of this project.
What can others learn from this project?
Millennials are media darlings. It’s easy to fall into the tropes of millennials killing retail and loving avocados, but there’s a tendency to filter out the headlines that don’t conform to our confirmation bias