The Rise of Hyphenated Last Names in Pro Sports

Category: Best data-driven reporting (small and large newsrooms)

Country/area: United States

Organisation: The Pudding

Organisation size: Small

Publication date: 13/05/2019

Credit: Jan Diehm

Project description:

Last winter I was watching an NFL game featuring one of the league’s most memorably named players: Ha Ha Clinton-Dix. But it wasn’t his first name that caught my attention—it was his last.

The list of players whose names arch over the numbers on the back of their jerseys goes on and on: Clinton-Dix, Shai Gilgeous-AlexanderSean Reid-FoleyJuJu Smith-SchusterKaleena Mosqueda-Lewis. So I wanted to investigate: Are double-barrelled last names getting more common in professional sports? And what about overall?

Impact reached:

Hyphenated names are hard to study. Although athletes proudly wear their last names on their jerseys, most names are personal. The US Census collects last names, but to preserve the anonymity of individuals, only names appearing 100 or more times are released. So, you get names like Smith and Johnson, but never names like Smith-Johnson. Laurie Scheuble, a Penn State professor who researches names told me that this was a first-of-its-kind analysis and that the work should be published in an academic journal.

Techniques/technologies used:

Player names were scraped from Baseball Reference, Basketball Reference, Football Reference, Hockey Reference, MLS, and NWSL using Node.js. Names that included “-” were tagged and manually vetted. Korean names, where the last name appears before the first name, were not tagged as hyphenated names. Players were grouped into decades by the season in which they played in their first professional game. When seasons spanned multiple years (i.e. 1979-1980), the last year was used as the decade. The front-end was built out using Javascript and D3.js.

What was the hardest part of this project?

The hardest part of the project was figuring out how to display the names in the data viz. I knew that the names themselves needed to be there centerpiece, after all, seeing the names was what attracted me to this project in the first place. I went through several iterations with the names as data: a histogram, a network diagram, but finally settled on the animated wall of words you see in the final.

What can others learn from this project?

  1. Something as simple as a passing thought while watching TV can turn into a data project that rivals academic research. 
  2. Sports stories don’t have to be game stories or numbers-based.
  3. Charts don’t have to look like charts at all.
  4. Important cultural trends can be hidden in everyday life.

Project links: