Lessons from Visualized: Shy Data needs Sneaky Data Scientists

Dan Barrett
Written 2 years ago
in Data


(Photo: M. Migurski @ Visualized Conference, 11/08/2012; source)

We live in an unprecedented time of abundant, easily accessible data, yet some of the most interesting data might not be recorded at all, let alone made accessible to the public.

As Michal Migurski from Stamen demonstrated at the Visualized conference November 9, the best data is often “shy” and hiding in plain sight. Teasing it out requires a new breed of data detectives: ones that aren’t afraid to get their hands dirty and collect hard data themselves.

One morning this summer, Mike and the rest of the Stamen team were sitting at Dolores Park Cafe near their office in San Francisco when they started counting shuttle buses. A common sight in the Mission District, these buses are part of a fleet of private vehicles that Silicon Valley tech giants such as Apple and Google use to shuttle employees back and forth between downtown San Francisco and the valley. Being the data fanatics they are, they decided to see if they could find out the exact routes and ridership numbers of this shadow startup transit system. As the inquiries turned up few answers, this casual question over breakfast turned into a full blown mystery, one that they would have to manually collect data to solve.

So how was this map collected?

Using FourSquare check-ins as a general guide, they gathered a network of bike messengers and volunteers to follow the buses, record routes, stops, and ridership numbers along the way. These “human sensors” returned the data entirely in the form of handwritten field papers, a human-generated map format created by Stamen.

Once the data started coming back, it quickly became apparent that the size and scale of the private shuttle system was much larger than anticipated. Stamen estimates that the shuttles are carrying up to a third of the volume of Caltrain, the largest mass transit system in the Bay Area.

You can see the surprising results of the project at The City from the Valley.

More on this project:
Mapping Silicon Valley’s Own Private “iWay” – allthingsd.com
Visualising the hidden networks of Silicon Valley – NewScientist

Dan Barrett is a senior software engineer for Contently, with degrees in GIS and Urban Planning. When not writing about data science & visualization, he is busy building statistical analysis tools for evaluating authors and their written work. Follow him on Twitter.