Can Big Data Make New York Buses On Time? by Jay Cassano

Can Big Data Make New York Buses On Time?
Jay Cassano

When you hear about big data, you might think of nefarious data brokers selling your browsing history or governments demanding logs of your phone's GPS coordinates. But the data that overwhelms our modern world is just as often being used for good and can improve our lives in completely banal ways we don't even notice—like making the buses run on time.

At least that's what New York City Council Member Ben Kallos is hoping open data from the Metropolitan Transit Authority (MTA) could do.

Kallos represents Manhattan's Upper East Side and his constituents, like most New Yorkers, complain that MTA buses are frequently late (read a previous Fast Companyprofile of Kallos here). But when Kallos forwarded complaints to the MTA, the agency would respond that the problems don't exist and a particularly vocal subset of his constituents must be exaggerating.

So Kallos called on the MTA to release its archival data showing when each bus arrived at each stop on its route. Because the MTA has real-time data for all buses publicly available, Kallos assumed that the MTA must have that archived this data. With it, programmers could help him analyze which buses were more than five minutes late to stops based on their advertised schedules.

Unfortunately, it turned out the MTA doesn't have that archival data in a format that the public could easily use. But in response to Kallos's request, the agency released three months of data on a pilot basis, and a coder from BetaNYC worked on the project, posting his work on the open-source coding site Github.

"Now with this data we have the ability to hold the MTA accountable," says Kallos. For example, the analysis found that, in October 2014, of a total of 33.5 million bus departures throughout all of New York City, only 58% ran on time, 14% were early, and 28% were late.The Github site breaks down the data much further.

After getting these results, Kallos isn't satisfied with just the small trickle of data from the MTA. He wants the agency to regularly release its data in a raw format that civic technologists can hack on.

"Government has to make as much information available through open data as possible," he says. "The importance of having open data is that you're not limited to just the story that the agency might want to tell or that government wants to tell. You're able to kick the tires, look under the hood, and ask different questions than they might otherwise want to present."