In Switzerland, people will be surprised at a bus that's 2min late. In Sydney, people will only consider it noteworthy if a bus is more than 30min late, and this varies greatly between routes and providers. So, how do Sydney busses (and third-party bus providers) stack up against each other and the world? To answer these questions we need data… lots of data.
Hooray for open government data! Transport NSW publishes real-time information on the location and lateness of all public transport. Unfortunately it's ephemeral – there is no public log of historical lateness for us to analyse. To gather the data I needed I had to fetch, log and aggregate ephemeral real-time data that was never intended to be used this way. There are random gaps and spontaneous route or timetable changes for special events, roadworks or holidays. Even with noisy data, the patterns start to emerge across months.
Public transport networks of the world export timetable and real-time data in a (reasonably) consistent format so this process can be applied across cities and countries. Let's see how Sydney stacks up against other cities or how Australia stacks up against the world! Perhaps 40min late busses are not an inevitable fact of life.
Katie is a software engineer who gets bored easily and likes learning new skills. She is currently learning about the foreign lands of Windows and .NET at Campaign Monitor and helping them explore the other side of the software world. In her spare time she stays sane by working on weird side projects with Python and Linux. Previously, she worked at Grok Learning as a full-stack dev and at Google in Sydney and Switzerland as a Site Reliability Engineer.