
Back to the Future
By Rosie Beeston
We went back to the future, but not like in the movie. There wasn’t a Delorean nor a lightning strike on a clock tower. And this graph may not have the same dramatic climax as when Michael J. Fox reached 88 mph, but we’re quite excited about it nevertheless.
Machine learning in action
We are creating machine learning models that predict how many people will be on a train. But, because small data sets are statistically unreliable and because passenger numbers are so low at the moment, we went back to 2019.
We trained our prediction model by showing it the number of smartcard taps per day from the first two thirds of the year.
As far as the model was concerned the remaining third of the year was ‘in the future’. But because 2019 is in the past and we already know what happened, we could compare the model’s predicted values with the actual values.
All this time travel is disorientating so let me show you what we mean.
What going back to the future means in practice
Suppose we travel back to 30st of September 2019 (with our trained-up model in tow!) and want to generate predictions for the 1st of October, we feed our model data from September.
Using knowledge of how the number of taps changes from one week to the next during training, our model generates a prediction.
With this starting point it then predicts further ahead, without knowledge of what is coming.
Now suppose that our 30th September selves want to predict all the way up to 28th October 2019.
The model extrapolates from the available data up to 30th September 2019, 28 days into the future.
7 days later, 21 days ahead of our target date, we can update this prediction using the ‘extra’ data that we’ve collected during the first week of October.
Here we extrapolated from data up to 7th October 2019, that is 21 days into the future; and so on.
The results
There are some anomalies. For example, there is a spike in actual taps around day 150. What happened there?
But this model uses a single data source (just smartcard tap data) for learning.
Incorporating context from other data sources (e.g., weather, events) comes next and will add further levels of intelligence.
Overall, it’s quietly pleasing that our predictions have a good level of accuracy as far as 4 weeks in advance of trains departing.
Building passenger confidence with accurate crowding estimates
In customer’s hands, this information will give them confidence of about the level of crowding on services when planning ahead.
And this insight will enable operators to make informed operational decisions about variable pricing and resource planning.
To find out more about CrowdAI, sign up to our upcoming event, Putting Passengers First, in collaboration with Liverpool John Moore’s University and Cogitare.