Back to the Future

Posted on: March 3, 2021 Posted by: Liz Davidson Comments: 0

By Rosie Beeston

We went back to the future, but not like in the movie. There wasn’t a Delorean nor a lightning strike on a clock tower. And this graph may not have the same dramatic climax as when Michael J. Fox reached 88 mph, but we’re quite excited about it nevertheless.

Predictions at respectively 28, 21, 14, 7 and 1 days in advance before actuals.

Machine learning in action

We are creating machine learning models that predict how many people will be on a train. But, because small data sets are statistically unreliable and because passenger numbers are so low at the moment, we went back to 2019.

We trained our prediction model by showing it the number of smartcard taps per day from the first two thirds of the year.

As far as the model was concerned the remaining third of the year was ‘in the future’. But because 2019 is in the past and we already know what happened, we could compare the model’s predicted values with the actual values.

All this time travel is disorientating so let me show you what we mean.

What going back to the future means in practice

Suppose we travel back to 30^st of September 2019 (with our trained-up model in tow!) and want to generate predictions for the 1^st of October, we feed our model data from September.

Using knowledge of how the number of taps changes from one week to the next during training, our model generates a prediction.

With this starting point it then predicts further ahead, without knowledge of what is coming.

Now suppose that our 30^th September selves want to predict all the way up to 28^th October 2019.

The model extrapolates from the available data up to 30^th September 2019, 28 days into the future.

7 days later, 21 days ahead of our target date, we can update this prediction using the ‘extra’ data that we’ve collected during the first week of October.

Here we extrapolated from data up to 7^th October 2019, that is 21 days into the future; and so on.

The results

There are some anomalies. For example, there is a spike in actual taps around day 150. What happened there?

But this model uses a single data source (just smartcard tap data) for learning.

Incorporating context from other data sources (e.g., weather, events) comes next and will add further levels of intelligence.

Overall, it’s quietly pleasing that our predictions have a good level of accuracy as far as 4 weeks in advance of trains departing.

Building passenger confidence with accurate crowding estimates

In customer’s hands, this information will give them confidence of about the level of crowding on services when planning ahead.

And this insight will enable operators to make informed operational decisions about variable pricing and resource planning.

To find out more about CrowdAI, sign up to our upcoming event, Putting Passengers First, in collaboration with Liverpool John Moore’s University and Cogitare.

Back to the Future