Or how a citizen data scientist can use open data

People living here know that mobility is one of the major challenges. For example finding a parking space in the city center is not an easy task. The city of Ghent knows this and has already taken some steps towards improving this. There is the so called ‘guidance system’ that tries to guide drivers towards the nearest parking with billboards advertising the nearest public parkings and their free spaces.

The data that this systems uses is also available as Open Data. There is one catch: only the realtime data is open, not the historical data. In other words, I can ask how much free space there is now, but I cannot see if the parking is filling up or not.

So that’s when the stalking began.

Since the beginning of last year I have been stalking the open data platform, asking every 10 minutes again ‘How much free space is there now? And now? And now?’. Of course I automated my online stalking by firing up a virtual machine on a cloud provider to do the stalking for me.

So now (after a year of patient waiting) I had my data to analyse.

First I decided to visualise how a typical week in the public parkings looks like. The ‘heatmap’ of this typical week already gave me some insights.

I shared this heatmap on twitter and with my colleagues, but the one person that challenged my results was my girlfriend. The Saturday after I showed her my heatmap, she asked me the following:

“Hey parking specialist, a friend is picking me up for drinks in the center at 21hrs. Will we still have a parking space at the ‘Vrijdagsmarkt’?”

Hmm, predictive analytics, I liked it. I quickly plotted the data for that particular public parking for a typical Friday evening, the previous Friday and also how the parking space was evolving today up to 19hrs.

“Ok so now there are still 114 free spaces, which is a lot more than last week at this hour. In two hours there should still be 50 or so spaces according to my ARIMA model.”

“Wow wow, hold on, don’t get too technical. All I need to know is whether we should risk it.”

They risked it, and sure enough, there was ample parking available.

If you’re interested in starting your own project, these were the building blocks:

  • A virtual machine on Microsoft Azure
  • A cron job to stalk the open data
  • Python scripts to aggregate the data
  • Basic knowledge of R for the data exploration
  • The ‘language of graphics’ in ggplot2 for the visualisations

I’m considering writing up a (technical) walktrough of my process and putting the data online (in the spirit of open data). Let me know if you’re interested!

Thomas Michem

You may also like