OCM Data Quality

I have said a few things recently about the quality of the data on Open Charge Map and I wanted to explain just why I said those things and why I no longer have the confidence to use OCM data.

The concept of OCM is admirable: to have a single database of all charge point locations worldwide. The problems though are huge. How do we achieve that objective?

The current OCM model is one of a combination of data provided by the community and also that provided by other data sources and databases such as the National Chargepoint Registry, eCars etc.

That is all very well… data provided by the community is entered by a EV driver, validated by an OCM editor as correct (as best they can ascertain) and the location is published. From then on, any changes to the details of that location are the responsibility of the community… perhaps the person that originally entered it or someone else. Either way, the OCM location will only get updated when someone in the community updates it.

However, there are no guidelines as to what people should put in each field so editors are applying their own standards which often conflict. There is no consistency.

But what about the data that is imported?

Firstly, a location imported from somewhere else may already have a community-sourced entry and so a duplicate it created. There are a lot of duplicates on OCM as a result of this.

Then there is the issue of data accuracy. For example… many (in fact almost all!) of the locations sourced from the National Chargepoint Registry have a ridiculous Title, often nothing more than the road name, town name or even the first sentence of the comments! None of the locations imported from eCars have a postcode, some of their positions are wrong, some of their their addresses are wrong or the equipment is incomplete or just incorrect. If we want to have confidence in the data on OCM then these issues must be resolved.

I had the idea of going through all of the UK locations on OCM and reviewing them for correctness, completeness and data quality. Yes, all 2400+ of them! I started but after updating about 200 it was pretty clear that every location had some issue… sometimes fairly trivial, sometimes significant… but whichever way you looked at it it was clear that the data quality was unreliable. I stopped updating.

Before continuing I wanted some assurances from Chris Cook (the OCM project manager) that these kinds of issues would be addressed. After all… if I were to spend the best part of a couple of months, full-time, updating OCM I wanted some assurance that my updates would remain and that this tidy-up would be a one-off never to be repeated task. After several discussions, both private and in public, it became clear that my concerns were not being heeded.

So, with the data in a pretty poor state by my standards, and no way ahead I could see that would assure me that data quality could be improved and maintained, I have reluctantly come to the conclusion that I do not feel that OCM is a data source of adequate quality for me to recommend to others and for me to use in my EV Route Planner.

I have done my best to help be part of the solution and to bring my experience as a database administrator to the project but as my ideas are generally fought against by the OCM team I have no choice but to withdraw from the project completely.

I have included a report on the UK OCM locations. It just lists Title and postcode. Notice the rubbish titles and the often missing or incomplete or incorrectly formatted postcodes. I haven’t printed the equipment but often there are no equipment entries at all or they are incomplete or incorrect.

This is a report of all UK OCM locations by Data Provider as of today –

OCM Location Report by Provider

Clearly we can never get 100% accuracy but for me there must be a sufficient degree of accuracy for us to have confidence in the data as a whole and for me OCM does not meet that level  of confidence. My estimate is that most entries have some degree of inaccuracy and that about 20% have sufficient issues to make use of the data sufficiently unreliable to cause drivers problems if relied upon.

OCM is not perfect and never will be. I accept that my ideas are not the only, or even necessarily the best, way to do things. I accept that there is little else that comes close at the moment to any kind of inclusive database of charge locations. Finally, I accept that Chris is clearly a talented software developer and has done a pretty good job with developing OCM to date. However, it could be a lot better and I have the skills and enthusiasm to make it much better but I am just ignored and pushed aside.

So, until the data quality significantly improves, and that will probably require a change of approach and possibly a change of project leadership, I am not using OCM data and I am no longer offering my services as editor.

Leave a Reply

Your e-mail address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.