Showing correlation in Open Data

During de course INFLAB Maarten van den Hoek and I were challenged to explore potential applications of open data made available by the city of Rotterdam. After some experimentation we decided to utilise Python, MongoDB and the lessons we had learned so far in the Data Mining course to develop web-application that calculates the correlation between different types of objects and their attributes in a given area. For instance: what is the correlation between small trees and red playground equipment? Given the right data, this could be expanded with locations of events. For example: what is the correlation between the presence (or absence) of traffic signs and accidents. In some cases, the tool can even be used to check the quality of the different datasets. If, for instance, the type of soil in which a tree is planted is labeled differently from the traffic sign that stands right next to it, at least one of the data points is probably wrong.

Then we took it a step further: we divided the area in smaller sections and calculated correlations for each subsection. Then the deviations from the global correlation were projected on a map. This way both a pattern and the exceptions to that pattern can be examined. You could discover that one particular subsection of an area has a significantly more desirable correlation between two variables, for instance playgrounds and traffic accidents, than it’s surrounding area. This discovery could then lead to an investigation of the subsection, to see what can be learned from it to improve other areas.

The web-application is available for download at

Comments are closed.