* Without further investigation, we don’t know if this match points to the same people, as the match is just month and year. But it’s a powerful example of what we can do with this newly released data.
Global Witness is an independent, not-for-profit organisation that campaigns to end environmental and human rights abuse that are driven by corruption and the exploitation of natural resources.
OpenCorporates is the largest open database of companies in the world. Their aim is quite simple (though mammoth in scale) - to create a url for every single company in the world.
Together, both organisations have fought hard for the UK government to release information about UK-based companies. Finally, in June 2016, Companies House, the UK registrar of companies, started publishing the world’s first open data register of “beneficial owners” or “people with significant control” of companies registered in the UK. These are the real people who own and control companies. This project was our first coordinated attempt to explore this data and find out what it reveals about potential cases of tax evasion and corruption.
Led by Data Ambassadors Gail Dawes, Nick Jewell and Sarah Constant, over 30 DataKind UK volunteer data scientists had the chance to work with Global Witness over a weekend DataDive to explore the data and uncover findings. They were also joined by an incredible group of partners - OpenCorporates, Organised Crime and Corruption Reporting Project and Spend Network - who all brought interesting data along to the event and participated all weekend, helping to give the volunteers’ context.
Using data from the original Companies House source, the UK registrar of companies, volunteers split into three teams to gain insight into UK companies and see if there is cause for further investigation as well as uncover any flaws in the data itself so Companies House could improve its next data release. The teams dove into the data, first prepping it for analysis with data wrangling and munging to then use different analytical techniques like fuzzy matching and network analysis to understand what the data might show.
There were a number of interesting findings including:
*Without further investigation we don’t know if this match points to the same people, as the match is just birth month and year. Still, it’s a powerful example of what we can do with this newly released data.
The teams also found errors in some of the text fields, due to how the data was manipulated or the simple fact that humans often don’t fit into neat boxes. For example, there were over 500 terms used to describe a person being “British,” including, for example, ten people who identified themselves as Cornish. (Cornwall is a county in the southwesterly tip of the UK.)
Some of the volunteers’ findings will be fed directly back to Companies House by Global Witness for them to investigate further. These include all the companies that appear to be owned by other companies with registered addresses in tax havens - a potential breach of the rules.
We also found ways that Companies House could improve its data collection, including replacing some open text fields with drop down menus.
Although we didn’t expect to find any slam dunk cases in just a weekend of work, Global Witness did find some interesting leads that will require more traditional investigative research to pursue further.
The volunteers’ work did demonstrate how this new company beneficial ownership data mightbe a powerful tool for investigative journalists though. For example, Reuters recently discovered a small town in northern England with unusually high numbers of directors for companies that are linked to pornography and gambling.
While there are 3.5 million companies in the UK, only 1.3 million have filled in their beneficial ownership information (as of 17th November 17 2016). We look forward to having another chance to go through the dataset once all companies have completed this information by July 2017.
As of January 2017 Companies House agreed to address a number of the problems we identified with the dataset and are pursuing companies and individuals we identified as non-compliant.
In addition, the findings have been widely publicised amongst governments and the private sector and held up as an example of data science being used for public good. At the Open Government Partnership Summit 2016, Sir Eric Pickles, the UK Governments anti-corruption champion, cited this project as an example of how useful this data is and why it's important for other governments to open up similar datasets.
A number of governments from countries around the world have reached out to Global Witness following the DataDive to solicit our advice on how to build effective beneficial ownership registries.
You can read more about this event on the Global Witness blog.