CONNECTING ALL THE DOTS
The latest price per square metre (PPSM) release again highlights the capability of Data Fabric by connecting data from four Open Government datasets:
- HM Land Registry Price Paid (price and transaction date)
- MHCLG Non-Domestic EPC (floor area)
- MHCLG Domestic EPC (floor area)
- VOA Rating List (floor area)
This results in circa 16.6 million usable PPSM records for circa 25.5 million ‘Data Fabricd’ Land Registry sales, to end of January 2021, for residential and commercial real estate in England and Wales.
These connections form a small part of the wider Data Fabric system which is discussed below.
WHAT IS Data Fabric?
A system that enables full data usage by connecting identifiers. This might be an identifier that relates to a position, such as an Ordnance Survey supplied UPRN, or an identifier that relates to a description of that position, such as a Land Registry supplied ‘GUID’ or MHCLG supplied ‘LmkKey’ for EPC records. The connections are achieved through a proprietary grid of positions, called HPIDs, against which the data is placed.
WHAT PROBLEM IS Data Fabric SOLVING?
Missing insight. Data Fabric gives the most upstream view of the data possible, or maximum data utilisation. That is, almost all the data is surfaced as a connected whole because the grid of HPIDs is not limited by existing perspectives. This inverts the paradigm – rather than limiting the data first by matching each feed to a ‘universal’ identifier and then connecting via this key, the datasets are connected with virtually no prior conditions applied. This does not mean that the data is clean, however, it is pure and complete. This unlimited view permits any downstream data journey to be taken depending on the use case. Typical steps in this journey might include connecting disparate feeds, data cleansing, application of business logic, data analysis, modelling and interpreted insight. For example, the HARNESS PPSM data is an output of Data Fabric created by connecting four datasets and applying business logic to select one floor area per price. Last month’s release was used to produce PPSM relatives, hence market insight, after applying simple cleansing and modelling techniques. This was done to give insight on ‘missing insight’, which highlights the limiting view of existing approaches to connecting data.
How many connections does Data Fabric have?
There are currently over half a billion connections for position identifiers (e.g., UPRN, TOID, Land Registry Title, and proprietary Canonical Building and Estate identifiers) for circa 62 million description identifiers, or record level IDs. This means that the data can be viewed, grouped and analysed from different perspectives. For example, understanding freehold / leasehold hierarchy where multiple units form one bounded asset. Most of the connected data is sourced under the Open Government Licence v3.0 though any address centric feed can be added as the system is completely modular.
You can receive regular updates regarding Data Fabric data releases by registering your interest here
WHY SHOULD I USE Data Fabric INSTEAD OF EXISTING METHODS?
Because of the double effect on data utilisation rates. That is:
- there are more positions (enabled by HPIDs) to which data can be pinned, and
- the understanding of connections surfaces more existing identifiers to match against.
This is demonstrated when benchmarking with the HMLR Registered Leases dataset, which has UPRN included. We see:
- 76.1% with UPRN (as supplied)
- 92.0% with UPRN (Data Fabric – more identifiers surfaced)
- 94.8% with HPID (Data Fabric – more positions created)
WHAT WOULD MY DATA LOOK LIKE USING THE HPIDS OF Data Fabric?
Contact us to find out more about how Data Fabric can empower your data.
Contains OS data © Crown copyright and database right 2021
Contains HM Land Registry data © Crown copyright and database right 2021. This data is licensed under the Open Government Licence v3.0.
Contains public sector information licensed under the Open Government Licence v3.0.
GOV.UK terms and conditions apply