Data Quality
Data Quality
Data quality is the degree to which a data set and the process that
produces it is fit for its purpose. Accuracy is one significant component of
data quality and reflects the degree to which the data conforms to the
correct value of a standard.
It is essential to keep in mind that data quality is highly application and use
case specific. A data set collected as part of one process may or may not
reflect the appropriate data quality and accuracy for another use case.
Organizations can start by understanding the benefits of data quality and
considering their unique needs.
Understanding what the data is and how it might evolve are key to monitoring and
benefiting from data quality. It also depends on how data is being used. Is
directionally accurate data good enough or is more specific data required? Are
there fault tolerances to consider? For example, in an IoT situation, consider
whether actual valves might blow or portions of the grid might drop.
"Businesses should apply a pragmatic approach to data quality that will align with
their core business goals," Vernocchi said.
Organizations must evaluate their industry and data needs to adequately determine
the requirements for the six dimensions of data quality and the best application for
each unique use case.
Each organization must also determine if its data is fit for the purposes and the
context for which it is used. The best team to discern a data element's quality is one
with the greatest familiarity with the context in which the data was collected, how
it was collected and what it was trying to accomplish.
"Data users will always be the best judges of data accuracy," said JP Romero,
technical manager of the enterprise information management practice at Kalypso, a
digital technology and consulting company.
"Prioritize where data has to be pristine and where some noise is acceptable, and be
upfront about the levels of risk you're willing to accept," said Alicia Frame,
director of graph data science at Neo4j, a graph database company.
It is essential to know and document when pristine data is needed, and be clear
when sorting through an ocean of data to find the desired information. For
example, if critical investment decisions are based on 10 data points, those data
points must be correct. If an organization is trying to draw conclusions from 10
billion data points, it is OK for some of them to be noise.
Be honest about the uncertainty and gaps in the data, Frame said. It is better to
report a margin of error rather than overcommit to a single number.
One helpful strategy is to align data quality standards and approaches with the
business value and goals for any given business process, said Satya Sachdeva, vice
president of insights and data at Sogeti, an IT consulting company and part of
Capgemini. Organizations need to set distinct data quality goals for each data
category and focus efforts accordingly to achieve complete data quality.