6a - Data Quality and Data Cleaning
6a - Data Quality and Data Cleaning
Based on:
• Recent book
Exploratory Data Mining and Data Quality
Dasu and Johnson
(Wiley, 2004)
1 2
1 2
Tutorial Focus
• Overview
– Data quality process
• Where do problems come from
• How can they be resolved
– Disciplines
• Management
• Statistics
• Database
• Metadata
3 4
3 4
Overview
• The meaning of data quality (1)
• The data quality continuum
• The meaning of data quality (2)
• Data quality metrics
• Technical tools
The Meaning of Data Quality (1)
– Management
– Statistical
– Database
– Metadata
5 6
5 6
1
2021/10/18
7 8
9 10
11 12
2
2021/10/18
13 14
13 14
15 16
15 16
17 18
17 18
3
2021/10/18
19 20
19 20
21 22
21 22
23 24
23 24
4
2021/10/18
Solutions
• Data exploration
– Determine which models and techniques are
appropriate, find data bugs, develop domain expertise.
• Continuous analysis
– Are the results stable? How do they change?
• Accountability
– Make the analysis part of the feedback loop.
25
25