0% found this document useful (0 votes)
19 views4 pages

Data Quality and Database Design 1

Uploaded by

mouashilemba15
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views4 pages

Data Quality and Database Design 1

Uploaded by

mouashilemba15
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Data Quality and Database Design1

MARIO PIATTINI, ISMAEL CABALLERO,


MARCELA GENERO, CORAL CALERO.
Grupo Alarcos. Escuela Superior de Informática.
Universidad de Castilla-La Mancha.
Ronda de Calatrava 7 13071 Ciudad Real.
ESPAÑA.

Abstract: - Both products and services must satisfy customers’ requirements. Information Systems and their
Databases are the main support for organizations to collect, store, and retrieval these requirement data. If any of
these operations are badly executed or not made on the right data, they will not produce useful results, and our aim
will not get satisfied. That is the reason for which we are interested in data quality. This paper deals about what
data quality is, which are the most important dimension of data quality and how we can design quality databases.
Key-Words: - data quality, quality dimensions.

1 Introduction Information System (IS) to address the problem of


Nowadays, most companies are facing a severe data quality, so that data becomes actual information
problem of data pollution, i.e. they have too much and knowledge. Companies must manage
data at their disposal, mainly due to three reasons: information as an important product, capitalising on
knowledge as a main asset in order to survive and
• Data can be captured in a very easy and prosper in the age of digital economy ([3]). By
inexpensive way, due to recent improving information quality, we will improve both
improvements and diffusion of data entry client and personnel satisfaction, which will further
technology: barcodes, OCR (Optic contribute to the improvement of the whole
Character Recognisers), customer cards, company.
credit cards...). Besides, lots of data can be Unfortunately, research on quality has
directly obtained from the Internet. focused until very recently on software quality,
• Uncontrolled data redundancy: Due to their neglecting data quality ([9]). Even in the case of
daily functioning, information systems grow traditional database design, quality has not been
in a disorderly and unplanned way, and in explicitly incorporated ([14]). Although databases
many cases companies do not have an have not traditionally focused on questions of
information architecture. quality, many of the tools and techniques developed
(integrity constraints, normalisation theory,
• Existence of large quantities of historical
transaction management, etc.) have influenced
‘expired’ data, which no longer serve to
quality. We think it is time to consider information
carry out any kind of process nor to obtain
quality as a main objective, rather than as a by-
any relevant information.
product of the process of database creation and
development.
As [6] points out, this can be paralleled to a
Broadly speaking, two different aspects
biological process, so that data that are not used tend
should be kept in mind regarding quality
to become atrophied. This pollution can have serious
information: quality of the data base as a whole and
consequences; thus, for example, [1] firms that up to
quality of data presentation. In fact, it is very
a half of the total cost of a data warehouse
important that the data reflect the real world in a
implementation may be caused by a poor data
correct way, that is, that they are precise; moreover,
quality. The Gartner Group has warned that poor
they must be easy to understand. Regarding data
data quality has been one of the most frequent failure
base quality as a whole, it depends on three
causes in reengineering projects. Therefore, it is
‘qualities’: DBMS (Data Base Management System)
necessary for the correct operation of the company

1 This research has been done within the framework of the CALIDAT Project, developed by Cronos Ibérica, S.A. in
collaboration with the Universidad de Castilla-La Mancha, supported by the Consejería de Educación y Cultura de la
Comunidad de Madrid (Ref: 09/0013/1999).
quality, data model (both conceptual and logical) accuracy, the degree to which
quality and data quality. In this paper we will focus the data reflect the objects
on the most prominent features of the data; regarding from the real world they
data model quality, the interested reader can consult represent; this includes:
[7]. conformity with the definition,
completion of values, validity
or conformity with the
company rules, accuracy of
2 Dimensions of information quality sources, accuracy of reality,
lack of duplication,
As we know, quality is a relative concept, as
accessibility.
far as it is in the eyes of the beholder; for this reason,
- Pragmatic quality, i.e. the
we can consider quality as a multidimensional
degree to which the data allows
concept, subject to restrictions and limitations ([4]).
the knowledge workers to
In recent years, various authors have proposed
satisfy the company objectives
different dimensions for data quality:
in an efficient and accurate
• [8] groups quality dimensions into three way.
self-explanatory categories:
• [13] analyse some of the causes of poor data
- Quality Dimensions of a Conceptual View
quality due to design deficiencies from an
- Content: Relevance of the data,
ontological perspective, identifying four
obtainability of values, clarity
quality dimensions:
of definition.
- Data Quality
- Scope: Comprehensiveness
- Nature of deficiency
and essentialness
- Completion
- Level of Detail: Granularity of
- Improper representation
attributes and precision of
domains.
- Composition: Naturalness,
As these authors indicate, the aim is that each state in
identifiability, Homogeneity,
the real world will unequivocally correspond to one
Minimal unnecessary
system state. If the unequivocal relationship is not
Redundancy.
verified, or the expected results are not obtained
- View Consistency: Structural
when operating with the data, a data deficiency is
and semantic consistencies.
produced.
- Reaction to Change: Flexibility
and robustness.
• [11] identify several dimensions group by
- Quality Dimensions of Data Values
four categories:
- Accuracy
- Intrinsic: precision, objectivity,
- Completeness
credibility, reputation
- Currency
- Accessibility: accessibility,
- Value Consistency
access security
- Quality Dimensions of Data
- Contextual: relevance, added
Representation
value, opportunity, completion,
- Appropriateness
data quantity
- Interpretability
- Representational:
- Portability
interpretability, comprehension
- Format Precision
facility, concise representation,
- Format Flexibility
consistent representation.
- Ability to represent null values
- Efficient usage of Recording
Media 3 Data base design and data quality
- Representation Consistency [5] suggest three different strategies in order to
• [2] emphasises two topics related to data improve the intrinsic quality of data bases:
quality: • Building richer semantic models that reflect
- Inherent quality, that is, data reality more accurately.
• Reinforcing databases by introducing a data base ([15]).
higher number of constraints, in order to
identify and discriminate problematic data 4 Conclusions and future research
and link them to the appropriate We can affirm that, if product and service quality has
applications. become a decisive factor of business success in
• Restricting the use of data to predefined recent years, information quality will receive a
processes, preventing them from being preferential role in the next decade.
modified by other processes so that they If we actually consider that information is the most
cannot be accidentally deleted. important business asset, one of the first aims of IT
professionals should consist in ensuring its quality.
Although these strategies allow a higher degree of We have presented some recent proposals regarding
data quality, they are not enough by themselves, information quality, but further research is needed on
since an adequate base is needed in order to manage the degree of quality attached to other processes
quality dimensions ([16]). Unfortunately, there are linked to information: modelling, data gathering and
very few proposals that consider data quality as a loading, and data presentation.
fundamental factor in the design process. In this On the one hand, companies will have to define a
sense, [14, 15] are an exception to this rule. They quality policies (see, for example, [8]) that defines
propose a method that is intended to complement the the obligations of each function in order to ensure
traditional design methodologies of database design, data quality in all its dimensions; on the other hand,
see figure 1 at the end of the paper. they will have to implement a process in order to
In the first step, see figure 1, apart from creating a evaluate the quality of the information at their
conceptual scheme, e.g. using an entity/ relationship disposal. There are several proposals regarding
model, quality requisites and candidate attributes information quality evaluation; English’s TQdM
should be identified, determining thereafter the (Total Quality data Management) can be highlighted.
‘quality parameter view’, so that each element within A decisive aspect regarding evaluation has to do
the conceptual schema can be linked to a quality with the definition of relevant metrics, that will allow
parameter. E.g., in an ‘academic’ data base, the an actual analysis and improvement of quality. In [3],
attribute ‘exam mark” can be linked to precision and three types of metrics are proposed: subjective
timeliness. Later on, subjective parameters are metrics (based on the data users’ judgement),
objectivated through the addition of labels to the objective metrics that are independent of the
attributes in the conceptual scheme (source, in order application (such as correction) and objective metrics
to know the degree of accuracy, and date, in order to belonging to the application (i.e., that are specific to
know the timeliness, of exam marks). a given domain). Besides, the actual value of
information (either produced by operational systems
Moreover, we can also propose an extension of or used to assist decision taking) should be
relational databases with indicators that allow the measured.
assignment of these objective and subjective
parameters to the quality of the values within the

References

[1] Celko J., Don`t Warehouse Dirty Data. Datamation, 15 October, 1995, pp. 42-52.
[2] English, L. Improving Data Warehouse and Business Information Quality. John Wiley & Sons, Inc.,
1999.
[3] Huang, K-T., Lee, Y.W. and Wang, R.Y. Quality Information and Knowledge. Prentice Hall, Upper
Saddle River, 1999
[4] Jones, C. Software Quality. Analysis and Guidelines for Success. London: International
Thomson Computer Press, 1997.
[5] Orman, L., Storey, V. Y Wang, R. Systems Approaches to Improving Data Quality. TDQM-94-05,
August 1994. Available on http://web.mit.edu/tdqm/www/papers/94/94-05.html
[6] Orr, KData Quality and System Theory. Communications of the ACM, 41 (2), 1998, pp. 66-71.
[7] Piattini, M., Genero, M., Calero, C., Ruiz, F. and Polo, M. Database quality. In: Advanced Databases:
Technology and Design. Piattini, M. and Diaz, O. (eds.). London, Artech House, 2000
[8] Redman, T. C. Data Quality for the Information Age. Artech House, Boston, 1996.
[9] Sneed, H.M. and Foshag, O. Measuring Legacy Database Structures. Proc. of The European Software
Measurement Conference FESMA’98, Coombes, Hooft and Peeters (eds.), 1998, pp. 199-210.
[10] Storey, V. C. and Wang, R. Modeling Quality Requirements in Conceptual Database Design. TDQM-
94-02, May 1994.
[11] Strong, D.M., Lee, Y.W. and Wang, R.Y. Data Quality in Context. Communications of the ACM ,
Vol. 40, No. 5, 1997, pp. 103-110.
[12] Strong, D.M., Lee, Y.W. and Wang, R.Y. 10 Potholes in the Road to Information Quality. IEEE
Computer, 1997, pp. 38-46.
[13] Wand, Y. and Wang, R.Y. (1996). Anchoring Data Quality Dimensions in Ontological Foundations.
Communications of the ACM, Vol. 39 (11), 1996, pp.86-95.
[14] Wang, R. Y., Kon, H. B. and Madnick, S. E. (1993). Data Quality Requirements Analysis and
Modeling. Proc. of the 9th International Conference on Data Engineering, IEEE Computer Society,
1993, pp. 670-677.
[15] Wang, R.Y., Reddy, M.P. and Kon, H.B. Toward quality data: An attribute-based approach. Decision
Support Systems, Vol. 13, 1995, pp. 349-372.

Application requirements
Step 1
DETERMINE THE APPLICATION DATA
VIEW OF DATA

Application candidate
Quality requirements Quality attributes
APPLICATION VIEW

Step 2
DETERMINE (SUBJETIVE) QUALITY
PARAMETERS FOR THE APPLICATION

PARAMETER
VIEW
Step 3

DETERMINE (OBJETIVE) QUALITY


INDICATORS FOR THE APPLICATION

QUALITY
VIEWS
Step 4

QUALITY VIEW INTEGRATION

QUALITY
SCHEMA

Fig. 1. Quality in database design ([14])

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy