Topic 12 Manipulating Data: Prepared By: Mohammad Nabeel Arshad
Topic 12 Manipulating Data: Prepared By: Mohammad Nabeel Arshad
Manipulating data
Prepared by: Mohammad Nabeel Arshad
Our world runs on data but that data must be relevant, accurate, and valid. It
must be stored and manipulated in ways that allow it to be of value to
individuals and organizations. Collecting, storing and manipulating data is the
focus of many IT systems.
When the integrity of data is secure, the information stored in a database will
remain complete, accurate, and reliable no matter how long it’s stored or how often
it’s accessed. Data integrity also ensures that your data is safe from any outside
forces.
Why is it Important to Maintain Data
Integrity?
Imagine making an extremely important business decision hinging on data
that is entirely, or even partially, inaccurate. Organizations routinely make
data-driven business decisions, and data without integrity, those decisions
can have a dramatic effect on the company’s bottom line goals.
Data Dictionary acts as the core or the hub of the database management system, is the
third component of the database management system. The Data Dictionary provides with
the following information –
1. The name of the data item.
2. The description of the data item.
3. The sources of the data.
4. The impact analysis.
5. Keywords that are used for the categorization and the search of the data item
descriptions.
Functions of the Data Dictionary
1. Defines the data element.
2. Helps in the scheduling.
3. Helps in the control.
4. Permits the various users who know which data is available and how can it
be obtained.
5. Helps in the identification of the organizational data irregularity.
6. Acts as a very essential data management tool.
7. Provides with a good standardization mechanism.
8. Acts as the corporate glossary of the ever growing information resource.
9. Provides the report facility, the control facility along with the excerpt
facility.
12.1.3 Construct a Data Dictionary
For each variable, the dictionary should contain most of the following
information (sometimes referred to as metadata, which means “data about
data”):
A short variable name (usually no more than eight or ten characters) that’s used when
telling the software what variables you want it to use in an analysis
If numeric: Information about how that number is displayed (how many digits are
before and after the decimal point)
If date/time: How it’s formatted (for example, 12/25/13 10:50pm or 25Dec2013 22:50)
How missing values are represented in the database (99, 999, “NA,” and so on)
Many statistical packages allow (or require) you to specify this information when you’re
creating the file anyway, so they can generate the data dictionary for you
automatically.
Communication is also improved through the understanding created by the
definitions in the data dictionary. Here are a few guidelines to help in creating a
data dictionary:
Gather Information
Make it Flexible
12.1.4 Concept and need for data
validation
Data validation primarily helps in ensuring that the data sent to connected
applications is complete, accurate, secure and consistent.
This is achieved through data validation's checks and rules that routinely
check for the validity of data. These rules are generally defined in a data
dictionary or are implemented through data validation software.
a. presence
b. range
c. lookup
d. list
e. length
f. format
g. check digit.
a. Presence check
There might be an important piece of data that you want to make sure is
always stored.
For example, a school will always want to know an emergency contact
number, a video rental store might always want to know a customer's
address, a wedding dress shop might always want a record of the brides
wedding date.
A presence check makes sure that a critical field cannot be left blank, it must
be filled in. If someone tries to leave the field blank then an error message
will appear and you won't be able to progress to another record or save any
other data which you have entered.
b. Range check
A range check is commonly used when you are working with data which
consists of numbers, currency or dates/times.
- speed up data entry because it is usually much faster to pick from a list than
to type each individual entry
- improved accuracy because they reduce the risk of spelling mistakes
- limit the options to choose from by only presenting the required options
However, using a lookup validation technique does not prevent someone from
entering the wrong data into the field and so mistakes can still be made.
d. list
Users can easily select the required option from the list and no need to enter.
Saves time and less chances of errors. Also there will be no spelling mistakes,
as the data is selected not written.
e. length
Sometimes you may have data which always has the same number of characters.
For example a UK landline telephone number has 11 characters.
fixed length input field
A length check could be set up to ensure that exactly 11 numbers are entered into the
field. This type of validation cannot check that the 11 numbers are correct but it can
ensure that 10 or 12 numbers aren't entered.
A length check can also be set up to allow characters to be entered within a certain range.
For example, postcodes can be in the form of:
The check digit is the final number in the sequence, so in this example it is the
final ‘2’.
The computer will perform a complex calculation on all of the numbers and then
compare the answer to the check digit. If both match, it means the data was
entered correctly.