100% found this document useful (1 vote)
171 views24 pages

Topic 12 Manipulating Data: Prepared By: Mohammad Nabeel Arshad

The document discusses data manipulation and data integrity. It defines data manipulation as changing data to make it easier to organize and access. Data integrity refers to data accuracy, completeness, and consistency throughout its lifecycle. The document then discusses the importance of maintaining data integrity for making accurate business decisions. It provides methods for maintaining data integrity, including input validation, data validation, removing duplicate data, backups, access controls, and audit trails. Finally, it discusses data dictionaries and validation rules.

Uploaded by

BLADE LEMON
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
171 views24 pages

Topic 12 Manipulating Data: Prepared By: Mohammad Nabeel Arshad

The document discusses data manipulation and data integrity. It defines data manipulation as changing data to make it easier to organize and access. Data integrity refers to data accuracy, completeness, and consistency throughout its lifecycle. The document then discusses the importance of maintaining data integrity for making accurate business decisions. It provides methods for maintaining data integrity, including input validation, data validation, removing duplicate data, backups, access controls, and audit trails. Finally, it discusses data dictionaries and validation rules.

Uploaded by

BLADE LEMON
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 24

Topic 12

Manipulating data
Prepared by: Mohammad Nabeel Arshad
 Our world runs on data but that data must be relevant, accurate, and valid. It
must be stored and manipulated in ways that allow it to be of value to
individuals and organizations. Collecting, storing and manipulating data is the
focus of many IT systems.

 Data manipulation is the process of changing data to make it easier to read


or be more organized. For example, a log of data could be organized in
alphabetical order, making individual entries easier to locate
12.1 Data integrity
 Data integrity is the overall accuracy, completeness, and consistency of data.
Data integrity also refers to the safety of data in regards to regulatory
compliance.

 The term data integrity refers to the accuracy and consistency of data.


Maintaining data integrity means making sure the data remains intact and
unchanged throughout its entire life cycle. This includes the capture of the data,
storage, updates, transfers, backups, etc

 When the integrity of data is secure, the information stored in a database will
remain complete, accurate, and reliable no matter how long it’s stored or how often
it’s accessed. Data integrity also ensures that your data is safe from any outside
forces.
Why is it Important to Maintain Data
Integrity?
 Imagine making an extremely important business decision hinging on data
that is entirely, or even partially, inaccurate. Organizations routinely make
data-driven business decisions, and data without integrity, those decisions
can have a dramatic effect on the company’s bottom line goals.

 A new International report reveals that a large


majority of senior executives don’t have a
high level of trust in the way their organization
uses data, analytics, or AI.
Methods to maintain data integrity
The data integrity threats listed above also highlight an aspect of data security that can help preserve
data integrity. Use the following checklist to preserve data integrity and minimize risk for your
organization:
1.Validate Input: When your data set is supplied by a known or unknown source (an end-user, another
application, a malicious user, or any number of other sources) you should require input validation. That
data should be verified and validated to ensure that the input is accurate.
2.Validate Data: It’s critical to certify that your data processes haven’t been corrupted. Identify
specifications and key attributes that are important to your organization before you validate the data.
3.Remove Duplicate Data: Sensitive data from a secure database can easily find a home on a
document, spreadsheet, email, or in shared folders where employees without proper access can see it.
It’s prudent to clean up stray data and remove duplicates.
.
4.Back up Data: In addition to removing duplicates to ensure data security, data backups are a critical
part of the process. Backing up is necessary and goes a long way to prevent permanent data loss. How
often should you be backing up? As often as possible. Keep in mind that backups are critical when
organizations get hit with ransomware attacks. Just make sure that your backups aren’t also encrypted!
5.Access Controls: We’ve made the case above for input validation, data validation, removing
duplications, and backups – all necessary to preserve data integrity.
6.Always Keep an Audit Trail: Whenever there is a breach, it’s critical to data integrity to be able to
track down the source. Often referred to as an audit trail, this provides an organization the breadcrumbs
to accurately pin point the source of the problem.
12.1.2 Data Dictionary
 A data dictionary contains metadata i.e data about the database. The data
dictionary is very important as it contains information such as what is in the
database, who is allowed to access it, where is the database physically stored
etc. The users of the database normally don't interact with the data
dictionary, it is only handled by the database administrators.

The data dictionary in general contains information about the following:


 Names of all the database tables and their schemas.
 Details about all the tables in the database, such as their owners, their
security constraints, when they were created etc.
 Physical information about the tables such as where they are stored and how.
 Table constraints such as primary key attributes, foreign key information etc.
 Information about the database views that are visible.
 Data Dictionary acts as an automated or a manual, active or a passive file which has the
ability to store the definitions of the data elements and the data characteristics. The Data
Dictionary is actually the repository of the information about the data. Data Dictionary
defines each of the data elements and also gives it a name for the easy access.

 Data Dictionary acts as the core or the hub of the database management system, is the
third component of the database management system. The Data Dictionary provides with
the following information –

1. The name of the data item.
2. The description of the data item.
3. The sources of the data.
4. The impact analysis.
5. Keywords that are used for the categorization and the search of the data item
descriptions.
 Functions of the Data Dictionary
1. Defines the data element.
2. Helps in the scheduling.
3. Helps in the control.
4. Permits the various users who know which data is available and how can it
be obtained.
5. Helps in the identification of the organizational data irregularity.
6. Acts as a very essential data management tool.
7. Provides with a good standardization mechanism.
8. Acts as the corporate glossary of the ever growing information resource.
9. Provides the report facility, the control facility along with the excerpt
facility.
12.1.3 Construct a Data Dictionary

 Every research database, large or small, simple or complicated, should be


accompanied by a data dictionary that describes the variables contained in
the database. It will be invaluable if the person who created the database is
no longer around. A data dictionary is, itself, a data file, containing one
record for every variable in the database.

 For each variable, the dictionary should contain most of the following
information (sometimes referred to as metadata, which means “data about
data”):
 A short variable name (usually no more than eight or ten characters) that’s used when
telling the software what variables you want it to use in an analysis

 A longer verbal description of the variable (up to 50 or 100 characters)

 The type of data (text, categorical, numerical, date/time, and so on)

 If numeric: Information about how that number is displayed (how many digits are
before and after the decimal point)

 If date/time: How it’s formatted (for example, 12/25/13 10:50pm or 25Dec2013 22:50)

 If categorical: What the permissible categories are

 How missing values are represented in the database (99, 999, “NA,” and so on)

 Many statistical packages allow (or require) you to specify this information when you’re
creating the file anyway, so they can generate the data dictionary for you
automatically.
 Communication is also improved through the understanding created by the
definitions in the data dictionary. Here are a few guidelines to help in creating a
data dictionary:

 Gather Information

 Decide the Format

 Make it Flexible
12.1.4 Concept and need for data
validation
 Data validation primarily helps in ensuring that the data sent to connected
applications is complete, accurate, secure and consistent.
 This is achieved through data validation's checks and rules that routinely
check for the validity of data. These rules are generally defined in a data
dictionary or are implemented through data validation software.

 Different types of validation can be performed depending on destination


constraints or objectives. Data validation is a form of data cleansing.
Why perform data validation? 
 When moving and merging data it’s important to make sure data from different
sources and repositories will conform to business rules and not become corrupted
due to inconsistencies in type or context. The goal is to create data that is
consistent, accurate and complete so to prevent data loss and errors during a
move. 

When is data validation performed? 


In data warehousing, data validation is often performed prior to the ETL
(Extraction Translation Load) process. A data validation test is performed so that
analyst can get insight into the scope or nature of data conflicts. Data validation
is a general term and can be performed on any type of data, however, including
data within a single application (such as Microsoft Excel) or when merging simple
data within a single data store.
12.1.5 Interpret and design validation
rules
 Validation rules verify that the data a user enters in a record meets the
standards you specify before the user can save the record.
 A validation rule can contain a formula or expression that evaluates the data
in one or more fields and returns a value of “True” or “False”.
 Validation rules also include an error message to display to the user when the
rule returns a value of “True” due to an invalid value.
Types of Validation rules

 a. presence
 b. range
 c. lookup
 d. list
 e. length
 f. format
 g. check digit.
a. Presence check

 There might be an important piece of data that you want to make sure is
always stored.
 For example, a school will always want to know an emergency contact
number, a video rental store might always want to know a customer's
address, a wedding dress shop might always want a record of the brides
wedding date.
 A presence check makes sure that a critical field cannot be left blank, it must
be filled in. If someone tries to leave the field blank then an error message
will appear and you won't be able to progress to another record or save any
other data which you have entered.
b. Range check
 A range check is commonly used when you are working with data which
consists of numbers, currency or dates/times.

 A range check allows you to set suitable boundaries:


c. lookup
 When a field contains a limited list of items then a lookup list can help reduce errors.
For example:
 - a shop might put the dress sizes into a lookup list
 - a car showroom might put the car models into a lookup list
 - a vet might list the most popular types of animals that they deal with.
 For example a database storing film information wants to record the type of film it is,
so for convenience a 'lookup' drop down list is provided on the data entry form.

The benefits of a lookup list are that they:

- speed up data entry because it is usually much faster to pick from a list than
to type each individual entry
- improved accuracy because they reduce the risk of spelling mistakes
- limit the options to choose from by only presenting the required options
However, using a lookup validation technique does not prevent someone from
entering the wrong data into the field and so mistakes can still be made.
d. list

 A list is created for the limited options to be entered.

 Users can easily select the required option from the list and no need to enter.

 Saves time and less chances of errors. Also there will be no spelling mistakes,
as the data is selected not written.
e. length
Sometimes you may have data which always has the same number of characters.
For example a UK landline telephone number has 11 characters.
fixed length input field

A length check could be set up to ensure that exactly 11 numbers are entered into the
field. This type of validation cannot check that the 11 numbers are correct but it can
ensure that 10 or 12 numbers aren't entered.

A length check can also be set up to allow characters to be entered within a certain range.
For example, postcodes can be in the form of:

CV45 2RE (7 without a space or 8 with a space) or


B9 3TF (5 without a space or 6 with a space).
An input field expecting a post code entry could have a rule that it must be between 5
and 8 characters.
f. format
 You may see this validation technique referred to as either a picture or a format check,
they are the same thing so no need to worry that you need to learn two different
definitions.
Some types of data will always consist of the same pattern.
 Example 1
 Think about a postcode. The majority of postcodes look something like this:
 CV36 7TP
 WR14 5WB
 Replace either of those examples with L for any letter which appears and N for any
number that appears and you will end up with:
 LLNN NLL
 This means that you can set up a picture/format check for something like a postcode field
to ensure that a letter isn't entered where a number should be or a number in place of a
letter.
 a few postcodes break this rule e.g. B9 7NT. You can still set up a picture/format check to
include this variation.
g. check digit
 This is used when you want to be sure that a range of numbers has been entered
correctly for example a barcode or an ISBN number:

 ISBN 1 84146 201 2

 The check digit is the final number in the sequence, so in this example it is the
final ‘2’.

 bar code check digit

 The '6' on the right is the check digit.

 The computer will perform a complex calculation on all of the numbers and then
compare the answer to the check digit. If both match, it means the data was
entered correctly.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy