0% found this document useful (0 votes)
60 views52 pages

Paper 2 Database-Chapter A.3

The document discusses the roles and responsibilities of database administrators and data administrators. It defines their functions including managing security, integrity, backups and recovery. It also covers threats to database security and how administrators address them.

Uploaded by

j7kkbxt9f6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views52 pages

Paper 2 Database-Chapter A.3

The document discusses the roles and responsibilities of database administrators and data administrators. It defines their functions including managing security, integrity, backups and recovery. It also covers threats to database security and how administrators address them.

Uploaded by

j7kkbxt9f6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 52

Paper 2 –Option A-Databases

A.3- Further aspects of database management

What are the functions of the Database Administrator?

Before trying to understand the functions of the database administrator, it is


necessary to first learn the three different functional levels needed to maintain a
database. These levels are the data administration (DA), the database
administration (DBA), and database steward.

What is a data administrator?

A data administration (also known as a database administration manager, data


architect, or information center manager) is a high level function responsible for
the overall management of data resources in an organization. In order to perform
its duties, the DA must know a good deal of system analysis and programming.
These are the functions of a data administrator (not to be confused with database
administrator functions):
1. Data policies, procedures, standards
2. Planning- development of organization's IT strategy, enterprise model,
cost/benefit model, design of database environment, and administration plan.
3. Data conflict (ownership) resolution
4. Data analysis- Define and model data requirements, business rules, operational
requirements, and maintain corporate data dictionary
5. Internal marketing of DA concepts
6. Managing the data repository
What is a database administrator?

Database administration is more of an operational or technical level function


responsible for physical database design, security enforcement, and database
performance. Tasks include maintaining the data dictionary, monitoring
performance, and enforcing organizational standards and security.
What are the functions of a database administrator?
1. Selection of hardware and software

1
 Keep up with current technological trends
 Predict future changes
 Emphasis on established off the shelf products

2. Managing data security and privacy

 Protection of data against accidental or intentional loss, destruction, or


misuse
 Firewalls
 Establishment of user privileges
 Complicated by use of distributed systems such as internet access and
client/ server technology.

How many major threats to database security can you think of?
1. Accidental loss due to human error or software/ hardware error.
2. Theft and fraud that could come from hackers or disgruntled employees.
3. Improper data access to personal or confidential data.
4. Loss of data integrity.
5. Loss of data availability through sabotage, a virus, or a worm.

3. Managing Data Integrity

 Integrity controls protects data from unauthorized use


 Data consistency
 Maintaining data relationship
 Domains- sets allowable values
 Assertions- enforce database conditions

4. Data backup

 We must assume that a database will eventually fail


 Establishment procedures
o how often should the data be back-up?
o what data should be backed up more frequently?
o who is responsible for the back ups?
 Back up facilities

2
o automatic dump- facility that produces backup copy of the entire
database
o periodic backup- done on periodic basis such as nightly or weekly
o cold backup- database is shut down during backup
o hot backup- a selected portion of the database is shut down and
backed up at a given time
o backups stored in a secure, off-site location

5. Database recovery

 Application of proven strategies for reinstallation of database after crash


 Recovery facilities include backup, journalizing, checkpoint, and recovery
manager

If there are back up facilities, are there also journalizing, checkpoint, and recovery
facilities?
Yes

 Journalizing facilities include:


o audit trail of transactions and database updates
o transaction log which records essential data for each transaction
processed against the database
o database change log shows images of updated data. The log stores a
copy of the image before and after modification.
 Checkpoint facilities:
o when the DBMS refuses to accept a new transaction, the system is in
a quiet state
o database and transactions are synchronized
o allows the recovery manager to resume processing from a short
period instead of repeating the entire day
 Recovery and Restart Procedures
o switch- mirrored databases
o restore/rerun- reprocess transactions against the backup
o transaction integrity- commit or abort all transaction changes
o backward recovery (rollback)- apply before images

These are share administration functions

3
1. Database design

 DA is responsible for logical design


 DBA is responsible for the external model design (subschemas), the physical
design (construction), and for designing integrity controls

2. Database implementation

 DBA
o establish security controls
o supervise database loading
o specify test procedures
o develop programming standards
o establish back up/ recovery procedures
 Both
o specify access policies
o user training

3. Operations and maintenance

 DBA
o monitor database performance
o tune and reorganize databases as needed
o enforce standards and procedures
 Both
o support users

4. Growth and change

 Both
o implement change-control procedures
o plan for growth and change
o evaluate new technologies

New functions
1. Data warehouse administration

 New function due to the increase use of data warehousing

4
o (massively) integrated decision support databases from various
sources
 Emphasis on integration and coordination of data and metadata from
multiple databases
 Specific functions
o support decision-oriented applications
o manage data warehouse (exponential) growth
o establish service level agreements

Review Questions: Multiple Choice


1. A person who takes overall responsibility for data, metadata, and the policies
about data use is the _______.
A. Data administrator
B. Database administrator
C. Database steward
D. Both A and B.
2. The _________ has a more hands-on, physical involvement with the database
than the ____________.

A. Data administrator; Database administrator


B. Database administrator; Data administrator
C. Database steward; Database administrator
D. None of the above
3. Before- and after-images of records that have been modified by transactions
are in a ____.

A. database change log


B. transaction log
C. checkpoint
D. journalizing facility
4. Which is NOT one of the basic facilities for backup and recovery of a database?

A. Checkpoint facility
B. Recovery Manager
C. Biometric Device
D. Journalizing facilities.
5. Which of the following is the goal of database security?
5
A. To protect primarily against accidental or intentional loss of data
B. To protect against misuse of data
C. To protect against destruction of data
D. All of the above
Answers:
1. A
2. B
3. A
4. C
5. D

Questions:

1. List the functions of a database administrator.

1. Selection of hardware and software


2. Managing data security and privacy
3. Managing data integrity
4. Data back up
5. Database recovery
6. Tuning database performance
7. Improving query processing performance

2. What impact has the internet caused to the management of data security.

As a result of the internet, managing data security effectively has become more
difficult because access to data has become open through the internet and
corporate intranets.

3. What are five major threats to data security?

1. Accidental loss due to human error or software/ hardware error.


2. Theft and fraud that could come from hackers or disgruntled employees.
3. Improper data access to personal or confidential data.
4. Loss of data integrity.

6
5. Loss of data availability through sabotage, a virus, or a worm.

4. Explain the function of a recovery manager?

The recovery manager is a module of the DBMS which restores the database to
a correct condition when a failure occurs and which resumes processing user
requests.

5. What is the difference between backward (rollback) and forward (roll forward)
recovery?

The rollback is the back out or undo of unwanted changes to the database.
Before-images of the records that have been changed are applied to the
database, and the database is returned to an earlier state. Used to reverse the
changes made by transactions that have been aborted or terminated abnormally.

Roll forward is the technique that starts with an earlier copy of the database.
After-images (the results of good transactions) are applied to the database, and
the database is quickly moved forward to a later state.

How can an end user interact with a database?

The data, a user fills into the forms, is saved in the database.
Who can i use the data for functionality which is performed at another place and
time?
I don't find a answer.

Also If an user posts 2 numbers in a form, then both numbers are stored in th db.
Then there i have a button somewhere else on the page, which calculates the sum
of the two numbers.

7
Recovery Techniques for Database Systems

 Definitions:
o Failure: An event at which the system does not perform according to
specifications. There are three kinds of failures:
1. failure of a program or transaction
2. failure of the total system
3. hardware failure
o Recovery Data: Data required by the recovery system for the
recovery of the primary data. In very high reliability systems, this
data might also need to be covered by a recovery mechanism... Data
recovery data is divided into two categories : 1) data required to
keep current values, and 2) data to make the restoration of previous
values possible.
o Transaction: The base unit of locking and recovery (for undo, redo, or
completion), appears atomic to the user.
o Database: A collection of related storage objects together with
controlled redundancy that serves one or more applications. Data is
stored in a way that is independent of programs using it, with a single
approach used to add, modify, or retrieve data.
o Correct State: Information in the database consists of the most
recent copies of data put in the database by users and contains no
data deleted by users.
o Valid State: The database contains part of the information of the
correct state. There is no spurious data, although pieces may be
missing.
o Consistent State: In a valid state, with the information contained
satisfying user consistency constraints. Varies depending on the
database and users.
o Crash: A failure of a system that is covered by a recovery technique.
o Catastrophe: A failure of a system not covered by a recovery
technique.

 Possible Levels of Recovery:

8
1. Recovery to the correct state.
2. Recovery to a checkpointed (past) correct state.
3. Recovery to a possible previous state.
4. Recovery to a valid state.
5. Recovery to a consistent state.
6. Crash resistance (prevention).

The bigger the damage, the cruder the recovery technique used.

 Recovery Techniques:
1. Salvation program: Run after a crash to attempt to restore the
system to a valid state. No recovery data used. Used when all other
techniques fail or were not used. Good for cases where buffers were
lost in a crash and one wants to reconstruct what was lost...(4,5)
2. Incremental dumping: Modified files copied to archive after job
completed or at intervals. (3,4)
3. Audit trail: Sequences of actions on files are recorded. Optimal for
"backing out" of transactions. (Ideal if trail is written out before
changes). (1,2,3)
4. Differential files: Separate file is maintained to keep track of
changes, periodically merged with the main file. (2,3)
5. Backup/current version: Present files form the current version of the
database. Files containing previous values form a consistent backup
version. (2,3)
6. Multiple copies: Multiple active copies of each file are maintained
during normal operation of the database. In cases of failure,
comparison between the versions can be used to find a consistent
version. (6)
7. Careful replacement: Nothing is updated in place, with the original
only being deleted after operation is complete. (2,6)

(Parens and numbers are used to indicate which levels from above are
supported by each technique).
Combinations of two techniques can be used to offer similar protection
against different kinds of failures. The techniques above, when
implemented, force changes to:

o The way data is structured (4,5,6).

9
o The way data is updated and manipulated (7).
o nothing (available as utilities) (1,2,3).
 Examples and bits of wisdom:
o Original Multics system : all disk files updated or created by the user
are copied when the user signs off. All newly created of modified files
not previously dumped are copied to tapes once per hour. High
reliability, but very high overhead. Changed to a system using a mix
of incremental dumping, full checkpointing, and salvage programs.
o Several other systems maintain backup copies of data through the
paging system (keep backups in the swap space).
o Use of buffers is dangerous for consistency.
o Intention lists: specify audit trail before it actually occurs.
o Recovery among interacting processes is hard. You can either
prevent the interaction or synchronize with respect to recovery.
o Error detection is difficult, and can be costly.

Relevance
Recovery from failure is a critical factor in databases. In case of disaster, it is very
important that as much as possible (if not everything) is recovered. This paper
surveys the methods that we in use at the time for data recovery.

Implications of the Database Approach

Potential for Enforcing Standards


This is very crucial for the success of database applications in large organizations
Standards refer to data item names, display formats, screens, report structures,
meta-data (description of data) etc
Reduced Application Development Time
Incremental time to add each new application is reduced
Flexibility to change data-structures
Database structure may evolve as new requirements are defined
Availability of Up-to-Date Information
Very important for on-line transaction systems such as airline, hotel, car
reservations

10
Economies of Scale
By consolidating data and applications across departments wasteful overlap of
resources and personnel can be avoided

Data Matching:

The Data Quality Services (DQS) data matching process enables you to reduce
data duplication and improve data accuracy in a data source. Matching analyzes
the degree of duplication in all records of a single data source, returning weighted
probabilities of a match between each set of records compared. You can then
decide which records are matches and take the appropriate action on the source
data.
The DQS matching process has the following benefits:

 Matching enables you to eliminate differences between data values that


should be equal, determining the correct value and reducing the errors that
data differences can cause. For example, names and addresses are often
the identifying data for a data source, particularly customer data, but the
data can become dirty and deteriorate over time. Performing matching to
identify and correct these errors can make data use and maintenance much
easier.
 Matching enables you to ensure that values that are equivalent, but were
entered in a different format or style, are rendered uniform.
 Matching identifies exact and approximate matches, enabling you to
remove duplicate data as you define it. You define the point at which an
approximate match is in fact a match. You define which fields are assessed
for matching, and which are not.
 DQS enables you to create a matching policy using a computer-assisted
process, modify it interactively based upon matching results, and add it to a
knowledge base that is reusable.
 You can re-index data copied from the source to the staging table, or not
re-index, depending on the state of the matching policy and the source
data. Not re-indexing can improve performance.

You can perform the matching process in conjunction with other data cleansing
processes to improve overall data quality. You can also perform data de-
11
duplication using DQS functionality built into Master Data Services. For more
information, see Master Data Services Overview.
The following illustration displays how data matching is done in DQS:

How to Perform Data Matching

As with other data quality processes in DQS, you perform matching by building a
knowledge base and executing a matching activity in a data quality project in the
following steps:

1. Create a matching policy in the knowledge base


2. Perform a de-duplication process in a matching activity that is part of a data
quality project.

12
Data Mining
Data mining (sometimes called data or knowledge discovery) is the process of
analyzing data from different perspectives and summarizing it into useful
information - information that can be used to increase revenue, cuts costs, or
both. Data mining software is one of a number of analytical tools for analyzing
data. It allows users to analyze data from many different dimensions or angles,
categorize it, and summarize the relationships identified. Technically, data mining
is the process of finding correlations or patterns among dozens of fields in large
relational databases.

Data, Information, and Knowledge

Data
Data are any facts, numbers, or text that can be processed by a computer. Today,
organizations are accumulating vast and growing amounts of data in different
formats and different databases. This includes:

 operational or transactional data such as, sales, cost, inventory, payroll, and
accounting

 nonoperational data, such as industry sales, forecast data, and macro


economic data

 meta data - data about the data itself, such as logical database design or
data dictionary definitions

Information
The patterns, associations, or relationships among all this data can provide
information. For example, analysis of retail point of sale transaction data can yield
information on which products are selling and when.

13
Knowledge
Information can be converted into knowledge about historical patterns and future
trends. For example, summary information on retail supermarket sales can be
analyzed in light of promotional efforts to provide knowledge of consumer buying
behavior. Thus, a manufacturer or retailer could determine which items are most
susceptible to promotional efforts.
Data Warehouses
Dramatic advances in data capture, processing power, data transmission, and
storage capabilities are enabling organizations to integrate their various
databases into data warehouses. Data warehousing is defined as a process of
centralized data management and retrieval. Data warehousing, like data mining,
is a relatively new term although the concept itself has been around for years.
Data warehousing represents an ideal vision of maintaining a central repository of
all organizational data. Centralization of data is needed to maximize user access
and analysis. Dramatic technological advances are making this vision a reality for
many companies. And, equally dramatic advances in data analysis software are
allowing users to access this data freely. The data analysis software is what
supports data mining.

What can data mining do?


Data mining is primarily used today by companies with a strong consumer focus -
retail, financial, communication, and marketing organizations. It enables these
companies to determine relationships among "internal" factors such as price,
product positioning, or staff skills, and "external" factors such as economic
indicators, competition, and customer demographics. And, it enables them to
determine the impact on sales, customer satisfaction, and corporate profits.
Finally, it enables them to "drill down" into summary information to view detail
transactional data.
With data mining, a retailer could use point-of-sale records of customer
purchases to send targeted promotions based on an individual's purchase history.
By mining demographic data from comment or warranty cards, the retailer could
develop products and promotions to appeal to specific customer segments.

14
For example, Blockbuster Entertainment mines its video rental history database
to recommend rentals to individual customers. American Express can suggest
products to its cardholders based on analysis of their monthly expenditures.
WalMart is pioneering massive data mining to transform its supplier relationships.
WalMart captures point-of-sale transactions from over 2,900 stores in 6 countries
and continuously transmits this data to its massive 7.5 terabyte Teradata data
warehouse. WalMart allows more than 3,500 suppliers, to access data on their
products and perform data analyses. These suppliers use this data to identify
customer buying patterns at the store display level. They use this information to
manage local store inventory and identify new merchandising opportunities. In
1995, WalMart computers processed over 1 million complex data queries.
The National Basketball Association (NBA) is exploring a data mining application
that can be used in conjunction with image recordings of basketball games. The
Advanced Scout software analyzes the movements of players to help coaches
orchestrate plays and strategies. For example, an analysis of the play-by-play
sheet of the game played between the New York Knicks and the Cleveland
Cavaliers on January 6, 1995 reveals that when Mark Price played the Guard
position, John Williams attempted four jump shots and made each one! Advanced
Scout not only finds this pattern, but explains that it is interesting because it
differs considerably from the average shooting percentage of 49.30% for the
Cavaliers during that game.
By using the NBA universal clock, a coach can automatically bring up the video
clips showing each of the jump shots attempted by Williams with Price on the
floor, without needing to comb through hours of video footage. Those clips show
a very successful pick-and-roll play in which Price draws the Knick's defense and
then finds Williams for an open jump shot.
How does data mining work?
While large-scale information technology has been evolving separate transaction
and analytical systems, data mining provides the link between the two. Data
mining software analyzes relationships and patterns in stored transaction data
based on open-ended user queries. Several types of analytical software are
available: statistical, machine learning, and neural networks. Generally, any of
four types of relationships are sought:

 Classes: Stored data is used to locate data in predetermined groups. For


example, a restaurant chain could mine customer purchase data to

15
determine when customers visit and what they typically order. This
information could be used to increase traffic by having daily specials.

 Clusters: Data items are grouped according to logical relationships or


consumer preferences. For example, data can be mined to identify market
segments or consumer affinities.

 Associations: Data can be mined to identify associations. The beer-diaper


example is an example of associative mining.

 Sequential patterns: Data is mined to anticipate behavior patterns and


trends. For example, an outdoor equipment retailer could predict the
likelihood of a backpack being purchased based on a consumer's purchase
of sleeping bags and hiking shoes.

Data mining consists of five major elements:

 Extract, transform, and load transaction data onto the data warehouse
system.

 Store and manage the data in a multidimensional database system.

 Provide data access to business analysts and information technology


professionals.

 Analyze the data by application software.

 Present the data in a useful format, such as a graph or table.

Different levels of analysis are available:

 Artificial neural networks: Non-linear predictive models that learn through


training and resemble biological neural networks in structure.

 Genetic algorithms: Optimization techniques that use processes such as


genetic combination, mutation, and natural selection in a design based on
the concepts of natural evolution.

 Decision trees: Tree-shaped structures that represent sets of decisions.


These decisions generate rules for the classification of a dataset. Specific
16
decision tree methods include Classification and Regression Trees (CART)
and Chi Square Automatic Interaction Detection (CHAID) . CART and CHAID
are decision tree techniques used for classification of a dataset. They
provide a set of rules that you can apply to a new (unclassified) dataset to
predict which records will have a given outcome. CART segments a dataset
by creating 2-way splits while CHAID segments using chi square tests to
create multi-way splits. CART typically requires less data preparation than
CHAID.

 Nearest neighbor method: A technique that classifies each record in a


dataset based on a combination of the classes of the k record(s) most
similar to it in a historical dataset (where k 1). Sometimes called the k-
nearest neighbor technique.

 Rule induction: The extraction of useful if-then rules from data based on
statistical significance.

 Data visualization: The visual interpretation of complex relationships in


multidimensional data. Graphics tools are used to illustrate data
relationships.

What technological infrastructure is required?


Today, data mining applications are available on all size systems for mainframe,
client/server, and PC platforms. System prices range from several thousand
dollars for the smallest applications up to $1 million a terabyte for the largest.
Enterprise-wide applications generally range in size from 10 gigabytes to over 11
terabytes. NCR has the capacity to deliver applications exceeding 100 terabytes.
There are two critical technological drivers:

 Size of the database: the more data being processed and maintained, the
more powerful the system required.

 Query complexity: the more complex the queries and the greater the
number of queries being processed, the more powerful the system
required.

17
Relational database storage and management technology is adequate for many
data mining applications less than 50 gigabytes. However, this infrastructure
needs to be significantly enhanced to support larger applications. Some vendors
have added extensive indexing capabilities to improve query performance. Others
use new hardware architectures such as Massively Parallel Processors (MPP) to
achieve order-of-magnitude improvements in query time. For example, MPP
systems from NCR link hundreds of high-speed Pentium processors to achieve
performance levels exceeding those of the largest supercomputers.

Multiple choice questions for a quick revision


MUST GO THROUGH………………BY ALL HL & SL
STUDENTS

DATABASE APPLICATIONS AND PRIVACY IMPLICATIONS

Multiple Choice:
1. Database programs can do all of the following EXCEPT:

A. store and organize data.

B. Create graphics.

C. Communicate data.

D. Manage information.

Answer: B

2. A (n) ____________ is a good comparison to a database.

A. computerized file cabinet

B. computerized typewriter
18
C. office desktop

D. computerized calculator

Answer: A

3. Database software is an example of a(n):

A. DBA.

B. application.

C. desktop publishing program.

D. operating system.

Answer: B

19
4. Advantages of databases include all of the following EXCEPT:

A. easy to reorganize data.

B. easy to retrieve information.

C. easy to store large amounts of data.

D. easy to secure because information from the database cannot be


printed

Answer: D

5. Software for organizing storage and retrieval of information is a(n):

A. database.

B. database program.

C. operating system.

D. data warehouse.

Answer: B

6. A collection of information stored in an organized form in a computer is a(n):

A. database.

B. DBMS.

C. operating system.

D. utility.

Answer: A

7. A relational database is composed of one or more:


20
A. directories.

B. tables.

C. folders.

D. DBMS.

Answer: B

21
8. In a database table, a ____________ is a collection of data fields.

A. vector

B. query

C. descriptor

D. record

Answer: D

9. In a customer database table, all of the information for one customer is kept in
a:

A. field type.

B. field.

C. record.

D. column.

Answer: C

10. In a customer database, a customer’s surname would be keyed into a:

A. row.

B. text field.

C. record.

D. computed field.

Answer: B

22
11. In a database, a ____________ field shows results of calculations performed
on data in other numeric fields.

A. configured

B. concatenated

C. key

D. computed

Answer: D

23
12. The number of newspapers sold on May 30 would be kept in a ____________
field.

A. date

B. numeric

C. text

D. key

Answer: B

13. Bringing data from a word processing program into a database program is
known as:

A. exporting.

B. batch processing.

C. importing.

D. mining.

Answer: C

14. ____________ is perusing data in a database as if looking through pages in a


notebook.

A. Browsing

B. Mining

C. Scrubbing

D. Cleansing

24
Answer: A

15. When looking for a specific patient in a hospital’s database, ___________ is


more efficient than browsing.

A. surfing

B. keying

C. scrubbing

D. querying

Answer: D

25
16. Arranging all customer records in customer number order is an example of:

A. querying.

B. sorting.

C. inquiring.

D. filtering.

Answer: B

17. An ordered list of specific records and specific fields printed in an easy-to-read
format is known as a(n):

A. query.

B. sort.

C. inquiry.

D. report.

Answer: D

18. The process of ____________ would be used when sending data from a
database to a word processor so that mailing labels could be produced.

A. exporting

B. sorting

C. mining

D. querying

Answer: A

26
19. Database queries must be:

A. contiguous.

B. unambiguous.

C. contoured.

D. batched.

Answer: B

27
20. The following is an example of:

Select Student_ID From Students Where Major = Business and Credits >= 46

A. query language.

B. BASIC language.

C. HTML language.

D. a spreadsheet formula.

Answer: A

21. PIM stands for:

A. personal information manager.

B. personal inquiry manager.

C. personalized information management.

D. program information management.

Answer: A

22. A(n) ____________ combines data tables with demographic information.

A. PIM

B. intranet

C. SPSS

D. GIS

Answer: D

28
23. A ____________ manipulates data in a large collection files and cross
references those files.

A. DBA

B. GIS

C. PIM

D. DBMS

Answer: D

29
24. A large corporation would use a ____________ to keep records for many
employees and customers along with all of its inventory data.

A. GIS

B. spreadsheet program

C. PIM

D. database management system

Answer: D

25. For a customer database, a good choice of key field would be:

A. address.

B. customer ID.

C. phone number.

D. last name.

Answer: B

26. A key field must:

A. uniquely identify a record.

B. be used to connect two tables in the database.

C. be located in a minimum of three tables.

D. be common and used in many records.

Answer: A
30
27. In a(n) ____________, data from more than one table can be combined.

A. key field

B. relational database

C. file manager

D. XML

Answer: B

31
28. ____________ processing is used when a large mail-order company
accumulates orders and processes them together in one large set.

A. Interactive

B. Group

C. Real-time

D. Batch

Answer: D

29. When making an airline reservation through the Internet, you use
____________ processing.

A. interactive

B. group

C. digitization

D. batch

Answer: A

30. Producing invoices once a month is an example of ____________ processing.

A. interactive

B. digitization

C. real-time

D. batch

Answer: D

32
31. In a typical client/server environment, the client can be any of the following
EXCEPT a:

A. desktop computer.

B. mainframe.

C. PDA.

D. notebook.

Answer: B

33
32. In a client/server environment, the server:

A. processes a query from a client and then sends the answer back to
the client.

B. cannot be used to access a corporate data warehouse.

C. must be a CRM system.

D. must be within 100 meters of all client computers in the network.

Answer: A

33. ____________ is connectivity software that hides the complex interaction


between client and server computers and creates a three-tier design
separating actual data from the programming logic used to access it.

A. CRM

B. XML

C. Middleware

D. Firmware

Answer: C

34. Data mining is:

A. batch processing using files stored on a mainframe computer.

B. locating trends and patterns in information kept in large databases.

C. querying databases used by the mining industry.

D. creating a database warehouse from many smaller databases.

34
Answer: B

35. ___________ is a new, powerful data description language used to construct


web pages as well as access and query databases using the Internet.

A. SQL

B. CRM

C. PIM

D. XML

Answer: D

35
36. A CRM system organizes and tracks information on:

A. consulates.

B. computer registers.

C. customers.

D. privacy violations.

Answer: C

37. In an object-oriented database, every object is an instance of a

A. table

B. field

C. class

D. record.

Answer: C

38. When a person uses language like ordinary English to query a database, it is
known as a(n) ____________ language query.

A. HTML

B. object-oriented

C. natural

D. XML

Answer: C

36
39. The act of accessing data about other people through credit card information,
credit bureau data, and public records and then using that data without
permission is known as:

A. identity theft.

B. personal theft.

C. data mining.

D. Big Brother crime.

Answer: A

37
40. An aspect of the USA Patriot Act is the requirement that when presented with
appropriate warrants:

A. citizens must submit to lie detector tests upon request.

B. companies must turn over their employees’ military records.

C. libraries must turn over their patrons’ records.

D. foreigners must be fingerprinted when entering the US.

Answer: C

41. One disadvantage of data mining is that it:

A. accumulates so much data that it is difficult to use efficiently.

B. bypasses virus checking.

C. generates few results.

D. produces graphs and reports, no straight-forward data.

Answer: A

Fill in the Blank:


42. A(n) ____________ is a collection of information stored electronically.

Answer: database

43. A(n) ____________ field shows results of a calculation done using values in
other numeric fields.

Answer: computed

38
44. A(n) ____________ is a collection of related information stored in a database
program.

Answer: table

45. In a university database table, all of the information for one student (e.g.
student ID, name, address) would be stored in one ____________.

Answer: record

46. In a university database, a course name would be stored as a(n)


____________ type field.

Answer: text Reference: Database Anatomy

39
47. In a university database, a student’s birth date would be stored in a(n)
____________ type field.

Answer: date

48. In ____________ view, the database program shows the data one record at a
time.

Answer: form

49. Bringing a list of names and addresses from a Word document into a database
program is called ____________ data.

Answer: importing

50. A request for information from a database that can be saved and reused later
is known as a(n) ____________.

Answer: stored query

51. To arrange students in a university database table in alphabetical order, the


user must perform a(n) ____________ on the database.

Answer: sort

52. A typical SQL statement filters the ____________ of a database, thereby


presenting only those that meet the criteria given.

Answer: records

53. A specialized database program that can store addresses and phone numbers,
keep a calendar, and set alarms is known as a(n) ____________.

40
Answer: PIM

54. DBMS stands for ____________.

Answer: database-management system

55. A(n) ____________ combines tables of data with demographic information.

Answer: GIS

56. GIS stands for ____________.

Answer: geographical information system

41
57. PIM stands for ____________.

Answer: personal information manager

58. Because it is a unique identifier for a book, an ISBN number would be an


example of a(n) ____________ field used in a library database table.

Answer: key

59. In a(n) ____________ database, changes in one file are automatically


reflected in other files.

Answer: relational

60. Timesheet transactions collected and used to update payroll files once a
week, is an example of ____________ processing.

Answer: batch

61. In ____________ computing, users can view and change data online.

Answer: real-time or interactive

62. ____________ databases spread data across networks on several different


computers.

Answer: Distributed

63. In client/server computing, connectivity software can also be called


____________.

Answer: middleware

42
64. In a client/server environment, a desktop computer is known as the
____________.

Answer: client

65. Some large companies keep all of their corporate data in an integrated data
repository called a data ____________.

Answer: warehouse

66. Data ____________ is used to find hidden predictive information in


databases.

Answer: mining Reference: Data Mining Difficulty: Challenging

67. Data ____________ uses artificial intelligence and statistical methods to find
trends and patterns in data.

Answer: mining

68. A CRM system tracks information on ____________.

Answer: customers

43
69. A company’s self-contained network that uses a search engine and Web
browser is called a(n) ____________.

Answer: intranet

70. ____________ databases store objects instead of records.

Answer: Object-oriented

71. Using software to search for and replace data that contains errors is called
____________.

Answer: data scrubbing or cleansing

72. Since the ____________ Act was passed, libraries and bookstores can be
required to turn over their customer records to the FBI.

Answer: USA Patriot

73. The Children’s Online Privacy Protection Act requires Internet-based


businesses to obtain parental consent before collecting information from
children under ____________ years of age.

Answer: thirteen

74. Match the following Federal Acts to their meanings:

I. Family Education Rights and Privacy Act A. easier for FBI to


collect information about individuals

II. USA PATRIOT Act B. “I was denied for


credit! I demand to see the report.”

44
III. Privacy Act of 1974 C. parents must give consent if
an Internet-based business wishes to collect data from children under 13
years of age

IV. Fair Credit Reporting Act of 1970 D. students can access


and correct educational records

V. Freedom of Information Act of 1966 E. video stores


cannot give out customer rental records

VI. Video Privacy Protection Act F. federal agencies must


provide your information to you

VII. Children’s Online Privacy G. “I’d like to look at the


court records Protection Act of Joe Smith.”

Answers: D, A, F, B, G, E, C

75. Match the following five terms to their meanings:

I. data mining A. integrated corporate data kept in a central repository

II. data warehouse B. language used to program complex database queries

III. XML C. process that locates hidden predictive information in large databases

IV. SQL D. software used to manipulate a large collection of data

V. DBMS E. data description language designed for database access on the Web

Answers: C, A, E, B, D

45
Normal Forms
Forms of normalization are given below:

1. FIRST NORMAL FORM (1NF)


2. SECOND NORMAL FORM (2NF)
3. THIRD NORMAL FORM (3NF)
4. Boyce-Codd Normal Form (BCNF)
5. FOURTH NORMAL FORM (4NF)
6. FIFTH NORMAL FORM (5NF)

FIRST NORMAL FORM (1NF) :


"A relation schema R is in 1NF, if it does not have any composite attributes,multivalued atttribute or their
combination."
The objective of first normal form is that the table should contain no repeating groups of data.Data is divided
into logical units called entities or tables
All attributes (column) in the entity (table) must be single valued.
Repeating or multi valued attributes are moved into a separate entity (table) & a relationship is established
between the two tables or entities.

Example of the first normal form :


Consider the following Customer table.
Customer :

cid name address contact_no

society city

C01 aaa Amul avas,Anand {1234567988}

C02 bbb near parimal garden,abad {123,333,4445}

C03 ccc sardar colony , surat

Here,address is a composite attribute , which is further subdivided into two column society and city.And
attribute contact_no is multivalued attribute.
Problems with this relation are -

46
 It is not possible to store multiple values in a single field in a relation. so, if any customer has more
than one contact number, it is not possible to store those numbers.
 Another problem is related to information retrieval . Suppose, here, if there is a need to find out all
customers belonging to some particular city, it is very difficult to retrieve. The reason is: city name
for all customers are combined with society names and stored whole as an address.

Solution for composite attribute


Insert separate attributes in a relation for each sub-attribute of a composite attribute.
In our example, insert two separate attributes for Society and city in a relation in place of single composite
attributes address. Now, insert data values separately for Society and City for all tuples.
Customer :

cid name Society

city contact_no

C01 aaa Amul avas Anand {1234567988}

C02 bbb near parimal garden abad {123,333,4445}

C03 ccc sardar colony surat

Solution for Multi-valued attribute


Two approaches are available to solve problem of multi-valued attribute

1. First Approach:
In a First approach, determine maximum allowable values for a multi-valued attribute.In our case, if
maximum two numbers are allowed to store, insert two separate attributes attributes to store contact
numbers as shown.
Customer:

cid name Society city

contact_no1 contact_no contact_no3


2

47
C0 aaa Amul avas Anand 1234567988
1

C0 bbb near abad 123 333 4445


2 parimal
garden

C0 ccc sardar surat


3 colony

Now,if customer has only one contact number or no any contact number, then keep the related field empty
for tupple of that customer. If customer has two contact numbers, store both number in related fields. If
customer has more than two contact numbers, store two numbers and ignore all other numbers.

2.Second Approach:
In a second approach, remove the multi-valued attribute that violates 1NF and place it in a separate relation
along with the primary key of given original relation. The primary key of new relation is the combination of
multi-valued attribute and primary key of old relation. for example, in our case, remove the contact_no
attribute and place it with cid in a separate relation customer_contact. Primary Key for relation
Customer_contact will be combination of cid and contact_no.
customer:

cid name Society city

C01 aaa Amul avas Anand

C02 bbb near parimal garden abad

C03 ccc sardar colony surat

Customer_contact

cid contact_no

48
C01 1234567988

C02 123

C02 333

C02 4445

First approach is simple.But, it is not always possible to put restriction on maximum allowable values.It also
introduces null values for mant fields.
Second approach is superior as it does not suffer from draw backs of first approach. But, it is some what
complicated one.For example, to display all information about any/all customers, two relations - Customer
and Customer_contact - need to be accessed.

SECOND NORMAL FORM (2NF) :

"A relation schema R is in 2NF, if It is in First Normal Form, and every non_ prime attribute of relation is fully
functionally dependent on primary key."
A relation can violate 2NF only when it has more than one attribute in combination as a primary key. if
relation has only single attribute as a primary key, then, relation will definitely be in 2NF.

Example:
consider the folllowing relation Table Depositor_Account
Depositor_Account

This relation contains following functional dependencies.


FD1 : {Cid, ano} -> {access_date, balance, bname}
FD2 : ano -> {balance, bname}

In this relation schema, access_date, balance and bname are non - prime attributes. Among all these three

49
attributes, access_date is fully dependent on primary key (cid and ano). But balance and bname are not fully
dependent on primary key. tey depend on ano only.
So, this relation is not in Second normal form. Such kind of partia dependencies result in data redundancy.

Solution:
Decompose the ralation such that, resultant relations do not have any partial functional dependency. For this
purpose, remove the partial dependent non-prime attributes that violates 2NF in relation. Place them in a
separate new relation along with the prime attribute on which they fully depend.
In our example, balance and bname are partial dependant attribute on primary key. so, remove them and
place in separate ralation called account alog with prime attribute ano. For relation Account, ano will be a
Primary key.
The Depositor_ account relation will be decomped in two seperate relations, called Account_holder and
Account.
Account:

Account_Holder:

THIRD NORMAL FORM (3NF) :


"A relation schema R is in 3NF, if it is in Second normal form, and no any non-prime attribute of relation is
transitively dependent on primary key."
Third normal form ensures that the relation does not have any non-prime attribute transitively dependent on
primary key. In other words, It ensures that all the non-prime attributes of relation directly depend on the
primary key.

Example:
Consider the following relation schema Account_Branch
Account_Branch:

50
This relationcontains following functional dependencies.
FD1 : ano -> {balance, bname, baddress}, and
FD2 : bname -> baddress
In this relation schema, there is a functional dependency ano -> bname between ano & bname as shown in
FD1. also, there is another functional dependency bname -> baddress between bname & baddress as shown
in FD2. more over bname is a non-prime attribute. So, there is a transitive dependency from ano to baddress,
denoted by ano -> baddress.
Such kind of transitive dependencies result in data redundancy. In this relation, branch address wil be stored
repeatedly for each account of the same branch, occupying more amount of memory.

Solution:
Decompose the relation in such a way that, resultant relatons do not have any non-prime attribute
transitively dependent on primary key. For this purpose, remove the transitively dependant non-prime
attributes that violates 3NF from relation. Place them in a separate new relation along with the non-prime
attribute due to which tansitive dependency occured. The primary key of new relation will be the non-prime
atttribute.
In our example, baddress is transitively dependent on ano due to non-prime attribute bname. So, remove
baddress and place it in separate relation called Branch along with the non-prime attribute bname. for
relation Branch, bname will be a primary key.
The Account_Branch relation will be decomposed in two separate relations called Account and Branch.
Account:

Branch:

51
52

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy