Computers and Data Resource Management
Computers and Data Resource Management
That's why organizations and their managers need to practice data resource management, a managerial activity
that applies information systems technologies like database management, data warehousing, and other data
management tools to the task of managing an organization's data resources to meet the information needs of
their business stakeholders. This chapter will show you the managerial implications of using data resource
management technologies and methods to manage an organization's data assets to meet the information
requirements of E-business companies.
Character
The most basic logical data element is the character, which consists of a single alphabetic, numeric, or other
symbol. One might argue that the bit or byte is a more elementary data element, but remember that those terms
refer to the physical storage elements provided by the computer hardware, discussed in Chapter 3. From a user's
point of view (that is, from a logical as opposed to a physical or hardware view of data), a character is the most
basic element of data that can be observed and manipulated.
Field
The next higher level of data is the field, or data item. A field consists of a grouping of characters. For example,
the grouping of alphabetic characters in a person's name forms a name field, and the grouping of numbers in a
sales amount forms a sales amount field. Specifically, a data field represents an attribute (a characteristic or
quality) of some entity (object, person, place, or event). For example, an employee's salary is an attribute that is a
typical data field used to describe an entity who is an employee of a business.
Record
Related fields of data are grouped to form a record. Thus, a record represents a collection of attributes that
describe an entity. An example is the payroll record for a person, which consists of data fields describing
attributes such as the person's name, Social Security number, and rate of pay. Fixed-length records contain a fixed
number of fixed-length data fields. Variable-length records contain a variable number of fields and field lengths.
File
A group of related records is a data file, or table. Thus, an employee file would contain the records of the
employees of a firm. Files are frequently classified by the application for which they are primarily used, such as a
payroll file or an inventory file, or the type of data they contain, such as a document file or a graphical image file.
Files are also classified by their permanence, for example, a payroll master file versus a payroll weekly transaction
1
D a t a R e s o u r c e M a n a g e m e n t
file. A transaction file, therefore, would contain records of all transactions occurring during a period and might
be used periodically to update the permanent records contained in a master file. A history file is an obsolete
transaction or master file retained for backup purposes or for long-term historical storage called archival storage.
Database
A database is an integrated collection of logically related records or objects. An object consists of data values
describing the attributes of an entity, plus the operations that can be performed upon the data.
A database consolidates records previously stored in separate files into a common pool of data records that
provides data for many applications. The data stored in a database are independent of the application programs
using them and of the type of secondary storage devices on which they are stored. For example, a personnel
database consolidates data formerly segregated in separate files such as payroll files, personnel action files, and
employee skills files.
For example, customer records and other common types of data are needed for several different applications in
banking, such as check processing, automated teller systems, bank credit cards, savings accounts, and instalments
loan accounting. These data can be consolidated into a common customer database, rather than being kept in
separate files for each of those applications.
Database Development
Database management packages like Microsoft Access or Lotus Approach allow end users to easily
develop the databases they need. However, large organizations with client/server or mainframe-based
systems usually place control of enterprisewide database development in the hands of database
2
D a t a R e s o u r c e M a n a g e m e n t
administrators (DBAs) and other database specialists. This improves the integrity and security of
organizational data bases. Database developers use the data definition language (DDL) in database
management systems like Oracle 8 or IBM's DB2 to develop and specify the data contents,
relationships, and structure of each database, and to modify these database specifications when
necessary. Such information is catalogued and stored in a database of data definitions and specifications
called a data dictionary, which is maintained by the DBA.
Data dictionaries can be queried by the database administrator to report the status of any aspect of a
firm's metadata. The administrator can then make changes to the definitions of selected data elements.
Some active (versus passive) data dictionaries automatically enforce standard data element definitions
whenever end users and application programs use a DBMS to access an organization's databases. For
example, an active data dictionary would not allow a data entry program to use a nonstandard def-
inition of a customer record, nor would it allow an employee to enter a name of a customer that
exceeded the defined size of that data element.
Database Interrogation
The database interrogation capability is a major benefit of a database management system. End users can use a
DBMS by asking for information from a database using a query language or a report generator. They can receive
an immediate response in the form of video displays or printed reports. No difficult programming is required.
The query language feature lets you easily obtain immediate responses to ad hoc data requests: you merely key
in a few short inquiries. The report generator feature allows you to quickly specify a report format for
information you want presented as a report.
SQL Queries
SQL, or Structured Query Language, is a query language found in many database management packages. The
basic form of an SQL query is:
SELECT. . . FROM. . . WHERE. . .
After SELECT you list the data fields you want retrieved. After FROM you list the files or tables from which
the data must be retrieved. After WHERE you specify conditions that limit the search to only those data records
in which you are interested.
Database Maintenance
The databases of an organization need to be updated continually to reflect new business transactions and other
events. Other miscellaneous changes must also be made to ensure accuracy of the data in the databases. This
3
D a t a R e s o u r c e M a n a g e m e n t
database maintenance process is accomplished by transaction processing programs and other end user
application packages, with the support of the DBMS. End users and information specialists can also employ
various utilities provided by a DBMS for database maintenance.
Application Development
DBMS packages play a major role in application development. End users, systems analysts, and other
application developers can use the internal4GL programming language and built-in software development tools
provided by many DBMS packages to develop custom application programs. For example, you can use a
DBMS to easily develop the data entry screens, forms, reports, or web pages of a business application. A DBMS
also makes the job of application programmers easier, since they do not have to develop detailed data-handling
procedures using a conventional programming language every time they write a program. Instead, they can
include data manipulation language (DML) statements in their programs that call on the DBMS to perform
necessary data-handling activities.
Types of Databases
Continuing developments in information technology and its business applications have resulted in the evolution
of several major types of database.
Operational Databases
These databases .store detailed data needed to support the business processes and operations of the E-business
enterprise. They are also called subject area databases (SAD B), transaction databases, and production databases.
Examples are a customer database, human resource database, inventory database, and other data bases
containing data generated by business operations. This includes databases of Internet and electronic commerce
activity, such as click stream data describing the online behaviour of customers or visitors to a company's website.
Distributed Databases
Many organizations replicate and distribute copies or parts of databases to network servers at a variety of sites.
These distributed databases can reside on network servers on the World Wide Web, on corporate intranets or
extranets, or on other company networks. Distributed databases may be copies of operational or analytical
databases, hypermedia or discussion databases, or any other type of database. Replication and distribution of
databases is done to improve database performance and security. Ensuring that all of the data in an
organization's distributed databases are consistently and concurrently updated is a major challenge of distributed
database management.
External Databases
Access to Data Warehouses and Data Mining
A data warehouse stores data that have been extracted from the various operational external and other databases
of an organization. It is a central source of data that have been cleaned, transformed, and catalogued so they can
be used by managers and other business professionals for data mining, online analytical processing, and other
forms of business analysis, market research, and decision support. Data warehouses may be subdivided into data
marts, which hold subsets of data from the warehouse that focus on specific aspects of a company, such as a
department or a business process.
Data from various operational and external databases can be captured, cleaned and transformed into data that
can be better used for analysis. This acquisition process might include activities like consolidating data from
several sources, filtering out unwanted data, correcting incorrect data, converting data to new data elements, and
aggregating data into new data subsets.
This data is then stored in the enterprise data warehouse, from where it can be moved into data marts or to an
analytical data store that holds data in a more useful form for certain types of analysis. Metadata that defines the
4
D a t a R e s o u r c e M a n a g e m e n t
data in the data warehouse is stored in a metadata repository and catalogued by a metadata directory. Finally, a
variety of analytical software tools can be provided to query, report, mine, and analyze the data for delivery to
business end users via Internet and intranet web systems or other networks.
Data Mining
Data mining is a major use of data warehouse databases. In data mining, the data in a data warehouse are
analyzed to reveal hidden patterns and trends in historical business activity. This can be used to help managers
make decisions about strategic changes in business operations to gain competitive advantages in the marketplace.
Data mining can discover new correlations, patterns, and trends in vast amounts of business data (frequently
several terabytes of data), stored in data warehouses. Data mining software uses advanced pattern recognition
algorithms, as well as a variety of mathematical and statistical techniques to sift through mountains of data to
extract previously unknown strategic business information.
The rapid growth of websites on the Internet and corporate intranets and extranets has dramatically increased the
use of databases of hypertext and hypermedia documents. A website stores such information in a hypermedia
database consisting of hyperlinked pages of multimedia (text, graphic and photographic images, video clips,
audio segments, and so on). That IS, from a database management point of view, the set of interconnected
multimedia pages at a website is a database of interrelated hypermedia pages, rather than interrelated data
records.
You might use a web browser on your client PC to connect with a web network server. This server runs web
server software to access and transfer the web pages you request. Hypermedia database consist of HTML
(Hypertext Markup Language) pages, image files, video files, and audio. The web server software acts as a
database management system to manage the use of the interrelated hypermedia pages of the website.
Database administration is an important data resource management function responsible for the proper use of
database management technology. Database administration includes responsibility for developing and
maintaining the organization's data dictionary, designing and monitoring the performance of databases, and
enforcing standards for database use and security. Database administrators and analysts work with systems
developers and end users to provide their expertise to major systems development projects.
5
D a t a R e s o u r c e M a n a g e m e n t
Data planning is a corporate planning and analysis function that focuses on data resource management. It
includes the responsibility for developing an overall data architecture for the firm's data resources that ties in with
the firm's strategic mission and plans, and the objectives and processes of its business units. Data planning is
done by organizations that have made a formal commitment to long-range planning for the strategic use and
management of their data resources.
Data administration is another vital data resource management function. It involves administering the collection,
storage, and dissemination of all types of data in such a way that data become a standardized resource available
to all end users in the organization. The focus of data administration is the support of an organization's business
processes and strategic business objectives. Data administration may also include responsibility for developing
policies and setting standards for corporate database design, processing, and security arrangements.
Like many companies, the Massachusetts Housing Finance Agency found itself routinely storing vital business
information in many different types of databases. The MHFA decided not to install and train executive end users
in a new software system, or reengineer business processes so they could all share a common database. Instead,
they selected the DQpowersuite of reporting and query software tools which lets end users make queries and
produce reports that access data in different databases as if they were part of one common database. Though the
response times are significantly slower with DQpowersuite, its main advantage is its ability to easily give
business users the query and reporting tools they need to access the diverse data resources of a company.
6
D a t a R e s o u r c e M a n a g e m e n t
Database structures
The relationships among the many individual records stored in data bases are based on one of several logical
data structures, or models. Database management system packages are designed to use a specific data structure to
provide end users with quick, easy access to information stored in databases. Five fundamental database
structures are the hierarchical, network, relational, object-oriented, and multidimensional models.
Hierarchical Structure
Early mainframe DBMS packages used the hierarchical structure, in which the relationships between records
form a hierarchy or treelike structure. In the traditional hierarchical model, all records are dependent and
arranged in multilevel structures, consisting of one root record and any number of subordinate levels. Thus, all
of the relationships among records are one-to-many, since each data element is related to only one element
above it. The data element or record at the highest level of the hierarchy (the department data element in this
illustration) is called the root element. Any data element can be accessed by moving progressively downward
from a root and along the branches of the tree until the desired record (for example, the employee data element)
is located.
Network Structure
The network structure can represent more complex logical relationships, and is still used by some mainframe
DBMS packages. It allows many-to-many relationships among records; that is, the network model can access a
data element by following one of several paths, because any data element or record can be related to any
number of other data elements. For example departmental records can be related to more than one employee
record, and employee records can be related to more than one project record. Thus, one could locate all
employee records for a particular department, or all project records related to a particular employee.
Relational Structure
The relational model has become the most popular of the three database structures. It is used by most
microcomputer DBMS packages, as well as by many midrange and mainframe systems. In the relational model,
all data elements within the database are viewed as being stored in the form of simple tables. Relational database
model may include two tables representing some of the relationships among departmental and employee records.
Other tables, or relations, for this organization's database might represent the data element relationships among
projects, divisions, product lines, and so on. Database management system packages based on the relational
model can link data elements from various tables to provide information to users. For example, a DBMS
package could retrieve and display an employee's name and salary from the employee table and the name of the
employee's department from the department table, by using their common department number field to link or
join the two tables.
Multidimensional Structure
The multidimensional database structure is a variation of the relational model that uses multidimensional
structures to organize data and express the relationships between data. You can visualize multidimensional
structures as cubes of data and cubes within cubes of data. Each side of the cube is considered a dimension of
the data. Each dimension can represent a different category, such as product type, region, sales channel, and
time.
Each cell within a multidimensional structure contains aggregated data related to elements along each of its
dimensions. For example, a single cell may contain the total sales for a product in a region for a specific sales
channel in a single month. A major benefit of multidimensional databases is that they are a compact and easy-to-
understand way to visualize and manipulate data elements that have many interrelationship' So multidimensional
7
D a t a R e s o u r c e M a n a g e m e n t
databases have become the most popular database structure for the analytical data bases that support online
analytical processing (OLAP) applications, in which fast answers to complex business queries are expected.
Object-Oriented Structure
The object-oriented database model is considered to be one of the key technologies of a new generation of
multimedia web-based applications. An object consists of data values describing the attributes of an entity, plus
the operations that can be performed upon the data. This encapsulation capability allows the object-oriented
model to better handle more complex types of data (graphics, pictures, voice, text) than other database
structures.
The object-oriented model also supports inheritance; that is, new objects can be automatically created by
replicating some or all of the characteristics of one or more parent objects. Thus, the checking and savings
account objects can both inherit the common attributes and operations of the parent bank account object. Such
capabilities have made object-oriented database management systems (OODBMS) popular in computer-aided design
(CAD) and in a growing number of applications. For example, object technology allows designers to develop
product designs, store them as objects in an object-oriented database, and replicate and modify them to create
new product designs. In addition, multimedia web-based applications for the Internet and corporate intranets
and extranets have become a major application area for object technology.
The hierarchical data structure was a natural model for the databases used for the structured, routine types of
transaction processing that was a characteristic of many business operations. Data for these operations can easily
be represented by groups of records in a hierarchical relationship. However, there are many cases where
information is needed about records that do not have hierarchical relationships. For example, it is obvious that,
in some organizations, employees from more than one department can work on more than one project. A
network data structure could easily handle this many-to-many relationship. It is thus more flexible than the
hierarchical structure in support of databases for many types of business operations. However, like the
hierarchical structure, because its relationships must be specified in advance, the network model cannot easily
handle ad hoc requests for information.
Relational databases, on the other hand, allow an end user to easily receive information in response to ad hoc
requests. That's because not all of the relationships between the data elements in a relationally organized database
need to be specified when the database is created. Database management software (such as Oracle 8, DB2, Ac-
cess, and Approach) creates new tables of data relationships using parts of the data from several tables. Thus,
relational databases are easier for programmers to work with and easier to maintain than the hierarchical and
network models.
The major limitation of the relational model is that relational database management systems cannot process large
amounts of business transactions as quickly and efficiently as those based on the hierarchical and network
models, or complex, high volume applications as well as the object-oriented model. This performance gap has
narrowed with the development of advanced relational DBMS software with object-oriented extensions. The use
of database management software based on the object-oriented and multidimensional models is growing steadily,
as these technologies are playing a greater role for OLAP and web-based applications. .
Object-oriented database software is finding increasing use in managing the hypermedia databases and Java
applets on the World Wide Web and corporate intranets and extranets. Industry proponents predict that object-
oriented database management systems will become the key software component that manages the hyperlinked
multimedia pages and other types of data that support corporate websites. That's because an OODBMS can
8
D a t a R e s o u r c e M a n a g e m e n t
easily manage the access and storage of objects such as document and graphic images, video clips, audio
segments, and other subsets of web pages.
Object technology proponents argue that an object-oriented DBMS can work with such complex data types and
the Java applets that use them much more efficiently than relational database management systems. However,
major relational DBMS vendors have countered by adding object-oriented modules to their relational software.
Examples include multimedia object extensions to IBM's DB2, Informix's DataBlades for their Universal Server,
and Oracle's object-based "cartridges" for their Universal Server and Oracle 8i.
Accessing Databases
Efficient access to data is important. In database maintenance, records or objects have to be continually added,
deleted, or updated to reflect business transactions. Data must also be accessed rapidly so information can be
produced in response to end user requests.
Key Fields
That's why all data records usually contain one or more identification fields, or keys, that identify the record so it
can be located. For example, the Social Security number of a person is often used as a primary key field that
uniquely identifies the data records of individuals in student, employee, and customer files and databases. Other
methods also identify and link data records stored in several different database files. For example, hierarchical
and network databases may use pointer fields. These are fields within a record that indicate (point to) the
location of another record that is related to it in the same file, or in another file. Hierarchical and network
database management systems use this method to link records so they can retrieve information from several
different database files.
Relational database management packages use primary keys to link records. Each table (file) in a relational
database must contain a primary key. This field (or fields) uniquely identifies each record in a file and must also
be found in other related files. For example, department number can be the primary key in the Department table
and is also a field in the Employee table. As we mentioned earlier, a relational database management package
could easily provide you with information from both tables by joining the tables and retrieving the information
you want.
Sequential Access
One of the original and basic ways to access data is sequential access. This method uses a sequential organization,
in which records are physically stored in a specified order according to a key field in each record. For example,
payroll records could be placed in a payroll file in a numerical order based on employee Social Security
numbers. Sequential access is fast and efficient when dealing with large volumes of data that need to be
processed periodically. However, it requires that all new transactions be sorted into the proper sequence for
sequential access processing. Also, most of the database or file may have to be searched to locate, store, or
modify even a small number of data records. Thus, this method is too slow to handle applications requiring
immediate updating or responses.
Direct Access
When using direct access methods, records do not have to be arranged in any particular sequence on storage
media. However, the computer must keep track of the storage location of each record using a variety of direct
organization methods so that data can be retrieved when needed. New transactions data do not have to be sorted,
and processing that requires immediate responses or updating is easily handled. There are a number of ways to
directly access records in the direct organization method. Let's take a brief look at three widely used methods to
accomplish such direct access processing.
9
D a t a R e s o u r c e M a n a g e m e n t
One common technique of direct access is key transformation. This method performs an arithmetic
computation on a key field of record (such as a product number or Social Security number) and uses the
number that results from that calculation as an address to store and access that record. Thus, the process is called
key transformation because an arithmetic operation is applied to a key field to transform it into the storage
location address of a record. Another direct access method used to store and locate records involves the use of
an index of record keys and related storage addresses. A new data record is stored at the next available location,
and its key and address are placed in an index. The computer uses this index whenever it must access a record.
In the indexed sequential access method (ISAM), records are stored in a sequential order on a magnetic disk
or other direct access storage device based on the key field of each record. In addition, each database contains an
index that references one or more key fields of each data record to its storage location address. Thus, an in-
dividual record can be directly located by using its key fields to search and locate its address in the database
index, just as you can locate key topics in this book by looking them up in its index. As a result, if a few records
must be processed quickly, the index is used to directly access the record needed. However, when large numbers
of records must be processed periodically, the sequential organization provided by this method is used. For
example, processing the weekly payroll for employees or producing monthly statements for customers could be
done using sequential access processing of the records in the database.
Database Development
Developing small, personal databases is relatively easy using microcomputer database management packages.
However, developing a large database of complex data types can be a complex task. In many companies,
developing and managing large corporate data bases are the primary responsibility of the database administrator
and database design analysts. They work with end users and systems analysts to model business processes and
the data they require. Then they determine (1) what data definitions should be included in the database and (2)
what structure or relationships should exist among the data elements.
Database development may start with a top-down data planning process. Database administrators and designers
work with corporate and end user management to develop an enterprise model that defines the basic business
process of the enterprise. Then they define the information needs of end users in a business process, such as the
purchasing/receiving process that all businesses have.
Next, end users must identify the key data elements that are needed to perform their specific business activities.
This frequently involves developing entity relationships diagrams (ERDs) that model the relationships among the
many entities involved in business processes. End users and database designers could use ERD models to
identify what supplier and product data are necessary or may be required if their company has installed
enterprise resource planning (ERP) software to automate their business processes.
Such user views are a major part of a data modelling process where the relationships between data elements are
identified. Each data model defines the logical relationships among the data elements needed to support a basic
business process. For example, can a supplier provide more than one type of product to us? Can a customer
have more than one type of account with us? Can an employee have several pay rates or be assigned to several
project workgroups?
Answering such questions will identify data relationships that have to be represented in a data model that
supports a business process. These data models then serve as logical frameworks (called schemas and subschema)
on which to base the physical design of databases and the development of application programs to support the
business processes of the organization. A schema is an overall logical view of the relationships among the data
elements in a database, while the subschema is a logical view of the data relationships needed to support specific
end user application programs that will access that database.
10
D a t a R e s o u r c e M a n a g e m e n t
Remember that data models represent logical views of the data and relationships of the database. Physical
database design takes a physical view of the data (also called the internal view) that describes how data are to be
physically stored and accessed on the storage devices of a computer system. For example checking, savings, and
instalment lending are business processes whose data models are part of a banking services data model that
serves as a logical data framework for all bank services.
SUMMARY
Data Resource Management. Data resource management is a managerial activity that applies information
systems technology and management tools to the task of managing an organization's data resources. It includes
the database administration function that focuses on developing and maintaining standards and controls for an
organization's databases. Data administration, however, focuses on the planning and control of data to support
business functions and strategic organizational objectives. This includes a data planning effort that focuses on
developing an overall data architecture for a firm's data resources.
Database Management. The database management approach affects the storage and processing of data. The
data needed by different applications are consolidated and integrated into several common databases, instead of
being stored in many independent data files. Also, the database management approach emphasizes updating and
maintaining common databases, having users' application programs share the data in the database, and providing
a reporting and an inquiry/response capability so end users can easily receive reports and quick responses to
requests for information.
Database Software. Database management systems are software packages that simplify the creation, use, and
maintenance of databases. They provide software tools so end users, programmers, and database administrators
can create and modify databases, interrogate a database, generate reports, do application development, and
perform database maintenance.
Types of Databases. Several types of databases are used by business organizations, including operational,
distributed, and external databases. Data warehouses are a central source of data from other data bases that have
been cleaned, transformed and catalogued for business analysis and decision support applications. That includes
data mining, which attempts to find hidden patterns and trends in the warehouse data. Hypermedia databases on
the World Wide Web and corporate intranets and extranets store hyperlinked multimedia pages at a website.
Web server software can manage such databases for quick access and maintenance of the web database.
Database Development. The development of databases can be easily accomplished using microcomputer
database management packages for small end user applications. However, the development of large corporate
databases requires a top-down data planning effort. This may involve developing enterprise and entity
relationship models, subject area databases, and data models that reflect the logical data elements and
relationships needed to support the operation and management of the basic business processes of the or-
ganization.
Data Access. Data must be organized in some logical manner on physical storage devices so that they can be
efficiently processed. For this reason, data are commonly organized into logical data elements such as characters,
fields, records, files, and data bases. Database structures, such as the hierarchical, network, relational, and object-
oriented models, are used to organize the relationships among the data records stored in databases. Databases
and files can be organized in either a sequential or direct manner and can be accessed and maintained by either
sequential access or direct access processing methods.
11