CHAPTER THREE
CHAPTER THREE
1|Page
o Characters can be uppercase letters (A-Z), lowercase letters (a-z), numbers (0-9),
or special symbols (e.g., ., !, +, -).
3.1.2.1 From Characters to Records
• Characters combine to form a field, which represents a single piece of data, such as a
name, number, or attribute related to a business object (e.g., employee, location) or activity
(e.g., sale). Fields may also be calculated, such as totals, averages, or other computed
values.
• A collection of related fields makes up a record, providing a complete description of a
single entity or event. For example, an employee record includes fields for the employee’s
name, address, phone number, pay rate, and earnings to date.
2|Page
Attributes are chosen to capture the relevant details about an entity. The specific value of an
attribute (called a data item) is stored in the fields of the corresponding record.
A data key is a field within a record used to identify that record.
3|Page
In the past, information systems relied on individual files for storing data specific to each system.
For instance, a payroll system would maintain its own payroll file, and every operational system
managed separate data files for its purposes.
Today, most organizations adopt the database approach to data management, where multiple
information systems share a centralized pool of related data.
4|Page
reports
reports
reports
reports
5|Page
3.3.3 Organizing a Database
When designing a database, several critical factors must be addressed:
• Data Collection: Identifying the data to collect and its sources.
• Access Control: Determining who will have access to the data.
• Usage Goals: Understanding how the data will be used.
• Performance Monitoring: Ensuring the database meets standards for response time,
availability, and other key performance metrics.
For example, AppDynamics offers a cloud-based business execution solution called i-nexus,
which helps clients define actions and plans to achieve their goals. This service operates on 30
Java virtual machines and eight database servers, continuously monitored with database
performance tools. This approach has improved system responsiveness and reduced the time
needed to resolve issues.
At the organizational level, this is known as enterprise data modeling. This approach starts with
strategic-level analysis of general data needs and progresses to more detailed requirements for
departments and functional areas. The result is a roadmap for developing databases and
information systems, including standardized data definitions and formats that ensure compatibility
and integration across systems.
6|Page
ER diagrams are essential for ensuring that the relationships among database entities align with
business operations and user needs. They also serve as reference tools, aiding in database updates
and redesigns.
Such relationships are visually represented using symbols like the "crow’s foot" to indicate
multiplicity. ER diagrams are invaluable for maintaining consistency and supporting efficient
database design and usage.
7|Page
Data Table 2: department table
Each attribute can be restricted to a set of permissible values, known as its domain. The domain
defines the acceptable values that can be entered in each column of a relational table. For example,
the domain for an attribute like "employee type" might only allow values such as "H" (hourly) or
"S" (salary). If someone attempted to input a "1" in the employee type field, it would be rejected.
Similarly, the domain for "pay rate" would exclude negative values. By establishing domains, data
accuracy can be enhanced.
Selecting involves filtering rows based on specific criteria. For example, if a department manager
needs to find the department number for project 226, which is a sales manual project, they can use
selection to display only the row for project 226. This will show that the department number for
the sales manual project is 598.
8|Page
Projecting entails removing certain columns from a table. For instance, a department table might
include the department number, department name, and the manager's social security number
(SSN). If the sales manager wants a table with just the department number and SSN for the sales
manual project, they can project the data by eliminating the department name column, resulting in
a new table with only the relevant data.
Joining combines two or more tables. For example, you can merge the project table with others to
create a new table that includes project number, description, department number, department
name, and the manager's SSN.
Tables in a relational database can be linked through shared data attributes, allowing for more
comprehensive data analysis and report generation. Linking—combining tables based on common
attributes—enhances the flexibility and power of relational databases. For instance, the president
of a company might want to know the name of the manager for the sales manual project and how
long they have been with the company. If the company has manager, department, and project
tables, these can be linked as shown in the provided figures to generate the needed information.
9|Page
One of the key benefits of a relational database is its ability to link tables, as demonstrated in figure
3.9. This linking minimizes data redundancy and enables data to be organized more logically. For
instance, linking the manager's social security number, stored once in the manager table, avoids
the need to store it repeatedly in the project table.
The relational database model is widely adopted due to its simplicity, flexibility, and intuitive
nature compared to other models. It organizes data in tables, making it easier to control. As shown
in figure 3.10, a relational database management system (RDBMS) like Microsoft Access can be
used to store data in rows and columns, with hyperlink tools available on the ribbon or toolbar for
creating, editing, and manipulating the database. The ability to link tables also allows users to
relate data in new ways without having to redefine complex relationships. Because of these
advantages, many organizations use the relational model for large corporate databases, including
those for marketing and accounting.
Relational databases such as Oracle, IBM DB2, Microsoft SQL Server, Microsoft Access,
MySQL, Sybase, and others are based on the relational model. This model has been highly
successful and remains the dominant choice in the commercial sector. However, many
10 | P a g e
organizations are now exploring new nonrelational models to address certain business
requirements.
Data cleansing (also known as data cleaning or data scrubbing) is the process of identifying and
correcting or removing incomplete, incorrect, inaccurate, or irrelevant records within a database.
The aim of data cleansing is to enhance the quality of data used for decision-making. "Bad data"
may result from user entry errors or corruption during data transmission or storage. Data cleansing
differs from data validation, which involves identifying and rejecting "bad data" during the data
entry process.
One method of data cleansing involves cross-checking the data against a validated dataset to
identify and correct errors. For example, entries for street number, street name, city, state, and Zip
code in a database could be verified by comparing them with a trusted Zip code database.
Additionally, data cleansing may involve standardizing information, such as converting various
abbreviations (St., St, st., st) into a single standard term (street).
Data enhancement involves enriching the existing data by adding related information, such as
appending zip code details, country codes, or census tract codes to records.
However, the cost of data cleansing can be substantial. Achieving 100% accuracy in a database by
eliminating all "bad data" is often too expensive to be practical.
11 | P a g e
in a broad range of types and functionalities, from affordable software packages to advanced
systems costing hundreds of thousands of dollars.
SQL databases adhere to the ACID properties (Atomicity, Consistency, Isolation, Durability),
introduced by Jim Gray after Codd’s research. These properties ensure that database transactions
are executed reliably and maintain data integrity. Essentially, data is divided into atomic values—
independent, indivisible elements such as employee ID, last name, and city. The data remains
consistent across the database, is isolated from other transactions until one is completed, and is
durable, meaning it should not be lost.
SQL databases implement concurrency control by locking records to prevent modifications from
other transactions until the current one is completed. While ACID-compliant SQL databases
ensure high data integrity, they may experience slower performance due to these safeguards.
SQL's popularity stems from its simplicity and flexibility. Programmers can use the same query
language across different systems, from PCs to large mainframes. SQL can also be embedded into
many programming languages, such as C++ and Java, making it a powerful tool for developers.
Its standardization and ease of use have made SQL a widely adopted language in the programming
community.
12 | P a g e
Figure 3.12 Structured query Language (SQL)
Databases used to provide a user view of the database, to add and modify data, to store and retrieve
data, and to manipulate the data and generate reports. Each of these activities is discussed in greater
in the following sections.
When installing and using a large relational database, one of the initial tasks is to inform the
Database Management System (DBMS) about the logical and physical structure of the data, as
well as the relationships among the data for each user. This is done through a description called a
schema, similar to a schematic diagram. In a relational database, the schema outlines the tables,
the fields within each table, and the relationships between those fields and tables. For instance,
large database systems like Oracle use schemas to define the tables and other features associated
with a specific user or person. The DBMS then uses the schema to determine where to locate the
requested data in relation to other pieces of data.
13 | P a g e
Figure 3.13 data definition language
Additionally, a data dictionary may describe data flows, how records are organized, and the
processing requirements for the data. Figure 3.14 presents an example of a typical data dictionary
entry.
14 | P a g e
Figure 3.14 data dictionary entry
In the example shown in figure 3.14, the data dictionary for the part number of an inventory item
may contain the following details:
A data dictionary is an essential tool for maintaining an efficient database with reliable, non-
redundant information, and it simplifies modifications when needed. It also assists programmers
by providing a detailed description of the data elements in the database, enabling them to write the
necessary code to access the data.
Following the standards outlined in the data dictionary also facilitates data sharing across
organizations. For example, the U.S. Department of Energy (DOE) developed a data dictionary of
terms to standardize the evaluation of energy data. The Building Energy Data Exchange
Specification (BEDES) offers a common language for key data elements, including data formats,
valid ranges, and definitions. This standardization improves communication among contractors,
15 | P a g e
software vendors, finance companies, utilities, and public utility commissions. Adhering to these
data standards ensures that information can be easily shared and aggregated without extensive data
cleansing and conversion, allowing stakeholders to answer important questions about energy
savings and usage.
A DBMS serves as an intermediary between an application program and the database. When an
application needs data, it makes a request through the DBMS. For example, if a pricing program
needs data on the engine option of a new car (such as a six-cylinder engine instead of the standard
four-cylinder engine), the application sends the request to the DBMS. The request follows a logical
access path (LAP). The DBMS then, in collaboration with various system programs, accesses the
storage device (such as a disk drive or solid-state drive) where the data is stored. The DBMS
follows a physical access path (PAP) to locate and retrieve the data. In this case, the DBMS might
retrieve the price data for the six-cylinder engine from a disk drive.
Problems can arise if two or more people or programs try to access the same record simultaneously.
For instance, an inventory control program may reduce the inventory count by 10 units because 10
units were shipped to a customer, while at the same time, a purchasing program may increase the
inventory by 200 units due to a recent delivery. Without proper controls, one of the updates could
be incorrect, leading to inaccurate inventory levels. To prevent this, concurrency control can be
implemented, such as locking the record to prevent other programs from accessing it while it is
being updated by one program.
Once a DBMS is set up, authorized users, including employees and managers, can access it to
review reports and retrieve essential information. A DBMS allows a company to efficiently
manage this process. Some databases utilize Query By Example (QBE), a visual method for
creating database queries or requests. With QBE, users can perform queries and other tasks by
interacting with windows and clicking on the data or features they need, much like using other
GUI (graphical user interface) operating systems and applications.
16 | P a g e
3.9.3 Database Administration
Database administrators (DBAs) are trained and experienced IS professionals who collaborate with
business users to define their data requirements, use database programming languages to create
databases that meet those needs, test and assess databases, implement improvements for
performance, and ensure data security from unauthorized access. A DBA must have a solid
understanding of the organization’s core business, proficiency with selected database management
systems, and awareness of emerging technologies and design trends. The DBA's role encompasses
planning, designing, creating, operating, securing, monitoring, and maintaining databases.
Typically, DBAs hold a degree in computer science or management information systems, along
with some job-specific training or extensive experience with various database products.
The DBA works closely with users to define the database's content, determining which entities are
important and what attributes should be recorded for them. This highlights the importance of DBAs
not only understanding the organization's business but also ensuring that non-IS personnel
recognize the value of their role. DBAs play a crucial part in developing effective information
systems that benefit the organization, employees, and managers.
DBAs also collaborate with programmers to ensure that applications adhere to database
management system standards and conventions. After the database is operational, the DBA
monitors security logs for violations and tracks performance to meet user needs, ensuring system
efficiency. If issues arise, the DBA addresses them proactively to prevent escalation.
A key responsibility of the DBA is safeguarding the database from attacks or failures. They use
security software, preventive measures, and redundancy to protect data and ensure its accessibility.
Despite these efforts, database security breaches still occur. For instance, between June and August
2014, more than 83 million customer records were stolen from JPMorgan Chase, marking the
largest theft of consumer data from a U.S. financial institution.
Some organizations have introduced the role of data administrator, responsible for establishing
consistent principles for data management across the organization, such as defining data standards
and ensuring uniformity in data definitions across databases. For example, a data administrator
ensures that terms like "customer" are consistently defined and handled across all corporate
databases. They also work with business managers to determine who should have read or update
17 | P a g e
access to certain databases and attributes, passing this information to the DBA for implementation.
The data administrator is often a high-level position reporting to senior management.
The DBMS market includes software that caters to a wide range of users, from non-technical
individuals to highly skilled professional programmers, and operates on various types of
computers, from tablets to supercomputers. This market generates billions of dollars annually for
companies like IBM, Oracle, and Microsoft.
Choosing a DBMS starts with evaluating the organization’s information needs. Key factors to
consider include the database size, the number of concurrent users, performance requirements,
the DBMS’s ability to integrate with other systems, its features, vendor options, and the cost of
the system.
With Database as a Service (DaaS), the database is hosted on a service provider's servers and
accessed by users over the Internet, with the service provider handling the database
administration. Numerous companies offer DaaS services, including Amazon, Google,
Microsoft, Oracle, IBM, and others. For instance, Amazon Relational Database Service (Amazon
RDS) allows organizations to set up and manage MySQL, Microsoft SQL, Oracle, or
PostgreSQL databases in the cloud. The service automatically backs up the database and retains
those backups based on a user-defined retention schedule.
18 | P a g e
3.11 Big Data
Big data refers to datasets that are exceptionally large (terabytes or more) and complex,
encompassing everything from sensor data to social media content, which makes them difficult for
traditional data management software, hardware, and analysis methods to handle effectively.
3.11.1 Characteristics of Big Data
Computer technology analyst Doug Laney associated the three characteristics of volume, velocity,
and variety with big data.
• Volume. In 2014, it was estimated that the volume of the data that exists in the digital
universe is expected to grow to an amazing 44 zettabytes by 2020,
• Velocity. The velocity at which data is currently coming at us exceeds 5 trillion bits per
second. This rate is accelerating rapidly, and the volume of digital data is expected to
double every two years between now and 2020.
• Variety. Data today comes in a variety of formats. Some of the data is what computer
scientists call structured data its format is known in advance, and it fits nicely into
traditional databases. For example, the data generated by the well-defined business
transactions that are used to update many corporate databases containing customer,
product, inventory, financial, and employee data is generally structured data. However,
most of the data that an organization must deal with its unstructured data, meaning that it
is not organized in any predefined manner. Unstructured data comes from sources such as
word-processing documents, social media, email, photos, surveillance video, and phone
messages.
19 | P a g e
Big Data uses
Here are several examples of how organizations are leveraging big data to enhance their daily
operations, planning, and decision-making:
• Retail companies track social networks like Facebook, Google, LinkedIn, Twitter, and
Yahoo to engage brand supporters, identify potential brand detractors (and attempt to
change their views), and allow passionate customers to promote their products.
• Advertising and marketing agencies monitor social media comments to gauge consumer
responses to advertisements and promotions.
• Hospitals analyze medical data and patient records to identify individuals at risk of needing
readmission within a few months of discharge, with the goal of proactively engaging these
patients to prevent costly rehospitalizations.
• Customer product companies observe social media to understand customer behavior,
preferences, and product perceptions, using this insight to refine their products, services,
and marketing strategies.
• Financial services firms utilize customer interaction data to identify individuals likely to
respond to more targeted and personalized offers.
• Manufacturers analyze subtle vibration data from their machinery, which shifts slightly as
it wears, to predict the ideal time for maintenance or equipment replacement, thereby
preventing costly repairs or catastrophic failures.
20 | P a g e
3.11.3 Challenges Of Big Data
Individuals, organizations, and society at large must find ways to manage the ever-expanding
flood of data to avoid the risks of information overload. The challenge is complex, with
numerous questions that need addressing, such as how to select which data to store, where and
how to store it, how to extract relevant data for decision-making, how to derive value from this
data, and how to protect sensitive information from unauthorized access. With so much data
available, business users often struggle to find the information they need for decisions and may
lack confidence in the accuracy of the data they can access.
Handling data from diverse sources, especially external ones, also increases the risk of non-
compliance with government regulations or internal controls. If compliance measures are not
clearly defined and followed, it can lead to violations, which may result in government
investigations and significant financial consequences. For instance, when Marvell Technologies
discovered issues with its revenue booking practices, it caused a 16% drop in its stock price in a
single day.
Optimists believe these challenges can be overcome, and that the increasing availability of data
will lead to more accurate analysis, improved decision-making, and better outcomes. However,
not everyone is on board with big data applications. Concerns about privacy arise, as
corporations collect vast amounts of personal data that may be shared with other organizations.
This enables the creation of detailed profiles of individuals without their knowledge or consent.
Furthermore, big data introduces security risks—can organizations safeguard this data from
competitors and malicious hackers? Some experts believe that companies collecting and storing
big data could face liability from individuals or organizations. Despite these potential drawbacks,
many businesses are diving into big data, attracted by the potential for valuable insights and new
applications.
21 | P a g e
promotes the understanding, development, and practice of managing data as a crucial enterprise
asset. DAMA has outlined 10 key functions of data management.
At the heart of data management is Data Governance, which defines the roles, responsibilities, and
processes necessary to ensure that data is reliable and usable across the organization. It involves
designating individuals responsible for resolving and preventing data issues.
The need for data management arises from several factors, including compliance with external
regulations aimed at managing risks like financial misstatement, safeguarding sensitive data, and
ensuring high-quality data is available for critical decision-making. Without structured business
processes and controls, these requirements cannot be met. Formal management processes are
essential to govern data.
Effective data governance requires leadership from the business side and active involvement
across departments, not just from the information systems team. A cross-functional approach is
advised because various departments utilize data and information systems, and no single individual
can fully grasp the organization’s data needs.
A cross-functional team is especially crucial for meeting compliance requirements. The data
governance team should include executives, project managers, business managers, and data
stewards. A data steward is responsible for managing critical data elements, such as sourcing new
data, defining and maintaining consistent reference and master data, and addressing data quality
issues. Data users consult data stewards for guidance on which data to use for business decisions
and to ensure data accuracy and completeness.
The data governance team establishes the ownership of data assets and creates policies outlining
accountability for various aspects of the data, such as its accuracy, accessibility, consistency,
completeness, and security. The team also defines processes for storing, archiving, backing up,
and protecting data from cyberattacks, accidental destruction or disclosure, and theft. Furthermore,
it develops standards and procedures for authorizing who can access, update, and use the data,
along with implementing controls and audit processes to ensure ongoing compliance with
organizational policies and regulations.
22 | P a g e