0% found this document useful (0 votes)
18 views22 pages

CHAPTER THREE

Uploaded by

Hayelom
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views22 pages

CHAPTER THREE

Uploaded by

Hayelom
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

CHAPTER THREE

3.1 DATABASE SYSTEM AND BIG DATA


A database is a structured, well-organized, and carefully maintained collection of data designed
to support an organization in achieving its objectives. As a key component of an information
system, a database enables success by delivering timely, accurate, and relevant information to
managers and decision-makers. It also aids companies in analyzing data to cut costs, boost profits,
attract new customers, monitor past operations, and explore new market opportunities.
A Database Management System (DBMS) is a set of programs that facilitate accessing and
managing a database while serving as a bridge between the database, its users, and other
applications. A DBMS provides centralized control over data resources, which is essential for
ensuring data integrity and security. Together, the database, the DBMS, and the application
programs that utilize the data form a comprehensive database environment.
With the rapid growth of information, databases and DBMSs have become even more crucial for
organizations. Many organizations rely on multiple databases, but without effective data
management, it becomes challenging to locate and connect relevant information, making accurate
and critical decision-making nearly impossible.

3.1.1 Data Fundamentals


Data is essential for an organization to perform its business operations effectively. Without data
and the capability to process it, key activities such as paying employees, issuing invoices, ordering
inventory, or generating information for managerial decision-making would not be possible.
Data refers to raw facts, such as employee IDs or sales numbers. To convert this raw data into
meaningful information, it must be systematically organized and structured.

3.1.2 Hierarchy of Data


Data is organized into a structured hierarchy, starting from the smallest unit (a bit) and building
up to a database.

• A bit is a binary digit (0 or 1) representing the on/off state of a circuit.


• Bits are grouped into bytes, with a byte typically consisting of eight bits. Each byte
represents a character, the fundamental unit of information.

1|Page
o Characters can be uppercase letters (A-Z), lowercase letters (a-z), numbers (0-9),
or special symbols (e.g., ., !, +, -).
3.1.2.1 From Characters to Records
• Characters combine to form a field, which represents a single piece of data, such as a
name, number, or attribute related to a business object (e.g., employee, location) or activity
(e.g., sale). Fields may also be calculated, such as totals, averages, or other computed
values.
• A collection of related fields makes up a record, providing a complete description of a
single entity or event. For example, an employee record includes fields for the employee’s
name, address, phone number, pay rate, and earnings to date.

3.1.2.2 From Records to Databases


• A set of related records forms a file, such as an employee file containing records for all
employees or an inventory file containing all inventory items.
• At the highest level, a database is an integrated collection of related files. It stores not only
the data at all levels—bits, characters, fields, records, and files—but also the relationships
among them.
This hierarchy of data illustrates how raw data is systematically organized, enabling meaningful
storage, retrieval, and analysis.

3.1.3 Data Entities, Attributes, and Keys


Entities, attributes, and keys are foundational concepts in database design:
• An entity refers to a person, place, object, or concept for which data is collected, stored,
and maintained. Examples include employees, products, or services. Organizations
typically structure their data around entities.
• An attribute represents a characteristic or property of an entity. For instance:
o For an employee, attributes might include employee ID, last name, first name, hire
date, and department number.
o For inventory items, attributes might include inventory ID, description, quantity in
stock, and storage location.
o For customers, attributes may include customer ID, name, address, phone number,
credit rating, and contact person.

2|Page
Attributes are chosen to capture the relevant details about an entity. The specific value of an
attribute (called a data item) is stored in the fields of the corresponding record.
A data key is a field within a record used to identify that record.

3.2 Database Applications in Real-World Scenarios


Organizations often create databases of attributes, populated with data items, to support daily
operations. Some examples include:
• Offshore Leaks Database: This database contains information about 100,000 offshore
companies and trusts. While creating offshore accounts is legal, such accounts are often
used to evade taxes. Law enforcement and tax officials use this database to identify
potential tax evaders.
• Stolen-Phone Database: U.S. wireless providers have developed a database to track stolen
3G and 4G/LTE phones. If a device is reported stolen, it is denied access to carrier
networks. When the device is returned to its owner, it can be reactivated. The database may
eventually integrate with foreign carriers to limit the export of stolen devices.
• Global Terrorism Database (GTD): This database includes details on over 140,000
terrorist incidents from 1970 to 2014, with updates added annually. It records data such as
event date, location, weapons used, targets, casualties, and responsible parties.

3.2.1 Primary and Secondary Keys


• A primary key is a field or set of fields that uniquely identifies a record in a database. For
example, an employee ID or an item number on eBay ensures that each record is distinct
and can be easily accessed, organized, and managed.
• Secondary keys are alternate fields used to locate records when the primary key is
unavailable. For example, a mail-order company might use a customer’s last name as a
secondary key if the customer ID is unknown. If multiple records share the same secondary
key, additional criteria (such as address or phone number) can be used to identify the
correct record.
By combining these concepts, databases ensure efficient data management, allowing organizations
to retrieve, organize, and manipulate data effectively.

3.2.2 The Database Approach

3|Page
In the past, information systems relied on individual files for storing data specific to each system.
For instance, a payroll system would maintain its own payroll file, and every operational system
managed separate data files for its purposes.
Today, most organizations adopt the database approach to data management, where multiple
information systems share a centralized pool of related data.

3.3 Advantages of the Database Approach


• Data Sharing: Databases enable the sharing of data and information across systems. For
example, federal databases may store DNA test results as attributes for convicted criminals,
allowing law enforcement nationwide to access this information.
• Enterprise-Wide Integration: Separate but related databases can be interconnected to
form a comprehensive enterprise-wide database. For instance, Walgreens uses an
electronic health records database to store patient information across all its stores. This
database integrates customer interactions with both in-store medical clinics and
pharmacies, providing a holistic view of patient records.

3.3.1 Role of a Database Management System (DBMS)


To implement the database approach, organizations require additional software called a Database
Management System (DBMS).
• A DBMS serves as an intermediary between the database and its users or application
programs.
• It acts as a buffer, facilitating smooth interaction between the application programs and the
underlying database, ensuring efficient data access and management.
This modern approach enhances efficiency, data accessibility, and resource sharing across an
organization.

4|Page
reports

reports

reports

reports

Database interface application programs users


Figure 3.5 data modeling and database characteristics

3.2.2 Importance of Organized Databases


In today’s data-driven business environment, organizations must effectively manage and analyze
large volumes of information. To achieve this, data must be well-organized, ensuring it can be
efficiently accessed and utilized. A well-designed database should:
• Store all data relevant to the organization.
• Allow for quick access and easy updates.
• Align with the organization’s business processes.
When creating a database, organizations should address the following key considerations:
1. Content: What data needs to be collected, and what is the cost of acquiring it?
2. Access: Which users should have access to specific data, and at what times?
3. Logical Structure: How should data be organized to be meaningful and useful for
specific users?
4. Physical Organization: Where should the data be physically stored?
5. Archiving: How long should the data be retained?
6. Security: What measures should be taken to prevent unauthorized access to the data?
These considerations ensure that the database not only supports the organization’s needs but also
safeguards its data resources.

5|Page
3.3.3 Organizing a Database
When designing a database, several critical factors must be addressed:
• Data Collection: Identifying the data to collect and its sources.
• Access Control: Determining who will have access to the data.
• Usage Goals: Understanding how the data will be used.
• Performance Monitoring: Ensuring the database meets standards for response time,
availability, and other key performance metrics.
For example, AppDynamics offers a cloud-based business execution solution called i-nexus,
which helps clients define actions and plans to achieve their goals. This service operates on 30
Java virtual machines and eight database servers, continuously monitored with database
performance tools. This approach has improved system responsiveness and reduced the time
needed to resolve issues.

3.4 Data Modeling


Database designers use data models to represent the logical relationships among data. A data
model is a diagram that depicts entities and their relationships. Data modeling begins with
identifying a business problem and analyzing the necessary data to address it.

At the organizational level, this is known as enterprise data modeling. This approach starts with
strategic-level analysis of general data needs and progresses to more detailed requirements for
departments and functional areas. The result is a roadmap for developing databases and
information systems, including standardized data definitions and formats that ensure compatibility
and integration across systems.

3.4.1 Entity-Relationship Diagrams (ER Diagrams)


An ER diagram is a commonly used data model that employs graphical symbols to illustrate how
data is organized and related.

• Boxes represent entities (data items stored in tables).


• Lines depict relationships between these entities.

6|Page
ER diagrams are essential for ensuring that the relationships among database entities align with
business operations and user needs. They also serve as reference tools, aiding in database updates
and redesigns.

For example, an ER diagram for an order database might show:

• A one-to-many relationship where one salesperson serves multiple customers.


• A one-to-many relationship where each customer can place multiple orders.
• Orders that include multiple line items, and line items that may correspond to the same
product.

Such relationships are visually represented using symbols like the "crow’s foot" to indicate
multiplicity. ER diagrams are invaluable for maintaining consistency and supporting efficient
database design and usage.

Figure 3.6 Entity-relationship (ER) diagram for a customer order database

3.5 Relational Database Model


The relational database model is an effective and straightforward approach for organizing data into
two-dimensional tables, known as relations. Each table row represents an entity, while each
column corresponds to an attribute of that entity. See figure 3.7.
Data Table 1: project Table

7|Page
Data Table 2: department table

Data Table: manager Table

Each attribute can be restricted to a set of permissible values, known as its domain. The domain
defines the acceptable values that can be entered in each column of a relational table. For example,
the domain for an attribute like "employee type" might only allow values such as "H" (hourly) or
"S" (salary). If someone attempted to input a "1" in the employee type field, it would be rejected.
Similarly, the domain for "pay rate" would exclude negative values. By establishing domains, data
accuracy can be enhanced.

3.5.1 Manipulating Data


Once data is entered into a relational database, users can query and analyze it through various
manipulations, such as selecting, projecting, and joining.

Selecting involves filtering rows based on specific criteria. For example, if a department manager
needs to find the department number for project 226, which is a sales manual project, they can use
selection to display only the row for project 226. This will show that the department number for
the sales manual project is 598.

8|Page
Projecting entails removing certain columns from a table. For instance, a department table might
include the department number, department name, and the manager's social security number
(SSN). If the sales manager wants a table with just the department number and SSN for the sales
manual project, they can project the data by eliminating the department name column, resulting in
a new table with only the relevant data.

Joining combines two or more tables. For example, you can merge the project table with others to
create a new table that includes project number, description, department number, department
name, and the manager's SSN.

Tables in a relational database can be linked through shared data attributes, allowing for more
comprehensive data analysis and report generation. Linking—combining tables based on common
attributes—enhances the flexibility and power of relational databases. For instance, the president
of a company might want to know the name of the manager for the sales manual project and how
long they have been with the company. If the company has manager, department, and project
tables, these can be linked as shown in the provided figures to generate the needed information.

9|Page
One of the key benefits of a relational database is its ability to link tables, as demonstrated in figure
3.9. This linking minimizes data redundancy and enables data to be organized more logically. For
instance, linking the manager's social security number, stored once in the manager table, avoids
the need to store it repeatedly in the project table.

The relational database model is widely adopted due to its simplicity, flexibility, and intuitive
nature compared to other models. It organizes data in tables, making it easier to control. As shown
in figure 3.10, a relational database management system (RDBMS) like Microsoft Access can be
used to store data in rows and columns, with hyperlink tools available on the ribbon or toolbar for
creating, editing, and manipulating the database. The ability to link tables also allows users to
relate data in new ways without having to redefine complex relationships. Because of these
advantages, many organizations use the relational model for large corporate databases, including
those for marketing and accounting.

Relational databases such as Oracle, IBM DB2, Microsoft SQL Server, Microsoft Access,
MySQL, Sybase, and others are based on the relational model. This model has been highly
successful and remains the dominant choice in the commercial sector. However, many

10 | P a g e
organizations are now exploring new nonrelational models to address certain business
requirements.

3.6 Data cleaning


Data used for decision-making must be accurate, complete, economical, flexible, reliable, relevant,
simple, timely, verifiable, accessible, and secure.

Data cleansing (also known as data cleaning or data scrubbing) is the process of identifying and
correcting or removing incomplete, incorrect, inaccurate, or irrelevant records within a database.
The aim of data cleansing is to enhance the quality of data used for decision-making. "Bad data"
may result from user entry errors or corruption during data transmission or storage. Data cleansing
differs from data validation, which involves identifying and rejecting "bad data" during the data
entry process.

One method of data cleansing involves cross-checking the data against a validated dataset to
identify and correct errors. For example, entries for street number, street name, city, state, and Zip
code in a database could be verified by comparing them with a trusted Zip code database.
Additionally, data cleansing may involve standardizing information, such as converting various
abbreviations (St., St, st., st) into a single standard term (street).

Data enhancement involves enriching the existing data by adding related information, such as
appending zip code details, country codes, or census tract codes to records.

However, the cost of data cleansing can be substantial. Achieving 100% accuracy in a database by
eliminating all "bad data" is often too expensive to be practical.

3.7 Relational Database Management System (DBMSs)


Developing and implementing the appropriate database system ensures that it will effectively
support business operations and objectives. But how do we go about creating, implementing,
utilizing, and maintaining a database? The solution lies in the database management system
(DBMS). As mentioned earlier, a DBMS consists of a set of programs that act as an interface
between a database and application programs, or between the database and its users. DBMSs come

11 | P a g e
in a broad range of types and functionalities, from affordable software packages to advanced
systems costing hundreds of thousands of dollars.

3.8 SQL Databases


SQL (Structured Query Language) is a specialized programming language used to access and
manipulate data stored in relational databases. It was initially developed by Donald D. Chamberlin
and Raymond Boyce at the IBM Research Center and introduced in their 1974 paper titled
"SEQUEL: a structured English Query Language." Their work was based on the relational
database model proposed by Edgar F. Codd in his influential 1970 paper "A relational model of
data for large shared data banks."

SQL databases adhere to the ACID properties (Atomicity, Consistency, Isolation, Durability),
introduced by Jim Gray after Codd’s research. These properties ensure that database transactions
are executed reliably and maintain data integrity. Essentially, data is divided into atomic values—
independent, indivisible elements such as employee ID, last name, and city. The data remains
consistent across the database, is isolated from other transactions until one is completed, and is
durable, meaning it should not be lost.

SQL databases implement concurrency control by locking records to prevent modifications from
other transactions until the current one is completed. While ACID-compliant SQL databases
ensure high data integrity, they may experience slower performance due to these safeguards.

SQL's popularity stems from its simplicity and flexibility. Programmers can use the same query
language across different systems, from PCs to large mainframes. SQL can also be embedded into
many programming languages, such as C++ and Java, making it a powerful tool for developers.
Its standardization and ease of use have made SQL a widely adopted language in the programming
community.

12 | P a g e
Figure 3.12 Structured query Language (SQL)

3.8.1 Data Activities

Databases used to provide a user view of the database, to add and modify data, to store and retrieve
data, and to manipulate the data and generate reports. Each of these activities is discussed in greater
in the following sections.

3.8.2 Providing a User View

When installing and using a large relational database, one of the initial tasks is to inform the
Database Management System (DBMS) about the logical and physical structure of the data, as
well as the relationships among the data for each user. This is done through a description called a
schema, similar to a schematic diagram. In a relational database, the schema outlines the tables,
the fields within each table, and the relationships between those fields and tables. For instance,
large database systems like Oracle use schemas to define the tables and other features associated
with a specific user or person. The DBMS then uses the schema to determine where to locate the
requested data in relation to other pieces of data.

3.9 Creating and modifying the Database


Schemas are typically entered into the DBMS by database personnel through a Data Definition
Language (DDL). A DDL is a set of commands and instructions used to define and describe the
relationships between data in a specific database. It allows the creator of the database to outline
how the data will be structured and related within the schema. Generally, a DDL specifies logical
access paths and records within the database. For example, figure 3.13 illustrates a simplified
version of a DDL used to create a general schema, with the letter "X" indicating where specific
information should be inserted. The DDL in this example defines terms such as file description,
area description, record description, and set description, although other terms and commands may
also be used, depending on the DBMS in use.

13 | P a g e
Figure 3.13 data definition language

An essential step in database creation is developing a data dictionary, which provides a


comprehensive description of all the data used within the database. The data dictionary includes
the following details for each data item:

• The name of the data item


• Any aliases or alternative names that might be used
• The range of acceptable values
• The type of data (e.g., alphanumeric or numeric)
• The storage space required for the item
• The individual responsible for updating it and the various users
• The permissions for accessing it
• A list of reports that utilize the data item

Additionally, a data dictionary may describe data flows, how records are organized, and the
processing requirements for the data. Figure 3.14 presents an example of a typical data dictionary
entry.

14 | P a g e
Figure 3.14 data dictionary entry

In the example shown in figure 3.14, the data dictionary for the part number of an inventory item
may contain the following details:

• Name of the person who created the entry (D. Bordwell)


• Date the entry was made (August 4, 2016)
• Name of the person who approved the entry (J. Edwards)
• Approval date (October 13, 2016)
• Version number (3.1)
• Number of pages used for the entry (1)
• The element name is part number (PARTNO)
• A description of the element
• Other possible names (PTNO)
• Value range (part numbers range from 0001 to 9999)
• Data type (numeric)
• Storage required (four positions for the part number)

A data dictionary is an essential tool for maintaining an efficient database with reliable, non-
redundant information, and it simplifies modifications when needed. It also assists programmers
by providing a detailed description of the data elements in the database, enabling them to write the
necessary code to access the data.

Following the standards outlined in the data dictionary also facilitates data sharing across
organizations. For example, the U.S. Department of Energy (DOE) developed a data dictionary of
terms to standardize the evaluation of energy data. The Building Energy Data Exchange
Specification (BEDES) offers a common language for key data elements, including data formats,
valid ranges, and definitions. This standardization improves communication among contractors,

15 | P a g e
software vendors, finance companies, utilities, and public utility commissions. Adhering to these
data standards ensures that information can be easily shared and aggregated without extensive data
cleansing and conversion, allowing stakeholders to answer important questions about energy
savings and usage.

3.9.1 Storing and retrieving data

A DBMS serves as an intermediary between an application program and the database. When an
application needs data, it makes a request through the DBMS. For example, if a pricing program
needs data on the engine option of a new car (such as a six-cylinder engine instead of the standard
four-cylinder engine), the application sends the request to the DBMS. The request follows a logical
access path (LAP). The DBMS then, in collaboration with various system programs, accesses the
storage device (such as a disk drive or solid-state drive) where the data is stored. The DBMS
follows a physical access path (PAP) to locate and retrieve the data. In this case, the DBMS might
retrieve the price data for the six-cylinder engine from a disk drive.

Problems can arise if two or more people or programs try to access the same record simultaneously.
For instance, an inventory control program may reduce the inventory count by 10 units because 10
units were shipped to a customer, while at the same time, a purchasing program may increase the
inventory by 200 units due to a recent delivery. Without proper controls, one of the updates could
be incorrect, leading to inaccurate inventory levels. To prevent this, concurrency control can be
implemented, such as locking the record to prevent other programs from accessing it while it is
being updated by one program.

3.9.2 Manipulating data and Generating Reports

Once a DBMS is set up, authorized users, including employees and managers, can access it to
review reports and retrieve essential information. A DBMS allows a company to efficiently
manage this process. Some databases utilize Query By Example (QBE), a visual method for
creating database queries or requests. With QBE, users can perform queries and other tasks by
interacting with windows and clicking on the data or features they need, much like using other
GUI (graphical user interface) operating systems and applications.

16 | P a g e
3.9.3 Database Administration

Database administrators (DBAs) are trained and experienced IS professionals who collaborate with
business users to define their data requirements, use database programming languages to create
databases that meet those needs, test and assess databases, implement improvements for
performance, and ensure data security from unauthorized access. A DBA must have a solid
understanding of the organization’s core business, proficiency with selected database management
systems, and awareness of emerging technologies and design trends. The DBA's role encompasses
planning, designing, creating, operating, securing, monitoring, and maintaining databases.
Typically, DBAs hold a degree in computer science or management information systems, along
with some job-specific training or extensive experience with various database products.

The DBA works closely with users to define the database's content, determining which entities are
important and what attributes should be recorded for them. This highlights the importance of DBAs
not only understanding the organization's business but also ensuring that non-IS personnel
recognize the value of their role. DBAs play a crucial part in developing effective information
systems that benefit the organization, employees, and managers.

DBAs also collaborate with programmers to ensure that applications adhere to database
management system standards and conventions. After the database is operational, the DBA
monitors security logs for violations and tracks performance to meet user needs, ensuring system
efficiency. If issues arise, the DBA addresses them proactively to prevent escalation.
A key responsibility of the DBA is safeguarding the database from attacks or failures. They use
security software, preventive measures, and redundancy to protect data and ensure its accessibility.
Despite these efforts, database security breaches still occur. For instance, between June and August
2014, more than 83 million customer records were stolen from JPMorgan Chase, marking the
largest theft of consumer data from a U.S. financial institution.
Some organizations have introduced the role of data administrator, responsible for establishing
consistent principles for data management across the organization, such as defining data standards
and ensuring uniformity in data definitions across databases. For example, a data administrator
ensures that terms like "customer" are consistently defined and handled across all corporate
databases. They also work with business managers to determine who should have read or update

17 | P a g e
access to certain databases and attributes, passing this information to the DBA for implementation.
The data administrator is often a high-level position reporting to senior management.

3.10 Popular Database management systems


Many popular database management systems address a wide range of individual, workgroup, and
enterprise needs as shown in Table 3.2.
Table3.2 Popular Database Management Systems

The DBMS market includes software that caters to a wide range of users, from non-technical
individuals to highly skilled professional programmers, and operates on various types of
computers, from tablets to supercomputers. This market generates billions of dollars annually for
companies like IBM, Oracle, and Microsoft.
Choosing a DBMS starts with evaluating the organization’s information needs. Key factors to
consider include the database size, the number of concurrent users, performance requirements,
the DBMS’s ability to integrate with other systems, its features, vendor options, and the cost of
the system.
With Database as a Service (DaaS), the database is hosted on a service provider's servers and
accessed by users over the Internet, with the service provider handling the database
administration. Numerous companies offer DaaS services, including Amazon, Google,
Microsoft, Oracle, IBM, and others. For instance, Amazon Relational Database Service (Amazon
RDS) allows organizations to set up and manage MySQL, Microsoft SQL, Oracle, or
PostgreSQL databases in the cloud. The service automatically backs up the database and retains
those backups based on a user-defined retention schedule.

18 | P a g e
3.11 Big Data
Big data refers to datasets that are exceptionally large (terabytes or more) and complex,
encompassing everything from sensor data to social media content, which makes them difficult for
traditional data management software, hardware, and analysis methods to handle effectively.
3.11.1 Characteristics of Big Data
Computer technology analyst Doug Laney associated the three characteristics of volume, velocity,
and variety with big data.
• Volume. In 2014, it was estimated that the volume of the data that exists in the digital
universe is expected to grow to an amazing 44 zettabytes by 2020,
• Velocity. The velocity at which data is currently coming at us exceeds 5 trillion bits per
second. This rate is accelerating rapidly, and the volume of digital data is expected to
double every two years between now and 2020.
• Variety. Data today comes in a variety of formats. Some of the data is what computer
scientists call structured data its format is known in advance, and it fits nicely into
traditional databases. For example, the data generated by the well-defined business
transactions that are used to update many corporate databases containing customer,
product, inventory, financial, and employee data is generally structured data. However,
most of the data that an organization must deal with its unstructured data, meaning that it
is not organized in any predefined manner. Unstructured data comes from sources such as
word-processing documents, social media, email, photos, surveillance video, and phone
messages.

3.11.2 Sources of Big Data


Organizations gather and utilize data from numerous sources, including business applications,
social media, sensors and controllers within the manufacturing process, systems managing
physical environments in factories and offices, media sources (such as audio and video broadcasts),
machine logs that track events, customer call data, public sources (like government websites), and
archives of historical transactions and communications. A significant portion of this data is
unstructured and does not easily fit into conventional relational database management systems.
Table 3.3 portals that provide access to free sources of useful Big Data

19 | P a g e
Big Data uses
Here are several examples of how organizations are leveraging big data to enhance their daily
operations, planning, and decision-making:

• Retail companies track social networks like Facebook, Google, LinkedIn, Twitter, and
Yahoo to engage brand supporters, identify potential brand detractors (and attempt to
change their views), and allow passionate customers to promote their products.
• Advertising and marketing agencies monitor social media comments to gauge consumer
responses to advertisements and promotions.
• Hospitals analyze medical data and patient records to identify individuals at risk of needing
readmission within a few months of discharge, with the goal of proactively engaging these
patients to prevent costly rehospitalizations.
• Customer product companies observe social media to understand customer behavior,
preferences, and product perceptions, using this insight to refine their products, services,
and marketing strategies.
• Financial services firms utilize customer interaction data to identify individuals likely to
respond to more targeted and personalized offers.
• Manufacturers analyze subtle vibration data from their machinery, which shifts slightly as
it wears, to predict the ideal time for maintenance or equipment replacement, thereby
preventing costly repairs or catastrophic failures.

20 | P a g e
3.11.3 Challenges Of Big Data
Individuals, organizations, and society at large must find ways to manage the ever-expanding
flood of data to avoid the risks of information overload. The challenge is complex, with
numerous questions that need addressing, such as how to select which data to store, where and
how to store it, how to extract relevant data for decision-making, how to derive value from this
data, and how to protect sensitive information from unauthorized access. With so much data
available, business users often struggle to find the information they need for decisions and may
lack confidence in the accuracy of the data they can access.
Handling data from diverse sources, especially external ones, also increases the risk of non-
compliance with government regulations or internal controls. If compliance measures are not
clearly defined and followed, it can lead to violations, which may result in government
investigations and significant financial consequences. For instance, when Marvell Technologies
discovered issues with its revenue booking practices, it caused a 16% drop in its stock price in a
single day.
Optimists believe these challenges can be overcome, and that the increasing availability of data
will lead to more accurate analysis, improved decision-making, and better outcomes. However,
not everyone is on board with big data applications. Concerns about privacy arise, as
corporations collect vast amounts of personal data that may be shared with other organizations.
This enables the creation of detailed profiles of individuals without their knowledge or consent.
Furthermore, big data introduces security risks—can organizations safeguard this data from
competitors and malicious hackers? Some experts believe that companies collecting and storing
big data could face liability from individuals or organizations. Despite these potential drawbacks,
many businesses are diving into big data, attracted by the potential for valuable insights and new
applications.

3.11.4 Data Management


Data management is a comprehensive set of functions that governs the processes by which data is
collected, validated for its intended use, stored, secured, and processed to ensure its accessibility,
reliability, and timeliness meet the needs of the data users within an organization. The Data
Management Association (DAMA) International, a nonprofit, vendor-independent organization,

21 | P a g e
promotes the understanding, development, and practice of managing data as a crucial enterprise
asset. DAMA has outlined 10 key functions of data management.
At the heart of data management is Data Governance, which defines the roles, responsibilities, and
processes necessary to ensure that data is reliable and usable across the organization. It involves
designating individuals responsible for resolving and preventing data issues.
The need for data management arises from several factors, including compliance with external
regulations aimed at managing risks like financial misstatement, safeguarding sensitive data, and
ensuring high-quality data is available for critical decision-making. Without structured business
processes and controls, these requirements cannot be met. Formal management processes are
essential to govern data.
Effective data governance requires leadership from the business side and active involvement
across departments, not just from the information systems team. A cross-functional approach is
advised because various departments utilize data and information systems, and no single individual
can fully grasp the organization’s data needs.
A cross-functional team is especially crucial for meeting compliance requirements. The data
governance team should include executives, project managers, business managers, and data
stewards. A data steward is responsible for managing critical data elements, such as sourcing new
data, defining and maintaining consistent reference and master data, and addressing data quality
issues. Data users consult data stewards for guidance on which data to use for business decisions
and to ensure data accuracy and completeness.
The data governance team establishes the ownership of data assets and creates policies outlining
accountability for various aspects of the data, such as its accuracy, accessibility, consistency,
completeness, and security. The team also defines processes for storing, archiving, backing up,
and protecting data from cyberattacks, accidental destruction or disclosure, and theft. Furthermore,
it develops standards and procedures for authorizing who can access, update, and use the data,
along with implementing controls and audit processes to ensure ongoing compliance with
organizational policies and regulations.

22 | P a g e

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy