0% found this document useful (0 votes)
78 views102 pages

P1 1f.edd8ce2 Notes

1. An accounting information system (AIS) is a type of management information system that provides precise and reliable information for accounting purposes. An AIS typically includes subsystems for transaction processing, financial reporting, and management reporting. 2. An AIS processes transactions by recording valid source documents, classifying transactions, recording them at the correct value in the proper period, and presenting the information in financial statements. It generates reports for internal and external use. 3. An AIS differs from decision support and executive information systems in its focus on detailed transaction processing for accounting, but it provides input to those systems to support management decision making.

Uploaded by

yahoo2008
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views102 pages

P1 1f.edd8ce2 Notes

1. An accounting information system (AIS) is a type of management information system that provides precise and reliable information for accounting purposes. An AIS typically includes subsystems for transaction processing, financial reporting, and management reporting. 2. An AIS processes transactions by recording valid source documents, classifying transactions, recording them at the correct value in the proper period, and presenting the information in financial statements. It generates reports for internal and external use. 3. An AIS differs from decision support and executive information systems in its focus on detailed transaction processing for accounting, but it provides input to those systems to support management decision making.

Uploaded by

yahoo2008
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 102

PART 1

PART 1 UNIT 6

6
1F. Technology and Analytics

Module

1 F.1. Information Systems 3

2 F.2. Data Governance 19

3 F.3. Technology-Enabled Finance Transformation 37

4 F.4. Data Analytics: Part 1 53

5 F.4. Data Analytics: Part 2 71


NOTES

6–2 © Becker Professional Education Corporation. All rights reserved.


1
MODULE
PART 1 UNIT 6

F.1. Information
Systems
Part 1
Unit 6

This module covers the following content from the IMA Learning Outcome Statements.

CMA LOS Reference: Part 1—Section F.1. Information Systems

The candidate should be able to:


a. identify the role of the accounting information system (AIS) in the value chain
b. demonstrate an understanding of the accounting information system cycles, including
revenue to cash, expenditures, production, human resources and payroll, financing, and
property, plant, and equipment, as well as the general ledger (GL) and reporting system
c. identify and explain the challenges of having separate financial and nonfinancial systems
d. define enterprise resource planning (ERP) and identify and explain the advantages and
disadvantages of ERP
e. explain how ERP helps overcome the challenges of separate financial and nonfinancial
systems, integrating all aspects of an organization's activities
f. define relational database and demonstrate an understanding of a database
management system
g. define data warehouse and data mart
h. define enterprise performance management (EPM) [also known as corporate
performance management (CPM) or business performance management (BPM)]
i. discuss how EPM can facilitate business planning and performance management

1 Management Information Systems

Management information systems (MIS) enable companies to use data as part of their strategic
planning process as well as the tactical execution of that strategy. Management information
systems often have subsystems called decision support systems (DSS) and executive information
systems (EIS).
A management information system provides users predefined reports that support effective
business decisions. MIS reports may provide feedback on daily operations, financial and
nonfinancial information to support decision making across functions, and both internal and
external information.

© Becker Professional Education Corporation. All rights reserved. Module 1 6–3


1 F.1. Information Systems PART 1 UNIT 6

1.1 Decision Support Systems (DSS)


A decision support system is an extension of an MIS that provides interactive tools to support day-
to-day decision making. A DSS may provide information, facilitate the preparation of forecasts, or
allow modeling of various aspects of a decision. It is sometimes called an expert system.

Illustration 1 Decision Support Sytem

Examples of decision support systems include production planning, inventory control, bid
preparation, revenue optimization, traffic planning, and capital investment planning systems.

1.2 Executive Information Systems (EIS)


Executive information systems provide senior executives with immediate and easy access to
internal and external information to assist in strategic decision making. An EIS consolidates
information internal and external to the enterprise and reports it in a format and level of detail
appropriate to senior executives.

Illustration 2 Executive Information System

Examples of executive information systems include systems that produce sales forecasts,
profit plans, key performance indicators, macro-economic data, and financial reports.

LOS 1F1a 1.3 Accounting Information Systems (AIS)


The accounting information system (AIS) is a type of management information system with a
high degree of precision and reliability needed for accounting purposes. An AIS may also be a
transaction processing system and a knowledge system (i.e., an input to DSS or EIS) providing
both internal and external reporting features.

1.3.1 AIS Subsystems


AIS are typically made up of three main subsystems (or often referred to as modules):
Transaction Processing System (TPS): A TPS converts economic events into financial
transactions (i.e., journal entries) and distributes the information to support daily operations
functions. TPS typically covers three main transaction cycles: sales cycle, conversion cycle,
and expenditure cycle.
Financial Reporting System (FRS) or General Ledger System (GLS): The FRS/GLS aggregates
daily financial information from the TPS and other sources for infrequent events such as
mergers, lawsuit settlements, or natural disasters to enable timely regulatory and financial
reporting (i.e., for publicly issued financial statements, tax returns, or compliance reports).
Management Reporting System (MRS): An MRS provides internal financial information to
solve day-to-day business problems, such as budgeting, variance analysis, or cost-volume-
profit analysis.

6–4 Module 1 F.1.AllInformation


© Becker Professional Education Corporation. Systems
rights reserved.
PART 1 UNIT
1 6 F.1. Information Systems

1.3.2 Objectives of an AIS


The three subsystems of an AIS collectively achieve the following five objectives:
1. Record valid transactions;
2. Properly classify those transactions;
3. Record the transactions at their correct value;
4. Record the transactions in the correct accounting period; and
5. Properly present the transactions and related information in the financial statements of
the organization.

1.3.3 Sequence of Events in an AIS


An AIS processes transactions in the following order:
1. Transaction data from source documents is entered into the AIS by an end user.
Alternatively, an order may be entered through the Internet by a customer.
2. Original source documents, if they exist, are filed.
3. Transactions are recorded in the appropriate journal.
4. Transactions are posted to the general and subsidiary ledgers.
5. Trial balances are prepared.
6. Adjustments, accruals, and corrections are entered. Financial reports are generated.

Pass Key

An AIS differs from a DSS and an EIS due to the high degree of precision and detail required
for accounting purposes (i.e., transaction processing). Data in an AIS is often processed
and aggregated to become inputs to a DSS and an EIS to enable management to make
data-driven decisions.

1.3.4 Financial Reporting and Management Reporting


In a fully integrated AIS system, transaction processing systems capture, record, and post to
subsidiary and general ledgers throughout the year. The financial reporting group accesses the
general ledger and prepares trial balances, adjusting entries, adjusted trail balances, financial
statements, closing entries, and closing trial balances.
Management reports provide information to solve business problems. These reports need
to be relevant, accurate, complete, timely, and concise. In addition, these reports need to be
exception-oriented to allow users to focus on the problem. For example, a missed time sheet
report can help management identify missing time sheets from employees or contractors so
corrective action may be taken.

© Becker Professional Education Corporation. All rights reserved. Module 1 6–5


1 F.1. Information Systems PART 1 UNIT 6

Generally, management reports can be categorized into the following:


Scheduled Reports: Scheduled reports are produced based on established time frames and
are recurring in nature. Examples include daily listing of sales, daily inventory status report,
weekly payroll reports, and quarterly financial statements.
On-Demand Reports: On-demand reports are triggered by a user’s need, not based on
established time frames. For instance, an inventory reorder report is generated when
inventory falls below the safety margin.

1.3.5 AIS Audit Trail


A well-designed AIS creates an audit trail for accounting transactions. The audit trail allows a
user to trace a transaction from source documents to the ledger and to trace from the ledger
back to source documents. The ability to trace in both directions is important in auditing.
An example of a basic accounting audit trail follows. Source documents are often stored as
electronic documents, thus alleviating the need to file paper documents. Sophisticated scanning
systems can turn paper documents into electronic documents before they are processed.

Input Output

Source
Financial
document Trial
Journal Ledger statements
(invoice, balance
reports
time card)

Store file

File original
source
document

LOS 1F1b 1.4 Transaction Cycles


The transaction cycles are the core functions within an accounting department, such as the
revenue cycle, expenditure and cash disbursement cycle, and other processes that involve the
recognition and/or facilitation of transactions.

1.4.1 Revenue and Cash Receipt Cycles


The key AIS functions of the revenue cycle include the following:
AIS allows real-time access to the inventory subsidiary ledger to check availability upon
receiving a customer order.
AIS automatically approves or denies credit based on the customer's credit record and
payment history.
AIS concurrently records sales invoices in the database, digitally transmits inventory release
orders to the warehouse, and digitally sends packing slips to the shipping department.
AIS has a terminal for the shipping department to digitally input shipping notices upon
shipment. The input triggers the system to update the customer's credit record, reduce
inventory subsidiary ledger records, insert the shipping date in sales invoice records, update
general ledger accounts, and distribute management reports (e.g., inventory summary,
sales summary).

6–6 Module 1 F.1.AllInformation


© Becker Professional Education Corporation. Systems
rights reserved.
PART 1 UNIT
1 6 F.1. Information Systems

AIS has a terminal for the cash receipts clerk to access the cash receipt system and record
the remittance.
AIS closes the sales invoice, posts to the general ledger accounts, updates the customer's
payment record, and distributes management reports (e.g., as transaction listings,
discrepancy reports, and general ledger change reports).

1.4.2 Expenditure and Cash Disbursement Cycles


The key AIS functions of the expenditures and cash disbursements cycle include the following:
AIS reads the requested purchase to verify that it is on the approved list, and then displays
a list of approved vendors with vendor contact information as an input to the competitive
bidding process or selection of vendors.
AIS digitally prepares the purchase order (PO) and delivers the PO to the vendor.
AIS has a terminal for the receiving department to enter the PO number and inputs the
quantities received. AIS concurrently updates the receiving report file, reconciles the
quantity received against open PO records, closes the PO if no exceptions are identified,
updates the inventory subsidiary ledger, and updates the general ledger accounts.
AIS has a terminal for the accounts payable clerk to enter invoices from suppliers into the
system. The system automatically links the invoices to the PO records and receiving report
records, and creates a digital accounts payable voucher stored in a centralized repository.
AIS automatically approves payment of invoices and sets the payment date according to the
terms on the invoice.
AIS prints and distributes the signed checks to the mail room for mailing. The system
records payments in the check register file, closes the vendor invoice, updates the
associated general ledger accounts, and distributes transaction reports to users.

1.4.3 Payroll Cycle


AIS is integrated with the human resource management system (HRMS) to enable real-time
changes of employment data, such as benefits, pay rates, deductions, employment status,
new hires, and terminations, etc.
AIS, in connection with operational systems, allows employees to enter timekeeping data in
real time to produce time and attendance files and the labor usage file.
AIS allocates labor costs to job costs, accumulates direct and indirect labor expenses at the end
of a work period (daily or weekly) on a batch basis, calculates payroll, updates employee records,
and produces payroll registers for accounts payable and cash disbursement departments.
AIS creates digital journal entries, attaches the original documents to the entries, and
automatically updates the general ledger.

1.4.4 Fixed Asset Cycle


AIS has a terminal for fixed asset groups to create a record of the asset subsidiary ledger
that includes each asset's useful life, salvage value, depreciation methodology, and location.
AIS automatically updates the general ledger, prepares journal entries, and creates a
depreciation schedule.
AIS automatically calculates depreciation, accumulated depreciation, and book value at
the end of the period. The system then creates a journal entry file and updates the general
ledger accounts accordingly.
When an asset is disposed of, the clerk records the disposal prompting the system to
calculate gains or losses of the disposal, prepare journal entries, and post adjusting entries
to the general ledger.

© Becker Professional Education Corporation. All rights reserved. Module 1 6–7


1 F.1. Information Systems PART 1 UNIT 6

LOS 1F1c 2 Enterprise Resource Planning Systems (ERP)


LOS 1F1d
Traditionally, most large and midsized organizations rely on separate information systems,
LOS 1F1e inclusive of nonfinancial systems (e.g., logistics, manufacturing, point of sales, human resources,
and project management) and financial systems to carry out designated tasks. However, this
decentralized model hinders the free flow of information that is required to make timely
business decisions in the digital era.
An enterprise resource planning system (ERP) is a cross-functional enterprise system that
integrates and automates the many business processes and systems that must work together in
the manufacturing, logistics, distribution, accounting, project management, finance, and human
resource functions of a business.
ERP software comprises a number of modules that can function independently or as an
integrated system to allow data and information to be shared among all of the different
departments and divisions of large businesses.

LOS 1F1d 2.1 Advantages and Disadvantages of ERP


ERP software manages the various functions within a business related to manufacturing, from
entering sales orders to coordinating shipping and after-sales customer service. In spite of the
name, ERP normally does not offer many planning features. The enterprise part, however, is
correct. The ERP is often considered a back-office system, from the customer order to fulfillment
of that order. Below is a summary of the advantages and disadvantages of ERP.

2.1.1 Advantages of ERP


Single Database: ERP systems store information in a central repository so that data may be
entered once and then accessed and used by various departments.
Integrated System: ERP systems act as the framework for integrating and improving an
organization's ability to monitor and track sales, expenses, customer service, distribution,
and many other business functions.
Faster Business Processes: ERP systems can provide vital cross-functional and
standardized information quickly to managers across the organization in order to assist
them in the decision-making process.

2.1.2 Disadvantages of ERP


Costly Implementation and Ongoing Maintenance: ERP consultants are expensive and
can be over 50 percent of the total implementation costs. In addition, the success of the ERP
is heavily dependent on training employees to be familiar with complex ERP systems. Lastly,
experts are often required to maintain and troubleshoot the ERP system.
Complexity and Flexibility: ERP is a complex system and the purchased ERP modules may
be costly to configure and customize to fit business needs.

6–8 Module 1 F.1.AllInformation


© Becker Professional Education Corporation. Systems
rights reserved.
PART 1 UNIT
1 6 F.1. Information Systems

3 Enterprise Performance Management (EPM) LOS 1F1h

LOS 1F1i
Enterprise performance management (EPM) systems, also known as business performance
management (BPM) or corporate performance management (CPM) systems, are software
solutions designed to help executives make strategic decisions. From a finance perspective, an
EPM enables leaders to plan, budget, and forecast business performances and to consolidate
financial results. EPMs are useful in breaking down high-level business strategies and translating
them into actionable plans or strategic objectives. Key performance indicators (KPI) are then
assigned to enable the enterprise to monitor progress toward achieving these objectives.

3.1 BPM Framework


The BPM framework is a five-component model that includes environment, systems and
technology, processes, organizational culture, and people.
Environment: This refers to the arena in which the company operates, which is defined by
its industry, competitors, regulators, consumers, suppliers, and economic circumstances.
Systems and Technology: This includes the infrastructure, supplies, tools, and other
resources employed to ensure that the business is able to meet its objective.
Processes: This focuses on the manner in which an organization uses its systems,
technology, and people to accomplish tasks that help it reach its business objective
(e.g., accounting procedures, logistics configuration, or information management).
Organizational Culture: This refers to the social environment within an organization that is
driven by the mission, vision, and values set by senior management.
People: These are the primary drivers of activity in an organization and the most valuable
resource. They support and cultivate each of the other components within this framework.
In order to effectively deploy human capital, a human resource management (HRM) strategy
is recommended to address talent attraction, retention, recruiting, work-life balance, and
employee development.
The following graphic depicts the BPM framework:

re
ltu
cu
Sy
l
na

ste
tio

ms
Organiza

People

Processes

Environment

© Becker Professional Education Corporation. All rights reserved. Module 1 6–9


1 F.1. Information Systems PART 1 UNIT 6

3.2 The BPM Spiral


Each of the five components of the BPM framework can be broken down, studied, and refined by
engaging in four steps explained in the BPM spiral:
1. Strategize
2. Plan
3. Monitor and analyze
4. Act and adjust
The execution is not peformed in a closed loop but rather as a spiral, as shown below.

1
4 4 4
2 2 2

Strategize
3 1
Plan
2
3
Monitor and analyze
3
3 Act and adjust
4

Strategizing involves management socializing tacit (existing) knowledge of products, services,


processes, and other business topics analyzed within a BPM system. Management should
discuss and document shared views, experiences, and insight (step one). This step generates
strategic thinking, which leads to the planning step where tacit knowledge is converted into
new knowledge (step two). Plans are then formed to carry out the strategies and those must
be monitored (step three). This monitoring allows organizational learning, which converts new
knowledge into tacit knowledge, resulting in action and adjustments (step four). This process
then repeats.

6–10 Module 1 F.1.AllInformation


© Becker Professional Education Corporation. Systems
rights reserved.
PART 1 UNIT
1 6 F.1. Information Systems

Illustration 3 Starting the BPM Spiral

Company A is a software development company that wants to improve the smartphone


games it develops. The company has a vision of becoming the most popular mobile sports
game programmer in the United States and Canada. The company should first assess
the environment component of the BPM framework, going through each of the four BPM
spiral steps.
1. Strategize: There are two other competitors within the mobile sports gaming space.
Consumers view Company A as superior to Companies B and C based on recent
surveys, but gamers still manage to rank Company A low on multiplayer game play.
Management determines the best way to gain more market share is by improving
multiplayer mode on all of its new sports games in the following year.
2. Plan: To improve this feature, leadership invests in additional programmers and game
play testers to improve ratings. Programmers must have advanced skills in newer
gaming technology that makes game play more realistic. Hiring testers to simulate the
games prior to launch will ensure ratings and revenue increases.
3. Monitor: To track success, management monitors active users. An active user is
someone who plays the game for at least one hour out of every 24-hour period. It
intends to grow active users by 15 percent. It also wants to grow the number of games
downloaded by 20 percent in the coming year and achieve an average review rating of
4.5 stars out of 5 in an acclaimed consumer gaming magazine.
4. Act: Six months after releasing its newly revised games, there was an increase in active
users of 10 percent, an increase in downloads by 15 percent, and a ratings increase
from 4.1 stars to 4.4 stars. Management decides to invest more in technology training
for new programmers and to continue to roll out the remaining revised sports games,
then evaluate.
The company would then move on to the systems and technology component.

4 Database Management Systems (DBMS)

A database management system is a software package designed to simplify the creation,


access, manipulation, and management of data stored in a database. It allows users to turn
raw data into usable information through organization, storage, and identification of the data's
relationships to other data.

4.1 Raw Data vs. Usable Information


Raw data consists of inputs that are unprocessed. In its raw state, data has little meaning.
Usable information, on the other hand, is the result of processing raw data to unlock its value by
organizing it in a way that reveals patterns, enables forecasts, and uses statistical modeling to
draw inferences.

© Becker Professional Education Corporation. All rights reserved. Module 1 6–11


1 F.1. Information Systems PART 1 UNIT 6

Illustration 4 Raw Data vs. Information

In preparation for a marathon, a runner begins training by running 3 miles 3 times a week
and 10 miles at the end of the week. However, she is concerned that she is not reaching the
full distance to meet her goal. By wearing a smart watch, she captures the unrecorded raw
data she is already generating—the distance of each run. The watch's software application
captures the raw data and converts it into usable information that she can actually
measure and track over time.

4.2 Database
A database is a shared, integrated computer structure that has three technical components and
two human components:
Metadata: Data about data including its characteristics (text, numeric, date, etc.), and
relationships with other data points.
Data Repository: The structure in which data points are actually stored, which is governed
by metadata.
DBMS: A collection of programs to manage the data structure by controlling access,
executing commands, and otherwise manipulating data stored in the repository.
Database Administrators: The managers of the database and DBMS, orchestrating the
design, access, restrictions, and overall use of the database.
End Users: Those accessing the data repository for business use. Depending on the design
of the overall database structure, users may be allowed to change data points within the
repository.
Database structure overview:

End users

Metadata

DBMS

Database administrator Repository

6–12 Module 1 F.1.AllInformation


© Becker Professional Education Corporation. Systems
rights reserved.
PART 1 UNIT
1 6 F.1. Information Systems

Pass Key

Think about a database as a well-organized electronic filing cabinet. The DBMS is powerful
software that helps manage the content like a librarian (database administrator) for patrons
of the library (end users). The repository is the documents stored in the filing cabinet and
metadata is the rule/playbook of how the data should be managed and integrated.

4.3 Relational Database LOS 1F1f

A relational database is a form of database that structures data in a way that allows sections
of data to be stored separately from other sections, typically referred to as tables or views,
but remain relationally intact with each other. This relational aspect maintains the data's
integrity, permitting attributes to be changed in one table but maintain its relationship with
other tables. Each table has a primary key, which is a unique record that can be tied to another
table. Attributes are other fields within a table that can be unique or duplicates, but typically
describe characteristics of the primary key (e.g., a table with SKU, or stock keeping unit, as the
primary key may have a product category of frozen foods as an attribute). Tables can also have a
foreign key, which refers to a primary key in another table. Foreign keys can have duplicates, but
primary keys do not.
Earlier forms of databases mostly consisted of groups of files called flat files that were stored
as plain text. Each line in the text file held one record with delimiters (i.e., commas or tabs) to
separate attributes associated with the record. The relational database model was created to
mitigate data redundancy problems commonly seen in flat file databases by logically organizing
data into two-dimensional tables (rows and columns). Each row in the table is considered a
"record." The columns carry attributes of the record and/or links to records in other tables.
The following chart illustrates a basic relational database structure.

Orders

OrderID (Primary Key)


OrderDate
SKU (Foreign Key) Products (Table Name)

PaymentForm SKU (Primary Key)


CityID ProductCategory
WarehouseID (Foreign Key) WarehouseID (Foreign Key)
AccountNumb (Foreign Key) Sale

Shipments

Customer WarehouseID (Primary Key)

AccountNumb (Primary Key) StateID

Address ShipmentDate

DateofBirth AddressShipping

PhoneNumber AccountNumb (Foreign Key)

CreditLine

© Becker Professional Education Corporation. All rights reserved. Module 1 6–13


1 F.1. Information Systems PART 1 UNIT 6

Illustration 5 Basic Relational Database

An academic office collects the class grades of all sophomores using a flat file. The flat file
has attributes such as a student's first name, last name, registered classes, and grades
for each class, as illustrated in the following table. Kevin Jones signed up for five classes—
Accounting 101, Economics 101, Ethics 101, Finance 101, and Marketing 101. The flat file
would replicate Kevin Jones' first and last name five times, once for each of Kevin Jones'
registered classes.

First Last Class Name Credit Hours Grade


Jane Pearson Accounting 101 4 A
Zac Smith Accounting 101 4 A
Kevin Jones Accounting 101 4 B
Randall Brown Accounting 101 4 A
Zac Smith Economics 101 3 A
Kevin Jones Economics 101 3 C
Jane Pearson Ethics 101 2 B
Liz Williams Ethics 101 2 A
Kevin Jones Ethics 101 2 A
Randall Brown Ethics 101 2 A
Kevin Jones Finance 101 3 B
Jane Pearson Marketing 101 3 B
Zac Smith Marketing 101 3 A
Kevin Jones Marketing 101 3 B

If a relational database is used, the data redundancy in the flat file scenario is eliminated. In
the relational data model, information is broken down into three tables. The top left table
contains the roster of the students in their sophomore year. The top right table contains
the list of classes available to the sophomores. Each table has its own primary key (i.e.,
student ID and class ID). The bottom table is a summary of grades based on the enrollment
record. This table contains both a primary key (Enrollment) and two foreign keys—Student
ID and Class ID. The stored attributes of students, such as first and last names, are not
repeated, reducing the redundancies, data inconsistencies, and data input errors.

(continued)

6–14 Module 1 F.1.AllInformation


© Becker Professional Education Corporation. Systems
rights reserved.
PART 1 UNIT
1 6 F.1. Information Systems

(continued)

Primary Key Attribute 1 Attribute 2 Primary Key Attribute 1 Attribute 2


Student ID First Last Class ID Class Name Credit Hours
A1 Jane Pearson 1001 Economics 101 3
A2 Zac Smith 1002 Accounting 101 4
A3 Liz Williams 1003 Marketing 101 3
A4 Kevin Jones 1004 Finance 101 3
A5 Randall Brown 1005 Ethics 101 2

Primary Key Key/Attribute 1 Key/Attribute 2 Attribute 3


Enrollment Student ID Class ID Grade
1 A1 1002 A
2 A1 1003 B
3 A1 1005 B
4 A2 1001 A
5 A2 1002 A
6 A2 1003 A
7 A3 1005 A
8 A4 1001 C
9 A4 1002 B
10 A4 1003 B
11 A4 1004 B
12 A4 1005 A
13 A5 1002 A
14 A5 1005 A

5 Data Warehouse and Data Mart LOS 1F1g

A data warehouse stores and aggregates data in a format designed to provide subject-oriented
or business-unit-focused information for decision support. Data stored in the warehouse is often
uploaded from operational systems (e.g., ERP systems), keeping data analysis separate from
the transaction system that runs the business. When a data warehouse is further organized
by departments or functions, each department or function is often referred to as a data mart.
Alternatively, separate smaller sections of the data warehouse can be extracted to form data
marts that can be used for varying needs by different parts of the organization.

© Becker Professional Education Corporation. All rights reserved. Module 1 6–15


1 F.1. Information Systems PART 1 UNIT 6

Data warehousing is a process that involves several steps:


1. Extract Data: From operational systems or ERP systems, from applications on smartphones
and laptops accessing the Web, as well as from hard copy (paper) data. It is important to
ensure that the data extracted from these systems are complete.
2. Cleanse Extracted Data: Remove redundancy (e.g., deduplication) and unidentified items in
the data and ensure data format consistency.
3. Normalize Data for the Data Warehouse: Further examine for redundancy issues and
ensure imported data characteristics are aligned with existing data items.
4. Transform Data Into the Warehouse Model: Establish key relationships across tables and
reconcile data names and values between ERP and the data warehouse.
5. Load Data Into the Data Warehouse Database: Ensure data is fully loaded into the data
warehouse and address any errors or exceptions from loading.
6. Test Updated Data Mart: Perform checks on the newly refreshed data mart against the
source systems (ERP or operational system) to ensure accuracy.
Below is a diagram that shows how raw data is aggregated and stored in the warehouse and
further organized into a data mart for use. Users consume the data by mining, analyzing, and
reporting information for decision support.

Extract Data Transform and Load Data Test Data Mart

Data Source
Marketing Data Mart

End Users

Cleansing and Normalization Warehouse

Purchasing Data Mart

Sales Data Mart

6–16 Module 1 F.1.AllInformation


© Becker Professional Education Corporation. Systems
rights reserved.
PART 1 UNIT
1 6 F.1. Information Systems

Question 1 MCQ-12692

The following are all functions that an accounting information system (AIS) performs, except:
a. Collection and storage of transaction data.
b. Aggregate data for financial managers to plan and take actions.
c. Reporting for regulatory bodies and external entities.
d. Facilitation of senior executive's decision-making and information needs.

Question 2 MCQ-12693

Wilson and Co. is evaluating enterprise resource planning (ERP) systems. Which of the
following is not a benefit of an ERP system that Wilson should factor when making its
selection?
a. ERPs combine financial data with operational data to provide timely and actionable
information.
b. ERPs use multiple databases to reduce reliance on a single database architecture.
c. ERPs allow cross-functional information sharing.
d. ERPs improve the ability to track and measure sales, costs, delivery times,
customer service performance, and other corporate activities.

Question 3 MCQ-12695

Which of the following statements most accurately describes an enterprise performance


management (EPM) system?
a. It has an emphasis on daily operations.
b. It facilitates managers focused on short-term improvements.
c. It is an accounting system used to record transactions, monitor performance, and
create financial reports.
d. It is a system for managing all of a company's operations by combining operational
and financial data to form strategies, execute plans, and report results.

© Becker Professional Education Corporation. All rights reserved. Module 1 6–17


1 F.1. Information Systems PART 1 UNIT 6

NOTES

6–18 Module 1 F.1.AllInformation


© Becker Professional Education Corporation. Systems
rights reserved.
2
MODULE
PART 1 UNIT 6

F.2. Data Governance


Part 1
Unit 6

This module covers the following content from the IMA Learning Outcome Statements.

CMA LOS Reference: Part 1—Section F.2. Data Governance

The candidate should be able to:


a. define data governance; i.e., managing the availability, usability, integrity, and security
of data
b. demonstrate a general understanding of data governance frameworks, COSO's Internal
Control framework and ISACA's COBIT (Control Objectives for Information and Related
Technologies)
c identify the stages of the data life cycle; i.e., data capture, data maintenance, data
synthesis, data usage, data analytics, data publication, data archival, and data purging
d. demonstrate an understanding of data preprocessing and the steps to convert data
for further analysis, including data consolidation, data cleaning (cleansing), data
transformation, and data reduction
e. discuss the importance of having a documented record retention (or records
management) policy
f. identify and explain controls and tools to detect and thwart cyberattacks, such as
penetration and vulnerability testing, biometrics, advanced firewalls, and access controls

1 Data Governance LOS 1F2a

Data governance focuses on the effective management of data availability, integrity, usability,
and security through the synchronization of resources, such as people and technology, with the
policies and processes necessary to achieve data governance goals. Although no standard data
governance model applies to all organizations, multiple data governance frameworks exist to
help organizations create tailored models using standards as a guide. In general, a strong data
governance model will have practices and policies with the following components:
Availability: Information is of little benefit to an organization if it is not available to the right
employees at the right time. While security may be a high priority, information must not be
secured in a way that creates unnecessary hurdles for those who need it.
Architecture: Job roles and IT applications should be designed to enable the fulfillment of
governance objectives.
Metadata: Data describing other data, known as metadata or data dictionaries, must be
robust in terms of its breadth and specificity. Vague or incomplete metadata may result
in misuse.

© Becker Professional Education Corporation. All rights reserved. Module 2 6–19


2 F.2. Data Governance PART 1 UNIT 6

Policy: Data governance policies help companies translate management and governance
objectives into practice.
Quality: Data integrity and quality are crucial and include ensuring that basic standards are
met so that there are no anomalies, such as missing values, duplicate values, transposed
values (phone numbers in the address field), or mismatched records (e.g., John Doe's
address is listed as John Smith's).
Regulatory Compliance and Privacy: Information collected, used, and stored by an
organization that is considered personally identifiable information (PII), personal health
information (PHI), or is otherwise subject to regulatory constraint should be subject to
policies designed to ensure that the use of the data does not violate company policies,
privacy laws (e.g., the California Consumer Privacy Act, CCPA; General Data Protection
Regulation, GDPR; or the Health Information Portability and Accountability Act, HIPAA).
Security: Data governance strategy should include the secure preservation, storage, and
transmission of data.

LOS 1F2b 2 Data Governance Frameworks: COSO Internal


Control—Integrated Framework

The Committee of Sponsoring Organizations (COSO) has developed guidance and frameworks
covering the areas of internal control, risk management, and fraud deterrence. Within its five-
point Internal Control—Integrated Framework (the framework) there are two categories with
principles that pertain specifically to internal control over information technology.

2.1 Control Activities


Principle 11 of the framework states that there should be general controls over technology
in order to achieve organizational objectives. To establish these controls, the company
must understand the dependency between general controls over technology and the use of
technology in business processes. It must also establish controls over relevant technology
infrastructure, security management, technology, acquisition, and maintenance processes.

2.2 Information and Communication


Principle 13 of the framework states that organizations should acquire, create, and use quality
information in order to support internal controls. This includes identifying the company's
information needs, capturing both external and internal sources of data, processing relevant
data into useful information, and maintaining quality when processing that data. The cost of
performing these tasks should be compared with their benefits.
Principle 14 states that effective communication of information is necessary to support internal
controls. This means communicating internal information to the proper stakeholders, including
the board of directors; providing communication lines that are separate from those directly to
management; and selecting relevant methods of communication.

Material from Internal Control—Integrated Framework, © 2013 Committee of Sponsoring Organizations of the Treadway
Commission (COSO). Used with permission.

6–20 Module 2 F.2.


© Becker Professional Education Corporation. All Data
rights Governance
reserved.
PART 1 UNIT
2 6 F.2. Data Governance

Illustration 1 COSO Principles

Spinal Surgery Clinic (SSC) P.A., a large group of physicians focusing on spinal surgery,
recently had an outside firm perform an IT audit as recommended by SSC's board of
directors. The findings resulted in recommendations that followed the COSO Internal
Control­—Integrated Framework principles 11, 13, and 14. As such, SSC invested in new
technology that required user identities to be verified by multiple points of validation
other than just a password in order to access patient accounts (in line with principle 11).
Additionally, SSC adopted a state-of-the-art data cleansing system in an effort to acquire
and use error-free data to enhance patient outcomes, which aligned with principle 13.
Lastly, to address principle 14, SSC began performing regular reviews of key IT functions
and started issuing monthly reports of internal control to the board of directors.

3 Data Governance Frameworks: ISACA's Control


Objectives for Information and Related Technology
(COBIT) Framework

The Information Systems Audit and Control Association (ISACA) is a not-for-profit organization
that formed to help companies and technology professionals manage, optimize, and protect
information technology (IT) assets. To accomplish this, ISACA created the Control Objectives
for Information and Related Technology (COBIT) framework, which provides a roadmap that
organizations can use to implement best practices for IT governance and management.

3.1 Governance Stakeholders


COBIT distinguishes between governance and management, recognizing them as two unique
disciplines that each exist for different reasons and require different sets of organizational
resources. Organizational governance is typically the responsibility of a company's board
of directors, consisting of a chairperson and focused organizational structures (e.g., audit
committee, executive committee, marketing committee). Management is responsible for
the daily planning and administration of company operations, generally consisting of a chief
executive officer (CEO), chief financial officer (CFO), chief operations officer (COO), and other
executive leaders. Management is selected and guided by the board of directors.
Stakeholders can be either internal or external, with the board of directors and management
considered internal. Other internal stakeholders include business managers, IT managers,
assurance providers, and risk managers. External stakeholders include regulators, investors,
business partners, and IT vendors, parties who are entitled to some information about compliance
and risk mitigation but are not entitled to the same information given to internal stakeholders.

3.2 COBIT Overview


ISACA created COBIT as more than just a set of principles. This system is a collection of
frameworks, standards, regulations, and publications. ISACA refers to the most recent iteration
of this collection as COBIT® 2019 (prior to COBIT® 2019, COBIT® 5 was the predecessor). The
underlying body of knowledge and framework of COBIT® 2019 is documented in four core
publications that cover introductory concepts, governance and management objectives, and
how to design and implement a governance solution.

Material from COBIT®, © 2019 ISACA. All rights reserved. Used with permission.

© Becker Professional Education Corporation. All rights reserved. Module 2 6–21


2 F.2. Data Governance PART 1 UNIT 6

3.2.1 Governance System and Governance Framework


As part of its foundation, COBIT® 2019 was developed using two sets of principles, one for
describing an IT governance system and another for explaining an IT governance framework.
There are six principles outlined for a governance system, as shown below.
Provide Stakeholder Value: Governance systems should create value for the company's
stakeholders by balancing benefits, risks, and resources. This should be accomplished
through a well-designed governance system with an actionable strategy.
Holistic Approach: Governance systems for IT can comprise diverse components,
collectively providing a holistic model.
Dynamic Governance System: When a change in one governance system occurs, the
impact on all others should be considered so that the system continues to meet the
demands of the organization. This means having a system that is dynamic enough that it
can continue to be relevant while adjusting as new challenges arise.
Governance Distinct From Management: Management activities and governance systems
should be clearly distinguished from each other because they have different functions.
Tailored to Enterprise Needs: Governance models should be customized to each individual
company, using design factors to prioritize and tailor the system.
End-to-End Governance System: More than just the IT function should be considered in a
governance system. All processes in the organization involving information and technology
should be factored into an end-to-end approach.
In addition to the governance system principles, there are three principles for the governance
framework, as follows.
Based on Conceptual Model: Governance frameworks should identify key components
as well as the relationships between those components in order to provide for greater
automation and to maximize consistency.
Open and Flexible: Frameworks should have the ability to change, adding relevant content
and removing irrelevant content, while keeping consistency and integrity.
Aligned to Major Standards: Frameworks should align with regulations, frameworks, and
standards.

3.2.2 COBIT Core


The COBIT Core is the primary model used to achieve a company's IT management and governance
objectives. Within this model, there are 40 governance and management objectives, each associated
with processes and components to help achieve those objectives. These 40 objectives are grouped
into five domains. Governance objectives are grouped in one domain and management objectives
are grouped in the remaining four domains. Each objective relates to at least one process as well as
other components that enable the company to achieve the objective.
Governance objectives are grouped into one domain:
Evaluate, Direct, Monitor (EDM): Those charged with governance (board of directors)
evaluate strategic objectives, direct management to achieve those objectives, and monitor
whether objectives are being met. There are five objectives within this domain: ensuring
business delivery, governance framework setting, risk optimization, resource optimization,
and stakeholder engagement.

6–22 Module 2 F.2.


© Becker Professional Education Corporation. All Data
rights Governance
reserved.
PART 1 UNIT
2 6 F.2. Data Governance

Management objectives have four domains:


Align, Plan, Organize (APO): Focuses on information technology's overall strategy, organization,
and supporting activities. There are 14 objectives within this domain. Managed data is one of
the most significant, as it ensures that critical data assets must be utilized to achieve company
goals. The other objectives provide guidance on IT infrastructure and architecture, innovation,
budgeting, human resources, vendors, quality, security, and managing risk.
Build, Acquire, Implement (BAI): Implementation of information technology's solutions in
the organization's business processes. This domain has 11 objectives, offering guidance on
requirements definitions, identifying solutions, managing capacity, dealing with organizational
and IT change, managing knowledge, administration of assets, and managing configuration.
Deliver, Service, Support (DSS): The security, delivery, and support of IT services. The six
objectives in this domain cover managed operations, service requests, managed problems,
continuity, security services, and business process controls.
Monitor, Evaluate, Assess (MEA): Addresses information technology's conformance to the
company's performance targets and control objectives along with external requirements.
There are four objectives in this domain covering managed performance and conformance
monitoring, managed system of internal control, compliance with external requirements,
and managed assurance.

3.2.3 Design Factors


COBIT design factors influence the design of a company's IT governance system, with a total of
11 factors that should be considered:
1. Enterprise Strategy: IT governance strategies generally include a primary strategy
and a secondary strategy. Examples include growth/acquisition strategies, innovation/
differentiation strategies, cost leadership strategies, and client service/stability strategies.
2. Enterprise Goals: Goals support the strategy and are structured based on the balanced
scorecard dimensions, which are financial, customer, internal, and growth.
3. Risk Profile: The risk profile addresses current risk exposure for the organization and maps
out which risks exceed the organization's risk appetite. These risks include IT operational
incidents, software adoption and usage problems, noncompliance, technology-based
innovation, and geopolitical issues.
4. Information and Technology (I&T or IT) Issues: Common issues include regular IT audit
findings of poor IT quality or control, insufficient IT resources, frustration between IT and
different departments, hidden IT spending, problems with data quality, and noncompliance
with applicable regulations.
5. Threat Landscape: The threat landscape is the environment in which the company
operates. The threat landscape may be classified as normal or high as a result of geopolitical
threats or issues, the industry sector, or economic issues.
6. Compliance Requirements: Compliance demands on the company can be classified as
low, normal, and high. The classifications are intuitive, with low requirements implying
minimal compliance demands, normal compliance indicating that the organization is typical
of its industry, and high requirements meaning that the company is subject to higher-than-
average compliance requirements.
7. Role of IT: IT can be categorized as:
yy Support—an IT system that is not critical for operating a business or maintaining continuity.
yy Factory—an IT system that will have an immediate impact in business operations and
continuity if it fails.

© Becker Professional Education Corporation. All rights reserved. Module 2 6–23


2 F.2. Data Governance PART 1 UNIT 6

y Turnaround—an IT system that drives innovation for the business but is not required
for critical business operations.
y Strategic—an IT system that is crucial for both innovation and business operations.
8. Sourcing Model for IT: Sourcing is the type of IT procurement model the company adopts,
ranging from outsourcing, to cloud-based (Web-based), built in-house, or a hybrid of any of
these sources.
9. IT Implementation Methods: The methods that can be used to implement new IT projects
include the Agile development method, the DevOps method, the traditional (waterfall)
method, or a hybrid of these.
10. Technology Adoption Strategy: IT adoption falls into three categories:
y First mover strategy—emerging technologies adopted as soon as possible to gain an edge.
y Follower strategy—emerging technologies are adopted after they are proven.
y Slow adopter—very late to adopt new technologies.
11. Enterprise Size: Two enterprise sizes are defined—large companies with a total full-time
employee count of more than 250 (default), and small and medium companies with 50 to
250 full-time employees.

3.2.4 Focus Areas


Focus areas are different types of governance issues, domains, or topics that can be solved
by a combination of management and governance objectives, along with their underlying
components. The COBIT materials provide some examples, such as cybersecurity, cloud
computing, and digital transformation, but state that the number is unlimited.

LOS 1F2c 4 Data Life Cycle


LOS 1F2d
There are eight distinct steps in the life cycle of every data point. All policies regarding data
quality should address data across all stages within the life cycle.
Organizations should map the flow of all data from capture to purging for critical business
processes, taking inventory of information assets, their value, and their availability to employees.

Data Life Cycle

Maintenance Analytics Publication Purging

1 22 3 4 5 6 7 8

Capture Synthesis Usage Archiving

6–24 Module 2 F.2.


© Becker Professional Education Corporation. All Data
rights Governance
reserved.
PART 1 UNIT
2 6 F.2. Data Governance

4.1 Data Capture


Data is extracted, created, received, or procured. Data may be passively collected (no user
consent) or actively collected (user consent), captured by a human, or captured by technology.
Actively Captured: Data capture with presumed consent includes application information sent by
job candidates, interviews or mortgage applications for lending, survey information, or information
gathered through some form in which a person expressly consents for it to be captured.
Passively Captured: May include a browser collecting Internet cookies while a user is
visiting a website; a smartphone collecting location data while using an application; or a
person at a public event taking attendance.
Technologically Captured: This is information captured by machines such as keystrokes,
number of clicks on advertisements on a website, number of visits to a website, the types
of shows watched on television, or the number of vehicles passing through a toll with an
electronic counter.
Personally Captured: Information collected by another person in real time through the use
of smartphones, video chat, e-mail, or live discussion.

4.2 Data Maintenance


Data maintenance is the initial processing, moving, cleaning, and preparation for synthesis or further
processing. This is where the typical extract-transfer-load (ETL) process would take place, capturing
data from its source, and transferring it in a form that can be loaded in a company's receiving system.
Data maintenance must occur before any value can be extracted from the data and involves
using tools and resources that support the long-term stability of data assets. Periodic audits
should be performed to ensure that the data maintains its integrity throughout the data life
cycle. Frequent backups should occur so the last copy backed up is recent enough that newly
corrupted data will not jeopardize the company's objectives. Maintenance may require a period
of downtime or inaccessibility for periodic servicing, which should be communicated to users
before the downtime occurs.

4.2.1 Data Cleansing


Data cleansing falls within the maintenance phase of the data life cycle. Data cleansing involves
deduplication (removing duplicate entries) and the removal of inaccurate data, outliers, and
missing data fields, making the data more uniform in content. Cleansing also involves validating
data with its original point of capture to verify that no corruption occurred in the transfer.
Cleansing structurally aligns all data points, converting nonconforming pieces of data into a
like form (e.g., removing additional spaces within a field or converting them to a format that is
machine readable by a specific program).
Cleansing impacts all stakeholders in a business, including vendors, suppliers, customers, and
employees. These processes safeguard information assets and help those using the data make
decisions with the most accurate information possible.

4.3 Data Synthesis


This refers to the modification and enhancement of data through preprocessing, consolidation,
and transformation so that it has more suitability for planned analytics. This step enables
cleansed data to serve as compatible inputs for analytics applications, the output of which can
have broad uses spanning multiple industries. Synthesized data can be used to:
Predict consumer buying patterns
Estimate sales figures for the coming quarter
Forecast consumer saving patterns

© Becker Professional Education Corporation. All rights reserved. Module 2 6–25


2 F.2. Data Governance PART 1 UNIT 6

Understand relationships between products


Segment groups of customers into categories for targeted advertising
Evaluate fluctuations in costs and the variables driving costs
Data preprocessing falls within the synthesis phase of the data life cycle. It involves converting
data from its point of capture into a more usable state through consolidation, transformation,
and data reduction.

4.3.1 Consolidation
Once similar data points are captured, they must be aggregated into a single collection or file for
further processing. Many organizations have different silos (e.g., marketing, finance, sales) that
collect information in a separate system. Consolidation is the process of combining these separate
data points to obtain an aggregate view, such as a single view of all data related to a customer.

4.3.2 Transformation
Transforming data involves taking data in its raw but clean form and converting it into
information that gives more insight and meaning. Transformation can be achieved through
appending more data, applying mathematical or statistical models, stripping the data to simplify
it, or through data visualization. Third-party data purchased or collected from free sources is a
common way to enhance or transform data.

Illustration 2 Transforming Data for Use

A large data-marketing firm collects information from paid surveys of consumers and from
free public sources of information, such as phone books and certain real estate registries.
In total, the company collects six points of personally identifiable information (PII) and over
100 data points on purchasing behavior. After capture and consolidation, the company
strips out Social Security numbers and home addresses (PII that should not be widely
shared) and then provides the data to the marketing department to use for targeted e-mail
campaigns based on consumer profiles.

Data visualization is a type of transformation that involves depicting data in a graphical,


pictorial, or artistic form. Simplistic bar charts and line graphs are common, but more advanced
techniques can be applied, such as the use of gradient color, a chart with a double axis, or
superimposing multiple graphics.

4.3.3 Data Reduction


Data reduction, sometimes referred to as compression or dimension reduction, involves the
process of eliminating variables by looking at each variable's individual value. Reduction can
also be performed on data models, such as predictive models used for forecasting, that contain
multiple variables. The following tests can be used to help determine which variables should be
reduced or eliminated:
Variables captured with more precision than is relevant for the analysis may be recoded,
either by directly reducing numerical precision or recasting the variable as an ordinal scale
(e.g., low, middle, high).
If a repetitive data capture (hourly, daily, weekly, etc.) contains 80 percent identical data for
each repetition, then the identical data can be removed from each capture and referenced,
while the 20 percent changing data is retained in place.

6–26 Module 2 F.2.


© Becker Professional Education Corporation. All Data
rights Governance
reserved.
PART 1 UNIT
2 6 F.2. Data Governance

Variables missing from too many records should be excluded, as should records with
too many missing variables. Incomplete variable sets can affect the accuracy of a model,
potentially making it seem as if certain variables are predictive in nature when they are not.
Backward variable elimination starts by including all variables in an equation, separately
testing the impact of removing each variable and deleting the variables that do not
contribute to the fit of the regression equation. This process is repeated until the model's
minimum acceptable performance is reached.
Forward variable selection adds one variable at a time using the variable that has the next
highest incremental change in performance. Forward selection may be more efficient in
terms of time and required computing power as compared to backward variable elimination
because only new variables with positive incremental value are included in the model.
Backward variable elimination takes longer as the program must process large quantities of
data at the beginning of the process.

4.4 Data Analytics


Once the data has been validated, cleansed, and synthesized, it is ready to be used to answer
questions. Examples of questions that can be answered using data analytics include finding
the relationship between dollars invested in advertisement and dollars generated in sales,
determining whether customers are able to be grouped together to better target advertising
efforts, or estimating projections of future sales and cost of sales. Each question may use a
different data analytics technique, suited to specific business applications, as well as its own list
of precautions and limitations.
The data analytics steps may be repeated several times, with the output of one analytics process
being used at the input to another, sometimes with an additional synthesis step between. Data
analytics are covered in more detail later in the course.

4.5 Data Usage


Personnel or technology activate the data by taking the resulting output data from the analytics
output and employing the data as needed. Data used in this phase benefits the organization in
some way, by either providing insights to support a business decision or selling the insight itself
to a customer. Reselling data may have legal and privacy implications; regulations prohibit the
sharing or selling of personal data in some circumstances.

4.6 Data Publication


Data is made known directly and indirectly in some form to internal and/or external users.
Common examples include manufacturers receiving sales data from wholesalers, financial
advisors sending newsletters to clients, or hospitalizations being reported in a health care
setting. Understanding the context of the data publication is important, as is having a plan
to respond if the data is later found to be incorrect. Sending data with errors in any of the
examples above could have a catastrophic effect on a company or its customers.

© Becker Professional Education Corporation. All rights reserved. Module 2 6–27


2 F.2. Data Governance PART 1 UNIT 6

4.7 Data Archival


The storage method and availability of data is important in the archival phase of the cycle.
Reductions in the speed of access, or access latency (delay), can create bottlenecks in receiving data.
In large organizations it may take an extended time period to gain access to archived or restricted
data. Self-service applications are becoming more common, enabling quicker access to accurate
electronic information. A balance between time to access and data accuracy should be considered.

ACCURACY

Low Moderate High

Instant Dangerous Suboptimal Optimal


TIME TO ACCESS

Days Dangerous Suboptimal Suboptimal

Weeks Dangerous Suboptimal Suboptimal

4.8 Data Purging


The last phase of the data life cycle, purging data, is removing data that no longer has any
organizational value or legal use. Sensitive data should be properly destroyed so that it cannot
be exploited. Even data that relates to past employees, former customers, or defunct accounts
may still put those people or companies at risk if accessed or utilized incorrectly or illegally.

Illustration 3 Importance of Data Purges

Welch and Co. is a behavioral health provider that has an IT department that builds
computers to save money. IT stores and reuses old computer shells and hard drives to
repurpose them after employees are terminated. Welch has no policy on purging data. In
a recent hacking attempt, the company's data center was breached, and the dormant hard
drive data was stolen. The personal information of terminated employees was extracted
from hard drives and found on the dark web for sale. Those users' accounts were then
used to attempt to execute a social engineering campaign, soliciting current employees to
divert funds to the hackers' private accounts.

LOS 1F2e 5 Archiving and Record Retention

Data is archived for organizational access and for regulatory requirements. Companies should
have a record retention and management policy that details the length of time each type of
document should be retained for internal, legal, or regulatory needs. Records should be kept of
the data's current form, its expected deletion date, and requirements for an audit trail.

6–28 Module 2 F.2.


© Becker Professional Education Corporation. All Data
rights Governance
reserved.
PART 1 UNIT
2 6 F.2. Data Governance

5.1 Organizational Retention Needs


A company's retention requirements for its documents is generally more focused on records
that track performance, such as financial statements, performance reports, or forecasts. These
records help monitor and improve operational performance, and so are retained in accessible
states. Other records are kept in order to supply the company with defense material from
regulation or prosecution.

5.2 Regulatory Considerations


Regulatory agencies with record retention requirements include the Internal Revenue Service,
the U.S. Department of Labor, the U.S. Securities and Exchange Commission (for public
companies), state health departments, and local governmental entities. These entities may
request records related to financial reporting, payroll, timekeeping, disciplinary documentation,
occupational safety and health logs, property deeds, permits, licenses, and insurance policies.
Retention requirements vary by state and industry.

5.3 Archival Procedures


Data retention policies should include appropriate data backups. A periodic audit of archival
procedures and data integrity is common, with typical metrics of success as follows:
Number of failures of data backups or archive transmissions
Number of successful backups
Percent of successful fulfillments of legal and regulatory requests
Number of violations by regulatory agency
Cost of penalties and fines

6 Information Technology Protection and Security LOS 1F2f

Cybersecurity policies are a part of data governance models that focus on the protection of
information technology assets from outside threats or attacks.

6.1 National Institute of Standards and Technology (NIST)


Cybersecurity Framework
The National Institute of Standards and Technology (NIST) Cybersecurity Framework was
introduced by the U.S. Department of Commerce as a voluntary set of guidelines for businesses to
adopt in order to protect an organization's data, networks, and information technology assets.
The NIST framework consists of five areas of focus, or functions, to equip modern companies
with the tools needed for better cybersecurity protection. These components are not ordered
steps, but rather functions that should be performed concurrently.
These functions define the high-level framework and are further subdivided into categories,
subcategories, and informative references.
Categories: Tie outcomes to specific activities and company needs.
Subcategories: Divide categories into management and technical activities that help
achieve the category outcomes.
Informative References: Provide a method to achieve the subcategory outcomes through
citation of specific standards, practices or guidelines.

© Becker Professional Education Corporation. All rights reserved. Module 2 6–29


2 F.2. Data Governance PART 1 UNIT 6

Recover Identify

Cybersecurity
Framework
Respond Protect

Detect

6.1.1 Identify
Stakeholders develop an understanding of the organization in order to manage all of its
personnel, systems, processes, software, hardware, and other data assets or devices that store,
transmit, and manipulate data.
Data and systems needing to be protected are specifically identified. Employee roles and
responsibilities relating to the systems handling that data are clearly identified. Vendors,
customers, and others with access to sensitive company data should also be identified and
carefully considered to understand where information is captured and exchanged. That
knowledge allows organizations to understand where vulnerabilities may exist in order to focus
efforts to mitigate risk.

6.1.2 Protect
Safeguards and access controls to networks, applications, and other devices should be deployed
as well as regular updates to security software, including encryption for sensitive information,
data backups, plans for disposing of files or unused devices, and training for all employees using
company computers with access to the network.
Cybersecurity policies should be developed, outlining the roles and responsibilities for all parties,
including employees, suppliers, distributors, business partners, and anyone else who may have
access to data that is sensitive. Policies should establish safeguards and steps to take to prevent
and respond to an attack. The goal for the aftermath of an attack is to minimize the damage.

6.1.3 Detect
Tools and resources are developed to detect active cybersecurity attacks, including monitoring
network access points, user devices, unauthorized personnel access, and high-risk employee
behavior or the use of high-risk devices.

6–30 Module 2 F.2.


© Becker Professional Education Corporation. All Data
rights Governance
reserved.
PART 1 UNIT
2 6 F.2. Data Governance

Illustration 4 Cybersecurity Tools

Falcon CPAs and Associates is a large accounting and IT auditing firm that has several
clients for which it provides bookkeeping, tax work, and IT audit services. Falcon decided
to run a scan using its new NIST-based security software for a client. The application scans
various applications and devices, generating a report with findings.
The report came back with high-risk employee behavior and the use of high-risk devices
as potential red flags. The employee behavior included access to records on the weekends
and after business hours. The use of high-risk devices included excessive use of USB drives
that were being plugged into the network to transfer data. Both were related to a single
individual who was later determined to be stealing employee banking information from the
payroll department outside of normal working business hours.

Detection includes continuous monitoring, scanning for anomalies or predefined events, and
investigating atypical activity by computer programs or employees. Detection measures are
put in place to immediately flag suspicious activity. Detection is one of the strongest pillars of
protection against cybersecurity threats because the existence of strong detection measures
often acts as a deterrent. Detection measures are considered adequate if the response time of
the detection measures is less than the time it takes a skilled hacker to completely penetrate the
protective security measures.

Illustration 5 Detective Measures as a Deterrent

A locked door on an empty house is a preventive measure; if the criminal knows no one is
home, the criminal is free to break the lock and steal from the house. If the same locked
door has a webcam pointed at it to identify the thief and simultaneously alert the police,
the presence of the camera may actually deter the thief from breaking into the house. The
camera is not a preventive measure; it does nothing to stop a break-in. But because the
thief can see the detective measure put in place, the camera may deter the crime.

6.1.4 Respond
The ability to contain a cybersecurity event depends on immediately reacting to the attack
using planned responses and continuously improving responses as risks and threats evolve. In
addition to taking action to mitigate losses, all potentially affected parties, such as employees,
suppliers, and customers, should be notified of an attack.
Business continuity plans should be in place to distinguish malicious attacks from other events
such as hazardous weather. Operating with only core operations or remotely may be the best
alternative to shutting down completely. Plans should be tested regularly and evolve as the
cybersecurity landscape evolves.

6.1.5 Recover
The last function of the NIST framework focuses on supporting the restoration of a company's
network to normal operations through repairing equipment, restoring backed up files or
environments, and positioning employees to rebound with the right response. Recovery
activities include communication and continuous improvement. Stakeholders affected by
cybersecurity events should be kept informed regarding recovery efforts. A review of lessons
learned should be incorporated into this stage of the plan.

© Becker Professional Education Corporation. All rights reserved. Module 2 6–31


2 F.2. Data Governance PART 1 UNIT 6

6.2 Security Controls


Because of the amount of information sharing and storage required in an organization, there
are many different points at which an organization is vulnerable. Employing access controls with
biometric verification, multifactor authentication, role-based control, or other types of enterprise
rule-based controls can help thwart cybersecurity attacks. Adding firewalls, coupled with more
advanced methods of preventing or detecting cybersecurity threats, like penetration and
vulnerability testing, provide a multitiered and more robust approach to security.

6.2.1 Vulnerability Scanning


Vulnerability scans are proactive measures that focus on offensive tactics rather than
passive, preventative approaches. Vulnerability scans monitor corporate networks for known
vulnerabilities and changes that are abnormal or indicative of potentially malicious activity.
Statistics collected may include a change in password reset attempts, changes in open ports
added, changes in frequency of IT tickets opened, or fluctuations in calls to the help desk.
Vulnerability scans can focus on different core pieces of an organization's IT infrastructure,
including the following:
Application-based scans involve scanning for known bugs, misconfigurations in applications
(such as those released by the manufacturer), or updated versions of applications that may
be reaching end-of-life support. Scans may be performed manually or be automated, and
the scans place a heavy emphasis on surveying all Web-facing programs because the Web is
a main point of entry to corporate systems.
Port-based scans monitor activity on communication endpoints, or ports, which includes
scanning the Internet for listings of the organization's open ports, analyzing any abnormal
series of failed attempts to log in to ports accessible by the Internet, or any other activity
that maps the perimeter of a company as defined by its exposed ports. Ports are docking
points that allow information to flow from one point to another, specifically from a
computer to the Internet. Ports that do not restrict public access are problematic because
they leave a company's network exposed to potential attackers.
Network-based scans are scans of network activity occurring on an organization's servers,
operating systems, routers, switches, and other network environments or devices. Scans
should look for changes in the number of patches deployed and successfully completed, or
significant changes in traffic and usage.
Device-based scans evaluate activity at the device level (laptops, desktops, smartphones,
and tablets). Examples of meaningful scans include the number of active log-ins per device
per user, number of new devices added per user, and the number of failed attempts per
user, especially on mobile devices due to their ease of access.
Data storage and repository scans focus on data repositories and files that are stored
on hosted servers, accessible by multiple users that could potentially be accessed by
nonemployees or users with inappropriate access rights.

6.2.2 Penetration Testing


Penetration testing, or "pen" testing, is another proactive approach. A pen test is an active
attempt by an organization or contracted IT provider to test the company's network, find its
weaknesses, and essentially hack into its own systems. Pen tests can be focused on specific
pieces of IT architecture, like ports and application protocol interfaces (API), or more focused on
communication methods, including video calls, e-mail, text, or chat platforms.

6–32 Module 2 F.2.


© Becker Professional Education Corporation. All Data
rights Governance
reserved.
PART 1 UNIT
2 6 F.2. Data Governance

Pen testing generally begins with a planning phase during which the organization and the role-
playing hacker (tester) establish goals and data to capture. This is followed by a phase when the
tester obtains an understanding of how an organization may respond to different attempts to
gain access. The tester may analyze code in a live state or in a test environment, then move to
the next phase, which involves gaining access by finding holes in the infrastructure assessed.
Once access is gained, the tester attempts to see how deeply the organization's systems can be
infiltrated, accessing data, user accounts, potentially financial accounts, and other sensitive data.
One outcome of breaches and identified weaknesses from pen testing is training for all
employees. Training is also often followed up with targeted coaching for individuals who failed
the pen testing or caused the mock breach.

6.2.3 Biometrics
The use of biometrics is another way to manage access controls and provide automated
recognition. Biometric technology uses an individual's personal physical attributes to securely
access protected information. Common applications include fingerprint scans, retinal scans,
and facial recognition software. Attributes are first loaded and stored in a database. Once the
biological input is captured it is then cross-checked against the reference database. If there is a
match, then access levels aligning with the matched record are granted.
Biometric applications are used in law enforcement and forensics, specifically in federal
government entities such as the Department of Defense, the Department of Justice, and the
Department of Homeland Security. Biometric applications provide identity assurance, help link
criminals to crimes, help track individuals across the globe, and can expedite identity verification.
The use of this technology continues to grow, as do regulations governing the technology. The
capture and use of biometric data raises privacy concerns among individuals and advocacy
groups across the globe.

6.2.4 Multifactor Authentication


A basic but proven way of enhancing access controls is through multifactor authentication (MFA).
MFA involves using a secondary device or application to validate one's identity in order to log in
to another device or account. Common methods of secondary identity authentication are short
message service (SMS) text messages, e-mails, and phone calls. Typically, in the initial account
creation, a user provides a secondary means of communication such as a cell phone, landline,
or e-mail account to use in conjunction with the primary means of access. An example would be
a company requiring employees to use an authenticator application on their smartphones that
would provide temporary codes that expire after 30 seconds. As part of the process to access
their domain, employees are required to enter their user password and type in the code on their
smartphone.

6.2.5 Firewalls
Firewalls are software applications or hardware devices that protect a person or company's
network traffic by filtering it through security protocols with predefined rules. For companies,
these rules may be aligned with company policies and access guidelines. Firewalls are
intended to prevent unauthorized access into the organization and to prevent employees from
downloading malicious programs or accessing restricted sites.
Basic packet-filtering firewalls work by analyzing network traffic that is transmitted in packets
(data communicated) and determine whether that firewall software is configured to accept the
data. If not, the firewall blocks the packet. Firewalls can be set to only allow trusted sources (IP
addresses) to transmit across the network. Other types of firewalls include:
Circuit-Level Gateways: Control traffic by verifying the source of a packet, meet rules and
policies set by the security team.

© Becker Professional Education Corporation. All rights reserved. Module 2 6–33


2 F.2. Data Governance PART 1 UNIT 6

Stateful Multilayer Inspection Firewalls: Combine packet-filtering and network address


translation to control traffic.
Network Address Translation Firewalls: Assign an internal network address to specific,
approved external sources so that those sources are approved to be inside the firewall.
Next-Generation Firewalls: Can assign different firewall rules to different applications as
well as users. In this way, a low-threat application has more permissive rules assigned to it
while a high-security application may have a highly restrictive rules set assigned.

6.2.6 Access Controls


An access control is a form of physical and/or virtual security tool used to restrict access to
company resources, company information housed within an organization, and the network
on which it resides. Physical restriction of access can involve the use of badges or key fobs
preventing unauthorized entrance into facilities that contain sensitive records or systems with
access to private data. The following summarizes four types of access controls.
1. Role-based access controls give employees access according to their job role.
2. Attribute-based access controls, also referred to as policy-based controls, grant access to
data and systems based on a policy or set of conditions, such as the time of day.
3. Discretionary access controls give access at the discretion of a system owner or
administrator based on a company's needs.
4. Mandatory access controls grant access through a clearance process managed by a central
authority. Mandatory access controls are common in government, the military, or health
care organizations.

Question 1 MCQ-12712

A large company recently decided to implement new security practices in response to


recent findings from a cybersecurity firm it hired to determine any security weaknesses.
One of the recommendations was to start using software that tests the system, searching
for known weaknesses in security such as application-based or network-based scans. What
type of cybersecurity protection is this?
a. Penetration test
b. Vulnerability scan
c. Biometric scan
d. Access control

6–34 Module 2 F.2.


© Becker Professional Education Corporation. All Data
rights Governance
reserved.
PART 1 UNIT
2 6 F.2. Data Governance

Question 2 MCQ-12713

Optimum Financial Planners publishes investment research for several different industries
and has a team of financial planners that advise hundreds of clients. It administers
quarterly surveys to determine investor expectations and trends that it then uses to give
to its planners so they can give investment advice. Optimum recently found an error in the
survey collection. In which phase of the data life cycle will this be addressed since the data
has already been released?
a. Data capture
b. Data synthesis
c. Data publication
d. Data archival

Question 3 MCQ-12714

Data preprocessing falls in which stage of the data life cycle?


a. Capture
b. Maintenance
c. Synthesis and analytics
d. Purging

Question 4 MCQ-12715

Which of the following is not a component within the COBIT® 2019 framework?
a. Publications
b. Design factors
c. Community input
d. Stakeholder validations

© Becker Professional Education Corporation. All rights reserved. Module 2 6–35


2 F.2. Data Governance PART 1 UNIT 6

NOTES

6–36 Module 2 F.2.


© Becker Professional Education Corporation. All Data
rights Governance
reserved.
3
MODULE
PART 1 UNIT 6

F.3. Technology-Enabled
Finance Transformation
Part 1
Unit 6

This module covers the following content from the IMA Learning Outcome Statements.

CMA LOS Reference: Part 1—Section F.3. Technology-Enabled Finance


Transformation

The candidate should be able to:


a. define the systems development life cycle (SDLC), including systems analysis, conceptual
design, physical design, implementation and conversion, and operations and maintenance
b. explain the role of business process analysis in improving system performance
c. define robotic process automation (RPA) and its benefits
d. evaluate where technologies can improve efficiency and effectiveness of processing
accounting data and information (e.g., artificial intelligence [AI])
e. define cloud computing and describe how it can improve efficiency
f. define software as a service (SaaS) and explain its advantages and disadvantages
g. recognize potential applications of blockchain, distributed ledger, and smart contracts

1 System Development Life Cycle and Process Automation LOS 1F3a

The systems development life cycle (SDLC) is a framework that organizes tasks at each phase of
development and use of a business process.
The task of building automated business processes that include computer software, data
architecture, and computer hardware can be a tremendous undertaking. Major overhauls of
organizational systems as well as the creation of new systems for large enterprises can be very
complex, because there is often overlap and interaction among the web of company practices.
Each of these unique but intertwined systems may have budgets in the tens of millions of dollars
and encompass the work of thousands of people over the course of several years, adding
complexity to the design, maintenance, and improvement of all corporate systems.
There are two strategies for managing the SDLC in general use today. The first strategy is called
the traditional method or the waterfall model. The second method, called agile development,
evolved from the waterfall model.

1.1 The Waterfall Model


The waterfall model has distinct steps involving separate groups of employees, each performing
a functional specialization. Agreements are created to describe the planning and execution of
each step and when to pass the project to the next step. The waterfall model includes a large
amount of documentation for requirements, deadlines, and adjustments.

© Becker Professional Education Corporation. All rights reserved. Module 3 6–37 F.3. Technology-En
3 F.3. Technology-Enabled Finance Transformation PART 1 UNIT 6

The waterfall model is characterized by different teams of employees performing separate tasks
in sequence, with each team beginning work from the pre-written authoritative agreement
of the preceding team and then ending work when the business requirements for the team
have been met. The project then passes to the next team. The following are some challenges
associated with the waterfall model:
Requires a great deal of time to complete.
Benefits of the new system are not realized until complete.
There is no customer input; change is difficult to manage.
Some employees may be idle before beginning or after completing their SDLC step.

7 Maintain
1
Plan
2 Ana

ly
3 ze
Design
Waterfall Model
ploy
De

4D
6

ev
5 elo
p
Test

The number and names of phases in the waterfall model differ between companies. However,
they all contain the same general process:
1. Plan
1F_System Development Life Cycle

Management evaluates the business needs of the system and determines whether it should
accept the project. Managers assess resources needed (i.e., personnel, finances, timeline)
to develop the system and compare it to the projected gains from the system. Management
then decides whether to begin the project.
2. Analyze
Management defines key business problems and company goals in the analyze phase,
and then identifies the steps and systems needed to achieve those goals. More specific
business requirements and sequential procedures are outlined in this step. Business
analysts determine problems that the system may face, gather requirements to solve those
problems, and develop business rules to arrive at a solution. The business requirements
may be formally documented in a business requirements document (BRD).

Pass Key

In some models, planning and analysis may be combined and called the requirements
phase. Less frequently, "development" is used for "plan and analyze" and "production" is
used for "develop." Regardless of the words used, planning what to build comes before
building it.

6–38 Module 3 F.3. Technology-Enabled


© Becker Professional Finance
Education Corporation. Transformation
All rights reserved.
PART 1 UNIT
3 6 F.3. Technology-Enabled Finance Transformation

Illustration 1 Planning and Analysis

A company is considering developing an app to sell tickets to the upcoming Olympic games.
During the planning phase, management assesses the company's potential profit from the
app. If deemed profitable, management will submit the bid to the host country. Submission
of the bid ends the planning stage.
During the analysis phase, the business requirements document (BRD) is developed
and becomes the foundation for the development of the project. The BRD contains the
following specifications:
"Customers must be able to access the marketplace from their computer or mobile device
to see a real-time view of available offers and prices."
"Customers must be able to pay for tickets using local currency and reserve their selections
while payment is verified."
"The host country wants customers to be able to alter the price of unsold tickets daily
between the go-live date and two weeks prior to the ticketed event. Customers must also
be able to alter the price of unsold tickets hourly within two weeks of the event."

Beginning with this stage and for each subsequent stage, feasibility studies are conducted
to determine whether the project is adhering to the original plan. Feasibility studies may be
conducted as a single, focused study, or they may incorporate multiple elements, including:
Economic Feasibility: Are the benefits still greater than the costs?
Technical Feasibility: Are all required technology and expertise available?
Operational Feasibility: Will all internal and external customers accept the system?
Scheduling Feasibility: Will all project resources be available when needed?
Legal Feasibility: Can all tasks be performed without violating laws?

Illustration 2 Feasibility Studies

After the planning and analysis phases are complete and the host country finalizes the
requirements through execution of the contract, the company may reassess resource
needs to more accurately define costs for the solution. These first feasibility studies
become the baseline to compare against later progress.

3. Design
Creation of the technical implementation plan occurs as business requirements are
translated into technical design documents. Individual technologies are evaluated
and selected, including logical data organization, physical data storage architecture,
programming languages, integration with third-party services, and/or deployed hardware.
The design phase can be subdivided into three parts:
yy Conceptual Design: Broad translation of business requirements into technical
requirements

© Becker Professional Education Corporation. All rights reserved. Module 3 6–39 F.3. Technology-En
3 F.3. Technology-Enabled Finance Transformation PART 1 UNIT 6

yy Logical Design: Hardware and software specification


yy Physical Design: More granular platform and product specification

Illustration 3 Design Stage

During the design phase, each business requirement is further developed and expanded.
For example, the requirement that "Customers must be able to pay using local currency
and reserve their selections while payment is verified" is expanded to specify credit
cards accepted for payment, fees charged, and the timeline to complete the conceptual
design. Specification of data file formats for transmission to credit card vendors and data
warehousing systems are developed in the logical design phase. Physical design would
include any specialized hardware to comply with payment card industry standards, server
hardware, cloud-based hardware, and workstation software and hardware for developers
and programmers.

4. Develop
The technical implementation plan is executed in the develop step. Buildings and rooms
are prepared, hardware is purchased and delivered, and programmers create proprietary
software to run the company's new product. The new system is completely built at this
stage and most of the project budget is spent, having committed dollars to employ experts
and purchase assets. Changes to the plan become more expensive in this stage because
each step builds on the prior steps. For example, changes in the develop stage may not be
supported by the original architecture in the design stage or achieve feasibility as outlined in
the plan and analysis phases.
5. Test
The system is checked for adherence to the business requirements in this step. The
new product must function as planned in the analysis and design stages. In addition to
backward-looking testing, which tests against the initial requirements, forward-looking
testing is conducted to see how well employees and customers can perform tasks (called
user-acceptance testing).
6. Deploy
The new system is delivered to end users. There are several methods available for
deployment that depend on available time, cost, and the cost of failure to the business:
yy Plunge or Big Bang: The entire new system is immediately delivered to all customers
and clients (lowest cost, highest risk).
yy Ramped (Rolling, Phased) Conversion: Portions of the new system replace corresponding
parts of the old system, one piece at a time (above-average cost, below-average risk).
yy A/B Testing (Pilot, Canary): A subset of users gets the new system while the old
system is still in use and assigned to current and new customers. After successful
deployment to the subset of users, the new system is deployed to everyone (average
cost, average risk).
yy Blue/Green (or Other Pair of Colors), or Shadow: The new system is fully deployed in
parallel with the old system; a routing layer directs progressively more duplicated traffic
to the new system. Once the new system is handling all the traffic, the old system is
deactivated (highest cost, lowest risk).

6–40 Module 3 F.3. Technology-Enabled


© Becker Professional Finance
Education Corporation. Transformation
All rights reserved.
PART 1 UNIT
3 6 F.3. Technology-Enabled Finance Transformation

Pass Key

Both the development and deployment phases may be called "Implementation." If either
phase is named implementation, the key to which phase is being discussed is to figure out
when the testing phase will occur or if the testing phase has occurred. An implementation
phase, which is earlier than the testing phase, must mean the development phase is being
discussed. An implementation phase, which occurs after testing, refers to deployment.

7. Maintain
Ongoing adjustments and improvements occur in the maintain stage, which begins as soon
as deployment is complete. Adaptations are made to the product to keep it operating at
an optimal level. Over time, the system becomes less well-suited to current conditions and
needs to be evaluated for either modification or replacement. When it is time to replace the
system, the SDLC repeats.

1.2 The Agile Framework


The Agile framework was created to address issues with the waterfall model. Agile is
characterized by cross-functional teams, each dedicated to particular functions or improvements
of a system drawn from a prioritized list of the customer's remaining needs for the system.
Frequent, short meetings are required, and features are kept small enough to be accomplished
by teams during each sprint (usually two weeks) before the team moves on to the next feature.
Communication between teams, within teams, and with customers is crucial in an Agile
environment as the priority list and project backlog constantly change.
The Agile process can be characterized using the following steps:

Dev
elo
pm
e
ng cept
Con design
Impleme
nta
tio
nt

nd
ni

g a n
lin
an

Te
du st
he i
Pl

Sc

ng
on

Do
ati

cu
ritiz

me
Prio

ntati
B acklo g

nts
ireme
on

qu Agile Software
E s tim a tio n

Re Development Cycle
a tio n
n s tr
mo
Rec

De
B u in g
fi x
g

w
ord

c k al
vie

ba v
Ad e d pro
an

e
re

ju s F ap
tme or
nc
d

n ts
er

or om Re
i

po t le a
rat C us se
e ch
anges

© Becker Professional Education Corporation. All rights reserved. Module 3 6–41 F.3. Technology-En
3 F.3. Technology-Enabled Finance Transformation PART 1 UNIT 6

1.2.1 Core Values


Although the items on the left are valuable, Agile promotes the items on the right.

Processes and tools


Individuals and interactions
Comprehensive
Working software
documentation are less valuable than
Customer collaboration
Contract negotiation
Responding to change
Following a plan

1.2.2 Agile Principles


1. Satisfy the Customer With Early and Continuous Delivery of the Highest-Priority
Features.
Through early prototyping, Agile shortens the time between discovering project
requirements and showing the customer an example of what will be received. The customer
is then able to correct any misinterpretations early in the process.
2. Welcome Change: A Change Request Is an Opportunity to Be Closer to the
Customer Needs.
The remaining project work is broken down into individual features (components) that
can be completed in a short amount of time, called a sprint. The time varies, but many
companies use a sprint cadence of two weeks. Receiving customer feedback every two
weeks, including any changes to the original plan, allows the development team to stop
losses from continued work toward undesired goals.
3. Deliver Working Software Frequently; Working Software Is the Primary Measure
of Progress.
This catalog of remaining work is called the product backlog, organized in the customer's
priority order. The highest priority features that can be accomplished in one sprint are
grouped into the sprint backlog. One sprint lasts an average of two weeks, so that the work
is delivered frequently for customer comment.
4. Complete Only the Work Requested by the Customer.
When a sprint ends, the new completed features are updated and shown to the customer
to receive feedback. If customer feedback results in product changes, the changes are
added to the backlog in priority order. Only work requested by the customer is approved,
performed, and documented.
5. Conduct Short, Frequent, and Regular Meetings to Maintain Focus and Make
Adjustments.
The entire SDLC sprint task list must be accomplished during a sprint by smaller, cross-
functional teams who meet regularly to plan, analyze, design, develop, test, and deploy
the feature(s) they are working on during each sprint. This fundamental change to systems
development work is the primary difficulty in adopting the Agile method.

6–42 Module 3 F.3. Technology-Enabled


© Becker Professional Finance
Education Corporation. Transformation
All rights reserved.
PART 1 UNIT
3 6 F.3. Technology-Enabled Finance Transformation

2 Artificial Intelligence LOS 1F3d

Artificial intelligence (AI), machine learning, and deep learning are three similar terms, often
used interchangeably, for computer programs and algorithms built to simulate characteristics
of human logic and intelligence. AI applications focus on processing large quantities of data,
learning from trends in that data, and using that insight to automate decision-making processes.
Common AI applications relevant to accounting include inference engines, business process
automation, robotic process automation, natural language processing (NLP, also referred to as
speech recognition) software, and neural networks, among various others.

2.1 Inference Engines


An inference engine is a program that is built to automate a decision-making process. The
program is given data about an existing situation, and the answer for what decision the engine
should make in each case. The new decisions are made using fuzzy logic, the term for a decision
made without specific instructions given to make the decision. The program classifies the data
based on results of earlier decisions through comparison to similar situations. Additional data
sets with correct answers (called training data) are used to check the proportion the program
would have gotten right. The program includes instructions to revise its decision-making
process to score better against the training data. This repeated revision and improving is called
machine learning.

Illustration 4 Inference Engines

A computer cannot "recognize" a stop sign. It would be difficult to program every detail
describing a stop sign and how the sign is different from its environment, along with all
environments a stop sign may be in. However, if a computer program is fed numerous
photographs with corresponding data on whether there is a stop sign, and if each picture
is stored in an accessible location, then it is easier to program a computer application
to recognize a stop sign in a new environment. Each new photograph is compared to
the catalog of existing photographs to determine if the new photograph is more like the
pictures with a stop sign or more like the pictures without a stop sign. In this way, the
inference engine "learns" what a stop sign looks like and can recognize stop signs with
greater accuracy as more pictures and data are reviewed by the engine.

Credit card companies and banks invest heavily in inference engines and computing resources
to run processes to detect fraud. Relying on AI rather than human analysts works better because
once fraudsters are aware of the clues that fraud analysts look for to detect fraud, the fraudsters
subtly alter their behavior. AI reacts more quickly to subtle changes, incorporating the new
behavior into its inference engine. This lets companies improve the accuracy and efficiency
of processing any data where difficult judgment calls are part of the process and adds a data-
driven insight to solve complex pattern-recognition problems.

© Becker Professional Education Corporation. All rights reserved. Module 3 6–43 F.3. Technology-En
3 F.3. Technology-Enabled Finance Transformation PART 1 UNIT 6

Illustration 5 Inference Engine Application

Credit card companies spend millions of dollars on fraudulent charges. It would be difficult
to program every known characteristic that could possibly be used to identify fraudulent
charges, such as changes in activity levels, repeated transactions just below thresholds, or
in-person transactions that occur at two distant locations close to the same time. However,
using many historic transactions in conjunction with data identifying whether each
transaction was fraudulent trains the computer program to compare new transactions to
the catalog of existing transactions that have been identified as valid or fraudulent. The
program examines common factors among all transactions and determines if the new
transaction is more like the group of fraudulent transactions or more like the valid ones.

CREDIT CARD Name: Sally Johnson


Account: 0061531
STATEMENT Bill Date: 6/01/20

Reference Posted Activity since Amount


No. Date last statement

00125465 5/20/20 Gasoline $90.00


00121321 5/23/20 Groceries $120.23
00121242 5/23/20 Book Purchase $20.20
00321566 5/23/20 KDNFLSOOL $2,265.00

Amount Due: $2,495.53

Minimum Payment Due: $47.53


Payment Due Date: 6/14/20

In this example, the fourth transaction would be flagged by an inference engine because
of the difference in the name of the activity and the dollar amount of the transaction,
compared to the other three transactions on the statement.

LOS 1F3b 2.2 Business Process Analysis


Business process automation is a general term for the automation of business processes using
computer programs designed to perform repetitive tasks. Automation allows the company to
deploy human resources to tasks better suited to human skills.
Business process analysis examines an existing business process to describe the steps taken,
the exchange of information, the governance of policies for each transaction, and the knowledge
needed to complete each task. The goal may be to improve the efficiency and effectiveness of
operations, or the goal may be to understand the process thoroughly enough to replace it with
business process automation.

6–44 Module 3 F.3. Technology-Enabled


© Becker Professional Finance
Education Corporation. Transformation
All rights reserved.
PART 1 UNIT
3 6 F.3. Technology-Enabled Finance Transformation

2.2.1 Robotic Process Automation (RPA) LOS 1F3c


Robotic process automation is a specific form of business process automation that refers to
programs capable of extracting information from a specific user interface (such as a customer
complaint form or a goods order form) that can then initiate further processes based on the
data extracted.
An RPA is a refinement of general web scraping tools that scour the Web looking for instances
of specified text to collect all material surrounding it. This content may be forms entered by
users or the content may be existing web page content. General web scraping tools often
collect ancillary data unintentionally, which introduces noise into the analysis. Robotic process
automation uses this as a point of refinement to yield a more highly curated collection of data
yielding a more reliable analysis. Robotic process automation can mimic human interaction with
many kinds of computer systems.

Illustration 6 Robotic Process Automation in Practice

A company is interested in formulating an RPA to collect the data from various inventory
invoices, payment orders, and reconciling documents used in its logistics system (part
of the enterprise resource planning system [ERP]), because it would be more efficient to
have software collect and format this existing information than to train every employee on
using existing or new systems. Because there are a limited number of forms, the company
constructs a library of forms for reference by the RPA. Once the RPA has been trained to
recognize each form, the company directs all forms to the RPA, which scans each form,
applies the parameters set in the program, and sends the appropriate data into the ERP
application database.

An RPA tool may be programmed to carry out any repetitive task involving a set of options.
This is problematic for employees who carry out repetitive tasks involving computers, such as
transferring data from one system to another performing recalculations on the transferred data,
because it potentially places their job in jeopardy. While this is a common fear of many in the
workforce, RPAs can actually be very beneficial to some employees because it makes data more
accessible to analyze trends, determine market penetration, or evaluate customer responses.
The work for employees then shifts from lower-skilled, repetitive functions to more refined skills
focused on analysis and strategy.

2.2.2 Natural Language Processing (NLP) Software


Natural language processing involves the technology developed and used to encode, decode,
and interpret human languages so that the technology can perform tasks, interact with other
humans, or carry out commands on other technology devices. This requires the mapping of
things like pragmatics, syntax, phonetics, pitch, and tone, as well as ensuring the NLP program
determines the proper response or chain of action. This is the technology needed to build
a network embedded on household devices or other Internet of Things (IoT) devices so that
the household has a virtual assistant to turn on the television by voice command. Accounting
applications include parsing text documents or speeches made by executives to extract and
catalog any financially relevant data.

© Becker Professional Education Corporation. All rights reserved. Module 3 6–45 F.3. Technology-En
3 F.3. Technology-Enabled Finance Transformation PART 1 UNIT 6

2.2.3 Neural Networks


An artificial neural network is a form of technology that is modeled after neurons that facilitate
the function of human or animal memory. The basic pieces of a neural network involve an
input layer, hidden layer, and an output (results) layer. Within the input layer are the different
variables that feed into the hidden layer. In the hidden layer, there are a series of weights
applied based on the inputs selected, which then direct the algorithm toward a given output.
There are typically multiple layers embedded within the hidden layer acting as neurons that
guide the input through to its output (result) based on the cumulative value of all weighted
points. Weights may initially be predetermined but the intent of the neural network is to learn
from prior trends in results based on inputs. So just as a human's response changes, its reaction
is refined in the hidden layer by changes to weights, which yields a different outcome. This is the
computing architecture needed to make fuzzy logic decisions for inference engines, commonly
used in fraud detection.

LOS 1F3e 3 Cloud Computing

Cloud computing is renting storage space, processing power, proprietary software, or all three,
on remote servers from another company rather than buying or building those components.
When a company acquires its own infrastructure as opposed to renting it, the company must
purchase enough to cover its peak usage so the business can accommodate high-volume
periods. During low-volume periods, this costly infrastructure is idle. For the customers of cloud
computing, the service offers infrastructure elasticity; renting only as much as needed on a
minute-to-minute basis. Processing and storage are rented in increments of computing power
used per units of time, so that customers pay smaller amounts during low-volume periods
and larger amounts during high-volume periods. Customers benefit because the cloud service
provider performs all maintenance and tech support on this hardware.
Cloud computing services are offered by some companies with large computing infrastructures
to either lease excess capacity during off-peak times or use purpose-built infrastructure to
support their customers. Cloud computing takes advantage of these companies' superior skills
and experience managing such infrastructure.
Additional efficiencies exist when a company's data is in one virtual location even if company
operations are in many locations. Data processing can be performed more efficiently from
that single location, and IT hardware support may be reduced throughout the company.
Because the companies providing cloud services provide distributed redundancy among
many data centers, having cloud data storage reduces the likelihood data is lost in an attack
or disaster.

LOS 1F3f 3.1 Software as a Service (SaaS)


Software-as-a-Service is a business model in which a company delivers subscription-based
software services to customers through licensing or service delivery. It follows the idea that
software is no longer thought of as a static product to be purchased, delivered, used, and
replaced, and requires ongoing costs of support and innovation in response to changes to the
business environment. As such, companies began offering access to software platforms via
the Internet, taking on these recurring upgrades, security enhancements, and other support
functions. Development teams using Agile for systems development constantly maintain
and improve the software. Even in non-Agile software development environments, patches
(upgrades to fix bugs) are required during the maintenance phase to keep up with changing
business environments.

6–46 Module 3 F.3. Technology-Enabled


© Becker Professional Finance
Education Corporation. Transformation
All rights reserved.
PART 1 UNIT
3 6 F.3. Technology-Enabled Finance Transformation

When software is developed internally, companies incur continued development costs in the form
of IT employees. When software is purchased from an outside source, costs include the upfront cost
of the software as well as the costs to maintain the software, update it, and troubleshoot problems,
adding to the cost of owning the software. SaaS formalizes the ongoing costs of software maintenance
to users by changing the price of the software from a one-time outlay into an ongoing subscription.

3.1.1 Advantages of SaaS


The user company enjoys continuously supported and updated software as long as fees are
paid to the provider.
Off-site hosting reduces the need for equipment, such as servers and other networking
devices that would otherwise be needed to run the software. Those servers would also
require maintenance and upgrades, which means there would be an added cost of
employees to take care of these tasks.
Space needed to house hardware in an internally supported data center is reduced.
Customers typically have access to a support team to help troubleshoot any issues with the
software.
Access to SaaS products is usually greater than those purchased or internally supported
because they only require an Internet connection. Internally supported software may be
limited by the need to access a network directly on company grounds or remotely through a
virtual private network (VPN).

3.1.2 Disadvantages of SaaS


Companies no longer have the flexibility of managing the software as they see fit, making
upgrades or changes to interactions with other systems as desired.
Labor needed to modify other applications might increase because any systems the
company builds that interact with SaaS software must be monitored for compatibility.
IT costs can potentially be higher. Companies are no longer able to stretch licenses beyond
their recommended life (e.g., replacing software with a 10-year life after 20 years of use).
SaaS models require companies to pay to continue using software, even if there are not
many upgrades or changes.
There could be a lack of customization. While many SaaS providers allow companies to tailor
their software somewhat, it may not be to the degree that a customer needs.

3.2 Blockchain LOS 1F3g

Blockchain is a control system originally designed to govern the creation and distribution of
Bitcoin. Bitcoin is a currency that exists only in electronic form, called a cryptocurrency. Bitcoin
must be "mined" in order to confirm transactions. Mining cryptocurrencies involves a person
or group of people performing cryptography, which is the solving of complex mathematical
equations. Through cryptography, blocks of a certain number of transactions are confirmed at
a time. The reward for solving (validating) the equation is both the receipt of Bitcoin and the
validation of a new block of transactions.
Because electronic data can be easily copied and altered, the accounting system governing it must
prevent the copying or alteration of the cryptocurrency; otherwise, the currency may become
instantly worthless through counterfeiting. Blockchain technology was developed to prevent
Bitcoin from being replicated and to limit its initial creation so that there is only a finite number of
Bitcoins. The value of blockchain is its resistance to alteration, multiparty transaction validation,
and decentralized nature. Alteration is difficult because each block adds to all prior blocks,
enabling everyone to view all blocks in the chain to the beginning of the entire chain. This serves as
a form of audit trail. The decentralization of Bitcoin makes it detached from government control.

© Becker Professional Education Corporation. All rights reserved. Module 3 6–47 F.3. Technology-En
3 F.3. Technology-Enabled Finance Transformation PART 1 UNIT 6

3.2.1 Blockchain Terminology


Peer Network: All computers participating in blockchain.
Distributed Ledger: Many computers among the peer network each have copies of
portions of the ledger, so there is wide duplication of records.
User: Anyone who uses blockchain, who may or may not also participate in the Blockchain
Peer Network.
Hash Code: Part of a block; the encoded record of the date and time of the transaction and
the public keys of the participants.
Block: An encoded record of one transaction.
Blockchain: A string of blocks showing all transactions for the service using blockchain,
usually Bitcoin.
Public Key: A person/company/entity who intends to spend or receive cryptocurrency or
other data on a blockchain network.
Private Key: A password to access a cryptocurrency or other blockchain account.
Fork: A divergence of the blockchain, with two groups of validators disagreeing on a particular
instance in a block, causing the future trajectory of the blockchain to take two different paths.
Smart Contracts: Similar to cryptocurrency and also based on blockchain technology.
Smart contracts are agreements or declarations, as opposed to currency transactions, that
are validated by a blockchain.
Mining: Verifying cryptocurrency transactions by solving mathematical equations in order to
prove the likelihood of occurrence. When blocks are mined successfully, it confirms a set of
transactions and rewards the miner with cryptocurrency.

3.2.2 Blockchain Process


A chain of data blocks (a blockchain) is attached to every Bitcoin. The blockchain describes
every transaction that Bitcoin has ever participated in. Each block contains a hash code, or
digital fingerprint for each transaction as well as the hash codes for the preceding and following
transactions. Altering a transaction will alter the hash codes, making that hash code out of
sync with the other blocks in the chain. This makes any attempt to change a transaction very
transparent to the other blocks in the chain.
Blockchain uses a distributed ledger, rather than a traditional accounting ledger where the
ledger is located within a single company. Blockchain records are not kept in a single location.
All blockchain participants maintain a copy of a portion of the entire history of all blockchain
transactions. This results in a high degree of duplication among the ledger participants. When
a person holding Bitcoin (a user) wishes to pay for a transaction with Bitcoin, a new transaction
is requested. The user's computer stores the blockchain for the involved Bitcoin(s), which is
then compared against all other copies of the blockchain for the Bitcoin in the peer network.
Any discrepancies in hash codes between the user's blockchain and the other computers in the
distributed ledger constitutes evidence of tampering. If tampering is discovered, the transaction
is refused. The user's version of the blockchain is then resynchronized to match the peer
network's version, undoing the false alteration.

6–48 Module 3 F.3. Technology-Enabled


© Becker Professional Finance
Education Corporation. Transformation
All rights reserved.
PART 1 UNIT
3 6 F.3. Technology-Enabled Finance Transformation

Illustration 7 Allowable Blockchain Transactions

An American of Scottish descent wants to buy haggis, a food delicacy from his home
country. Haggis is illegal to import into the United States, so the American attempts to buy
haggis on the dark web and pay with Bitcoin to hide this illegal activity. The American enters
his private key to authorize the transaction to transfer an amount of Bitcoin to the public
key of the seller. The transaction takes about five minutes while the blockchain for those
Bitcoin is checked against the other copies in the distributed ledger. The hash codes in the
blockchain residing on the American's computer match those stored on other computers
in the peer network, so the network agrees that the American's Bitcoin is authentic, and
allows the transfer, while writing a new block to record the transaction.

Illustration 8 Unallowable Blockchain Transactions

A person wants to buy a rare collectible from someone in another country. Instead of
using national currencies and paying exchange rate fees, the buyer agrees on a price in
Bitcoin. The buyer plugs in a flash drive containing the stored record of her Bitcoin and
authorizes the transfer. The transaction takes about five minutes while the blockchain
for the transferred Bitcoin is checked against the other copies in the distributed ledger.
This validation process returns a discrepancy. Every other version of the record for that
blockchain includes a transaction where this Bitcoin was already spent by the buyer,
except for the version which resides on the buyer's flash drive. The blockchain control
system decides that the many records are correct, and the single discrepancy is wrong and
corrects the buyer's data by making it align with the distributed ledger. It is presumed that
the buyer altered or sheltered this record and no longer owns the Bitcoin in question, and
the blockchain control system has reimposed that reality. The transferred Bitcoins are not
available for the buyer to use to pay the seller.

3.2.3 Blockchain Applications


Blockchain is a powerful but complex technology that may create problems with existing
applications and security practices, so its limitations should be considered when adapting it to
new applications. The complicated math processing mechanism for creating new blocks includes
the public key of the receiver and the private key of the giver. This means that blockchain still
depends on a password to authenticate. Human beings are bad at password security, and many
companies are bad at cybersecurity.

Illustration 9 Blockchain Security

If people use companies to facilitate Bitcoin exchanges, those companies have databases
that match customers with their keys. If the exchange company suffers a data breach,
the hackers have all the keys needed to write new blocks (authorize exchanges), and the
distributed ledger will view these transactions as valid. Once this happens, blockchain's
resistance to tampering now makes it harder to restore the Bitcoin to the original owners.

© Becker Professional Education Corporation. All rights reserved. Module 3 6–49 F.3. Technology-En
3 F.3. Technology-Enabled Finance Transformation PART 1 UNIT 6

If these concerns can be managed, there are beneficial applications to the wide adoption of
cryptocurrencies in general and specifically for blockchain technology.
Algorithms like those used in blockchain applications can be used to make transactions
more transparent and secure, even between parties without compatible banking or even
legal systems. That can take the politics out of economics by allowing buyers and sellers
anywhere in the world to do business.
This potential can help developing nations, which do not have the same financial
infrastructure as developed nations, join the marketplace. Unfortunately, blockchain
technology also aids criminals and terrorists to circumvent international laws and sanctions.
Blockchain can also be used to create smart contracts. Smart contracts are those where
the terms can be agreed on, executed, and verified automatically. Part of the function of
notaries, lawyers, and the courts in contract law is to record that a service is complete,
when payment is due, and when the payment has been received. If both the service and
the payment can be observed by the blockchain peer network, then the payment can be
directed by an automated process without the need to pay for the intermediary to officiate.
Note that the use of smart contracts does not replace the function of an attorney or other
legal practitioner, but rather it augments, and in some cases expedites, the processing of
legal documents.

Illustration 10 Blockchain Application

A person wishes to buy an item and have it delivered to her home. A smart contract can
be set up such that the buyer authorizes a payment (via Paypal, Bitcoin, or electronic bank
transfer) to anyone who delivers the item for the requested price. A fast delivery person
arrives at the destination where a camera connected to the Internet has object-recognition
software and can verify that the item has been delivered. The delivery fulfillment is recorded
in the smart contract's blockchain peer network. Then funds are automatically transferred to
the delivery person's account because the terms of the contract have been fulfilled.

The distributed ledger of blockchain is similar in architecture to nonrelational databases, a


newer technology which enables faster access to a database by multiple, widely dispersed
users. Blockchain could serve as backup data stored outside the database (therefore
not slowing it down). It could represent a means to increase the speed of nonrelational
databases by increasing the ability to roll back invalid changes after beginning them. This
would also promote the integrity of relational databases.

6–50 Module 3 F.3. Technology-Enabled


© Becker Professional Finance
Education Corporation. Transformation
All rights reserved.
PART 1 UNIT
3 6 F.3. Technology-Enabled Finance Transformation

Question 1 MCQ-12696

Which of the following statements concerning the systems development life cycle (SDLC)
are correct?
I. The SDLC describes the time that the system is being developed and contains a list of
steps to be executed once.
II. Under the waterfall method, phases do not overlap, and each team is dedicated to
one phase.
III. Under the agile method, all phases may occur within a sprint, and teams are dedicated
to one project.
IV. Waterfall and agile methods should be executed simultaneously.
a. I and IV only.
b. I and II only.
c. I, II, III, and IV.
d. II and III only.

Question 2 MCQ-12697

A buyer offers to purchase a company using Bitcoin. After the transaction is completed, the
buyer reloads onto her computer a copy of the Bitcoin data file that was made before the
company was purchased, so the associated blockchain shows she is the current owner of
the Bitcoin used to purchase the company. She then tries to buy another company with the
same Bitcoin, but the transaction fails. Which parts of the blockchain/Bitcoin infrastructure
have prevented this attempted fraud?
I. Smart contracts
II. Distributed ledger
III. Hash codes
IV. Two-factor authentication
a. I and IV only.
b. I and II only.
c. I, II, III, and IV.
d. II and III only.

© Becker Professional Education Corporation. All rights reserved. Module 3 6–51 F.3. Technology-En
3 F.3. Technology-Enabled Finance Transformation PART 1 UNIT 6

Question 3 MCQ-12698

A company writes a new artificial intelligence program to detect fraudulent credit card
transactions. The team of analysts builds a training data set with various types of known
fraudulent activity as well as an equal quantity of legitimate transactions so that the AI
program can learn to distinguish between the two. After training the AI with this data set,
the company tests the AI program by deploying it to detect fraud in credit card transactions
within the retail sector. The company sees astonishingly poor results from the program.
Which of the following describes the likely cause of this failure?
I. The training data did not resemble the situation the AI program would encounter after
deployment. It should have used the same types of transactions.
II. The ratio of fraud to non-fraud should be the same in the training data as is expected
after deployment.
III. After training the AI, a separate data set should be used for testing before deployment
in order to determine the effectiveness of the application.
a. I, II, and III.
b. I and II only.
c. II and III only.
d. I and III only.

6–52 Module 3 F.3. Technology-Enabled


© Becker Professional Finance
Education Corporation. Transformation
All rights reserved.
4
MODULE
PART 1 UNIT 6

F.4. Data Analytics:


Part 1
Part 1
Unit 6

This module covers the following content from the IMA Learning Outcome Statements.

CMA LOS Reference: Part 1—Section F.4. Data Analytics: Part 1

The candidate should be able to:


a. define Big Data; explain the four Vs: volume, velocity, variety, and veracity; and
describe the opportunities and challenges of leveraging insight from this data
b. explain how structured, semi-structured, and unstructured data is used by a business
enterprise
c. describe the progression of data, from data to information to knowledge to insight to
action
d. describe the opportunities and challenges of managing data analytics
e. explain why data and data science capability are strategic assets
f. define business intelligence (BI); i.e., the collection of applications, tools, and best
practices that transform data into actionable information in order to make better
decisions and optimize performance
g. define data mining
h. describe the challenges of data mining
i. explain why data mining is an iterative process and both an art and a science
j. explain how query tools (e.g., Structured Query Language [SQL]) are used to retrieve
information
k. describe how an analyst would mine large data sets to reveal patterns and provide
insights
m. define the different types of data analytics, including descriptive, diagnostic, predictive,
and prescriptive
v. describe exploratory data analysis and how it is used to reveal patterns and discover
insights

1 Business Intelligence LOS 1F4f

Business intelligence is the aggregation and transformation of data into visualizations


and presentations of information in order to make sound business decisions. Business
intelligence analysts are the stewards of this data, enabling companies to make more informed
business decisions.
Modern businesses use transactional databases, which are specialized databases designed
to rapidly intake small quantities of information and retain them in a searchable format.

© Becker Professional Education Corporation. All rights reserved. Module 4 6–53


4 F.4. Data Analytics: Part 1 PART 1 UNIT 6

Transactional databases are fed into data warehouses, which are optimized for searching
rather than maintaining transactions. Business analysts use the data warehouse to aggregate
transactions together to determine patterns, regional trends, or other insights among individual
products, lines of business, or customers.
This business analyst toolbox starts with basic database tools such as structured query language
(SQL) applications to extract the desired data from the data warehouse. Then Microsoft Excel®
or other similar analytic applications are used to transform the data. Finally, the transformed
data is fed into a presentation and/or visualization software, such as Power BI® or Tableau®, so
that business insights can be quickly and effectively communicated to management for more
informed decision making.

LOS 1F4a 1.1 Big Data


Big Data is a term coined in the mid-2000s when technological advancements allowed
companies to begin collecting and analyzing massive amounts of data. Electronic storage
capacity expanded, making it possible to inexpensively store the ever-growing amount of
data captured. Applications to manipulate data sets continued to evolve, making it easier
and faster to run advanced calculations on millions of records. The proliferation of Big Data
and its applications in companies large and small erupted in the second decade of the 2000s,
empowering businesses with information that could be used to predict trends or solve complex
business problems.

Illustration 1 Surveys vs. Surveillance

Before the era of Big Data, if a government or company was interested in what was
happening in the world, the best technique was to use a survey asking people what they
were doing and extrapolate the results to the whole population. Although there are best
practices at every step of the survey and interpretation process, practitioners know that the
science is imperfect. No sample group is perfectly representative of the whole population.
Additionally, people may provide bad data on surveys: They misremember what they did,
or report what they would like to have done, or intentionally report something false.
Big Data has changed the entire methodology of discovering the behavior of large
populations. It is no longer necessary to deal with the uncertainty of asking a few people
how often they study for classes. Textbooks and homework are online and every work
session records the start time, the end time, the content accessed, and the student's
identity, which is linked to the student's demographic data. Athletes now use cardio
equipment that records vitals such as heartrate or transmits data on weights lifted
using RFID technology, all while the equipment is being used. Those same RFID tags are
inside credit cards so their location can be tracked to see if they were present at store
transactions, or to locate the cards after they are stolen.
In short, we no longer ask people what they did. We ask the items what the people did with
them, because the items provide more accurate and reliable data.

6–54 Module 4 F.4.All


© Becker Professional Education Corporation. Data Analytics:
rights reserved.Part 1
PART 1 UNIT
4 6 F.4. Data Analytics: Part 1

1.2 The Four Vs


Big Data is characterized by the four Vs: volume, velocity, variety, and veracity. These four
qualities sharply distinguish Big Data from the scale of data analysis that came before it.

1.2.1 Volume
The volume of data is measured in bytes, which contain enough binary space to store one letter
of text. Prior to the era of Big Data, the quantity of data being analyzed was typically measured
in the millions of bytes (megabytes). Significant changes in the collective ability to create,
transmit, store, and compute data lead to the regular usage of petabytes (1,000,000,000,000,000
bytes, or 1015 bytes) and the occasional discussion of zettabytes (1021 bytes). Internet traffic each
year is measured in zettabytes, and work is underway to get agreement on what to call further
orders of magnitude when needed. This is how drastically the volume of data has changed.

Illustration 2 Data Storage Magnitude

The following list includes most of the terms associated with the amount of data that may
be stored within computer systems.
Byte: one encoded character (a single letter, number, or symbol)

Kilobyte: ~1 thousand bytes (~a five-page paper)


Megabyte: ~1 million bytes (~1 minute of MP3-format music)
Gigabyte: ~1 billion bytes (~1 hour of Netflix video)
Terabyte: ~1 trillion bytes (~all words spoken by the average 25-year-old over his or her
lifetime stored as text)
Petabyte: ~1 million gigabytes, or a trillion or 1015 bytes (~3.5 years of Netflix video
play time)
Exabyte: ~1 billion gigabytes, or 1018 bytes (monthly Web traffic in 2004; in 2020,
monthly Web traffic is 250 exabytes)
Zettabyte: ~1 trillion gigabytes (in 2020, annual global Internet traffic is estimated to be
about 2.95 zettabytes, or 2,951,479,051,793,530,000,000 bytes)

1.2.2 Velocity
Velocity means two things in the context of Big Data: the speed of data transmissions and how
quickly the data can be processed. With file sizes becoming larger (due to volume) over time, the
speed of sending data across networks must also increase. Also, as new information becomes
available, data must be processed quickly, or the answers may be irrelevant. These factors cause
velocity to increase.

© Becker Professional Education Corporation. All rights reserved. Module 4 6–55


4 F.4. Data Analytics: Part 1 PART 1 UNIT 6

Illustration 3 Car Navigation Applications

Consider the navigation app on a smartphone. In order for it to provide a time estimate
for the driver's best route, every smartphone in the area continuously sends data about
its location. Smartphones whose locations change at high speeds are considered to be
in cars. This data is selected, organized, and processed to see where the traffic is moving
quickly or slowly on several routes between the driver and the destination so that the app
can suggest the best two or three routes. Processing must occur fast enough so that the
driver is informed before traffic conditions change and the answer just provided becomes
outdated. As the app collects and sends new data based on new conditions, it sends
updated calculations of the amount until the destination is reached and suggests faster
routes as they become identified.

1.2.3 Variety
Information can now be stored in a variety of formats. Prior to the era of Big Data, the most
common formats were organized text files (documents) and number-based files (spreadsheets).
Advances in technology have increased the capacity to process large files quickly, allowing a
variety of other data formats to be analyzed more easily. For instance, images were historically
very difficult to manage because applications must process an array of pixels, each with a
data value for a unique color. Now images can be algorithmically analyzed because enough
computing power exists to navigate that array rather than just store it. Similar processes have
evolved to analyze videos, which are essentially two-dimensional arrays of images with a time
index. Movies once had to be viewed in order to be rated and catalogued for genre, but now that
task can be performed by advanced computer software. Because of these advances, analysts
working with data must now expect their input to be in a variety of formats, all of which need to
be mined for information.

1.2.4 Veracity
Veracity refers to the accuracy and reliability of data. As companies accumulate an ever-
increasing volume of data, the information captured is only good if it is high quality. Ensuring
high quality requires cleansing and maintenance, which means gathering data can be slow
and expensive. In addition to companies gathering their own data, it has become a common
business practice to purchase large streams of data from multiple third-party sources. This must
also go through a rigorous evaluation process because third-party data is often gathered for
different purposes and possibly with different quality standards. Therefore, the veracity—the
accuracy and reliability of the data—cannot be presumed. Data analysis must begin with steps to
detect and exclude unreliable data without introducing bias into the results.

1.3 Opportunities and Challenges With Big Data


A world supported by Big Data can increase the efficiency of business and society as a whole.
With enough information, a company can identify common traits among its customers and more
accurately target prospects with relevant advertisements. More information about what, where,
when, and how enable supply chains to be better managed. This makes information more
valuable, causing businesses to design operations around creating, protecting, and supplying
and monetizing information.

6–56 Module 4 F.4.All


© Becker Professional Education Corporation. Data Analytics:
rights reserved.Part 1
PART 1 UNIT
4 6 F.4. Data Analytics: Part 1

These benefits do not come without challenges. As more information is collected, companies
know more about consumers. This knowledge has caused global concern among privacy
advocates and that has affected the way companies collect, retain, and use data. It means
channels must be created within an organization to protect the information from theft and
misuse such that only a limited number of employees have access. Information must also be
restructured such that the origin of sensitive materials is masked and cannot lead back to
specific individuals. Even protected like this, data should only be used as laws or consumer
consent permits. These safeguards have continued to increase as scrutiny of large firms with this
information grows.

Illustration 4 Evolution of Information

Changes that come with Big Data create challenges for society and security. If every
company with an electronic device collecting information sells that data to anyone willing
to pay, the concept of privacy fundamentally changes. The capability of technology has
outpaced the public's understanding of technology as well as outpaced legislation to
control it. The large tech companies that hold the overwhelming portion of this data have
only recently begun to be challenged by lawmakers over their stewardship of the power
that this mass amount of data affords. Many of these companies operate with the belief
that data collected about a person is not the property of that person, but rather the
property of the company that collected it. One of the only exceptions to this is when a
company suffers a high-profile or high-liability data breach (due to hacking or negligence)
and the company is legally required to notify and compensate the people whose data was
exposed. The company's duty to maintain control of its data means that data security is
more important than ever.

1.4 Data Structure LOS 1F4b

Increased computing power means patterns and trends can be identified in sounds, images,
videos, and postings on social media platforms. Big Data has resulted in a shift away from
traditional structured and unstructured data classifications into a new classification: semi-
structured data.

1.4.1 Structured Data


Structured data is the oldest classification of data. Structured data has an explicit organizational
pattern. There may be segments of fixed length where each segment contains a piece of data,
or there may be segments of variable lengths where each segment contains a piece of data and
a header at the beginning identifying the lengths and types of data contained in each segment.
Both of these data structures are types of files, where one record follows the next record in a
defined order. A relational database is also structured data, even though there is no sequential
order to the data. Each database table has a header that identifies the structure of rows that
belong to it, wherever those rows may be stored in memory.

1.4.2 Unstructured Data


Unstructured data lacks this kind of organizational structure. A written novel would be an
example of unstructured data. While a novel has chapters and pages, these do not define
lengths or contents, so they are not organizational structures. Similarly, images (including .pdf
files), audio files, social media text files such as reviews and posts, are all unstructured data.
Analyzing this data requires techniques that can account for probability, and also require a large
pool of data to be able to perform analysis.

© Becker Professional Education Corporation. All rights reserved. Module 4 6–57


4 F.4. Data Analytics: Part 1 PART 1 UNIT 6

1.4.3 Semi-Structured Data


Semi-structured data refers to data that does not have an organizational structure that defines
lengths and content but that contains elements that can provide some degree of identification.
Comma-delimited files are one of the oldest types of semi-structured data. This kind of file has
different data types in sequence, separated by a comma. The length of each field is not known,
but the order of fields is. The programming language XML is also semi-structured because it
uses tags to identify different types of commands and labels. The overall composition of the
file is not knowable by a file header, but as the file is read, these structures can be identified.
Similarly, social media posts may contain hashtags, which can be used to create links and
relationships between posts.

LOS 1F4c 1.5 Data Progression


The value of information can be charted on a continuum, starting with raw data that is an
unconnected fact. Data becomes information when it conveys meaning and purpose, usually
requiring connected data to provide context. In the context of a business, an individual sales
transaction is valuable data. More value is added when the sale is connected to other pieces of
data, such as the date of the sale, the customer's information, the salesperson's information,
the method of sale (Internet order, in-store sale, etc.), and so on. Once complete, these pieces
of data that started as separate facts became contextualized as information endowed with
meaning and purpose.

Illustration 5 Data vs. Knowledge

Many transaction records would be necessary before having knowledge of the day's
sales, and likely many days would be necessary before one had knowledge of this store's
short-term financial health. After the store's first day of sales, without any comparative
information, it would be difficult to know if the sales were high, low, or in-between that
day. Information with context and synthesis of information over time is knowledge. Once
time passes and context grows, there is more insight into the relative strength or weakness
of sales. After the quantity of organized, contextualized information accumulates into a
comfortable amount of insight, then one can use knowledge to take action to improve sales
with timely interventions.

LOS 1F4d 1.6 Managing Data Analytics


LOS 1F4e Leveraging the power of data science and analytics for business presents the opportunity to
operate every part of the company with increased efficiency and effectiveness. Incorporating
data science into management and decision making increases the amount, speed, and precision
of knowledge about the operations of a business. Executives, operations managers, and
front-line employees can make better decisions about their work if they have timely access to
accurate, pertinent information. This makes a company's ability to harness data a tactical asset:
It can improve the efficiency of the tasks already being performed. Data science capability is also
a strategic asset because it can improve efficiency by shaping a company's strategic direction by
informing strategic decision making with information based on results and forecasts of future
performance.

6–58 Module 4 F.4.All


© Becker Professional Education Corporation. Data Analytics:
rights reserved.Part 1
PART 1 UNIT
4 6 F.4. Data Analytics: Part 1

There are also challenges in maintaining and utilizing data science capabilities. Data science
is expensive because a large amount of computing infrastructure and skilled employees are
required. Managers who are unfamiliar with the power of data science may be hesitant in
committing resources to improve data science capabilities because they may feel it would come
at the expense of operations. After investing in data science capabilities, success depends on the
quality of work and the judgement of those interpreting the data. If the data used for an analysis
is bad, or incorrect analysis techniques are used, or if the results are misinterpreted, then
decisions based on the data can be even worse than decisions made without data.

2 Data Mining LOS 1F4g

LOS 1F4i
Data mining is the process of investigating large data sets to discover previously unknown
patterns. Data mining combines the aggregation and analysis of data by trainable artificial
intelligence (AI) and machine-learning-enhanced decision support systems with various
statistical techniques. Data mining is both art and science. The decision support system and the
AI must be trained with data that is similar to the real data the AI is expected to investigate.
Training a model helps it to understand how to interpret certain types of information and yield
meaningful results. The parameters of the model have a large impact on what patterns are
judged to exist, and many data-mining techniques require an initial seed (or guess) value to
begin the analysis. For example, cluster analysis software often requires the analyst to define
the number of clusters before the data-mining process begins. With today's computing power,
analysts can run the procedure with several different starting seeds and accept the result that
the analyst judges to be most interesting and useful, combining data-mining science and art.

2.1 Exploratory Data Analysis LOS 1F4v

Exploratory data analysis is sometimes also called data mining, or atheoretical research. It is
used to investigate existing data sets for useful patterns. Unlike traditional research, the data
is not used to show support for a preexisting theory or desired analytical outcome. By using
this lower threshold, more agile insights can be made. A theory-based approach may be more
generalizable to a broader population, but atheoretical analysis is applicable when the target
population resembles the sample. As long as results are applied to populations similar to the
sample, the analysis should have high validity.
Cluster analysis or decision trees are examples of exploratory analysis. Looking for a pattern
between the number of days elapsed between presenting a bill to the customer and the
customer paying the bill, and the long-term probability of that customer defaulting on a debt,
would be another example of exploratory analysis. The cluster analysis does not inform
as to why those groups form. Even though the decision tree does not inform as to why the
population can be segmented that way and why the late-pay study does not inform as to why
a customer defaults, the company can still use this information to increase the efficiency of its
business processes.

2.2 Challenges of Data Mining LOS 1F4h

1. Quality and Quantity of Data Required


Producing valid results from the data that can reliably be used for decision making requires
a large quantity of data. Additionally, small quantities of bad data can dramatically skew the
results of data mining. There may be significant outliers, or some kind of bias in the data
collection process, but there is some bad data in every batch, so a defensible method for
dealing with bad data must be developed.

© Becker Professional Education Corporation. All rights reserved. Module 4 6–59


4 F.4. Data Analytics: Part 1 PART 1 UNIT 6

2. Timeliness
Larger quantities of higher-quality data will produce more valid and more reliable results,
but data gathering and data cleaning are expensive and time-consuming activities. A
company with a small or inexperienced data science team can spend too long gathering and
cleaning a data set for optimal decision making. Data science departments must have the
scale and skill to process data quickly.
3. Expensive Employees
Highly skilled professionals with advanced programming and technical knowledge are
required to work with file sizes and software packages today. Data mining requires equal
expertise in statistics and data science. Although the software process can be run without
statistics expertise, only a person (or team) with expertise in both areas can be certain that
the results are valid or reliable.
4. Corporate Culture
Executives are often educated, trained, and acclimatized to trust their own judgement and
that of their peers. In some corporate cultures, it can be a challenge to devote enough
resources for a department to competently perform data mining. Even if data mining has
been done, it can be a challenge for executives in some corporate cultures to consider the
insights gained with the appropriate weight—not blind trust, but not simply dismissed when
a recommendation is at odds with the executive's instinct.

LOS 1F4j 2.3 Retrieving Information


In order to perform data mining or even more everyday business analytics tasks, analysts must
retrieve the data they require before they can analyze it and distill information and insights from
it. In nearly every information systems architecture, some part of that process will require the
use of SQL to extract data from a database.
Databases are used to organize large amounts of data by creating categories to which the data
can be connected, and subsequently defining the relationships between those categories. For
example, many businesses have a customer table in their database that contains records such
as the customer's name, address, phone number, and similar information. The products a
company sells would be another category of data, with a product table containing a product's
name, price, size, and other relevant attributes. These tables are separate and do not duplicate
information. Every row in a table needs to have a unique identifier called a primary key, so the
row can easily be found. When information from another table needs to be referenced, the
primary key of the row from the other table is included as a data item in the original table and is
called a foreign key.

Illustration 6 Relational Databases

When a customer buys a product, the purchase event would need to record which
customer purchased which product. The customer table has a unique identifier (called
a primary key) and the product table has another (the primary key for that table). The
purchases table would need a primary key to uniquely identify each purchase, and there
would be some data such as the date the purchase occurred. In addition, the purchase
record would also contain the unique identifier for the customer making the purchase (a
foreign key that is the primary key from the customer table for that customer) and also the
unique identifier for the product purchased (a foreign key that is the primary key from the
product table).

6–60 Module 4 F.4.All


© Becker Professional Education Corporation. Data Analytics:
rights reserved.Part 1
PART 1 UNIT
4 6 F.4. Data Analytics: Part 1

2.3.1 Query Tools


Structured query language (SQL) is the programming language specifically designed to
selectively retrieve (and load) information that has been stored in relational databases. SQL was
invented in 1970 and was patterned off of English sentences. One key part of speech in SQL
language is the verb (referred to as a query or command). One of the most commonly used
commands is SELECT, which is used to retrieve data. To SELECT data, the column name would
be listed next to that command along with the table containing those columns after the FROM
command, and a semicolon to end the statement. While query tools and languages exist for
various types of information retrieval, most of those languages are rooted in SQL in some form,
adhering to a consistent code structure.

Illustration 7 Basic SQL

A simple SQL statement would be written as follows:


SELECT Customer_First_Name FROM Customer_Table;
This statement would produce a list of all the customer names that the company has
recorded in that table. Incidentally, if several customers have the first name of Betty, then
ordinarily "Betty" will be listed once for each Betty in the table (there are optional adverbs
to change this behavior). If the analyst did not want all of the customer names, but only a
subset, then the WHERE command could be used to identify that subset as follows:
SELECT Customer_Last_Name FROM Customer_Table WHERE Customer_Name = "Betty";
This statement would produce a list of all the last names of customers whose first name
is Betty.

Simple subsets, as shown in the example above, are often not sufficient in answering a business
question, so more complicated SQL queries are often required.

3 Introduction to Analytic Data Modeling LOS 1F4k

One of the simpler forms of analytic modeling in data mining is a decision tree. Based on a series
of decision points called nodes, decision trees are decision support tools that provide outcomes
based on probability or fact. These models begin with a single node, or question, with at least two
possibilities. Each possibility is assigned an outcome (e.g., yes or no; A, B, C) or probability that
breaks off further into additional outcomes with their own nodes and outcomes until the total of
all possible outcomes has been exhausted. The end result visually resembles a tree.
This process produces a model—a set of rules for making a reliable choice. As with any model,
decision trees are subject to the parameters created by the analyst who designed it. This
means the algorithm can be adjusted based on the unique characteristics of a given data set or
circumstances to improve the performance of the model.

© Becker Professional Education Corporation. All rights reserved. Module 4 6–61


4 F.4. Data Analytics: Part 1 PART 1 UNIT 6

3.1 Data Mining Process


To begin, a substantial data set must be gathered that records the various qualities being
discussed. Often, data sets consist of many events, each of which has a binary choice such as
{fraud transaction vs. legitimate transaction}, or {customer gives favorable rating vs. customer
gives unfavorable rating}, or {customer purchases product vs. customer does not purchase}, and
several relevant pieces of information about each case. The purpose of data mining is to identify
patterns among the data that will explain the binary choice.

Illustration 8 Data Mining Using a Decision Tree: Initial Data State

In this data set, dots represent individuals who have received advertisements from a
company. The vertical axis measures income, so wealthier individuals are higher in the
plot. The horizontal axis measures age, so older individuals are to the right in the plot.
The orange dots represent individuals who purchased the product, while the blue dots
represent individuals who did not make a purchase. The company wants to know which
customers are more likely to buy their products so that the company can better target its
advertising and sales efforts.

= Made purchase

= Did not make purchase


Income

Age

Visually, what the decision tree algorithm attempts to do is to cut this space into pieces
such that each piece has an overwhelming ratio of the same colored dots in it. Any piece
that has numerous orange dots is a defined group of customers who buy the company's
products. Logically, this means finding subsections of wealth and age where individuals
overwhelmingly choose the same way for whether to buy the product.

6–62 Module 4 F.4.All


© Becker Professional Education Corporation. Data Analytics:
rights reserved.Part 1
PART 1 UNIT
4 6 F.4. Data Analytics: Part 1

3.1.1 Making the First Decision


The algorithm will choose where to draw a vertical (or horizontal) line such that the ratio
between the choices on each side of the line is as high as possible.

Illustration 9 Data Mining Using a Decision Tree: Making the First Cut

= Made purchase

= Did not make purchase


Income

Age

This line divides the plot so that on the right side of the line there are mostly orange dots
and on the left side there are mostly blue dots. This indicates that there is an age divide
(represented by the line) among those who buy the company's products and those who do
not. This is not a perfect division; there are still orange and blue dots on both sides of the
line.

The data collected may not appear to have a pattern and the data mining process may not be
easy but using advanced algorithms and a large amount of computing power helps draw insights
on an otherwise uneventful data set. This is why it is important to calculate all possible ratios to
complete each mining step.

© Becker Professional Education Corporation. All rights reserved. Module 4 6–63


4 F.4. Data Analytics: Part 1 PART 1 UNIT 6

3.1.2 Growing the Decision Tree


Once the first iteration is complete, the process is refined and repeated. Rules for subsequent
decisions are generated by the same ratio-driven algorithmic criteria used for the first decision.

Illustration 10 Creating the Next Iteration

= Made purchase

= Did not make purchase


Income

Age

This second line generated by the algorithm separates a box in the upper right section of
the chart, which is dominated by orange, compared to the box in the lower right section,
which is mostly blue but still mixed.

= Made purchase

= Did not make purchase


Income

Age

This third line creates a fourth box in the lower left, which is overwhelmingly blue compared
to the box in the upper left, which is mostly orange but still mixed. Each of these lines was
placed by the algorithm, which calculated the ratios of blue and orange dots resulting in any
possible line that could be drawn. The results yielded lines with the optimal division and the
most favorable ratios, distinctly separating the colored dots most effectively.

6–64 Module 4 F.4.All


© Becker Professional Education Corporation. Data Analytics:
rights reserved.Part 1
PART 1 UNIT
4 6 F.4. Data Analytics: Part 1

3.1.3 Interpreting the Results

Illustration 11 Interpreting the Results

The following depicts a standard decision tree based on the previous illustration, showing
how it is designed and should be interpreted.

The tree creates the ordered list of the


lines drawn through the data set. The
first line is vertical, so it divides based Age > 39?
on age. Based on past purchases,
customers over age 39 are more likely
to make a purchase. So a node with a
binary outcome of yes or no is created
for that age. No Yes

Similarly, past purchasing


Age > 39? trends indicate that customers
older than 39 with an income
greater than $40,000 per year
were even more likely to make
Income < a purchase. So a second node
$40,000
No is inserted on the next level of
the tree.

No Yes

Finally, it was observed


that younger patrons with Age > 39?
incomes greater than $60,000
were highly likely to buy
the product. This node is
then inserted into the other Income Income <
outcome for the original node > $60,000 $40,000
as shown.
Note that some outcomes will
stop early, with no succeeding
nodes. No Yes No Yes

The result of this tree is that the company will focus its advertisement and sales efforts
on customers whose income is greater than $40,000 if the customer is older than 39, and
customers whose income is greater than $60,000 if the customer is younger than 39. The
company will spend less effort on all other customers because this model concludes that
doing so is an inefficient effort.

© Becker Professional Education Corporation. All rights reserved. Module 4 6–65


4 F.4. Data Analytics: Part 1 PART 1 UNIT 6

3.1.4 Growth of Decision Trees in Data Analytics


Decision tree algorithms could continue to expand into perpetuity and further subdivide the
diagram to create smaller boxes that are more exclusively the same color and further build the
decision tree, but this practice reaches a point of diminishing (even negative) returns. The better
a model fits the data from which it was constructed, the less reliable it will be when applied to
other data, such as real customers in the real world. This is because the model will not be able to
accommodate or interpret variation from new observations. This can be mitigated by testing the
model against a separate data set (testing data) to determine how large the algorithm should
grow on the decision tree, and to choose between different candidate decision trees.

Illustration 12 Adding Complexity in Data Analytics

Only two factors were considered in the illustration above: age and income. Data could
be gathered on a third factor, such as education. This would result in a three-dimensional
scatter plot, where an algorithm would use a series of planes to divide the space to
separate colored dots:
1.5

0.5

Education
0

-0.5

-1
2

Income 0

-2
0.5 1 1.5 2 2.5 3 3.5

Age

A multidimensional model uses the same logic as a two-dimensional model, although it would
be difficult to draw an example of a fourth-, fifth-, or twelfth-dimensional scatter plot. Similarly,
the algorithm does not have to use straight lines or planes to divide spaces. Complex curved
multivariate shapes can be used, provided the algorithm is sophisticated enough to employ
them and enough computing power is available to make the associated calculations.

6–66 Module 4 F.4.All


© Becker Professional Education Corporation. Data Analytics:
rights reserved.Part 1
PART 1 UNIT
4 6 F.4. Data Analytics: Part 1

4 Types of Data Analytics LOS 1F4m

The different types of data analytics include descriptive, diagnostic, predictive, and prescriptive
(or proscriptive) analytics. The difference between the types is the intention of the analysis,
rather than the techniques used. The person using the data may intend to use it to describe
what is happening or to find the immediate cause behind what is currently happening. The user
may also be more concerned with predicting future trends instead of describing past ones.

4.1 Descriptive
Descriptive data analytics answer questions to discover what has happened. Data assembly and
organization tasks such as cataloging daily, weekly, and monthly sales reports; usage rates and
expenditures for advertisement; and inventory accumulation or depletion are analytics tasks
that describe what is happening in an enterprise. These analytics tasks are often undertaken by
operations analysts.

4.2 Diagnostic
Diagnostic data analytics answer questions to discover why events are happening. For example,
diagnostic analytics may attempt to discover associations between marketing and sales, or
between inventory and sales, so that management may understand the relationships between
the occurring events. Business analysts often undertake these tasks.

4.3 Predictive
Predictive data analytics bring together the data from descriptive and diagnostic analytics to
predict expected future events, given what is known about conditions and presuming that the
relationships remain stable. Accounting analysts often undertake these tasks in order to inform
managers regarding anticipated gains and expenses in future periods.

4.4 Prescriptive
Prescriptive analytics focus on the business relationships discovered by diagnostic analytics to
recommend actions intended to influence future events. While predictive analytics describe what
will occur if everything continues as it is, prescriptive analytics focus on what should change to
bring about different results. Predictive analytics are often undertaken by experienced analysts
at the managerial or executive level.

© Becker Professional Education Corporation. All rights reserved. Module 4 6–67


4 F.4. Data Analytics: Part 1 PART 1 UNIT 6

Question 1 MCQ-12722

Consider the following representation of a data set prepared for data mining.

10

0
0 1 2 3 4 5 6 7 8 9 10

Which of the following statements are true?


I. The grouping together of colored dots was done with regression analysis.
II. In order to assign grey dots to an existing cluster group, classification analysis must
be used.
III. Colored dots were assigned using cluster analysis.
IV. In order to assign the grey dots to a group, the analyst must run a time-series analysis.
a. III only.
b. II and IV only.
c. II and III only.
d. I and II only.

6–68 Module 4 F.4.All


© Becker Professional Education Corporation. Data Analytics:
rights reserved.Part 1
PART 1 UNIT
4 6 F.4. Data Analytics: Part 1

Question 2 MCQ-12723

A decision tree analysis where subgroups of customers more likely to respond to


advertising are identified so that a company can selectively advertise only to those
customers. Which type of analytic modeling does this describe?

a. Descriptive
b. Diagnostic
c. Predictive
d. Prescriptive

Question 3 MCQ-12594

Baiem, LLC is analyzing stock market trading data. It intends to cease predicting future
stock price movement and wants to quickly analyze what the stock market is doing every
moment. By reacting quickly, Baiem will catch the rises and falls in stock prices caused by
other market actors. Baiem believes it will make more money by catching a portion of every
stock movement, rather than catching all of the movement each time it correctly predicts a
trend. With which of the "Four Vs" of Big Data is Baiem most concerned?
a. Velocity
b. Volume
c. Variety
d. Vexation

© Becker Professional Education Corporation. All rights reserved. Module 4 6–69


4 F.4. Data Analytics: Part 1 PART 1 UNIT 6

NOTES

6–70 Module 4 F.4.All


© Becker Professional Education Corporation. Data Analytics:
rights reserved.Part 1
5
MODULE
PART 1 UNIT 6

F.4. Data Analytics:


Part 2
Part 1
Unit 6

This module covers the following content from the IMA Learning Outcome Statements.

CMA LOS Reference: Part 1—Section F.4. Data Analytics: Part 2

The candidate should be able to:


l. explain the challenge of fitting an analytic model to the data
n. define the following analytic models: clustering, classification, and regression;
determine when each would be the appropriate tool to use
o. identify the elements of both simple and multiple regression equations
p. calculate the result of regression equations as applied to a specific situation
q. demonstrate an understanding of the coefficient of determination (R squared) and the
correlation coefficient
r. demonstrate an understanding of time series analyses, including trend, cyclical,
seasonal, and irregular patterns
s. identify and explain the benefits and limitations of regression analysis and time series
analysis
t. define standard error of the estimate, goodness of fit, and confidence interval
u. explain how to use predictive analytic techniques to draw insights and make
recommendations
w. define sensitivity analysis and identify when it would be the appropriate tool to use
x. demonstrate an understanding of the uses of simulation models, including the Monte
Carlo technique
y. identify the benefits and limitations of sensitivity analysis and simulation models
z. demonstrate an understanding of what-if (or goal-seeking) analysis
aa. identify and explain the limitations of data analytics
bb. utilize table and graph design best practices to avoid distortion in the communication
of complex information
cc. evaluate data visualization options and select the best presentation approach (e.g.,
histograms, boxplots, scatter plots, dot plots, tables, dashboards, bar charts, pie charts,
line charts, bubble charts
dd. understand the benefits and limitations of visualization techniques
ee. determine the most effective channel to communicate results
ff. communicate results, conclusions, and recommendations in an impactful manner
using effective visualization techniques

© Becker Professional Education Corporation. All rights reserved. Module 5 6–71


5 F.4. Data Analytics: Part 2 PART 1 UNIT 6

LOS 1F4n 1 Types of Analytic Models

There are various types of analytic models, each with strengths and limitations. Some models
are basic in structure and less complex in application. Others require a deep level of statistical
knowledge and a team of analytic consultants to apply the models to consumer data.

1.1 Clustering
Cluster analysis is a technique used to determine whether a large group contains two or more
cohesive subgroups, where each subgroup's members are more similar to members of their
own subgroup than they are to members of other subgroups. Cluster analysis is used to target
prospective consumers as well as existing customers, often with differentiated products or
advertisements targeted to the preferences of each subgroup.
Cluster analysis is also used to identify characteristics which unite and characteristics which
separate the customer base, so that products and advertisements can be effectively targeted
toward those subgroups. The benefit of this is multifold. Companies can save marketing dollars
by avoiding marketing efforts to consumers with a low propensity to respond to an ad. It also
helps companies better understand their customers so better, more relevant products can be
developed and sold. Products and services that are differentiated from the competition will be
more likely to satisfy consumers within the targeted demographic group.

1.2 Classification
Classification analysis attempts to accurately label data points as belonging to an established
group. Binary classification is when there are two options and analysis is used to choose the
more likely of those two options. For example, in conversion prediction, data is used to predict
whether individual customers are likely to purchase a product. Data is gathered about known
cases where existing customers purchased products, and data is gathered about many other
people to analyze which people most resemble the existing customer base and may be more
willing to make a purchase.
There are other types of classification analyses in addition to the binary application. In cases
in which one option is rarer than the other, imbalanced classification techniques are used to
isolate anomalies. This is highly relevant for fraud detection techniques, because fraudulent
transactions are rarer than non-fraud transactions. Similar techniques are used to determine
outliers during the data-cleaning phase of other analysis tasks.

1.3 Regression
Regression is an advanced mathematical analysis that produces an equation to determine the
relationship between variables (independent variables and a dependent variable). This approach
is applied if there is a suspected underlying linear relationship between two or more variables,
but it can also be applied in the event no previous relationship is known. This is the kind of
relationship that diagnostic analytics is intended to discover. Once a testable relationship has
been established, such as "viewing a certain advertisement increases the quantity purchased,"
regression analysis can be used to test the strength of the relationship, quantify that
relationship, and apply it to other data sets to infer how many individuals are likely to respond in
a similar manner.

6–72 Module 5 F.4.All


© Becker Professional Education Corporation. Data Analytics:
rights reserved.Part 2
PART 1 UNIT
5 6 F.4. Data Analytics: Part 2

1.3.1 Simple Regression LOS 1F4o


A simple regression equation is a mathematical representation of the relationship between LOS 1F4p
the observed (or independent) variable and the variable the model attempts to predict (the
dependent variable).
Simple regression has only one independent variable. The relationship between the dependent
and independent variable is typically theorized to be linear. To find the linear equation, the
statistics software performs calculations to find the squared distance between each observation
and each possible regression line, identifying the line on which the sum of those squared
distances is the smallest. This is referred to as the line of best fit. The resulting line quantifies the
relationship between the two variables, because it best fits the observed data.
The equation for the regression line is:

Y = β0 + β1X + ε

Where:
Y = dependent variable
β0 = intercept
β1 = slope
X = independent variable
ε = random error

Dependent Variable: The outcome estimated by the regression equation based on its
relationship to the independent variable.
Independent Variable: The variable theorized to cause or drive the value of the dependent
variable.
Intercept: The value that the dependent variable has when the independent variable is zero.
Slope: The quantified rate of change between the independent variable and the dependent
variable. It is usually read as "for each one-unit increase in X, there is a β1 increase in Y."
Random Error: Any remaining variation that cannot be accounted for in the model. If there
are other independent variables acting on the dependent variable which are not in the
model, then their effect is also contained in the error term. If all of the source data perfectly
fit onto a line, the error term would be zero.

Illustration 1 Simple Regression

A student is studying for an important test and would like to know the relationship between
the number of hours students study for the exam and the grade the students achieve on
the exam. The student polls other students who have taken the test in prior semesters and
gathers the following information on time studied and exam score:

(continued)

© Becker Professional Education Corporation. All rights reserved. Module 5 6–73


5 F.4. Data Analytics: Part 2 PART 1 UNIT 6

(continued)

The independent variable Hours Studied


(X) is the hours studied, the Grade (Y) (X)
variable that can be observed 78 3.5
and/or controlled. 80 5
The dependent variable 67 2
(Y) is the test grade, the 95 7
variable the student is trying 98 6
to predict based on hours 76 6
studied.
71 5.5
50 2
92 3
85 8

Grade Achieved per Hours Studied


120

100 This is a scatter plot of


80 the data, which appears
60
to be roughly linear.
40

20

0
0 2 4 6 8 10

The student inputs this data into an application that can perform linear regression and gets
the following regression equation:

Y = 59.5 + 4.1X + ε

Grade Achieved per Hours Studied


120
100 This regression equation
80
represents the formula
for the line of best fit for
60
these data points.
40

20

0
0 2 4 6 8 10

According to this formula, a student can expect a grade of 59.5 on the exam with zero
hours of study and an improvement of 4.1 points for every hour studied.

6–74 Module 5 F.4.All


© Becker Professional Education Corporation. Data Analytics:
rights reserved.Part 2
PART 1 UNIT
5 6 F.4. Data Analytics: Part 2

1.3.2 Correlation Coefficient LOS 1F4q


The correlation coefficient (r) is the measure for how closely the regression equation fits the LOS 1F4t
data. This measure has a range of –1 to 1. A correlation coefficient of –1 means the dependent
and independent variables have a perfect negative linear relationship (an increase in X means
a proportional decrease in Y, with each data point in the data set on the regression line).
A correlation coefficient of 1 means the variables have a perfect positive linear relationship (an
increase in X means a proportional increase in Y, with each data point in the data set on the
regression line).

Illustration 2 Checking for Correlation

There is clearly a positive relationship


Grade Achieved per Hours Studied between the number of hours studied and
120 the exam grade achieved, but it is not a
100 perfect linear relationship.
80
Using the linear regression application, the
60 student determined that the correlation
40 coefficient is 0.593, indicating a positive
20 but not particularly strong relationship
between hours studied and exam score.
0
0 2 4 6 8 10

1.3.3 Coefficient of Determination (Goodness of Fit)


The coefficient of determination (also called R-squared) measures the ability of a model to
explain the outcome (the Y values or dependent variables) given the X values or independent
variables used in the regression model. It is a measure of the goodness of fit of the regression
line. The coefficient of determination is called R-squared because it is calculated by squaring the
coefficient of correlation (r). The coefficient of determination varies between zero and 1 and is
read as a percentage.

Illustration 3 Interpreting the Coefficient of Determination

A coefficient of correlation of 0.593 yields an R-squared of 0.35, which means that only
35 percent of the variation in test scores is explained by the variation in the hours studied.
There must be other factors that influence test scores, but these are not taken into account
in this equation.

1.3.4 Standard Error of the Estimate


The standard error of the estimate is a measure of the accuracy of the relationship predicted
by the regression analysis. It is the standard deviation of all errors between actual observations
and the predicted dependent value for those independent values. As the number of data points
increases, the standard error of the estimate becomes smaller.

© Becker Professional Education Corporation. All rights reserved. Module 5 6–75


5 F.4. Data Analytics: Part 2 PART 1 UNIT 6

1.3.5 Confidence Interval


A confidence interval is a measure of the degree of certainty when applying statistical methods.
Simple linear regression yields two parameters: β0, the intercept or value of the dependent
variable when the independent variable is 0, and β1, the slope of the regression line. These
parameters in the regression equation are estimates and are not perfectly accurate unless the
correlation coefficient is either –1 or 1. A confidence interval for β0 or β1 is the probability that
the actual parameter values fall between two set values. The most common confidence intervals
reflect confidence levels of 95 percent or 99 percent.

Illustration 4 Confidence Intervals

The intercept in this study is the grade expected with zero studying, or 59.5. Based on the
methods used by the student, the student is 95 percent confident that the real intercept
value is between 35.8 and 80.1. Similarly, the model shows that each hour of studying
results in 4.1 points of grade improvement. The student is 95 percent confident that the
actual slope is between (0.44) and 8.67. These wide ranges reflect the imprecision of this
linear regression model.

1.3.6 Multiple Regression


Multiple regression is a form of regression in which the equation has multiple independent
variables. The key difference between simple regression and multiple regression is that multiple
regression assumes that more than one factor influences the dependent variable.
The multiple regression equation is:

Y = β0 + β1X1 + β2X2 + … + βnXn + ε

Where:
Y = dependent variable
β0 = intercept
βn = the slope for the independent variable Xn
Xn = the independent variables
ε = random error

Note that there is only one intercept and one error term, but multiple independent variables and
multiple slopes.

6–76 Module 5 F.4.All


© Becker Professional Education Corporation. Data Analytics:
rights reserved.Part 2
PART 1 UNIT
5 6 F.4. Data Analytics: Part 2

Illustration 5 Setting Up Multiple Regression

An analyst wants to determine whether the market price of a home can be calculated from
factors describing that home. Each factor is an independent variable and the sales price
of the home is the dependent variable. The data set contains listings for several hundred
homes for sale in two different suburbs of a major Midwestern city along with zip codes
identifying the suburb, the number of bedrooms and bathrooms, the size of the house and
of the yard, when the house was built, and a 0–30 score for the quality of the public school
district. A portion of the data set follows:

Price Address Zip Beds Baths Sq. Ft. Sq. Ft. Lot Built School
X1 X2 House X3 X4 X5 X6
379,900 350 Carpenter Rd. 43230 4 2.5 2,268 20,908.8 1987 22
285,900 380 Woodside Meadows 43230 4 2.5 2,002 7,405 1998 22
300,000 6128 Renwell Ln. Unit 87 43230 3 2.5 2,385 10,890 2004 10
399,900 471 Preservation Ln. 43230 4 2.5 2,796 10,890 2008 21
320,000 908 Ludwig Dr. 43230 4 2.5 2,838 11,325.6 1990 22
279,900 194 Winfall Dr. 43230 4 2.5 2,509 10,454.4 1994 21
280,000 4167 Guston Pl. 43230 3 2.5 3,380 12,196.8 2002 26
329,000 586 Pinegrove Pl. 43230 3 2.5 1,953 7,841 1998 22

The goal of the regression is to see if the data can be used to determine the market price
for a house in the area.
The initial run of the regression model against the data set produces the following
regression equation:
Y = $252,599 + 5,742X1 + 2,598X2 + 1,122X3 + 33X4 + 0.61X5 + 1168X6
The coefficient of determination or R-squared is determined to be 24 percent, meaning the
model explains 24 percent of the variation in the sales price of the home. The analyst will
most likely revise this model to find one with a stronger goodness of fit.

1.3.7 Benefits of Regression Analysis LOS 1F4s


Companies that accurately apply regression can make more informed business decisions. By
using business conditions as independent variables in the equation, output can be reasonably
predicted with statistically significant results. This can improve the precision of business decision
making.

1.3.8 Limitations of Regression Analysis


The results of regression analyses can only be generalized to populations that resemble the
sample used to create the model.
To create well-constructed regression models, it is necessary to understand the data in order to
properly interpret and apply results. Making business decisions based on the incorrect use of
this tool may lead to poor business outcomes.
The accuracy of regression is limited if the sample size is small. The definition of a healthy
sample size depends on the variation within the sample, but samples of 30 or fewer are
generally unlikely to produce statistically significant results.

© Becker Professional Education Corporation. All rights reserved. Module 5 6–77


5 F.4. Data Analytics: Part 2 PART 1 UNIT 6

LOS 1F4r 1.4 Time Series Analysis


Time series data is data collected over a period of time. Time series data can be graphed in
order to show patterns over time, with time on the x-axis and the observed values on the y-axis.
Visualizing time data can be used to separate trend lines, seasonality, and cyclicality from
irregular time patterns.
A trend line is the line that best fits the data across the time series. This line is calculated
using regression analysis and the slope shows the average overall direction of the observed
values on the y-axis.

Illustration 6 Times Series Trend Line

A company's sales revenue figures over time are plotted on the chart below:

Sales Revenue
Jan $498,729 $1,600,000

Feb $574,388 $1,400,000


Mar $994,214
$1,200,000
Apr $1,220,303
May $1,417,967 $1,000,000

Jun $1,448,811 $800,000


Jul $1,296,358
$600,000
Aug $1,420,278
Sep $1,209,389 $400,000

Oct $1,024,372 $200,000


Nov $1,296,817
0
Dec $1,315,316 Jan Feb Mar Apr May Jun Jul Aug Sept Oct Nov Dec

The solid line on the graph is the actual sales and the dotted line is the trend line.
Generally, sales are increasing over time.

Seasonality occurs when time series data exhibits regular and predictable patterns over
time intervals that are less than a year.
Cyclicality occurs when time series data rises and falls over periods of time that are not fixed
or predictable and are generally over periods of more than one year.

Illustration 7 Cyclical Trend Analysis

Global temperatures are rising in a nonlinear way. There is irregular variation in daily
temperatures. There is seasonal variation with hotter temperatures in the summer than
in the winter. There is also cyclical variation in which temperatures rise and fall over time.
Some cycles are decades long and some may be centuries long. In order to accurately
forecast temperature over time, all of these cycles must be identified and accounted for in
the analysis.

6–78 Module 5 F.4.All


© Becker Professional Education Corporation. Data Analytics:
rights reserved.Part 2
PART 1 UNIT
5 6 F.4. Data Analytics: Part 2

2 Predictive Analytic Techniques LOS 1F4u

Predictive analytics generally fall into two types: forecasting analytics and referential analytics.

2.1 Forecasting Analytics


Forecasting uses modeling techniques to predict future events on the basis of past events, past
conditions, and expected future conditions, which could differ from past and current conditions.
Sophisticated forecasting models incorporate the effects of likely changes so contingency-based
recommendations can be provided.

Illustration 8 Forecasting Budgets

Forecasting budgets in business may begin with regression analysis to identify the business
components that have the strongest effect on costs and revenues, such as advertising,
materials purchases, labor used, hours of operation, etc.
Next, trend analysis can be performed to determine any seasonality, such as a seasonal
demand for outdoor grilling equipment or snow shovels, and any cyclical effects on the
business, such as macroeconomic business cycles.
By calculating the effects of all these influences, an analyst may predict next month's (or
quarter's or year's) revenues and expenses with greater precision.

2.2 Referential Analytics


Referential analytics use modeling techniques to predict what will happen to one group based
on known data about what happened to another group (the reference group). Generally,
referential analysis is used for near-term events, so it is less important to try to predict future
conditions.

Illustration 9 Using Referential Analysis

A company that wants to expand into a new geographic territory tasks its analysts with
studying the quantity and types of customers and competitors in the proposed new
territory. Analysts compare the new territory with the company's existing territories that
have similar characteristics. When a similar territory is identified, the company refers to
existing data on the effectiveness of its past efforts in that territory (advertising, sales price,
competition, etc.) and applies that data to the proposed new territory.
In this way, management can plan its strategy in the new territory based on the results
from the existing territory. It may be necessary to adjust the plan based on any remaining
differences between the two territories.

© Becker Professional Education Corporation. All rights reserved. Module 5 6–79


5 F.4. Data Analytics: Part 2 PART 1 UNIT 6

LOS 1F4l 2.3 Challenges of Fitting Analytic Models to the Data


The challenges in fitting the analytic model to the data may be broken down into two areas:
Although it is important to build a model using a large quantity of high-quality data, only
a fraction (from 10 percent to 30 percent) of the available data should be used to build
the model (called training data). The remaining data is reserved for testing, but it must be
similar to the training data because there should be no systemic differences between the
training and testing data. A model that perfectly fits the test data is likely to be overfitted,
meaning it explains the test data so well that it is less effective when used on other data.
It can be difficult to know when to stop building the model. Once several versions of the
model have been built and trained, those versions are all run against the test data set. The
performance of those different versions of the model are compared to find which most
accurately measures the data, taking into account the rates of false positives, true positives,
false negatives, true negatives, and the ratio of positives to negatives. The model that
performs best is selected, unless the model can be improved, in which case the training and
test data sets are often shuffled back together and a new random subset is taken for the
training data, with the process beginning again after modifications.

LOS 1F4aa 2.3.1 Limitations of Data Analytics


There are three categories of data analytics limitations:
1. Technical limitations imposed by the computing hardware and software: The file sizes
for data sets and the amount of computing memory necessary to run analytics software
are very large and very expensive. The ability of a company to invest in data analytics
infrastructure may limit the analytics that the company is able to produce.
2. Available human capital: Data scientists have a highly specialized skill set; the quality of data
analytics depends on their expert judgment. The quality of data analytics performed by a
company is directly affected by the quality of the data analysts hired by the company.
3. The data itself: Ongoing expense and effort must be spent to continuously gather large
quantities of timely, relevant, high-quality data on which to perform analytics operations in
order to produce high-quality results to inform business decisions.

3 Other Analytic Methods

Other analytic methods include sensitivity analysis (also known as what-if analysis), and
simulation analysis.

LOS 1F4w 3.1 Sensitivity Analysis


LOS 1F4y Sensitivity analysis is used to investigate how the results of an analysis could change if
adjustments are made to inputs and/or independent variables. Sensitivity analysis is most often
LOS 1F4z used when the relationships driving dependent variables are not explicitly or reliably identified,
but the effects of the variables are quantified.
Sensitivity analysis is also called "what-if" analysis because it seeks to answer the question:
"What would happen in this situation if this input factor changed?" The main benefit of sensitivity
analysis is that it helps management identify tipping points at which business conditions may
require management action or intervention. A limitation of sensitivity analysis is that because
underlying relationships are not defined, any change in the underlying relationships after the
sensitivity analysis model is developed may lead to a misidentification of the tipping points.

6–80 Module 5 F.4.All


© Becker Professional Education Corporation. Data Analytics:
rights reserved.Part 2
PART 1 UNIT
5 6 F.4. Data Analytics: Part 2

Illustration 10 Sensitivity Analysis in Practice

A company may not have statistically verified every relationship that leads from advertising
to revenue (such as specific receptiveness of every component in the message to each
cluster-defined subgroup), but it has enough data points (e.g., this much additional
advertising led to this much additional revenue) to create an exploratory data analysis
model. The company can then use sensitivity analysis to estimate what level of advertising
expenditure is likely to produce a desired rise in revenue. Alternatively, a company
can calculate how sensitive its investment portfolio is to changes in the stock market
by calculating the expected income from all its investments at various possible future
market conditions.

3.1.1 Goal-Seeking Analysis


Goal-seeking analysis is a subcategory of sensitivity analysis in which the focus is on achieving
a particular set of outputs. If a company is interested in achieving a specified level of revenue,
then inputs can be adjusted in several ways, such as increasing sales incentives, changing
advertising dollars spent, and/or changing price levels. Each change will have possible secondary
effects. Goal-seeking analysis attempts to discover a formula in which the examined inputs can
be altered relative to one another and arrive at the intended goal.

3.2 Simulation Models LOS 1F4x

Simulation modeling is a risk-analysis tool that simultaneously performs multiple sensitivity LOS 1F4y
analyses to find the business conditions resulting in the most extreme, yet acceptable, business
outcomes. If management later finds that the business environment has crossed any of the
identified critical thresholds, management will know which actions must be taken until the
environment changes. Managers can otherwise remain confident that as long as business
conditions do not exceed any of these boundaries, they can expect acceptable results.
Simulation models also have the same main benefit (identifying tipping points) and limitation
(potential for misidentification of tipping points) as sensitivity analysis.

3.2.1 Monte Carlo Technique


The Monte Carlo technique is a type of simulation model with both multiple variables and
multiple assumptions about the distribution of the data. The Monte Carlo technique runs many
analyses at once, and for each variable being studied, every possible type of distribution is
presumed. As the Monte Carlo simulation processes data, highly unlikely distribution types drop
from the analysis until only the most likely ones for the studied data set remain. This means that
the Monte Carlo simulation does not result in a single answer for the analytic question posed,
but gives a range of likely results, each with a calculated probability of occurring.

4 Data Visualization

Data visualization is the pictorial or graphical representation of information displayed to make


trends and key concepts visible. The concept of data visualization has evolved in the era of Big
Data. Traditional charts and graphs are now applied to information on a larger scale, and then
enhanced with more nuanced graphical adaptations.

© Becker Professional Education Corporation. All rights reserved. Module 5 6–81


5 F.4. Data Analytics: Part 2 PART 1 UNIT 6

LOS 1F4dd 4.1 Benefits and Limitations of Visualization Techniques


4.1.1 Benefits
Visualization helps those analyzing data communicate the data more effectively, which results in
the following benefits:
Reduces time needed to communicate key findings
Highlights otherwise unnoticeable trends or statistics
Reveals subtle trends in an obvious way
Simplifies complex data
Identifies patterns and relationships

4.1.2 Limitations
Although data visualization can be used to quickly communicate patterns in data, a visualization
is a summary that may lack the precision of the actual underlying data. Additionally, data
visualizations are limited by the viewer's ability to understand the message conveyed by the
image. Written descriptions or tables can be used to enhance the user's understanding of the
data visualization.

LOS 1F4ee 4.2 Determining the Best Channel to Communicate Results


Data visualizations are made more effective by isolating the key concept that needs to be
communicated and then determining the visualization that best conveys the concept. For
instance, in a grouping of light-blue dots, most humans can distinctly see a single red dot located
in the group. Likewise, for a series of bars of the same height, a bar that is double the length of
all other bars stands out. Absent these physical differences, visuals can have numerical overlays
that supplement key findings with a percentage or an index.

LOS 1F4cc 4.3 Evaluating Data Visualization Options


The options available for data visualization are infinite, but several traditional types are
routinely used. One key to selecting the best visual is understanding the audience's background,
knowledge (or lack thereof), quantitative capability, and profession. Selecting charts for data
scientists and astrophysicists will be very different from selecting a chart for high school
students. This makes it important to define the target audience prior to visual selection. If a
message is for a large audience, the selection should be based on what could reasonably be
understood by the lowest cognitive level of the audience (e.g., if the message is for all drivers
then the visual must be comprehensible for 15- and 16-year-old potential drivers).
The use of certain visuals may be associated with different industries and can commonly be
found in literature or training materials for those industries. For example, line graphs are used
in the investment profession; anatomical or biological icons and visuals are used in health
care; and flowcharts are used to understand processes in business, engineering, or project
management.
Many data visualization methods obscure the precise values of individual observations, either by
aggregating them into a bar or a pie slice, or by rendering their value as a position against a less
precise scale, like a scatter plot or a line graph. How speedily the audience needs to grasp the
basic patterns present, and how much precision the audience requires, along with the nature of
the information being presented, will determine the types of visualization used.

6–82 Module 5 F.4.All


© Becker Professional Education Corporation. Data Analytics:
rights reserved.Part 2
PART 1 UNIT
5 6 F.4. Data Analytics: Part 2

4.3.1 Tables and Dashboards


A table is only a visualization method in that it presents a subset of data for easier viewing, such
as one or two key variables and a small number of observations or summaries of observations.
Tables are used when finding the precise values of a limited number of observances is crucial. The
cost of maintaining this precision is the inability to quickly communicate patterns through images.
A dashboard is a collection of data visualization techniques optimized to quickly communicate
key performance indicators to management and executives. A dashboard should be viewable on
one screen or page. Management may check key indicators every day, and only take action if an
unusual or interesting condition is manifested.

4.3.2 Bar Charts, Line Charts, and Stacked Charts


Trends generally have a linear component and involve time as a measure. Histograms, or bar
charts, use the length of a bar to represent the frequency or magnitude within a category. They
are used to visualize data when there are few important categories that encompass all the data,
such as defined time periods in a series.

Bar Chart

Growth
rate

Quarter

A line chart shows a frequency or a magnitude in one dimension against a data metric in the
other dimension, very similar to the way a histogram is set up. The difference is that instead
of multiple bars showing that frequency or magnitude, an indicator point is placed where the
top of the histogram bar would be, and then these points are connected with line segments or
curved lines. Reducing the data from a bar to a line allows multiple variables to be presented in
the same space, such as sales from two different departments.

Line Chart

© Becker Professional Education Corporation. All rights reserved. Module 5 6–83


5 F.4. Data Analytics: Part 2 PART 1 UNIT 6

A stacked chart combines the qualities of a histogram and a line chart, giving a visual
representation at all points of the difference between the several variables being presented.
All variables extend from their tops to the X axis; the value is not just the visible portion of the
variable. This visualization is best used when the included variables do not cross each other. If
the data points cross, a line chart should be used, as shown above.

Area/Stacked Chart

(in thousands)
Revenue

Jan Mar May Jul Sep Nov

Months

4.3.3 Dot Plots


A dot plot is a two-dimensional mapping of observances onto a coordinate plane, with one
dimension representing the frequency of observations of the other dimension.

Dot Plot
3.5
3

2.5
Frequency

1.5
1

0.5

0
0 $100,000 $200,000 $300,000 $400,000 $500,000

Price

This visualization is used when there are discrete and repeated instances of the observations.
Dot plots are used when the analyst wants present, individual observations in the visualization.
This is the primary difference from histograms, in which the individual observations are hidden
in the bars.

6–84 Module 5 F.4.All


© Becker Professional Education Corporation. Data Analytics:
rights reserved.Part 2
PART 1 UNIT
5 6 F.4. Data Analytics: Part 2

4.3.4 Flowcharts
Processes that have a beginning, middle, and end can be mapped using flowcharts. These
visuals allow learners to see potential options that may take place during a given process.

Flowchart

4.3.5 Pie Charts


Data points that sum to a whole can be illustrated by using pie charts. Pie charts show the
relative size of the component data sets.

Pie Chart

19% 23%

12%

46%

Percentage of Annual Revenue

© Becker Professional Education Corporation. All rights reserved. Module 5 6–85


5 F.4. Data Analytics: Part 2 PART 1 UNIT 6

Traditional pie charts are susceptible to distortion. Studies show that people often
underestimate the proportion of obtuse-angled sections of a pie chart and overestimate the
proportion of acute-angled sections. This angular distortion can be addressed using a doughnut
chart, which is a pie chart with the center removed, sometimes so that several pie charts can be
stacked to more easily compare proportions.

Doughnut Chart

33.2%

39.1%

45.3%
11.4% 9.3%
7.7% 6.1%
16.9%
50.7%
7.7% 4.8%
11.8% 2.9%
1.8%

2015 2016 2017 2018 2019


60.5%
4%

24.9%

37.1%

47.5%

56.2%

Pie charts are often used in marketing when the goal is for the audience to interpret the data
in favor of the marketing message. Pie charts are less suited to presenting management or
executives with actionable summaries of business information.

4.3.6 Boxplots, Scatter Plots, and Bubble Charts


A boxplot displays the distribution of data, showing the minimum, quartiles, median, and
maximum. A boxplot can show outliers and can show whether data is symmetrical, widely
distributed, or skewed.

Boxplot
Scale

Outlier/single data point Median Upper extreme


Lower extreme Lower quar�le Upper quar�le
Whisker

6–86 Module 5 F.4.All


© Becker Professional Education Corporation. Data Analytics:
rights reserved.Part 2
PART 1 UNIT
5 6 F.4. Data Analytics: Part 2

Scatter plots illustrate correlation by mapping data observations in two dimensions.

Scatter Plot

Dollars spent per visit

Dollars spent

Bubble charts are a type of scatter plot that uses the size of the dots to help the user visualize
magnitude or other relational quality.

This bubble chart, known as the Hans Rosling plot, uses data from the OurWorldInData
organization. The relative size of each dot represents the relative size of the population of the
countries represented. Further, the colors of the dots indicate the continents on which the
countries are located.

© Becker Professional Education Corporation. All rights reserved. Module 5 6–87


5 F.4. Data Analytics: Part 2 PART 1 UNIT 6

4.3.7 Directional Chart


Highlighting key events or milestones over time can be depicted using directional charts, with
the earliest data and event beginning on the left and the ending event on the right.

Directional Charts

4.3.8 Pyramid Chart


Understanding underlying foundations or building blocks can be effectively portrayed using a
pyramid chart. These are most helpful when the bottom layer represents an action or a target
that must first be achieved before the next layer up can take place.

Pyramid

Fats, oils,
and sweets

Milk, yogurt, Meat, poultry, fish,


and cheese dry beans, eggs,
group and nut group

Vegetable group Fruit group

Bread, cereal, rice, and pasta group

6–88 Module 5 F.4.All


© Becker Professional Education Corporation. Data Analytics:
rights reserved.Part 2
PART 1 UNIT
5 6 F.4. Data Analytics: Part 2

4.3.9 Waterfall Chart


Conveying the cumulative effect of each new or incremental piece of information can be
accomplished using a waterfall chart. As each data point is added, its effect is charted against
the total of all data points. In the following graphic, the total change in profit is shown in the first
line. In the subsequent part of the chart, the amount of the change in profit is shown over time,
making up the total of the profit change percent.
Waterfall Chart

Profit
change
%

Time

4.4 Communicating Using Visualization Techniques LOS 1F4ff

Communicating conclusions or recommendations requires more precision than just reporting all
results. This can be done by modifying traditional charts using figure overlaps, relative scaling,
geographical overlays, gradient colors, colors that correlate for the same data point, sorting
data, or by making a visual interactive.

Illustration 11 Visualization Enhancement

Enhancement Description Visual


Emphasis An example of enhancing traditional charts is
A
to enlarge a section for emphasis in a pie chart
or convert a traditional pyramid into puzzle-like B

figures to show interconnectedness. C

Combinations Combining chart concepts can bring a layer of 25%

depth to a visual, or just be more aesthetically 19%

appealing to the eye. 4%

7% 15%

8%
12%

10%

(continued)

© Becker Professional Education Corporation. All rights reserved. Module 5 6–89


5 F.4. Data Analytics: Part 2 PART 1 UNIT 6

(continued)

Enhancement Description Visual


Interconnectedness Overlapping shapes in visuals show relationships
or interconnectedness between data points.
1 2

Relative size Heat maps and Mekko maps illustrate relative


size. Heat maps are graphical illustrations
that use different colors to convey strength or TECHNOLOGY FINANCIAL SVCS.
3%
RETAIL AUTO

concentration and may be used to help visualize 25% 2%


15%

complex information by utilizing height, width,


9%

and a light-to-dark color scale. Mekko maps


1%
17% 10%
3%

are common in comparing market share or the


Low High

volume of stock trades (or market cap) in a day.


Scaling/magnitude Bubble charts incorporate scaling, which shows 500
450

the relative magnitude of data points. 400


350
300
250
200
150
100
50
0
20X1 20X2 20X3 20X4 20X5 20X6 20X7

Similarity Using colors for grouping similar data points such


as products, geography, or time periods can be
effective.

PRODUCT PRODUCT PRODUCT


GROUP GROUP GROUP
A B C

Geography Geographical overlays can also be useful as they


can illustrate not only the region to which a data
point is related, but also the magnitude of that
data.

Sequence Simply sorting data in charts is effective in Store M


revealing sequences, relative order, or volume Store X

patterns. Store Z

Store Q

Store L

Store A

Store R

Gradient/filters Applying gradient colors in charts can add Filters


- Time period

depth to trends by indicating volume, density,


- Price point
- % Change
- Competitors

or concentration. Also, providing filters for - Moving average

users to engage is effective, making the chart Price

interactive so the user can tailor the information


to individual needs.
Time

6–90 Module 5 F.4.All


© Becker Professional Education Corporation. Data Analytics:
rights reserved.Part 2
PART 1 UNIT
5 6 F.4. Data Analytics: Part 2

4.4.1 Best Practices LOS 1F4bb


In order to effectively communicate key findings and recommendations, there are several
considerations that can help to avoid confusion, prevent distortion, and minimize cognitive
overload.
Minimize Colors: Limit the number of different colors or shades from four to six to avoid
distraction.
Darken for Emphasis: Use light colors for insignificant details to place emphasis on the
key points.
Label Sparingly: Use labels when accuracy is necessary for otherwise indistinguishable
data points.
Minimize Depth: Three-dimensional charts displayed on two-dimensional pages/screens
require forced perspective. Unnecessary depth may create proportion or scale distortions or
draw attention to a meaningless difference, causing misinterpretation.
Avoid Over Slicing: Use pie charts with only four or five data points to avoid crowding and
confusion. If more must be used, enhance the pie chart by choosing bold colors for the one
or two key pieces and diminishing the other sections with a gray scale.
Minimize Visual Noise: Minimize the number of data points charted unless using a scatter
plot or similar chart to emphasize volume.
Minimize Legends: Avoid using legends with more than four or five colors. As the number
of colors increases, visually matching the legend to the chart becomes difficult, reducing the
effectiveness of the visualization.
Avoid Absolutes: Definitive language or superlatives (absolutes) should be avoided when
describing or labeling visuals, including "best," "worst," "always," or "never." Instead,
use "preferred," "unfavorable," "frequently," or "seldomly," which suggest a more likely
interpretation to the reader.
Maintain Appropriate Scale: Unless there is a compelling reason not to, scales should:
y Exist
y Be of uniform size at all points on the visualization
y Start at zero
y Not use excessively small or large units
y Be unbroken
y Not omit a range of values
Avoid Bias: Information stewards must present information in an unbiased way that does
not mislead users.
Use the Same Time Period: When making comparisons, use the same time periods. For
example, a company's year-over-year changes in net income should not be compared at
calendar year-end with the results of a company that uses a different fiscal year-end date.
Economic circumstances, such as changes in consumer demand or producer prices, could
be significantly different.

© Becker Professional Education Corporation. All rights reserved. Module 5 6–91


5 F.4. Data Analytics: Part 2 PART 1 UNIT 6

Illustration 12 Visual Distortion

Messages from data can be manipulated by reframing parameters. The tables below show
the exact same data; however, the Y axis in Figure 1 has a minimum value of 0 percent.
The Y axis in Figure 2 has a minimum value set at about 25 percent, emphasizing the
incremental difference between Company X and Company Y's growth in annual revenue.
Figure 2 distorts the difference, which may be misleading for users.

Figure 1 Figure 2

Annual Annual
revenue growth revenue growth
% %

Y Y

Illustration 13 Importance of Scale

If pictures are used to represent data, be careful to scale them appropriately. In the
following images, the vertical axis is faithful, and Oscar has eaten twice as much pizza
as Shelly. However, the image of the pizza has been scaled in two dimensions, making it
appear that Oscar has eaten four times as much.

Slices of Pizza Eaten Slices of Pizza Eaten

5 5
Slices of pizza eaten

Slices of pizza eaten

4 4

3 3

2 2

1 1

0 0
Shelly Oscar Shelly Oscar

6–92 Module 5 F.4.All


© Becker Professional Education Corporation. Data Analytics:
rights reserved.Part 2
PART 1 UNIT
5 6 F.4. Data Analytics: Part 2

4.4.2 Faithful Representation


Data visualizations should not omit data. The following two images are constructed from the
same data, but a trend is apparent on the left that does not exist when all of the data is faithfully
presented on the right.

Misleading Bar Graph No Years Missing From Horizontal Axis


"Hurricanes Increasing in the 1990s" Bar Graph Containing Complete Data—"Hurricanes in the 1990s"
12 12

10 10
Number of hurricanes

Number of hurricanes
8 8

6 6

4 4

2 2

0 0
1992 1994 1995 1997 1999 2000 1992 1993 1994 1995 1996 1997 1998 1999 2000

Years Years

This issue becomes more complex when three-dimensional images are used to visualize data.
Because the typical data visualization has only two dimensions to work with (a flat page or a flat
screen), three-dimensional images often have to use a forced perspective to create the illusion
of depth. Forced perspective often means that the scale does not apply to all portions of the
image in the same way, which creates distortion.

4.4.3 Forecasting
When using visualization techniques in trend analysis, time series, or any other topic that mixes
actual data with projected, estimated, or predicted data, clearly indicate where the change is
from actual data to the projected, estimated, or predicted data.

Question 1 MCQ-12728

The OutdoorPeople Co. has identified several subgroups among the company's customer
base. These groups have particular combinations of age, wealth, geographic location, etc.
The company is about to release a new product and it wants to measure how much of
an effect the customer's wealth has on buying the product after viewing the advertising
message(s) for the product.
What kind of analysis will be most useful to answer OutdoorPeople's need for information?
a. Cluster analysis
b. Regression analysis
c. Fourrier analysis
d. Classification analysis

© Becker Professional Education Corporation. All rights reserved. Module 5 6–93


5 F.4. Data Analytics: Part 2 PART 1 UNIT 6

Question 2 MCQ-12729

Jacks Capital Inc. is putting together financial results for its annual report. Gains were
reported for each month during the past year but those were completely offset by heavy
losses in the last two months. If Jacks wants to show the relative cumulative incremental
impact of each month's results, which of the following charts would best illustrate that?
a. Scatter plot
b. Flowchart
c. Pyramid
d. Waterfall chart

Question 3 MCQ-12730

The Happy Smiles Co. has a new advertisement and the company has collected data from
focus groups about how effectively the advertisement leads to purchases of the company's
products. The company has measured the demographic information of the focus group
and has prepared a regression analysis. The output of that regression analysis is that the
correlation coefficient is 0.22, the coefficient of determination is 0.10, the standard error is
8,435.20, and the regression equation is:
Sales = 10,743 + 1.7 (Household income) – 8.2 (Average child age)
Which of these business decisions is most appropriate given the data?
a. Focus marketing messages on wealthy customers with young children.
b. Focus marketing messages on poorer customers with older children.
c. Do not produce or market this product.
d. The data supports none of these recommendations.

Question 4 MCQ-12731

A local bank is looking for any patterns in its data for which customers pay back their loans
and which ones do not. The data the company has decided to use is the final disposition
of the loan (paid or defaulted), the customer's income, the amount of the loan, and the
proportion between those two values.
Which of the following data visualization techniques would be most suited to facilitate the
recognition of any patterns present?
a. Bubble chart
b. Pie chart
c. Line graph
d. Flowchart

6–94 Module 5 F.4.All


© Becker Professional Education Corporation. Data Analytics:
rights reserved.Part 2
Class Question Explanations Part 1

UNIT 6

Unit 6, Module 1

1. MCQ-12692
Choice "d" is correct. The AIS differs from the decision support system (DSS) and the executive
information system (EIS) due to the high degree of precision and detail required for accounting
purposes. Accounting systems must have accuracy for reconiliations, transaction processing,
and other processes requiring detailed information that a DSS or EIS do not necessarily need.
The AIS provides data as an input to the DSS or the EIS. The AIS by itself does not have controls
to ensure management makes decisions that are in the shareholder's best interest.
Choice "a" is incorrect. Transaction processing is a subsystem of AIS that can initiate, stop,
manipulate, or report on transactions between a company and its suppliers and/or customers.
Choice "b" is incorrect. This choice describes a feature of the management reporting system,
which is a subsystem of AIS that enables internal managers to make financial decisions.
Choice "c" is incorrect. This choice describes features of the financial reporting system, which is a
subsystem of AIS. It is used for reporting to regulatory entities like the Internal Revenue Service
or the Securities and Exchange Commission, as well as to the public in order to meet filing
requirements.

2. MCQ-12693
Choice "b" is correct. An ERP is a cross-functional enterprise system that integrates and
automates business processes and systems to work together, including manufacturing,
logistics, distribution, accounting, project management, finance, and human resource
functions of a business.
ERP uses a single database architecture that allows data to be stored in a centralized repository
for information sharing.
Choice "a" is incorrect. ERP integrates both financial and nonfinancial systems and enables
systems to work together. A benefit of ERP is that it removes barriers in organizations that are
used to working in silos and not communicating with each other.
Choice "c" is incorrect. ERP systems can provide vital cross-functional and standardized
information quickly to benefit managers across the organization in order to assist them in the
decision-making process.
Choice "d" is incorrect. ERP systems act as the framework of integrating systems and therefore
improves the organization's tracking ability for its business functions.

© Becker Professional Education Corporation. All rights reserved. CQ–65


Part 1 Class Question Explanations

3. MCQ-12695
Choice "d" is correct. Enterprise performance management systems are software packages
designed to help a chief financial officer (CFO) conduct planning, create budgets, forecast
business performance, and consolidate financial results to align with the organization's vision
and strategy.
This choice correctly describes an EPM's key characteristic of combining operational and
financial data to drive the organization forward.
Choice "a" is incorrect. This choice describes a key characteristic of an enterprise resource
planning system.
Choice "b" is incorrect. An EPM system encourages a long-term focus by aligning strategic
objectives with actionable plans, tracking the progress with key performance indicators.
Choice "c" is incorrect. An AIS is a system used to process transactions and generate financial
statements.

Unit 6, Module 2

1. MCQ-12712
Choice "b" is correct. Vulnerability scans are proactive security measures that scan for known
weaknesses in hardware or software applications. They are active in nature, as opposed to
preventive, focusing on core pieces of a company's infrastructure including application-based
scans, network-based scans, port-based scans, device-based scans, and data storage and
repository scans.
This question states that the scan is for "known" weaknesses, which specifically refers to
vulnerability scans. It also lists two of the types of vulnerability scans, application-based and
network-based scans.
Choice "a" is incorrect. Penetration tests are not scans for known weaknesses, but rather they
are attempts by hired professionals to "hack" into a company's IT applications, systems, and
other network components. This intentional breach can be achieved by any means possible, as
opposed to a prescribed method of entry.
Choice "c" is incorrect. Biometric scans can be part of a vulnerability scan, but they generally
refer to access controls that utilize biometric characteristics such as an eye scan or fingerprint
scan in order to gain access.
Choice "d" is incorrect. Access controls are tools that prevent unauthorized access, not scan for
known weaknesses. These controls can strengthen weaknesses but do not identify them.

CQ–66 © Becker Professional Education Corporation. All rights reserved.


Class Question Explanations Part 1

2. MCQ-12713
Choice "c" is correct. Data publication is the phase in which information is disseminated to other
individuals, both internally and externally. It is the fifth step in the cycle after usage and prior to
archiving and purging.
While this question does mention data capture through the administration of a survey, the
question specifically asks about information that has already been published to others.
Managing miscommunications of inaccurate data to employees and customers falls within the
publication phase.
Choice "a" is incorrect. Data capture does take place in this example, but the problem is asking
about information that has already gone through that phase and has been disseminated.
Choice "b" is incorrect. Data synthesis is the phase in which data has value added and is
transformed, not information that is already in its transformed state.
Choice "d" is incorrect. Data archival refers to data that has already been captured, synthesized,
and publicized.

3. MCQ-12714
Choice "c" is correct. Data preprocessing is converting information into a form that adds value
through consolidation, reduction, and transformation. When consolidating files, similar data
points are aggregated into a single file that can require a cleansing step (usually a maintenance
activity), removing things such as inaccurate data, incomplete fields, or duplication records. That
data is then transformed into its new enhanced state.
Because data preprocessing transforms data, synthesizing it, it fits in the synthesis and analytics
phase of the data life cycle.
Choice "a" is incorrect. Data capture involves the initial obtainment of information, not adding
value once it has been captured.
Choice "b" is incorrect. Data maintenance focuses on the extract, transfer, cleansing, and load
phase of the life cycle, not the value-added phase.
Choice "d" is incorrect. Data purging is the final phase that deals with the removal of data, not
transforming it in the synthesis phase.

4. MCQ-12715
Choice "d" is correct. The COBIT® 2019 framework has several components, including inputs,
COBIT Core, Design Factors, Focus Areas, and publications.
While stakeholders are identified and considered in the framework, stakeholder validations are
not a component of the COBIT® 2019 framework. Key stakeholders include management and the
board of directors. Stakeholders can also be separated into internal and external.
Choice "a" is incorrect. Publications are a key component of the COBIT® 2019 framework that are
documents with information on implementing a governance system.
Choice "b" is incorrect. Design factors are a component in the COBIT® 2019 framework that
influence the design of a governance system.
Choice "c" is incorrect. Community input is a component of the COBIT® 2019 framework that
connects users with the framework itself.

© Becker Professional Education Corporation. All rights reserved. CQ–67


Part 1 Class Question Explanations

Unit 6, Module 3

1. MCQ-12696
Choice "d" is correct. Statements II and III are true. Waterfall and agile methods are two
strategies for implementing the systems development life cycle. Under the waterfall method,
phases do not overlap, and authoritative agreements pass a project from one phase to another.
Companies using the waterfall method create specialized teams for each phase of development.
Under the agile method, individual features of a project start and finish in each sprint; all phases
of the SDLC for that feature are executed in a single sprint. This requires cross-functional teams
that can perform all phases for that feature. Due to the complexity of building these teams, it is
a best practice to keep a team together and dedicated to one project.
Choices "a" and "c" are incorrect. The SDLC describes the steps (phases) by which a project is
initiated, developed, and used, and how the process is to begin again, making statement I false.
The SDLC is a circular process. This key difference in team structure between waterfall and
agile is reflected in every level of management and organizational structure in the company;
therefore, a company is generally organized around only one of these methods, making
statement IV false.
Choice "b" is incorrect. The SDLC describes the steps (phases) by which a project is initiated,
developed, and used, and how the process is to begin again, making statement I false. The SDLC
is a circular process.

2. MCQ-12697
Choice "d" is correct. Blockchain uses a distributed ledger so that many people have copies
of the history of use of each Bitcoin. When this person attempted to falsify a Bitcoin record,
the blockchain record on her computer disagreed with the many other copies. Because those
duplicate copies agreed with each other and they all disagreed with the one copy from this
person, the blockchain software concludes that the one different copy is fraudulent, so it denied
the transaction and changed the different copy to match the others.
Hash codes are the places in each "block" of the blockchain where transaction records are
kept, such as the date of the transaction and the public key of each party, after a great deal of
complicated math is performed on those records to encode them. In order to falsify a hash code,
one would need to know all the algorithms for encoding, as well as the dates and keys, and to
make a large fraction of the distributed ledger match the false values. This is currently believed
to be impossible.
Choices "a" and "c" are incorrect. Smart contracts are agreements that can result in blockchain/
Bitcoin transactions when compliance with the terms of the agreement can be observed online.
They do not prevent fraudulent transactions from occurring.
Two-factor authentication is the use of a second code or device after entering log-in credentials
to verify authorized access. This fraud is not based on gaining access but in falsifying records.
Choice "b" is incorrect. Smart contracts are agreements that can result in blockchain/Bitcoin
transactions when compliance with the terms of the agreement can be observed online. They do
not prevent fraudulent transactions from occurring.

CQ–68 © Becker Professional Education Corporation. All rights reserved.


Class Question Explanations Part 1

3. MCQ-12698
Choice "a" is correct. All of these errors contributed to the failure of the AI program. Training this
type of program is a very complex and lengthy process. The unit of measurement for how much
computing power is spent training is the amount of processing the human brain is theoretically
capable of in a year, and AI developers count those by the hundreds.
Training data must resemble the data that the AI program is expected to encounter once it is
deployed. As many different examples as possible, of the same type of transaction that it will be
asked to evaluate, should be within the training data set. Furthermore, the training data set as
a whole should contain approximately the same ratio of fraud to non-fraud as is expected after
deployment. If the AI program is trained to expect fraud half of the time, it will build a decision
algorithm that will expect, and therefore declare, fraud about half of the time. Because an AI
program can be trained into dysfunction, a testing phase should always be undertaken with
different data to evaluate the program's readiness for deployment.
Choice "b" is incorrect. These probably contributed to the AI program's ineffectiveness, but the
other options likely did as well.
Choice "c" is incorrect. These possibly led to the AI's ineffectiveness, but the other options
probably did as well.
Choice "d" is incorrect. The program's lack of performance could most likely be related to these
choices, but so could the others.

Unit 6, Module 4

1. MCQ-12722
Choice "c" is correct. Clustering analysis is a technique that is used to unite data points with
other data points that have similar characteristics, creating a "cluster" or profile of a subset of
the data set. Classification assigns data points with labels, classifying them into different groups
or assigning them into categories.
If there are groups within a data set that share common characteristics, cluster analysis is the
tool used to identify and define these groups. During later analysis, if the best fit of a data point
for a cluster is needed, then classification analysis would be applied.
Choice "a" is incorrect. Grouping individuals together is correctly called cluster analysis, but this
choice does not include item II, which is also correct.
Choice "b" is incorrect. Assigning particular individuals to their best-fit cluster is called
classification analysis. However, the same task is not simultaneously called time-series analysis.
That analysis is regression where one independent variable is time.
Choice "d" is incorrect. Regression analysis is not the grouping together of members of a data
set that share common characteristics. Regression analysis is the validation and quantification of
relationships between an independent variable and one or more dependent variables.

© Becker Professional Education Corporation. All rights reserved. CQ–69


Part 1 Class Question Explanations

2. MCQ-12723
Choice "d" is correct. This data analysis presumes that there is a relationship (or at least a
correlation) between the measured characteristics of a customer (such as age and income) and
whether or not they choose to purchase the company's product.
The analysis is performed so that the company may choose to take action prescribed
by the model to selectively advertise to those customers who are more receptive to the
company's products.
Choice "a" is incorrect. Decision tree analysis is not directed toward describing what has
occurred. The information present about who has purchased the company's products and
the associated data is not presented for its own value, but rather for what new information
can be discovered within it and, most important, what the company should do with that
new information.
Choice "b" is incorrect. Decision tree analysis is not used to diagnose the reasons why a
customer did or did not purchase the company's products; rather, it is atheoretical and cannot
validate or quantify any theoretical relationship between the characteristics present. Decision
tree analysis presumes that relationships exist and uses the correlational effects to proscribe
choices. If characteristics of a customer's choices have been incorrectly identified or labeled,
then the decision tree's recommendations will be flawed.
Choice "c" is incorrect. This is the second-best answer present, but it is still incorrect. The
information about who has purchased the company's products and the associated data is
not simply to be able to predict future earnings, but rather what company policies should
change based on the new information discovered regarding which customers purchase the
company's products.

3. MCQ-12594
Choice "a" is correct. The "Four Vs" of Big Data are velocity (the speed at which data is gathered
and processed); volume (the amount of storage required to retain the data gathered); variety
(the spread of data types across sensor values, numbers, text, pictures, etc.); and veracity
(the accuracy of the data and the presumption that data within the same data set may be of
different quality).
Baiem's proposed strategy relies on speed. Baiem must receive data, analyze it, produce a
recommendation, and act on that recommendation before every other stock trader. This
challenge is addressed by velocity.
Choice "b" is incorrect. Volume is the size of the data. Baiem is less concerned with the quantity
of data coming out of the stock market than the speed at which it can react to it.
Choice "c" is incorrect. Variety is different types of data being collected and analyzed. Variety
is not Baiem's main concern; the data being analyzed includes a stock name, a price, and a
time stamp.
Choice "d" is incorrect. Vexation is the state of being annoyed, frustrated, or worried. While
Baiem may feel vexed about its stock trades, vexation is not one of the "Four Vs" aspects of
Big Data.

CQ–70 © Becker Professional Education Corporation. All rights reserved.


Class Question Explanations Part 1

Unit 6, Module 5

1. MCQ-12728
Choice "b" is correct. Regression analysis uses statistics software to discover and quantify
the relationship between a dependent variable and one or more independent variables. The
resulting coefficients can be used to predict values of the dependent variable from any values
the independent variable may have in the future.
OutdoorPeople wants to know if it can predict how likely a customer is to buy its product based
on the customer's wealth. After performing a successful regression analysis, OutdoorPeople will
have a regression equation that will contain this information.
Choice "a" is incorrect. Cluster analysis is used to identify subgroups within a larger group based
on shared characteristics. OutdoorPeople has already identified subgroups but is asking a
question across all its subgroups. Cluster analysis will not answer that question.
Choice "c" is incorrect. Fourrier analysis is used to represent a repeating waveform as a series of
trigonometric functions so that repeating oscillating phenomena (such as sound, light, heat, etc.)
can be mathematically reproduced and compared. Fourrier analysis is unlikely to be of any help
to OutdoorPeople.
Choice "d" is incorrect. Classification analysis is used to place newly encountered data into
subgroups already established by cluster analysis. OutdoorPeople already has clusters, but they
are not being used in this study, and no new customers are being classified into existing clusters.

2. MCQ-12729
Choice "d " is correct. The cumulative impact of data points over time can be shown by a
waterfall chart. Each point contributes to the total of all data points, with each incremental
contribution shown at a given point in time.
A waterfall chart is the best answer because it will show both the cumulative and incremental
impact of each month's financial results for Jacks Capital. This will allow investors to see that all
months, except for two, were consistent.
Choice "a " is incorrect. Scatter plots are more for data sets that have a high volume and
they can have overlapping time periods. They also do not show the cumulative effect of all
data points.
Choice "b" is incorrect. Flowcharts are for processes. They show a path from beginning to end
with different options along the way. They do not show cumulative value.
Choice "c" is incorrect. Pyramids are for communicating foundational relationships. The data in
this example does not have this sort of relationship and does not report cumulative value.

© Becker Professional Education Corporation. All rights reserved. CQ–71


Part 1 Class Question Explanations

3. MCQ-12730
Choice "d" is correct. Regression analysis is complex and does not always produce a positive
result. Models that are not statistically significant often have one or more of the following
warning signs: a small (near zero) correlation coefficient, a small (near zero) coefficient of
determination, or a large (in proportion to the dependent variable) standard error.
In this example, all three of these warning signs are present. This regression equation is unlikely
to be a true representation of the relationship between these demographic variables and sales,
if such a relationship even exists. There is no support for the regression.
Choice "a" is incorrect. While the regression equation would indeed suggest that this is the way
to maximize sales given the signs on the slopes of the β terms for the two independent variables
(positive for wealth, negative for age), the regression model itself is of poor quality.
Choice "b" is incorrect. The regression equation would suggest the opposite strategy given
the signs on the slopes of the β terms for the two independent variables (positive for wealth,
negative for age).
Choice "c" is incorrect. The fact that this regression model is of such poor quality is not a
reflection on the product being studied. A poor regression model does not mean the product is
bad, merely that the regression model cannot provide reliable recommendations for how best to
market it under the studied conditions.

4. MCQ-12731
Choice "a" is correct. A bubble chart is a scatter plot (a mapping of data points onto a grid
according to two or more qualities of the data, one quality for each axis forming the grid (usually
two). The spatial distribution of the data points enables pattern recognition such as correlation
and the direction of any covariant relationship. Bubble charts are particularly useful because
they can display more than two types of data without resorting to a third or higher dimensional
graph through the use of symbols, color, and the size of the data points.
For this example, if the bank mapped its customer's income to one axis, then the bank could use
either of the other measures for the other axis, leaving the third quality to determine the size of
the bubble. Either way, the bank would have an image showing which loans left customers more
financially stretched relative to other customers. Coloring the dots differently to show defaults
versus paid loans would help the bank discover an association between loaning a customer a
higher proportion of the customer's income and the likelihood of default.
Choice "b" is incorrect. A pie chart is used to show what proportion of the whole comprises each
subgroup. A pie chart could be made to show the relative proportions of paid loans to defaulted
loans, and a separate pie chart could show the proportions among designated segments of
income, but this visualization technique would have no way to combine the two in a single image
to discover patterns.
Choice "c" is incorrect. A line chart is used to show a progression between observations and
the trend demonstrated. The bank could use a line chart to show the changing proportions of
default as income increased, but this visualization technique would have no way to represent
individual loans or the other two data types called for by management.
Choice "d" is incorrect. A flowchart is a diagram used to represent each step of a complex
process, such as the operation or building of a computer program. Each customer could use
a flowchart to decide how to allocate monthly income, including paying the loan, but the bank
neither would have access to all of this information nor any way to aggregate it using this
visualization technique.

CQ–72 © Becker Professional Education Corporation. All rights reserved.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy