P1 1f.edd8ce2 Notes
P1 1f.edd8ce2 Notes
PART 1 UNIT 6
6
1F. Technology and Analytics
Module
F.1. Information
Systems
Part 1
Unit 6
This module covers the following content from the IMA Learning Outcome Statements.
Management information systems (MIS) enable companies to use data as part of their strategic
planning process as well as the tactical execution of that strategy. Management information
systems often have subsystems called decision support systems (DSS) and executive information
systems (EIS).
A management information system provides users predefined reports that support effective
business decisions. MIS reports may provide feedback on daily operations, financial and
nonfinancial information to support decision making across functions, and both internal and
external information.
Examples of decision support systems include production planning, inventory control, bid
preparation, revenue optimization, traffic planning, and capital investment planning systems.
Examples of executive information systems include systems that produce sales forecasts,
profit plans, key performance indicators, macro-economic data, and financial reports.
Pass Key
An AIS differs from a DSS and an EIS due to the high degree of precision and detail required
for accounting purposes (i.e., transaction processing). Data in an AIS is often processed
and aggregated to become inputs to a DSS and an EIS to enable management to make
data-driven decisions.
Input Output
Source
Financial
document Trial
Journal Ledger statements
(invoice, balance
reports
time card)
Store file
File original
source
document
AIS has a terminal for the cash receipts clerk to access the cash receipt system and record
the remittance.
AIS closes the sales invoice, posts to the general ledger accounts, updates the customer's
payment record, and distributes management reports (e.g., as transaction listings,
discrepancy reports, and general ledger change reports).
LOS 1F1i
Enterprise performance management (EPM) systems, also known as business performance
management (BPM) or corporate performance management (CPM) systems, are software
solutions designed to help executives make strategic decisions. From a finance perspective, an
EPM enables leaders to plan, budget, and forecast business performances and to consolidate
financial results. EPMs are useful in breaking down high-level business strategies and translating
them into actionable plans or strategic objectives. Key performance indicators (KPI) are then
assigned to enable the enterprise to monitor progress toward achieving these objectives.
re
ltu
cu
Sy
l
na
ste
tio
ms
Organiza
People
Processes
Environment
1
4 4 4
2 2 2
Strategize
3 1
Plan
2
3
Monitor and analyze
3
3 Act and adjust
4
In preparation for a marathon, a runner begins training by running 3 miles 3 times a week
and 10 miles at the end of the week. However, she is concerned that she is not reaching the
full distance to meet her goal. By wearing a smart watch, she captures the unrecorded raw
data she is already generating—the distance of each run. The watch's software application
captures the raw data and converts it into usable information that she can actually
measure and track over time.
4.2 Database
A database is a shared, integrated computer structure that has three technical components and
two human components:
Metadata: Data about data including its characteristics (text, numeric, date, etc.), and
relationships with other data points.
Data Repository: The structure in which data points are actually stored, which is governed
by metadata.
DBMS: A collection of programs to manage the data structure by controlling access,
executing commands, and otherwise manipulating data stored in the repository.
Database Administrators: The managers of the database and DBMS, orchestrating the
design, access, restrictions, and overall use of the database.
End Users: Those accessing the data repository for business use. Depending on the design
of the overall database structure, users may be allowed to change data points within the
repository.
Database structure overview:
End users
Metadata
DBMS
Pass Key
Think about a database as a well-organized electronic filing cabinet. The DBMS is powerful
software that helps manage the content like a librarian (database administrator) for patrons
of the library (end users). The repository is the documents stored in the filing cabinet and
metadata is the rule/playbook of how the data should be managed and integrated.
A relational database is a form of database that structures data in a way that allows sections
of data to be stored separately from other sections, typically referred to as tables or views,
but remain relationally intact with each other. This relational aspect maintains the data's
integrity, permitting attributes to be changed in one table but maintain its relationship with
other tables. Each table has a primary key, which is a unique record that can be tied to another
table. Attributes are other fields within a table that can be unique or duplicates, but typically
describe characteristics of the primary key (e.g., a table with SKU, or stock keeping unit, as the
primary key may have a product category of frozen foods as an attribute). Tables can also have a
foreign key, which refers to a primary key in another table. Foreign keys can have duplicates, but
primary keys do not.
Earlier forms of databases mostly consisted of groups of files called flat files that were stored
as plain text. Each line in the text file held one record with delimiters (i.e., commas or tabs) to
separate attributes associated with the record. The relational database model was created to
mitigate data redundancy problems commonly seen in flat file databases by logically organizing
data into two-dimensional tables (rows and columns). Each row in the table is considered a
"record." The columns carry attributes of the record and/or links to records in other tables.
The following chart illustrates a basic relational database structure.
Orders
Shipments
Address ShipmentDate
DateofBirth AddressShipping
CreditLine
An academic office collects the class grades of all sophomores using a flat file. The flat file
has attributes such as a student's first name, last name, registered classes, and grades
for each class, as illustrated in the following table. Kevin Jones signed up for five classes—
Accounting 101, Economics 101, Ethics 101, Finance 101, and Marketing 101. The flat file
would replicate Kevin Jones' first and last name five times, once for each of Kevin Jones'
registered classes.
If a relational database is used, the data redundancy in the flat file scenario is eliminated. In
the relational data model, information is broken down into three tables. The top left table
contains the roster of the students in their sophomore year. The top right table contains
the list of classes available to the sophomores. Each table has its own primary key (i.e.,
student ID and class ID). The bottom table is a summary of grades based on the enrollment
record. This table contains both a primary key (Enrollment) and two foreign keys—Student
ID and Class ID. The stored attributes of students, such as first and last names, are not
repeated, reducing the redundancies, data inconsistencies, and data input errors.
(continued)
(continued)
A data warehouse stores and aggregates data in a format designed to provide subject-oriented
or business-unit-focused information for decision support. Data stored in the warehouse is often
uploaded from operational systems (e.g., ERP systems), keeping data analysis separate from
the transaction system that runs the business. When a data warehouse is further organized
by departments or functions, each department or function is often referred to as a data mart.
Alternatively, separate smaller sections of the data warehouse can be extracted to form data
marts that can be used for varying needs by different parts of the organization.
Data Source
Marketing Data Mart
End Users
Question 1 MCQ-12692
The following are all functions that an accounting information system (AIS) performs, except:
a. Collection and storage of transaction data.
b. Aggregate data for financial managers to plan and take actions.
c. Reporting for regulatory bodies and external entities.
d. Facilitation of senior executive's decision-making and information needs.
Question 2 MCQ-12693
Wilson and Co. is evaluating enterprise resource planning (ERP) systems. Which of the
following is not a benefit of an ERP system that Wilson should factor when making its
selection?
a. ERPs combine financial data with operational data to provide timely and actionable
information.
b. ERPs use multiple databases to reduce reliance on a single database architecture.
c. ERPs allow cross-functional information sharing.
d. ERPs improve the ability to track and measure sales, costs, delivery times,
customer service performance, and other corporate activities.
Question 3 MCQ-12695
NOTES
This module covers the following content from the IMA Learning Outcome Statements.
Data governance focuses on the effective management of data availability, integrity, usability,
and security through the synchronization of resources, such as people and technology, with the
policies and processes necessary to achieve data governance goals. Although no standard data
governance model applies to all organizations, multiple data governance frameworks exist to
help organizations create tailored models using standards as a guide. In general, a strong data
governance model will have practices and policies with the following components:
Availability: Information is of little benefit to an organization if it is not available to the right
employees at the right time. While security may be a high priority, information must not be
secured in a way that creates unnecessary hurdles for those who need it.
Architecture: Job roles and IT applications should be designed to enable the fulfillment of
governance objectives.
Metadata: Data describing other data, known as metadata or data dictionaries, must be
robust in terms of its breadth and specificity. Vague or incomplete metadata may result
in misuse.
Policy: Data governance policies help companies translate management and governance
objectives into practice.
Quality: Data integrity and quality are crucial and include ensuring that basic standards are
met so that there are no anomalies, such as missing values, duplicate values, transposed
values (phone numbers in the address field), or mismatched records (e.g., John Doe's
address is listed as John Smith's).
Regulatory Compliance and Privacy: Information collected, used, and stored by an
organization that is considered personally identifiable information (PII), personal health
information (PHI), or is otherwise subject to regulatory constraint should be subject to
policies designed to ensure that the use of the data does not violate company policies,
privacy laws (e.g., the California Consumer Privacy Act, CCPA; General Data Protection
Regulation, GDPR; or the Health Information Portability and Accountability Act, HIPAA).
Security: Data governance strategy should include the secure preservation, storage, and
transmission of data.
The Committee of Sponsoring Organizations (COSO) has developed guidance and frameworks
covering the areas of internal control, risk management, and fraud deterrence. Within its five-
point Internal Control—Integrated Framework (the framework) there are two categories with
principles that pertain specifically to internal control over information technology.
Material from Internal Control—Integrated Framework, © 2013 Committee of Sponsoring Organizations of the Treadway
Commission (COSO). Used with permission.
Spinal Surgery Clinic (SSC) P.A., a large group of physicians focusing on spinal surgery,
recently had an outside firm perform an IT audit as recommended by SSC's board of
directors. The findings resulted in recommendations that followed the COSO Internal
Control—Integrated Framework principles 11, 13, and 14. As such, SSC invested in new
technology that required user identities to be verified by multiple points of validation
other than just a password in order to access patient accounts (in line with principle 11).
Additionally, SSC adopted a state-of-the-art data cleansing system in an effort to acquire
and use error-free data to enhance patient outcomes, which aligned with principle 13.
Lastly, to address principle 14, SSC began performing regular reviews of key IT functions
and started issuing monthly reports of internal control to the board of directors.
The Information Systems Audit and Control Association (ISACA) is a not-for-profit organization
that formed to help companies and technology professionals manage, optimize, and protect
information technology (IT) assets. To accomplish this, ISACA created the Control Objectives
for Information and Related Technology (COBIT) framework, which provides a roadmap that
organizations can use to implement best practices for IT governance and management.
Material from COBIT®, © 2019 ISACA. All rights reserved. Used with permission.
y Turnaround—an IT system that drives innovation for the business but is not required
for critical business operations.
y Strategic—an IT system that is crucial for both innovation and business operations.
8. Sourcing Model for IT: Sourcing is the type of IT procurement model the company adopts,
ranging from outsourcing, to cloud-based (Web-based), built in-house, or a hybrid of any of
these sources.
9. IT Implementation Methods: The methods that can be used to implement new IT projects
include the Agile development method, the DevOps method, the traditional (waterfall)
method, or a hybrid of these.
10. Technology Adoption Strategy: IT adoption falls into three categories:
y First mover strategy—emerging technologies adopted as soon as possible to gain an edge.
y Follower strategy—emerging technologies are adopted after they are proven.
y Slow adopter—very late to adopt new technologies.
11. Enterprise Size: Two enterprise sizes are defined—large companies with a total full-time
employee count of more than 250 (default), and small and medium companies with 50 to
250 full-time employees.
1 22 3 4 5 6 7 8
4.3.1 Consolidation
Once similar data points are captured, they must be aggregated into a single collection or file for
further processing. Many organizations have different silos (e.g., marketing, finance, sales) that
collect information in a separate system. Consolidation is the process of combining these separate
data points to obtain an aggregate view, such as a single view of all data related to a customer.
4.3.2 Transformation
Transforming data involves taking data in its raw but clean form and converting it into
information that gives more insight and meaning. Transformation can be achieved through
appending more data, applying mathematical or statistical models, stripping the data to simplify
it, or through data visualization. Third-party data purchased or collected from free sources is a
common way to enhance or transform data.
A large data-marketing firm collects information from paid surveys of consumers and from
free public sources of information, such as phone books and certain real estate registries.
In total, the company collects six points of personally identifiable information (PII) and over
100 data points on purchasing behavior. After capture and consolidation, the company
strips out Social Security numbers and home addresses (PII that should not be widely
shared) and then provides the data to the marketing department to use for targeted e-mail
campaigns based on consumer profiles.
Variables missing from too many records should be excluded, as should records with
too many missing variables. Incomplete variable sets can affect the accuracy of a model,
potentially making it seem as if certain variables are predictive in nature when they are not.
Backward variable elimination starts by including all variables in an equation, separately
testing the impact of removing each variable and deleting the variables that do not
contribute to the fit of the regression equation. This process is repeated until the model's
minimum acceptable performance is reached.
Forward variable selection adds one variable at a time using the variable that has the next
highest incremental change in performance. Forward selection may be more efficient in
terms of time and required computing power as compared to backward variable elimination
because only new variables with positive incremental value are included in the model.
Backward variable elimination takes longer as the program must process large quantities of
data at the beginning of the process.
ACCURACY
Welch and Co. is a behavioral health provider that has an IT department that builds
computers to save money. IT stores and reuses old computer shells and hard drives to
repurpose them after employees are terminated. Welch has no policy on purging data. In
a recent hacking attempt, the company's data center was breached, and the dormant hard
drive data was stolen. The personal information of terminated employees was extracted
from hard drives and found on the dark web for sale. Those users' accounts were then
used to attempt to execute a social engineering campaign, soliciting current employees to
divert funds to the hackers' private accounts.
Data is archived for organizational access and for regulatory requirements. Companies should
have a record retention and management policy that details the length of time each type of
document should be retained for internal, legal, or regulatory needs. Records should be kept of
the data's current form, its expected deletion date, and requirements for an audit trail.
Cybersecurity policies are a part of data governance models that focus on the protection of
information technology assets from outside threats or attacks.
Recover Identify
Cybersecurity
Framework
Respond Protect
Detect
6.1.1 Identify
Stakeholders develop an understanding of the organization in order to manage all of its
personnel, systems, processes, software, hardware, and other data assets or devices that store,
transmit, and manipulate data.
Data and systems needing to be protected are specifically identified. Employee roles and
responsibilities relating to the systems handling that data are clearly identified. Vendors,
customers, and others with access to sensitive company data should also be identified and
carefully considered to understand where information is captured and exchanged. That
knowledge allows organizations to understand where vulnerabilities may exist in order to focus
efforts to mitigate risk.
6.1.2 Protect
Safeguards and access controls to networks, applications, and other devices should be deployed
as well as regular updates to security software, including encryption for sensitive information,
data backups, plans for disposing of files or unused devices, and training for all employees using
company computers with access to the network.
Cybersecurity policies should be developed, outlining the roles and responsibilities for all parties,
including employees, suppliers, distributors, business partners, and anyone else who may have
access to data that is sensitive. Policies should establish safeguards and steps to take to prevent
and respond to an attack. The goal for the aftermath of an attack is to minimize the damage.
6.1.3 Detect
Tools and resources are developed to detect active cybersecurity attacks, including monitoring
network access points, user devices, unauthorized personnel access, and high-risk employee
behavior or the use of high-risk devices.
Falcon CPAs and Associates is a large accounting and IT auditing firm that has several
clients for which it provides bookkeeping, tax work, and IT audit services. Falcon decided
to run a scan using its new NIST-based security software for a client. The application scans
various applications and devices, generating a report with findings.
The report came back with high-risk employee behavior and the use of high-risk devices
as potential red flags. The employee behavior included access to records on the weekends
and after business hours. The use of high-risk devices included excessive use of USB drives
that were being plugged into the network to transfer data. Both were related to a single
individual who was later determined to be stealing employee banking information from the
payroll department outside of normal working business hours.
Detection includes continuous monitoring, scanning for anomalies or predefined events, and
investigating atypical activity by computer programs or employees. Detection measures are
put in place to immediately flag suspicious activity. Detection is one of the strongest pillars of
protection against cybersecurity threats because the existence of strong detection measures
often acts as a deterrent. Detection measures are considered adequate if the response time of
the detection measures is less than the time it takes a skilled hacker to completely penetrate the
protective security measures.
A locked door on an empty house is a preventive measure; if the criminal knows no one is
home, the criminal is free to break the lock and steal from the house. If the same locked
door has a webcam pointed at it to identify the thief and simultaneously alert the police,
the presence of the camera may actually deter the thief from breaking into the house. The
camera is not a preventive measure; it does nothing to stop a break-in. But because the
thief can see the detective measure put in place, the camera may deter the crime.
6.1.4 Respond
The ability to contain a cybersecurity event depends on immediately reacting to the attack
using planned responses and continuously improving responses as risks and threats evolve. In
addition to taking action to mitigate losses, all potentially affected parties, such as employees,
suppliers, and customers, should be notified of an attack.
Business continuity plans should be in place to distinguish malicious attacks from other events
such as hazardous weather. Operating with only core operations or remotely may be the best
alternative to shutting down completely. Plans should be tested regularly and evolve as the
cybersecurity landscape evolves.
6.1.5 Recover
The last function of the NIST framework focuses on supporting the restoration of a company's
network to normal operations through repairing equipment, restoring backed up files or
environments, and positioning employees to rebound with the right response. Recovery
activities include communication and continuous improvement. Stakeholders affected by
cybersecurity events should be kept informed regarding recovery efforts. A review of lessons
learned should be incorporated into this stage of the plan.
Pen testing generally begins with a planning phase during which the organization and the role-
playing hacker (tester) establish goals and data to capture. This is followed by a phase when the
tester obtains an understanding of how an organization may respond to different attempts to
gain access. The tester may analyze code in a live state or in a test environment, then move to
the next phase, which involves gaining access by finding holes in the infrastructure assessed.
Once access is gained, the tester attempts to see how deeply the organization's systems can be
infiltrated, accessing data, user accounts, potentially financial accounts, and other sensitive data.
One outcome of breaches and identified weaknesses from pen testing is training for all
employees. Training is also often followed up with targeted coaching for individuals who failed
the pen testing or caused the mock breach.
6.2.3 Biometrics
The use of biometrics is another way to manage access controls and provide automated
recognition. Biometric technology uses an individual's personal physical attributes to securely
access protected information. Common applications include fingerprint scans, retinal scans,
and facial recognition software. Attributes are first loaded and stored in a database. Once the
biological input is captured it is then cross-checked against the reference database. If there is a
match, then access levels aligning with the matched record are granted.
Biometric applications are used in law enforcement and forensics, specifically in federal
government entities such as the Department of Defense, the Department of Justice, and the
Department of Homeland Security. Biometric applications provide identity assurance, help link
criminals to crimes, help track individuals across the globe, and can expedite identity verification.
The use of this technology continues to grow, as do regulations governing the technology. The
capture and use of biometric data raises privacy concerns among individuals and advocacy
groups across the globe.
6.2.5 Firewalls
Firewalls are software applications or hardware devices that protect a person or company's
network traffic by filtering it through security protocols with predefined rules. For companies,
these rules may be aligned with company policies and access guidelines. Firewalls are
intended to prevent unauthorized access into the organization and to prevent employees from
downloading malicious programs or accessing restricted sites.
Basic packet-filtering firewalls work by analyzing network traffic that is transmitted in packets
(data communicated) and determine whether that firewall software is configured to accept the
data. If not, the firewall blocks the packet. Firewalls can be set to only allow trusted sources (IP
addresses) to transmit across the network. Other types of firewalls include:
Circuit-Level Gateways: Control traffic by verifying the source of a packet, meet rules and
policies set by the security team.
Question 1 MCQ-12712
Question 2 MCQ-12713
Optimum Financial Planners publishes investment research for several different industries
and has a team of financial planners that advise hundreds of clients. It administers
quarterly surveys to determine investor expectations and trends that it then uses to give
to its planners so they can give investment advice. Optimum recently found an error in the
survey collection. In which phase of the data life cycle will this be addressed since the data
has already been released?
a. Data capture
b. Data synthesis
c. Data publication
d. Data archival
Question 3 MCQ-12714
Question 4 MCQ-12715
Which of the following is not a component within the COBIT® 2019 framework?
a. Publications
b. Design factors
c. Community input
d. Stakeholder validations
NOTES
F.3. Technology-Enabled
Finance Transformation
Part 1
Unit 6
This module covers the following content from the IMA Learning Outcome Statements.
The systems development life cycle (SDLC) is a framework that organizes tasks at each phase of
development and use of a business process.
The task of building automated business processes that include computer software, data
architecture, and computer hardware can be a tremendous undertaking. Major overhauls of
organizational systems as well as the creation of new systems for large enterprises can be very
complex, because there is often overlap and interaction among the web of company practices.
Each of these unique but intertwined systems may have budgets in the tens of millions of dollars
and encompass the work of thousands of people over the course of several years, adding
complexity to the design, maintenance, and improvement of all corporate systems.
There are two strategies for managing the SDLC in general use today. The first strategy is called
the traditional method or the waterfall model. The second method, called agile development,
evolved from the waterfall model.
© Becker Professional Education Corporation. All rights reserved. Module 3 6–37 F.3. Technology-En
3 F.3. Technology-Enabled Finance Transformation PART 1 UNIT 6
The waterfall model is characterized by different teams of employees performing separate tasks
in sequence, with each team beginning work from the pre-written authoritative agreement
of the preceding team and then ending work when the business requirements for the team
have been met. The project then passes to the next team. The following are some challenges
associated with the waterfall model:
Requires a great deal of time to complete.
Benefits of the new system are not realized until complete.
There is no customer input; change is difficult to manage.
Some employees may be idle before beginning or after completing their SDLC step.
7 Maintain
1
Plan
2 Ana
ly
3 ze
Design
Waterfall Model
ploy
De
4D
6
ev
5 elo
p
Test
The number and names of phases in the waterfall model differ between companies. However,
they all contain the same general process:
1. Plan
1F_System Development Life Cycle
Management evaluates the business needs of the system and determines whether it should
accept the project. Managers assess resources needed (i.e., personnel, finances, timeline)
to develop the system and compare it to the projected gains from the system. Management
then decides whether to begin the project.
2. Analyze
Management defines key business problems and company goals in the analyze phase,
and then identifies the steps and systems needed to achieve those goals. More specific
business requirements and sequential procedures are outlined in this step. Business
analysts determine problems that the system may face, gather requirements to solve those
problems, and develop business rules to arrive at a solution. The business requirements
may be formally documented in a business requirements document (BRD).
Pass Key
In some models, planning and analysis may be combined and called the requirements
phase. Less frequently, "development" is used for "plan and analyze" and "production" is
used for "develop." Regardless of the words used, planning what to build comes before
building it.
A company is considering developing an app to sell tickets to the upcoming Olympic games.
During the planning phase, management assesses the company's potential profit from the
app. If deemed profitable, management will submit the bid to the host country. Submission
of the bid ends the planning stage.
During the analysis phase, the business requirements document (BRD) is developed
and becomes the foundation for the development of the project. The BRD contains the
following specifications:
"Customers must be able to access the marketplace from their computer or mobile device
to see a real-time view of available offers and prices."
"Customers must be able to pay for tickets using local currency and reserve their selections
while payment is verified."
"The host country wants customers to be able to alter the price of unsold tickets daily
between the go-live date and two weeks prior to the ticketed event. Customers must also
be able to alter the price of unsold tickets hourly within two weeks of the event."
Beginning with this stage and for each subsequent stage, feasibility studies are conducted
to determine whether the project is adhering to the original plan. Feasibility studies may be
conducted as a single, focused study, or they may incorporate multiple elements, including:
Economic Feasibility: Are the benefits still greater than the costs?
Technical Feasibility: Are all required technology and expertise available?
Operational Feasibility: Will all internal and external customers accept the system?
Scheduling Feasibility: Will all project resources be available when needed?
Legal Feasibility: Can all tasks be performed without violating laws?
After the planning and analysis phases are complete and the host country finalizes the
requirements through execution of the contract, the company may reassess resource
needs to more accurately define costs for the solution. These first feasibility studies
become the baseline to compare against later progress.
3. Design
Creation of the technical implementation plan occurs as business requirements are
translated into technical design documents. Individual technologies are evaluated
and selected, including logical data organization, physical data storage architecture,
programming languages, integration with third-party services, and/or deployed hardware.
The design phase can be subdivided into three parts:
yy Conceptual Design: Broad translation of business requirements into technical
requirements
© Becker Professional Education Corporation. All rights reserved. Module 3 6–39 F.3. Technology-En
3 F.3. Technology-Enabled Finance Transformation PART 1 UNIT 6
During the design phase, each business requirement is further developed and expanded.
For example, the requirement that "Customers must be able to pay using local currency
and reserve their selections while payment is verified" is expanded to specify credit
cards accepted for payment, fees charged, and the timeline to complete the conceptual
design. Specification of data file formats for transmission to credit card vendors and data
warehousing systems are developed in the logical design phase. Physical design would
include any specialized hardware to comply with payment card industry standards, server
hardware, cloud-based hardware, and workstation software and hardware for developers
and programmers.
4. Develop
The technical implementation plan is executed in the develop step. Buildings and rooms
are prepared, hardware is purchased and delivered, and programmers create proprietary
software to run the company's new product. The new system is completely built at this
stage and most of the project budget is spent, having committed dollars to employ experts
and purchase assets. Changes to the plan become more expensive in this stage because
each step builds on the prior steps. For example, changes in the develop stage may not be
supported by the original architecture in the design stage or achieve feasibility as outlined in
the plan and analysis phases.
5. Test
The system is checked for adherence to the business requirements in this step. The
new product must function as planned in the analysis and design stages. In addition to
backward-looking testing, which tests against the initial requirements, forward-looking
testing is conducted to see how well employees and customers can perform tasks (called
user-acceptance testing).
6. Deploy
The new system is delivered to end users. There are several methods available for
deployment that depend on available time, cost, and the cost of failure to the business:
yy Plunge or Big Bang: The entire new system is immediately delivered to all customers
and clients (lowest cost, highest risk).
yy Ramped (Rolling, Phased) Conversion: Portions of the new system replace corresponding
parts of the old system, one piece at a time (above-average cost, below-average risk).
yy A/B Testing (Pilot, Canary): A subset of users gets the new system while the old
system is still in use and assigned to current and new customers. After successful
deployment to the subset of users, the new system is deployed to everyone (average
cost, average risk).
yy Blue/Green (or Other Pair of Colors), or Shadow: The new system is fully deployed in
parallel with the old system; a routing layer directs progressively more duplicated traffic
to the new system. Once the new system is handling all the traffic, the old system is
deactivated (highest cost, lowest risk).
Pass Key
Both the development and deployment phases may be called "Implementation." If either
phase is named implementation, the key to which phase is being discussed is to figure out
when the testing phase will occur or if the testing phase has occurred. An implementation
phase, which is earlier than the testing phase, must mean the development phase is being
discussed. An implementation phase, which occurs after testing, refers to deployment.
7. Maintain
Ongoing adjustments and improvements occur in the maintain stage, which begins as soon
as deployment is complete. Adaptations are made to the product to keep it operating at
an optimal level. Over time, the system becomes less well-suited to current conditions and
needs to be evaluated for either modification or replacement. When it is time to replace the
system, the SDLC repeats.
Dev
elo
pm
e
ng cept
Con design
Impleme
nta
tio
nt
nd
ni
g a n
lin
an
Te
du st
he i
Pl
Sc
ng
on
Do
ati
cu
ritiz
me
Prio
ntati
B acklo g
nts
ireme
on
qu Agile Software
E s tim a tio n
Re Development Cycle
a tio n
n s tr
mo
Rec
De
B u in g
fi x
g
w
ord
c k al
vie
ba v
Ad e d pro
an
e
re
ju s F ap
tme or
nc
d
n ts
er
or om Re
i
po t le a
rat C us se
e ch
anges
© Becker Professional Education Corporation. All rights reserved. Module 3 6–41 F.3. Technology-En
3 F.3. Technology-Enabled Finance Transformation PART 1 UNIT 6
Artificial intelligence (AI), machine learning, and deep learning are three similar terms, often
used interchangeably, for computer programs and algorithms built to simulate characteristics
of human logic and intelligence. AI applications focus on processing large quantities of data,
learning from trends in that data, and using that insight to automate decision-making processes.
Common AI applications relevant to accounting include inference engines, business process
automation, robotic process automation, natural language processing (NLP, also referred to as
speech recognition) software, and neural networks, among various others.
A computer cannot "recognize" a stop sign. It would be difficult to program every detail
describing a stop sign and how the sign is different from its environment, along with all
environments a stop sign may be in. However, if a computer program is fed numerous
photographs with corresponding data on whether there is a stop sign, and if each picture
is stored in an accessible location, then it is easier to program a computer application
to recognize a stop sign in a new environment. Each new photograph is compared to
the catalog of existing photographs to determine if the new photograph is more like the
pictures with a stop sign or more like the pictures without a stop sign. In this way, the
inference engine "learns" what a stop sign looks like and can recognize stop signs with
greater accuracy as more pictures and data are reviewed by the engine.
Credit card companies and banks invest heavily in inference engines and computing resources
to run processes to detect fraud. Relying on AI rather than human analysts works better because
once fraudsters are aware of the clues that fraud analysts look for to detect fraud, the fraudsters
subtly alter their behavior. AI reacts more quickly to subtle changes, incorporating the new
behavior into its inference engine. This lets companies improve the accuracy and efficiency
of processing any data where difficult judgment calls are part of the process and adds a data-
driven insight to solve complex pattern-recognition problems.
© Becker Professional Education Corporation. All rights reserved. Module 3 6–43 F.3. Technology-En
3 F.3. Technology-Enabled Finance Transformation PART 1 UNIT 6
Credit card companies spend millions of dollars on fraudulent charges. It would be difficult
to program every known characteristic that could possibly be used to identify fraudulent
charges, such as changes in activity levels, repeated transactions just below thresholds, or
in-person transactions that occur at two distant locations close to the same time. However,
using many historic transactions in conjunction with data identifying whether each
transaction was fraudulent trains the computer program to compare new transactions to
the catalog of existing transactions that have been identified as valid or fraudulent. The
program examines common factors among all transactions and determines if the new
transaction is more like the group of fraudulent transactions or more like the valid ones.
In this example, the fourth transaction would be flagged by an inference engine because
of the difference in the name of the activity and the dollar amount of the transaction,
compared to the other three transactions on the statement.
A company is interested in formulating an RPA to collect the data from various inventory
invoices, payment orders, and reconciling documents used in its logistics system (part
of the enterprise resource planning system [ERP]), because it would be more efficient to
have software collect and format this existing information than to train every employee on
using existing or new systems. Because there are a limited number of forms, the company
constructs a library of forms for reference by the RPA. Once the RPA has been trained to
recognize each form, the company directs all forms to the RPA, which scans each form,
applies the parameters set in the program, and sends the appropriate data into the ERP
application database.
An RPA tool may be programmed to carry out any repetitive task involving a set of options.
This is problematic for employees who carry out repetitive tasks involving computers, such as
transferring data from one system to another performing recalculations on the transferred data,
because it potentially places their job in jeopardy. While this is a common fear of many in the
workforce, RPAs can actually be very beneficial to some employees because it makes data more
accessible to analyze trends, determine market penetration, or evaluate customer responses.
The work for employees then shifts from lower-skilled, repetitive functions to more refined skills
focused on analysis and strategy.
© Becker Professional Education Corporation. All rights reserved. Module 3 6–45 F.3. Technology-En
3 F.3. Technology-Enabled Finance Transformation PART 1 UNIT 6
Cloud computing is renting storage space, processing power, proprietary software, or all three,
on remote servers from another company rather than buying or building those components.
When a company acquires its own infrastructure as opposed to renting it, the company must
purchase enough to cover its peak usage so the business can accommodate high-volume
periods. During low-volume periods, this costly infrastructure is idle. For the customers of cloud
computing, the service offers infrastructure elasticity; renting only as much as needed on a
minute-to-minute basis. Processing and storage are rented in increments of computing power
used per units of time, so that customers pay smaller amounts during low-volume periods
and larger amounts during high-volume periods. Customers benefit because the cloud service
provider performs all maintenance and tech support on this hardware.
Cloud computing services are offered by some companies with large computing infrastructures
to either lease excess capacity during off-peak times or use purpose-built infrastructure to
support their customers. Cloud computing takes advantage of these companies' superior skills
and experience managing such infrastructure.
Additional efficiencies exist when a company's data is in one virtual location even if company
operations are in many locations. Data processing can be performed more efficiently from
that single location, and IT hardware support may be reduced throughout the company.
Because the companies providing cloud services provide distributed redundancy among
many data centers, having cloud data storage reduces the likelihood data is lost in an attack
or disaster.
When software is developed internally, companies incur continued development costs in the form
of IT employees. When software is purchased from an outside source, costs include the upfront cost
of the software as well as the costs to maintain the software, update it, and troubleshoot problems,
adding to the cost of owning the software. SaaS formalizes the ongoing costs of software maintenance
to users by changing the price of the software from a one-time outlay into an ongoing subscription.
Blockchain is a control system originally designed to govern the creation and distribution of
Bitcoin. Bitcoin is a currency that exists only in electronic form, called a cryptocurrency. Bitcoin
must be "mined" in order to confirm transactions. Mining cryptocurrencies involves a person
or group of people performing cryptography, which is the solving of complex mathematical
equations. Through cryptography, blocks of a certain number of transactions are confirmed at
a time. The reward for solving (validating) the equation is both the receipt of Bitcoin and the
validation of a new block of transactions.
Because electronic data can be easily copied and altered, the accounting system governing it must
prevent the copying or alteration of the cryptocurrency; otherwise, the currency may become
instantly worthless through counterfeiting. Blockchain technology was developed to prevent
Bitcoin from being replicated and to limit its initial creation so that there is only a finite number of
Bitcoins. The value of blockchain is its resistance to alteration, multiparty transaction validation,
and decentralized nature. Alteration is difficult because each block adds to all prior blocks,
enabling everyone to view all blocks in the chain to the beginning of the entire chain. This serves as
a form of audit trail. The decentralization of Bitcoin makes it detached from government control.
© Becker Professional Education Corporation. All rights reserved. Module 3 6–47 F.3. Technology-En
3 F.3. Technology-Enabled Finance Transformation PART 1 UNIT 6
An American of Scottish descent wants to buy haggis, a food delicacy from his home
country. Haggis is illegal to import into the United States, so the American attempts to buy
haggis on the dark web and pay with Bitcoin to hide this illegal activity. The American enters
his private key to authorize the transaction to transfer an amount of Bitcoin to the public
key of the seller. The transaction takes about five minutes while the blockchain for those
Bitcoin is checked against the other copies in the distributed ledger. The hash codes in the
blockchain residing on the American's computer match those stored on other computers
in the peer network, so the network agrees that the American's Bitcoin is authentic, and
allows the transfer, while writing a new block to record the transaction.
A person wants to buy a rare collectible from someone in another country. Instead of
using national currencies and paying exchange rate fees, the buyer agrees on a price in
Bitcoin. The buyer plugs in a flash drive containing the stored record of her Bitcoin and
authorizes the transfer. The transaction takes about five minutes while the blockchain
for the transferred Bitcoin is checked against the other copies in the distributed ledger.
This validation process returns a discrepancy. Every other version of the record for that
blockchain includes a transaction where this Bitcoin was already spent by the buyer,
except for the version which resides on the buyer's flash drive. The blockchain control
system decides that the many records are correct, and the single discrepancy is wrong and
corrects the buyer's data by making it align with the distributed ledger. It is presumed that
the buyer altered or sheltered this record and no longer owns the Bitcoin in question, and
the blockchain control system has reimposed that reality. The transferred Bitcoins are not
available for the buyer to use to pay the seller.
If people use companies to facilitate Bitcoin exchanges, those companies have databases
that match customers with their keys. If the exchange company suffers a data breach,
the hackers have all the keys needed to write new blocks (authorize exchanges), and the
distributed ledger will view these transactions as valid. Once this happens, blockchain's
resistance to tampering now makes it harder to restore the Bitcoin to the original owners.
© Becker Professional Education Corporation. All rights reserved. Module 3 6–49 F.3. Technology-En
3 F.3. Technology-Enabled Finance Transformation PART 1 UNIT 6
If these concerns can be managed, there are beneficial applications to the wide adoption of
cryptocurrencies in general and specifically for blockchain technology.
Algorithms like those used in blockchain applications can be used to make transactions
more transparent and secure, even between parties without compatible banking or even
legal systems. That can take the politics out of economics by allowing buyers and sellers
anywhere in the world to do business.
This potential can help developing nations, which do not have the same financial
infrastructure as developed nations, join the marketplace. Unfortunately, blockchain
technology also aids criminals and terrorists to circumvent international laws and sanctions.
Blockchain can also be used to create smart contracts. Smart contracts are those where
the terms can be agreed on, executed, and verified automatically. Part of the function of
notaries, lawyers, and the courts in contract law is to record that a service is complete,
when payment is due, and when the payment has been received. If both the service and
the payment can be observed by the blockchain peer network, then the payment can be
directed by an automated process without the need to pay for the intermediary to officiate.
Note that the use of smart contracts does not replace the function of an attorney or other
legal practitioner, but rather it augments, and in some cases expedites, the processing of
legal documents.
A person wishes to buy an item and have it delivered to her home. A smart contract can
be set up such that the buyer authorizes a payment (via Paypal, Bitcoin, or electronic bank
transfer) to anyone who delivers the item for the requested price. A fast delivery person
arrives at the destination where a camera connected to the Internet has object-recognition
software and can verify that the item has been delivered. The delivery fulfillment is recorded
in the smart contract's blockchain peer network. Then funds are automatically transferred to
the delivery person's account because the terms of the contract have been fulfilled.
Question 1 MCQ-12696
Which of the following statements concerning the systems development life cycle (SDLC)
are correct?
I. The SDLC describes the time that the system is being developed and contains a list of
steps to be executed once.
II. Under the waterfall method, phases do not overlap, and each team is dedicated to
one phase.
III. Under the agile method, all phases may occur within a sprint, and teams are dedicated
to one project.
IV. Waterfall and agile methods should be executed simultaneously.
a. I and IV only.
b. I and II only.
c. I, II, III, and IV.
d. II and III only.
Question 2 MCQ-12697
A buyer offers to purchase a company using Bitcoin. After the transaction is completed, the
buyer reloads onto her computer a copy of the Bitcoin data file that was made before the
company was purchased, so the associated blockchain shows she is the current owner of
the Bitcoin used to purchase the company. She then tries to buy another company with the
same Bitcoin, but the transaction fails. Which parts of the blockchain/Bitcoin infrastructure
have prevented this attempted fraud?
I. Smart contracts
II. Distributed ledger
III. Hash codes
IV. Two-factor authentication
a. I and IV only.
b. I and II only.
c. I, II, III, and IV.
d. II and III only.
© Becker Professional Education Corporation. All rights reserved. Module 3 6–51 F.3. Technology-En
3 F.3. Technology-Enabled Finance Transformation PART 1 UNIT 6
Question 3 MCQ-12698
A company writes a new artificial intelligence program to detect fraudulent credit card
transactions. The team of analysts builds a training data set with various types of known
fraudulent activity as well as an equal quantity of legitimate transactions so that the AI
program can learn to distinguish between the two. After training the AI with this data set,
the company tests the AI program by deploying it to detect fraud in credit card transactions
within the retail sector. The company sees astonishingly poor results from the program.
Which of the following describes the likely cause of this failure?
I. The training data did not resemble the situation the AI program would encounter after
deployment. It should have used the same types of transactions.
II. The ratio of fraud to non-fraud should be the same in the training data as is expected
after deployment.
III. After training the AI, a separate data set should be used for testing before deployment
in order to determine the effectiveness of the application.
a. I, II, and III.
b. I and II only.
c. II and III only.
d. I and III only.
This module covers the following content from the IMA Learning Outcome Statements.
Transactional databases are fed into data warehouses, which are optimized for searching
rather than maintaining transactions. Business analysts use the data warehouse to aggregate
transactions together to determine patterns, regional trends, or other insights among individual
products, lines of business, or customers.
This business analyst toolbox starts with basic database tools such as structured query language
(SQL) applications to extract the desired data from the data warehouse. Then Microsoft Excel®
or other similar analytic applications are used to transform the data. Finally, the transformed
data is fed into a presentation and/or visualization software, such as Power BI® or Tableau®, so
that business insights can be quickly and effectively communicated to management for more
informed decision making.
Before the era of Big Data, if a government or company was interested in what was
happening in the world, the best technique was to use a survey asking people what they
were doing and extrapolate the results to the whole population. Although there are best
practices at every step of the survey and interpretation process, practitioners know that the
science is imperfect. No sample group is perfectly representative of the whole population.
Additionally, people may provide bad data on surveys: They misremember what they did,
or report what they would like to have done, or intentionally report something false.
Big Data has changed the entire methodology of discovering the behavior of large
populations. It is no longer necessary to deal with the uncertainty of asking a few people
how often they study for classes. Textbooks and homework are online and every work
session records the start time, the end time, the content accessed, and the student's
identity, which is linked to the student's demographic data. Athletes now use cardio
equipment that records vitals such as heartrate or transmits data on weights lifted
using RFID technology, all while the equipment is being used. Those same RFID tags are
inside credit cards so their location can be tracked to see if they were present at store
transactions, or to locate the cards after they are stolen.
In short, we no longer ask people what they did. We ask the items what the people did with
them, because the items provide more accurate and reliable data.
1.2.1 Volume
The volume of data is measured in bytes, which contain enough binary space to store one letter
of text. Prior to the era of Big Data, the quantity of data being analyzed was typically measured
in the millions of bytes (megabytes). Significant changes in the collective ability to create,
transmit, store, and compute data lead to the regular usage of petabytes (1,000,000,000,000,000
bytes, or 1015 bytes) and the occasional discussion of zettabytes (1021 bytes). Internet traffic each
year is measured in zettabytes, and work is underway to get agreement on what to call further
orders of magnitude when needed. This is how drastically the volume of data has changed.
The following list includes most of the terms associated with the amount of data that may
be stored within computer systems.
Byte: one encoded character (a single letter, number, or symbol)
1.2.2 Velocity
Velocity means two things in the context of Big Data: the speed of data transmissions and how
quickly the data can be processed. With file sizes becoming larger (due to volume) over time, the
speed of sending data across networks must also increase. Also, as new information becomes
available, data must be processed quickly, or the answers may be irrelevant. These factors cause
velocity to increase.
Consider the navigation app on a smartphone. In order for it to provide a time estimate
for the driver's best route, every smartphone in the area continuously sends data about
its location. Smartphones whose locations change at high speeds are considered to be
in cars. This data is selected, organized, and processed to see where the traffic is moving
quickly or slowly on several routes between the driver and the destination so that the app
can suggest the best two or three routes. Processing must occur fast enough so that the
driver is informed before traffic conditions change and the answer just provided becomes
outdated. As the app collects and sends new data based on new conditions, it sends
updated calculations of the amount until the destination is reached and suggests faster
routes as they become identified.
1.2.3 Variety
Information can now be stored in a variety of formats. Prior to the era of Big Data, the most
common formats were organized text files (documents) and number-based files (spreadsheets).
Advances in technology have increased the capacity to process large files quickly, allowing a
variety of other data formats to be analyzed more easily. For instance, images were historically
very difficult to manage because applications must process an array of pixels, each with a
data value for a unique color. Now images can be algorithmically analyzed because enough
computing power exists to navigate that array rather than just store it. Similar processes have
evolved to analyze videos, which are essentially two-dimensional arrays of images with a time
index. Movies once had to be viewed in order to be rated and catalogued for genre, but now that
task can be performed by advanced computer software. Because of these advances, analysts
working with data must now expect their input to be in a variety of formats, all of which need to
be mined for information.
1.2.4 Veracity
Veracity refers to the accuracy and reliability of data. As companies accumulate an ever-
increasing volume of data, the information captured is only good if it is high quality. Ensuring
high quality requires cleansing and maintenance, which means gathering data can be slow
and expensive. In addition to companies gathering their own data, it has become a common
business practice to purchase large streams of data from multiple third-party sources. This must
also go through a rigorous evaluation process because third-party data is often gathered for
different purposes and possibly with different quality standards. Therefore, the veracity—the
accuracy and reliability of the data—cannot be presumed. Data analysis must begin with steps to
detect and exclude unreliable data without introducing bias into the results.
These benefits do not come without challenges. As more information is collected, companies
know more about consumers. This knowledge has caused global concern among privacy
advocates and that has affected the way companies collect, retain, and use data. It means
channels must be created within an organization to protect the information from theft and
misuse such that only a limited number of employees have access. Information must also be
restructured such that the origin of sensitive materials is masked and cannot lead back to
specific individuals. Even protected like this, data should only be used as laws or consumer
consent permits. These safeguards have continued to increase as scrutiny of large firms with this
information grows.
Changes that come with Big Data create challenges for society and security. If every
company with an electronic device collecting information sells that data to anyone willing
to pay, the concept of privacy fundamentally changes. The capability of technology has
outpaced the public's understanding of technology as well as outpaced legislation to
control it. The large tech companies that hold the overwhelming portion of this data have
only recently begun to be challenged by lawmakers over their stewardship of the power
that this mass amount of data affords. Many of these companies operate with the belief
that data collected about a person is not the property of that person, but rather the
property of the company that collected it. One of the only exceptions to this is when a
company suffers a high-profile or high-liability data breach (due to hacking or negligence)
and the company is legally required to notify and compensate the people whose data was
exposed. The company's duty to maintain control of its data means that data security is
more important than ever.
Increased computing power means patterns and trends can be identified in sounds, images,
videos, and postings on social media platforms. Big Data has resulted in a shift away from
traditional structured and unstructured data classifications into a new classification: semi-
structured data.
Many transaction records would be necessary before having knowledge of the day's
sales, and likely many days would be necessary before one had knowledge of this store's
short-term financial health. After the store's first day of sales, without any comparative
information, it would be difficult to know if the sales were high, low, or in-between that
day. Information with context and synthesis of information over time is knowledge. Once
time passes and context grows, there is more insight into the relative strength or weakness
of sales. After the quantity of organized, contextualized information accumulates into a
comfortable amount of insight, then one can use knowledge to take action to improve sales
with timely interventions.
There are also challenges in maintaining and utilizing data science capabilities. Data science
is expensive because a large amount of computing infrastructure and skilled employees are
required. Managers who are unfamiliar with the power of data science may be hesitant in
committing resources to improve data science capabilities because they may feel it would come
at the expense of operations. After investing in data science capabilities, success depends on the
quality of work and the judgement of those interpreting the data. If the data used for an analysis
is bad, or incorrect analysis techniques are used, or if the results are misinterpreted, then
decisions based on the data can be even worse than decisions made without data.
LOS 1F4i
Data mining is the process of investigating large data sets to discover previously unknown
patterns. Data mining combines the aggregation and analysis of data by trainable artificial
intelligence (AI) and machine-learning-enhanced decision support systems with various
statistical techniques. Data mining is both art and science. The decision support system and the
AI must be trained with data that is similar to the real data the AI is expected to investigate.
Training a model helps it to understand how to interpret certain types of information and yield
meaningful results. The parameters of the model have a large impact on what patterns are
judged to exist, and many data-mining techniques require an initial seed (or guess) value to
begin the analysis. For example, cluster analysis software often requires the analyst to define
the number of clusters before the data-mining process begins. With today's computing power,
analysts can run the procedure with several different starting seeds and accept the result that
the analyst judges to be most interesting and useful, combining data-mining science and art.
Exploratory data analysis is sometimes also called data mining, or atheoretical research. It is
used to investigate existing data sets for useful patterns. Unlike traditional research, the data
is not used to show support for a preexisting theory or desired analytical outcome. By using
this lower threshold, more agile insights can be made. A theory-based approach may be more
generalizable to a broader population, but atheoretical analysis is applicable when the target
population resembles the sample. As long as results are applied to populations similar to the
sample, the analysis should have high validity.
Cluster analysis or decision trees are examples of exploratory analysis. Looking for a pattern
between the number of days elapsed between presenting a bill to the customer and the
customer paying the bill, and the long-term probability of that customer defaulting on a debt,
would be another example of exploratory analysis. The cluster analysis does not inform
as to why those groups form. Even though the decision tree does not inform as to why the
population can be segmented that way and why the late-pay study does not inform as to why
a customer defaults, the company can still use this information to increase the efficiency of its
business processes.
2. Timeliness
Larger quantities of higher-quality data will produce more valid and more reliable results,
but data gathering and data cleaning are expensive and time-consuming activities. A
company with a small or inexperienced data science team can spend too long gathering and
cleaning a data set for optimal decision making. Data science departments must have the
scale and skill to process data quickly.
3. Expensive Employees
Highly skilled professionals with advanced programming and technical knowledge are
required to work with file sizes and software packages today. Data mining requires equal
expertise in statistics and data science. Although the software process can be run without
statistics expertise, only a person (or team) with expertise in both areas can be certain that
the results are valid or reliable.
4. Corporate Culture
Executives are often educated, trained, and acclimatized to trust their own judgement and
that of their peers. In some corporate cultures, it can be a challenge to devote enough
resources for a department to competently perform data mining. Even if data mining has
been done, it can be a challenge for executives in some corporate cultures to consider the
insights gained with the appropriate weight—not blind trust, but not simply dismissed when
a recommendation is at odds with the executive's instinct.
When a customer buys a product, the purchase event would need to record which
customer purchased which product. The customer table has a unique identifier (called
a primary key) and the product table has another (the primary key for that table). The
purchases table would need a primary key to uniquely identify each purchase, and there
would be some data such as the date the purchase occurred. In addition, the purchase
record would also contain the unique identifier for the customer making the purchase (a
foreign key that is the primary key from the customer table for that customer) and also the
unique identifier for the product purchased (a foreign key that is the primary key from the
product table).
Simple subsets, as shown in the example above, are often not sufficient in answering a business
question, so more complicated SQL queries are often required.
One of the simpler forms of analytic modeling in data mining is a decision tree. Based on a series
of decision points called nodes, decision trees are decision support tools that provide outcomes
based on probability or fact. These models begin with a single node, or question, with at least two
possibilities. Each possibility is assigned an outcome (e.g., yes or no; A, B, C) or probability that
breaks off further into additional outcomes with their own nodes and outcomes until the total of
all possible outcomes has been exhausted. The end result visually resembles a tree.
This process produces a model—a set of rules for making a reliable choice. As with any model,
decision trees are subject to the parameters created by the analyst who designed it. This
means the algorithm can be adjusted based on the unique characteristics of a given data set or
circumstances to improve the performance of the model.
In this data set, dots represent individuals who have received advertisements from a
company. The vertical axis measures income, so wealthier individuals are higher in the
plot. The horizontal axis measures age, so older individuals are to the right in the plot.
The orange dots represent individuals who purchased the product, while the blue dots
represent individuals who did not make a purchase. The company wants to know which
customers are more likely to buy their products so that the company can better target its
advertising and sales efforts.
= Made purchase
Age
Visually, what the decision tree algorithm attempts to do is to cut this space into pieces
such that each piece has an overwhelming ratio of the same colored dots in it. Any piece
that has numerous orange dots is a defined group of customers who buy the company's
products. Logically, this means finding subsections of wealth and age where individuals
overwhelmingly choose the same way for whether to buy the product.
Illustration 9 Data Mining Using a Decision Tree: Making the First Cut
= Made purchase
Age
This line divides the plot so that on the right side of the line there are mostly orange dots
and on the left side there are mostly blue dots. This indicates that there is an age divide
(represented by the line) among those who buy the company's products and those who do
not. This is not a perfect division; there are still orange and blue dots on both sides of the
line.
The data collected may not appear to have a pattern and the data mining process may not be
easy but using advanced algorithms and a large amount of computing power helps draw insights
on an otherwise uneventful data set. This is why it is important to calculate all possible ratios to
complete each mining step.
= Made purchase
Age
This second line generated by the algorithm separates a box in the upper right section of
the chart, which is dominated by orange, compared to the box in the lower right section,
which is mostly blue but still mixed.
= Made purchase
Age
This third line creates a fourth box in the lower left, which is overwhelmingly blue compared
to the box in the upper left, which is mostly orange but still mixed. Each of these lines was
placed by the algorithm, which calculated the ratios of blue and orange dots resulting in any
possible line that could be drawn. The results yielded lines with the optimal division and the
most favorable ratios, distinctly separating the colored dots most effectively.
The following depicts a standard decision tree based on the previous illustration, showing
how it is designed and should be interpreted.
No Yes
The result of this tree is that the company will focus its advertisement and sales efforts
on customers whose income is greater than $40,000 if the customer is older than 39, and
customers whose income is greater than $60,000 if the customer is younger than 39. The
company will spend less effort on all other customers because this model concludes that
doing so is an inefficient effort.
Only two factors were considered in the illustration above: age and income. Data could
be gathered on a third factor, such as education. This would result in a three-dimensional
scatter plot, where an algorithm would use a series of planes to divide the space to
separate colored dots:
1.5
0.5
Education
0
-0.5
-1
2
Income 0
-2
0.5 1 1.5 2 2.5 3 3.5
Age
A multidimensional model uses the same logic as a two-dimensional model, although it would
be difficult to draw an example of a fourth-, fifth-, or twelfth-dimensional scatter plot. Similarly,
the algorithm does not have to use straight lines or planes to divide spaces. Complex curved
multivariate shapes can be used, provided the algorithm is sophisticated enough to employ
them and enough computing power is available to make the associated calculations.
The different types of data analytics include descriptive, diagnostic, predictive, and prescriptive
(or proscriptive) analytics. The difference between the types is the intention of the analysis,
rather than the techniques used. The person using the data may intend to use it to describe
what is happening or to find the immediate cause behind what is currently happening. The user
may also be more concerned with predicting future trends instead of describing past ones.
4.1 Descriptive
Descriptive data analytics answer questions to discover what has happened. Data assembly and
organization tasks such as cataloging daily, weekly, and monthly sales reports; usage rates and
expenditures for advertisement; and inventory accumulation or depletion are analytics tasks
that describe what is happening in an enterprise. These analytics tasks are often undertaken by
operations analysts.
4.2 Diagnostic
Diagnostic data analytics answer questions to discover why events are happening. For example,
diagnostic analytics may attempt to discover associations between marketing and sales, or
between inventory and sales, so that management may understand the relationships between
the occurring events. Business analysts often undertake these tasks.
4.3 Predictive
Predictive data analytics bring together the data from descriptive and diagnostic analytics to
predict expected future events, given what is known about conditions and presuming that the
relationships remain stable. Accounting analysts often undertake these tasks in order to inform
managers regarding anticipated gains and expenses in future periods.
4.4 Prescriptive
Prescriptive analytics focus on the business relationships discovered by diagnostic analytics to
recommend actions intended to influence future events. While predictive analytics describe what
will occur if everything continues as it is, prescriptive analytics focus on what should change to
bring about different results. Predictive analytics are often undertaken by experienced analysts
at the managerial or executive level.
Question 1 MCQ-12722
Consider the following representation of a data set prepared for data mining.
10
0
0 1 2 3 4 5 6 7 8 9 10
Question 2 MCQ-12723
a. Descriptive
b. Diagnostic
c. Predictive
d. Prescriptive
Question 3 MCQ-12594
Baiem, LLC is analyzing stock market trading data. It intends to cease predicting future
stock price movement and wants to quickly analyze what the stock market is doing every
moment. By reacting quickly, Baiem will catch the rises and falls in stock prices caused by
other market actors. Baiem believes it will make more money by catching a portion of every
stock movement, rather than catching all of the movement each time it correctly predicts a
trend. With which of the "Four Vs" of Big Data is Baiem most concerned?
a. Velocity
b. Volume
c. Variety
d. Vexation
NOTES
This module covers the following content from the IMA Learning Outcome Statements.
There are various types of analytic models, each with strengths and limitations. Some models
are basic in structure and less complex in application. Others require a deep level of statistical
knowledge and a team of analytic consultants to apply the models to consumer data.
1.1 Clustering
Cluster analysis is a technique used to determine whether a large group contains two or more
cohesive subgroups, where each subgroup's members are more similar to members of their
own subgroup than they are to members of other subgroups. Cluster analysis is used to target
prospective consumers as well as existing customers, often with differentiated products or
advertisements targeted to the preferences of each subgroup.
Cluster analysis is also used to identify characteristics which unite and characteristics which
separate the customer base, so that products and advertisements can be effectively targeted
toward those subgroups. The benefit of this is multifold. Companies can save marketing dollars
by avoiding marketing efforts to consumers with a low propensity to respond to an ad. It also
helps companies better understand their customers so better, more relevant products can be
developed and sold. Products and services that are differentiated from the competition will be
more likely to satisfy consumers within the targeted demographic group.
1.2 Classification
Classification analysis attempts to accurately label data points as belonging to an established
group. Binary classification is when there are two options and analysis is used to choose the
more likely of those two options. For example, in conversion prediction, data is used to predict
whether individual customers are likely to purchase a product. Data is gathered about known
cases where existing customers purchased products, and data is gathered about many other
people to analyze which people most resemble the existing customer base and may be more
willing to make a purchase.
There are other types of classification analyses in addition to the binary application. In cases
in which one option is rarer than the other, imbalanced classification techniques are used to
isolate anomalies. This is highly relevant for fraud detection techniques, because fraudulent
transactions are rarer than non-fraud transactions. Similar techniques are used to determine
outliers during the data-cleaning phase of other analysis tasks.
1.3 Regression
Regression is an advanced mathematical analysis that produces an equation to determine the
relationship between variables (independent variables and a dependent variable). This approach
is applied if there is a suspected underlying linear relationship between two or more variables,
but it can also be applied in the event no previous relationship is known. This is the kind of
relationship that diagnostic analytics is intended to discover. Once a testable relationship has
been established, such as "viewing a certain advertisement increases the quantity purchased,"
regression analysis can be used to test the strength of the relationship, quantify that
relationship, and apply it to other data sets to infer how many individuals are likely to respond in
a similar manner.
Y = β0 + β1X + ε
Where:
Y = dependent variable
β0 = intercept
β1 = slope
X = independent variable
ε = random error
Dependent Variable: The outcome estimated by the regression equation based on its
relationship to the independent variable.
Independent Variable: The variable theorized to cause or drive the value of the dependent
variable.
Intercept: The value that the dependent variable has when the independent variable is zero.
Slope: The quantified rate of change between the independent variable and the dependent
variable. It is usually read as "for each one-unit increase in X, there is a β1 increase in Y."
Random Error: Any remaining variation that cannot be accounted for in the model. If there
are other independent variables acting on the dependent variable which are not in the
model, then their effect is also contained in the error term. If all of the source data perfectly
fit onto a line, the error term would be zero.
A student is studying for an important test and would like to know the relationship between
the number of hours students study for the exam and the grade the students achieve on
the exam. The student polls other students who have taken the test in prior semesters and
gathers the following information on time studied and exam score:
(continued)
(continued)
20
0
0 2 4 6 8 10
The student inputs this data into an application that can perform linear regression and gets
the following regression equation:
Y = 59.5 + 4.1X + ε
20
0
0 2 4 6 8 10
According to this formula, a student can expect a grade of 59.5 on the exam with zero
hours of study and an improvement of 4.1 points for every hour studied.
A coefficient of correlation of 0.593 yields an R-squared of 0.35, which means that only
35 percent of the variation in test scores is explained by the variation in the hours studied.
There must be other factors that influence test scores, but these are not taken into account
in this equation.
The intercept in this study is the grade expected with zero studying, or 59.5. Based on the
methods used by the student, the student is 95 percent confident that the real intercept
value is between 35.8 and 80.1. Similarly, the model shows that each hour of studying
results in 4.1 points of grade improvement. The student is 95 percent confident that the
actual slope is between (0.44) and 8.67. These wide ranges reflect the imprecision of this
linear regression model.
Where:
Y = dependent variable
β0 = intercept
βn = the slope for the independent variable Xn
Xn = the independent variables
ε = random error
Note that there is only one intercept and one error term, but multiple independent variables and
multiple slopes.
An analyst wants to determine whether the market price of a home can be calculated from
factors describing that home. Each factor is an independent variable and the sales price
of the home is the dependent variable. The data set contains listings for several hundred
homes for sale in two different suburbs of a major Midwestern city along with zip codes
identifying the suburb, the number of bedrooms and bathrooms, the size of the house and
of the yard, when the house was built, and a 0–30 score for the quality of the public school
district. A portion of the data set follows:
Price Address Zip Beds Baths Sq. Ft. Sq. Ft. Lot Built School
X1 X2 House X3 X4 X5 X6
379,900 350 Carpenter Rd. 43230 4 2.5 2,268 20,908.8 1987 22
285,900 380 Woodside Meadows 43230 4 2.5 2,002 7,405 1998 22
300,000 6128 Renwell Ln. Unit 87 43230 3 2.5 2,385 10,890 2004 10
399,900 471 Preservation Ln. 43230 4 2.5 2,796 10,890 2008 21
320,000 908 Ludwig Dr. 43230 4 2.5 2,838 11,325.6 1990 22
279,900 194 Winfall Dr. 43230 4 2.5 2,509 10,454.4 1994 21
280,000 4167 Guston Pl. 43230 3 2.5 3,380 12,196.8 2002 26
329,000 586 Pinegrove Pl. 43230 3 2.5 1,953 7,841 1998 22
The goal of the regression is to see if the data can be used to determine the market price
for a house in the area.
The initial run of the regression model against the data set produces the following
regression equation:
Y = $252,599 + 5,742X1 + 2,598X2 + 1,122X3 + 33X4 + 0.61X5 + 1168X6
The coefficient of determination or R-squared is determined to be 24 percent, meaning the
model explains 24 percent of the variation in the sales price of the home. The analyst will
most likely revise this model to find one with a stronger goodness of fit.
A company's sales revenue figures over time are plotted on the chart below:
Sales Revenue
Jan $498,729 $1,600,000
The solid line on the graph is the actual sales and the dotted line is the trend line.
Generally, sales are increasing over time.
Seasonality occurs when time series data exhibits regular and predictable patterns over
time intervals that are less than a year.
Cyclicality occurs when time series data rises and falls over periods of time that are not fixed
or predictable and are generally over periods of more than one year.
Global temperatures are rising in a nonlinear way. There is irregular variation in daily
temperatures. There is seasonal variation with hotter temperatures in the summer than
in the winter. There is also cyclical variation in which temperatures rise and fall over time.
Some cycles are decades long and some may be centuries long. In order to accurately
forecast temperature over time, all of these cycles must be identified and accounted for in
the analysis.
Predictive analytics generally fall into two types: forecasting analytics and referential analytics.
Forecasting budgets in business may begin with regression analysis to identify the business
components that have the strongest effect on costs and revenues, such as advertising,
materials purchases, labor used, hours of operation, etc.
Next, trend analysis can be performed to determine any seasonality, such as a seasonal
demand for outdoor grilling equipment or snow shovels, and any cyclical effects on the
business, such as macroeconomic business cycles.
By calculating the effects of all these influences, an analyst may predict next month's (or
quarter's or year's) revenues and expenses with greater precision.
A company that wants to expand into a new geographic territory tasks its analysts with
studying the quantity and types of customers and competitors in the proposed new
territory. Analysts compare the new territory with the company's existing territories that
have similar characteristics. When a similar territory is identified, the company refers to
existing data on the effectiveness of its past efforts in that territory (advertising, sales price,
competition, etc.) and applies that data to the proposed new territory.
In this way, management can plan its strategy in the new territory based on the results
from the existing territory. It may be necessary to adjust the plan based on any remaining
differences between the two territories.
Other analytic methods include sensitivity analysis (also known as what-if analysis), and
simulation analysis.
A company may not have statistically verified every relationship that leads from advertising
to revenue (such as specific receptiveness of every component in the message to each
cluster-defined subgroup), but it has enough data points (e.g., this much additional
advertising led to this much additional revenue) to create an exploratory data analysis
model. The company can then use sensitivity analysis to estimate what level of advertising
expenditure is likely to produce a desired rise in revenue. Alternatively, a company
can calculate how sensitive its investment portfolio is to changes in the stock market
by calculating the expected income from all its investments at various possible future
market conditions.
Simulation modeling is a risk-analysis tool that simultaneously performs multiple sensitivity LOS 1F4y
analyses to find the business conditions resulting in the most extreme, yet acceptable, business
outcomes. If management later finds that the business environment has crossed any of the
identified critical thresholds, management will know which actions must be taken until the
environment changes. Managers can otherwise remain confident that as long as business
conditions do not exceed any of these boundaries, they can expect acceptable results.
Simulation models also have the same main benefit (identifying tipping points) and limitation
(potential for misidentification of tipping points) as sensitivity analysis.
4 Data Visualization
4.1.2 Limitations
Although data visualization can be used to quickly communicate patterns in data, a visualization
is a summary that may lack the precision of the actual underlying data. Additionally, data
visualizations are limited by the viewer's ability to understand the message conveyed by the
image. Written descriptions or tables can be used to enhance the user's understanding of the
data visualization.
Bar Chart
Growth
rate
Quarter
A line chart shows a frequency or a magnitude in one dimension against a data metric in the
other dimension, very similar to the way a histogram is set up. The difference is that instead
of multiple bars showing that frequency or magnitude, an indicator point is placed where the
top of the histogram bar would be, and then these points are connected with line segments or
curved lines. Reducing the data from a bar to a line allows multiple variables to be presented in
the same space, such as sales from two different departments.
Line Chart
A stacked chart combines the qualities of a histogram and a line chart, giving a visual
representation at all points of the difference between the several variables being presented.
All variables extend from their tops to the X axis; the value is not just the visible portion of the
variable. This visualization is best used when the included variables do not cross each other. If
the data points cross, a line chart should be used, as shown above.
Area/Stacked Chart
(in thousands)
Revenue
Months
Dot Plot
3.5
3
2.5
Frequency
1.5
1
0.5
0
0 $100,000 $200,000 $300,000 $400,000 $500,000
Price
This visualization is used when there are discrete and repeated instances of the observations.
Dot plots are used when the analyst wants present, individual observations in the visualization.
This is the primary difference from histograms, in which the individual observations are hidden
in the bars.
4.3.4 Flowcharts
Processes that have a beginning, middle, and end can be mapped using flowcharts. These
visuals allow learners to see potential options that may take place during a given process.
Flowchart
Pie Chart
19% 23%
12%
46%
Traditional pie charts are susceptible to distortion. Studies show that people often
underestimate the proportion of obtuse-angled sections of a pie chart and overestimate the
proportion of acute-angled sections. This angular distortion can be addressed using a doughnut
chart, which is a pie chart with the center removed, sometimes so that several pie charts can be
stacked to more easily compare proportions.
Doughnut Chart
33.2%
39.1%
45.3%
11.4% 9.3%
7.7% 6.1%
16.9%
50.7%
7.7% 4.8%
11.8% 2.9%
1.8%
24.9%
37.1%
47.5%
56.2%
Pie charts are often used in marketing when the goal is for the audience to interpret the data
in favor of the marketing message. Pie charts are less suited to presenting management or
executives with actionable summaries of business information.
Boxplot
Scale
Scatter Plot
Dollars spent
Bubble charts are a type of scatter plot that uses the size of the dots to help the user visualize
magnitude or other relational quality.
This bubble chart, known as the Hans Rosling plot, uses data from the OurWorldInData
organization. The relative size of each dot represents the relative size of the population of the
countries represented. Further, the colors of the dots indicate the continents on which the
countries are located.
Directional Charts
Pyramid
Fats, oils,
and sweets
Profit
change
%
Time
Communicating conclusions or recommendations requires more precision than just reporting all
results. This can be done by modifying traditional charts using figure overlaps, relative scaling,
geographical overlays, gradient colors, colors that correlate for the same data point, sorting
data, or by making a visual interactive.
7% 15%
8%
12%
10%
(continued)
(continued)
patterns. Store Z
Store Q
Store L
Store A
Store R
Messages from data can be manipulated by reframing parameters. The tables below show
the exact same data; however, the Y axis in Figure 1 has a minimum value of 0 percent.
The Y axis in Figure 2 has a minimum value set at about 25 percent, emphasizing the
incremental difference between Company X and Company Y's growth in annual revenue.
Figure 2 distorts the difference, which may be misleading for users.
Figure 1 Figure 2
Annual Annual
revenue growth revenue growth
% %
Y Y
If pictures are used to represent data, be careful to scale them appropriately. In the
following images, the vertical axis is faithful, and Oscar has eaten twice as much pizza
as Shelly. However, the image of the pizza has been scaled in two dimensions, making it
appear that Oscar has eaten four times as much.
5 5
Slices of pizza eaten
4 4
3 3
2 2
1 1
0 0
Shelly Oscar Shelly Oscar
10 10
Number of hurricanes
Number of hurricanes
8 8
6 6
4 4
2 2
0 0
1992 1994 1995 1997 1999 2000 1992 1993 1994 1995 1996 1997 1998 1999 2000
Years Years
This issue becomes more complex when three-dimensional images are used to visualize data.
Because the typical data visualization has only two dimensions to work with (a flat page or a flat
screen), three-dimensional images often have to use a forced perspective to create the illusion
of depth. Forced perspective often means that the scale does not apply to all portions of the
image in the same way, which creates distortion.
4.4.3 Forecasting
When using visualization techniques in trend analysis, time series, or any other topic that mixes
actual data with projected, estimated, or predicted data, clearly indicate where the change is
from actual data to the projected, estimated, or predicted data.
Question 1 MCQ-12728
The OutdoorPeople Co. has identified several subgroups among the company's customer
base. These groups have particular combinations of age, wealth, geographic location, etc.
The company is about to release a new product and it wants to measure how much of
an effect the customer's wealth has on buying the product after viewing the advertising
message(s) for the product.
What kind of analysis will be most useful to answer OutdoorPeople's need for information?
a. Cluster analysis
b. Regression analysis
c. Fourrier analysis
d. Classification analysis
Question 2 MCQ-12729
Jacks Capital Inc. is putting together financial results for its annual report. Gains were
reported for each month during the past year but those were completely offset by heavy
losses in the last two months. If Jacks wants to show the relative cumulative incremental
impact of each month's results, which of the following charts would best illustrate that?
a. Scatter plot
b. Flowchart
c. Pyramid
d. Waterfall chart
Question 3 MCQ-12730
The Happy Smiles Co. has a new advertisement and the company has collected data from
focus groups about how effectively the advertisement leads to purchases of the company's
products. The company has measured the demographic information of the focus group
and has prepared a regression analysis. The output of that regression analysis is that the
correlation coefficient is 0.22, the coefficient of determination is 0.10, the standard error is
8,435.20, and the regression equation is:
Sales = 10,743 + 1.7 (Household income) – 8.2 (Average child age)
Which of these business decisions is most appropriate given the data?
a. Focus marketing messages on wealthy customers with young children.
b. Focus marketing messages on poorer customers with older children.
c. Do not produce or market this product.
d. The data supports none of these recommendations.
Question 4 MCQ-12731
A local bank is looking for any patterns in its data for which customers pay back their loans
and which ones do not. The data the company has decided to use is the final disposition
of the loan (paid or defaulted), the customer's income, the amount of the loan, and the
proportion between those two values.
Which of the following data visualization techniques would be most suited to facilitate the
recognition of any patterns present?
a. Bubble chart
b. Pie chart
c. Line graph
d. Flowchart
UNIT 6
Unit 6, Module 1
1. MCQ-12692
Choice "d" is correct. The AIS differs from the decision support system (DSS) and the executive
information system (EIS) due to the high degree of precision and detail required for accounting
purposes. Accounting systems must have accuracy for reconiliations, transaction processing,
and other processes requiring detailed information that a DSS or EIS do not necessarily need.
The AIS provides data as an input to the DSS or the EIS. The AIS by itself does not have controls
to ensure management makes decisions that are in the shareholder's best interest.
Choice "a" is incorrect. Transaction processing is a subsystem of AIS that can initiate, stop,
manipulate, or report on transactions between a company and its suppliers and/or customers.
Choice "b" is incorrect. This choice describes a feature of the management reporting system,
which is a subsystem of AIS that enables internal managers to make financial decisions.
Choice "c" is incorrect. This choice describes features of the financial reporting system, which is a
subsystem of AIS. It is used for reporting to regulatory entities like the Internal Revenue Service
or the Securities and Exchange Commission, as well as to the public in order to meet filing
requirements.
2. MCQ-12693
Choice "b" is correct. An ERP is a cross-functional enterprise system that integrates and
automates business processes and systems to work together, including manufacturing,
logistics, distribution, accounting, project management, finance, and human resource
functions of a business.
ERP uses a single database architecture that allows data to be stored in a centralized repository
for information sharing.
Choice "a" is incorrect. ERP integrates both financial and nonfinancial systems and enables
systems to work together. A benefit of ERP is that it removes barriers in organizations that are
used to working in silos and not communicating with each other.
Choice "c" is incorrect. ERP systems can provide vital cross-functional and standardized
information quickly to benefit managers across the organization in order to assist them in the
decision-making process.
Choice "d" is incorrect. ERP systems act as the framework of integrating systems and therefore
improves the organization's tracking ability for its business functions.
3. MCQ-12695
Choice "d" is correct. Enterprise performance management systems are software packages
designed to help a chief financial officer (CFO) conduct planning, create budgets, forecast
business performance, and consolidate financial results to align with the organization's vision
and strategy.
This choice correctly describes an EPM's key characteristic of combining operational and
financial data to drive the organization forward.
Choice "a" is incorrect. This choice describes a key characteristic of an enterprise resource
planning system.
Choice "b" is incorrect. An EPM system encourages a long-term focus by aligning strategic
objectives with actionable plans, tracking the progress with key performance indicators.
Choice "c" is incorrect. An AIS is a system used to process transactions and generate financial
statements.
Unit 6, Module 2
1. MCQ-12712
Choice "b" is correct. Vulnerability scans are proactive security measures that scan for known
weaknesses in hardware or software applications. They are active in nature, as opposed to
preventive, focusing on core pieces of a company's infrastructure including application-based
scans, network-based scans, port-based scans, device-based scans, and data storage and
repository scans.
This question states that the scan is for "known" weaknesses, which specifically refers to
vulnerability scans. It also lists two of the types of vulnerability scans, application-based and
network-based scans.
Choice "a" is incorrect. Penetration tests are not scans for known weaknesses, but rather they
are attempts by hired professionals to "hack" into a company's IT applications, systems, and
other network components. This intentional breach can be achieved by any means possible, as
opposed to a prescribed method of entry.
Choice "c" is incorrect. Biometric scans can be part of a vulnerability scan, but they generally
refer to access controls that utilize biometric characteristics such as an eye scan or fingerprint
scan in order to gain access.
Choice "d" is incorrect. Access controls are tools that prevent unauthorized access, not scan for
known weaknesses. These controls can strengthen weaknesses but do not identify them.
2. MCQ-12713
Choice "c" is correct. Data publication is the phase in which information is disseminated to other
individuals, both internally and externally. It is the fifth step in the cycle after usage and prior to
archiving and purging.
While this question does mention data capture through the administration of a survey, the
question specifically asks about information that has already been published to others.
Managing miscommunications of inaccurate data to employees and customers falls within the
publication phase.
Choice "a" is incorrect. Data capture does take place in this example, but the problem is asking
about information that has already gone through that phase and has been disseminated.
Choice "b" is incorrect. Data synthesis is the phase in which data has value added and is
transformed, not information that is already in its transformed state.
Choice "d" is incorrect. Data archival refers to data that has already been captured, synthesized,
and publicized.
3. MCQ-12714
Choice "c" is correct. Data preprocessing is converting information into a form that adds value
through consolidation, reduction, and transformation. When consolidating files, similar data
points are aggregated into a single file that can require a cleansing step (usually a maintenance
activity), removing things such as inaccurate data, incomplete fields, or duplication records. That
data is then transformed into its new enhanced state.
Because data preprocessing transforms data, synthesizing it, it fits in the synthesis and analytics
phase of the data life cycle.
Choice "a" is incorrect. Data capture involves the initial obtainment of information, not adding
value once it has been captured.
Choice "b" is incorrect. Data maintenance focuses on the extract, transfer, cleansing, and load
phase of the life cycle, not the value-added phase.
Choice "d" is incorrect. Data purging is the final phase that deals with the removal of data, not
transforming it in the synthesis phase.
4. MCQ-12715
Choice "d" is correct. The COBIT® 2019 framework has several components, including inputs,
COBIT Core, Design Factors, Focus Areas, and publications.
While stakeholders are identified and considered in the framework, stakeholder validations are
not a component of the COBIT® 2019 framework. Key stakeholders include management and the
board of directors. Stakeholders can also be separated into internal and external.
Choice "a" is incorrect. Publications are a key component of the COBIT® 2019 framework that are
documents with information on implementing a governance system.
Choice "b" is incorrect. Design factors are a component in the COBIT® 2019 framework that
influence the design of a governance system.
Choice "c" is incorrect. Community input is a component of the COBIT® 2019 framework that
connects users with the framework itself.
Unit 6, Module 3
1. MCQ-12696
Choice "d" is correct. Statements II and III are true. Waterfall and agile methods are two
strategies for implementing the systems development life cycle. Under the waterfall method,
phases do not overlap, and authoritative agreements pass a project from one phase to another.
Companies using the waterfall method create specialized teams for each phase of development.
Under the agile method, individual features of a project start and finish in each sprint; all phases
of the SDLC for that feature are executed in a single sprint. This requires cross-functional teams
that can perform all phases for that feature. Due to the complexity of building these teams, it is
a best practice to keep a team together and dedicated to one project.
Choices "a" and "c" are incorrect. The SDLC describes the steps (phases) by which a project is
initiated, developed, and used, and how the process is to begin again, making statement I false.
The SDLC is a circular process. This key difference in team structure between waterfall and
agile is reflected in every level of management and organizational structure in the company;
therefore, a company is generally organized around only one of these methods, making
statement IV false.
Choice "b" is incorrect. The SDLC describes the steps (phases) by which a project is initiated,
developed, and used, and how the process is to begin again, making statement I false. The SDLC
is a circular process.
2. MCQ-12697
Choice "d" is correct. Blockchain uses a distributed ledger so that many people have copies
of the history of use of each Bitcoin. When this person attempted to falsify a Bitcoin record,
the blockchain record on her computer disagreed with the many other copies. Because those
duplicate copies agreed with each other and they all disagreed with the one copy from this
person, the blockchain software concludes that the one different copy is fraudulent, so it denied
the transaction and changed the different copy to match the others.
Hash codes are the places in each "block" of the blockchain where transaction records are
kept, such as the date of the transaction and the public key of each party, after a great deal of
complicated math is performed on those records to encode them. In order to falsify a hash code,
one would need to know all the algorithms for encoding, as well as the dates and keys, and to
make a large fraction of the distributed ledger match the false values. This is currently believed
to be impossible.
Choices "a" and "c" are incorrect. Smart contracts are agreements that can result in blockchain/
Bitcoin transactions when compliance with the terms of the agreement can be observed online.
They do not prevent fraudulent transactions from occurring.
Two-factor authentication is the use of a second code or device after entering log-in credentials
to verify authorized access. This fraud is not based on gaining access but in falsifying records.
Choice "b" is incorrect. Smart contracts are agreements that can result in blockchain/Bitcoin
transactions when compliance with the terms of the agreement can be observed online. They do
not prevent fraudulent transactions from occurring.
3. MCQ-12698
Choice "a" is correct. All of these errors contributed to the failure of the AI program. Training this
type of program is a very complex and lengthy process. The unit of measurement for how much
computing power is spent training is the amount of processing the human brain is theoretically
capable of in a year, and AI developers count those by the hundreds.
Training data must resemble the data that the AI program is expected to encounter once it is
deployed. As many different examples as possible, of the same type of transaction that it will be
asked to evaluate, should be within the training data set. Furthermore, the training data set as
a whole should contain approximately the same ratio of fraud to non-fraud as is expected after
deployment. If the AI program is trained to expect fraud half of the time, it will build a decision
algorithm that will expect, and therefore declare, fraud about half of the time. Because an AI
program can be trained into dysfunction, a testing phase should always be undertaken with
different data to evaluate the program's readiness for deployment.
Choice "b" is incorrect. These probably contributed to the AI program's ineffectiveness, but the
other options likely did as well.
Choice "c" is incorrect. These possibly led to the AI's ineffectiveness, but the other options
probably did as well.
Choice "d" is incorrect. The program's lack of performance could most likely be related to these
choices, but so could the others.
Unit 6, Module 4
1. MCQ-12722
Choice "c" is correct. Clustering analysis is a technique that is used to unite data points with
other data points that have similar characteristics, creating a "cluster" or profile of a subset of
the data set. Classification assigns data points with labels, classifying them into different groups
or assigning them into categories.
If there are groups within a data set that share common characteristics, cluster analysis is the
tool used to identify and define these groups. During later analysis, if the best fit of a data point
for a cluster is needed, then classification analysis would be applied.
Choice "a" is incorrect. Grouping individuals together is correctly called cluster analysis, but this
choice does not include item II, which is also correct.
Choice "b" is incorrect. Assigning particular individuals to their best-fit cluster is called
classification analysis. However, the same task is not simultaneously called time-series analysis.
That analysis is regression where one independent variable is time.
Choice "d" is incorrect. Regression analysis is not the grouping together of members of a data
set that share common characteristics. Regression analysis is the validation and quantification of
relationships between an independent variable and one or more dependent variables.
2. MCQ-12723
Choice "d" is correct. This data analysis presumes that there is a relationship (or at least a
correlation) between the measured characteristics of a customer (such as age and income) and
whether or not they choose to purchase the company's product.
The analysis is performed so that the company may choose to take action prescribed
by the model to selectively advertise to those customers who are more receptive to the
company's products.
Choice "a" is incorrect. Decision tree analysis is not directed toward describing what has
occurred. The information present about who has purchased the company's products and
the associated data is not presented for its own value, but rather for what new information
can be discovered within it and, most important, what the company should do with that
new information.
Choice "b" is incorrect. Decision tree analysis is not used to diagnose the reasons why a
customer did or did not purchase the company's products; rather, it is atheoretical and cannot
validate or quantify any theoretical relationship between the characteristics present. Decision
tree analysis presumes that relationships exist and uses the correlational effects to proscribe
choices. If characteristics of a customer's choices have been incorrectly identified or labeled,
then the decision tree's recommendations will be flawed.
Choice "c" is incorrect. This is the second-best answer present, but it is still incorrect. The
information about who has purchased the company's products and the associated data is
not simply to be able to predict future earnings, but rather what company policies should
change based on the new information discovered regarding which customers purchase the
company's products.
3. MCQ-12594
Choice "a" is correct. The "Four Vs" of Big Data are velocity (the speed at which data is gathered
and processed); volume (the amount of storage required to retain the data gathered); variety
(the spread of data types across sensor values, numbers, text, pictures, etc.); and veracity
(the accuracy of the data and the presumption that data within the same data set may be of
different quality).
Baiem's proposed strategy relies on speed. Baiem must receive data, analyze it, produce a
recommendation, and act on that recommendation before every other stock trader. This
challenge is addressed by velocity.
Choice "b" is incorrect. Volume is the size of the data. Baiem is less concerned with the quantity
of data coming out of the stock market than the speed at which it can react to it.
Choice "c" is incorrect. Variety is different types of data being collected and analyzed. Variety
is not Baiem's main concern; the data being analyzed includes a stock name, a price, and a
time stamp.
Choice "d" is incorrect. Vexation is the state of being annoyed, frustrated, or worried. While
Baiem may feel vexed about its stock trades, vexation is not one of the "Four Vs" aspects of
Big Data.
Unit 6, Module 5
1. MCQ-12728
Choice "b" is correct. Regression analysis uses statistics software to discover and quantify
the relationship between a dependent variable and one or more independent variables. The
resulting coefficients can be used to predict values of the dependent variable from any values
the independent variable may have in the future.
OutdoorPeople wants to know if it can predict how likely a customer is to buy its product based
on the customer's wealth. After performing a successful regression analysis, OutdoorPeople will
have a regression equation that will contain this information.
Choice "a" is incorrect. Cluster analysis is used to identify subgroups within a larger group based
on shared characteristics. OutdoorPeople has already identified subgroups but is asking a
question across all its subgroups. Cluster analysis will not answer that question.
Choice "c" is incorrect. Fourrier analysis is used to represent a repeating waveform as a series of
trigonometric functions so that repeating oscillating phenomena (such as sound, light, heat, etc.)
can be mathematically reproduced and compared. Fourrier analysis is unlikely to be of any help
to OutdoorPeople.
Choice "d" is incorrect. Classification analysis is used to place newly encountered data into
subgroups already established by cluster analysis. OutdoorPeople already has clusters, but they
are not being used in this study, and no new customers are being classified into existing clusters.
2. MCQ-12729
Choice "d " is correct. The cumulative impact of data points over time can be shown by a
waterfall chart. Each point contributes to the total of all data points, with each incremental
contribution shown at a given point in time.
A waterfall chart is the best answer because it will show both the cumulative and incremental
impact of each month's financial results for Jacks Capital. This will allow investors to see that all
months, except for two, were consistent.
Choice "a " is incorrect. Scatter plots are more for data sets that have a high volume and
they can have overlapping time periods. They also do not show the cumulative effect of all
data points.
Choice "b" is incorrect. Flowcharts are for processes. They show a path from beginning to end
with different options along the way. They do not show cumulative value.
Choice "c" is incorrect. Pyramids are for communicating foundational relationships. The data in
this example does not have this sort of relationship and does not report cumulative value.
3. MCQ-12730
Choice "d" is correct. Regression analysis is complex and does not always produce a positive
result. Models that are not statistically significant often have one or more of the following
warning signs: a small (near zero) correlation coefficient, a small (near zero) coefficient of
determination, or a large (in proportion to the dependent variable) standard error.
In this example, all three of these warning signs are present. This regression equation is unlikely
to be a true representation of the relationship between these demographic variables and sales,
if such a relationship even exists. There is no support for the regression.
Choice "a" is incorrect. While the regression equation would indeed suggest that this is the way
to maximize sales given the signs on the slopes of the β terms for the two independent variables
(positive for wealth, negative for age), the regression model itself is of poor quality.
Choice "b" is incorrect. The regression equation would suggest the opposite strategy given
the signs on the slopes of the β terms for the two independent variables (positive for wealth,
negative for age).
Choice "c" is incorrect. The fact that this regression model is of such poor quality is not a
reflection on the product being studied. A poor regression model does not mean the product is
bad, merely that the regression model cannot provide reliable recommendations for how best to
market it under the studied conditions.
4. MCQ-12731
Choice "a" is correct. A bubble chart is a scatter plot (a mapping of data points onto a grid
according to two or more qualities of the data, one quality for each axis forming the grid (usually
two). The spatial distribution of the data points enables pattern recognition such as correlation
and the direction of any covariant relationship. Bubble charts are particularly useful because
they can display more than two types of data without resorting to a third or higher dimensional
graph through the use of symbols, color, and the size of the data points.
For this example, if the bank mapped its customer's income to one axis, then the bank could use
either of the other measures for the other axis, leaving the third quality to determine the size of
the bubble. Either way, the bank would have an image showing which loans left customers more
financially stretched relative to other customers. Coloring the dots differently to show defaults
versus paid loans would help the bank discover an association between loaning a customer a
higher proportion of the customer's income and the likelihood of default.
Choice "b" is incorrect. A pie chart is used to show what proportion of the whole comprises each
subgroup. A pie chart could be made to show the relative proportions of paid loans to defaulted
loans, and a separate pie chart could show the proportions among designated segments of
income, but this visualization technique would have no way to combine the two in a single image
to discover patterns.
Choice "c" is incorrect. A line chart is used to show a progression between observations and
the trend demonstrated. The bank could use a line chart to show the changing proportions of
default as income increased, but this visualization technique would have no way to represent
individual loans or the other two data types called for by management.
Choice "d" is incorrect. A flowchart is a diagram used to represent each step of a complex
process, such as the operation or building of a computer program. Each customer could use
a flowchart to decide how to allocate monthly income, including paying the loan, but the bank
neither would have access to all of this information nor any way to aggregate it using this
visualization technique.