0% found this document useful (0 votes)
31 views102 pages

P11 Business Analysis

Uploaded by

aotraspr1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views102 pages

P11 Business Analysis

Uploaded by

aotraspr1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 102

SECTION-B

Business Data Analytics


Introduction to Data Science for Business Decision-Making

Introduction to Data Science for


8
Business Decision-Making
This Module Includes
8.1 Meaning, Nature, Properties, Scope of Data
8.2 Types of Data in Finance and Costing
8.3 Digitization of Data and Information
8.4 Transformation of Data to Decision Relevant Information
8.5 Communication of Information for Quality Decision-making
8.6 Professional Skepticism regarding Data
8.7 Ethical Use of Data and Information

The Institute of Cost Accountants of India 561


Financial Management and Business Data Analytics

Introduction to Data Science for Business


Decision-Making
SLOB Mapped against the Module:
To develop a detailed understanding of the fundamental concepts of data science and its expected role in
business decisions.

Module Learning Objectives:


After studying this module, the students will be able to –
~~ Understand the basic meaning, Nature, Properties, Scope of Data
~~ Understand the use of data for business relevant decision making process
~~ Understand the ethical use of data

562 The Institute of Cost Accountants of India


Introduction to Data Science for Business Decision-Making

Meaning, Nature, Properties,


8.1
Scope of Data

T
here is a saying ‘data is the new oil’. Over the last few years, with the advent of increasing computing
power and availability of data, the importance and application of data science has grown exponentially. The
field of finance and accounts has not remained untouched from this wave. In fact, to become an effective
finance and accounts professional, it is very important to understand, analyse and evaluate data sets.

8.1.1 What is data and how it is linked to information and knowledge?


Data is a source of information and information needs to be processed for gathering knowledge. Any ‘data’ on
its own does not confer any meaning. The relationship between data, information, and knowledge may be depicted
from figure 8.1 below:

DATA INFORMATION KNOWLEDGE

Figure 8.1: Relationship between data, information & Knowledge


The idea of data in the syllabus is frequently described to as ‘raw’ data, which is a collection of meaningless text,
numbers, and symbols. The example of ‘raw data’ could be as below:

●● 2,4,6,8...........
●● Amul, Nestle, ITC..........
●● 36,37,38,35,36............

Figure 8.2: Raw data (Data, information and knowledge)


The figure 8.2 above shows few data series. It is almost impossible to decipher, what these data series is talking
about. The reason is that we do not know the exact context of these data. The first series may be a multiplication
table of 2. Alternatively, this series may also be the marks obtained by students in a class test with full marks of
20. The second series names few Indian brands, but we don’t know, why the names are uttered here at all. To cut
the long story short, we must know the context in which the raw data is talking about. Any ‘data’ on its own can’t
convey any information.

8.1.2 What is information?


As we discussed, data needs to processed for gathering information. Most commonly, we take the help of
computers and software packages for processing data. An exponential growth in availability of computing powers,
and software packages lead to growth of data science in recent years.

The Institute of Cost Accountants of India 563


Financial Management and Business Data Analytics

If we say that the first series in figure 8.2 is really the first four numbers of multiplication table of 2, the third series
is the highest temperature of Kolkata during previous four days, we are actually discovering some information out
of the raw data.
So, we may say now

Information = Data + Context

●● 2, 4, 6, 8……… (Multiplication table of 2)

●● Amul, Nestle, ITC…… (Three FMCG companies listed in


NSE)

●● 36, 37, 38, 35, 36……… (Highest temperature in Kolkata for


last four days)

Figure 8.3: Information = Data + Context

8.1.3 What is knowledge?


When these ‘information’ is used for solving a problem, we say it’s the use of knowledge. By having the
information, about highest temperatures in Kolkata for a month, we may try to estimate the sale of air conditioners.
If our intention is to analyse the profitability of listed FMCG companies in India, first information we should have
been the names of FMCG companies. So, we may say:

Knowledge = Information + Application of it

●● 2, 4, 6, 8……… (Multiplication table of 2)….(The table of 3


should start from 3)

●● Amul, Nestle, ITC…… (Three FMCG companies listed in


NSE)….(These three company’s financial performance
should be analysed to understand the Indian FMCG sector)

●● 36, 37, 38, 35, 36……… (Highest temperature in Kolkata for


last four days)……. (The sale of ACs may be estimated using
this information)

Figure 8.4: Knowledge = Information + Application of it

8.1.4 Nature of Data


Over the time the magnitude and availability of data has exponentially grown over the years. However, the data
sets may be classified into different groups as below:
(i) Numerical data: Any data expressed as a number is a numerical data. In finance, a prominent example is

564 The Institute of Cost Accountants of India


Introduction to Data Science for Business Decision-Making

stock price data. Figure 8.5 below is showing the daily stock prices of HUL stock. This is an example of
numerical data.

Figure 8.5: Stock price of HUL (Source: finance.yahoo.com)

(ii) Descriptive data: Some times information may be deciphered in the form of qualitative information.
Look at the paragraph in figure 8.6 extracted from annual report of HUL (2021-22). This is a descriptive
data provided by HUL in its annual report (2021-22). The user may use this data to make a judicious
investment decision.

Leading social and environment change


●● At Hindustan Unilever, we have always strived to grow our business while protecting the planet and
doing good for the people. We believe that to generate superior long-term value, we need to care
for all our stakeholders – our consumers, customers, employees, shareholders and above all, the
planet and society. We call it the multistakeholder model of sustainable growth. With more people
entering the consuption cycle and adding to the pressure on natural resources, it will become even
more important to decouple growth from environmental impact and drive positive social change.

Figure 8.6: Descriptive data extracted from HUL annual report (2021-22)

(iii) Graphic data: A picture or graphic may tell thousand stories. Data may also be presented in the form of a

The Institute of Cost Accountants of India 565


Financial Management and Business Data Analytics

picture or graphics. For example, the stock price of HUL may be presented in the form of a picture or chart
(Figure 8.7)

Figure 8.7: Graphic representation of HUL stock prices (Source: google.com)

566 The Institute of Cost Accountants of India


Introduction to Data Science for Business Decision-Making

Types of Data in Finance and


8.2
Costing

D
ata plays a very important role in the study of finance and cost accounting. From the inception of the
study of finance, accounting and cost accounting, data always played an important role. Be it in the
form of financial statements, or cost statements etc the finance and accounting professionals played a
significant role in helping the management to make prudent decisions.
The kinds of data used in finance and costing may be quantitative as well as qualitative in nature.
~~ Quantitative financial data: By the term ‘quantitative data’, we mean the data expressed in numbers.
The quantitative data availability in finance is significant. The stock price data, financial statements etc
are examples of quantitative data. As most of the financial records are maintained in the form of organised
numerical data.
~~ Qualitative financial data: However, some data in financial studies may appear in a qualitative format
e.g. text, videos, audio etc. These types of data may be very useful for financial analysis. For example, the
‘management discussion and analysis’ presented as part of annual report of a company is mostly presented
in the form of text. This information is useful for getting an insight into the performance of the business.
Similarly, key executives often appear for an interview in business channels. These interactions are often
goldmines for data and information.

Types of data
There is another way of classifying the types of data. The data may be classified also as:
(i) Nominal
(ii) Ordinal
(iii) Interval
(iv) Ratio
Each gives a distinct set of traits that influences the sort of analysis that may be conducted. The differentiation
between the four scale types is based on three basic characteristics:
(a) Whether the sequence of answers matters or not
(b) Whether the gap between observations is significant or interpretable, and
(c) The existence or presence of a genuine zero.
We will briefly discuss these four types below:
(i) Nominal Scale: Nominal scale is being used for categorising data. Under this scale, observations are classified
based on certain characteristics. The category labels may contain numbers but have no numerical value.
Examples could be, classifying equities into small-cap, mid-cap, and large-cap categories or classifying
funds as equity funds, debt funds, and balanced funds etc.

The Institute of Cost Accountants of India 567


Financial Management and Business Data Analytics

(ii) Ordinal Scale: Ordinal scale is being used for classifying and put it in order. The numbers just indicate an
order. They do not specify how much better or worse a stock is at a specific price compared to one with a
lower price. For example, the top 10 stocks by P/E ratio
(iii) Interval scale: Interval scale is used for categorising and ranking using an equal interval scale. Equal
intervals separate neighbouring scale values. As a result of scale’s arbitrary zero point, ratios cannot be
calculated. For example, temperature scales. The temperature of 40 degrees is 5 degrees higher than that of
35 degrees. The issue is that a temperature of 0 degrees Celsius does not indicate the absence of temperature.
A temperature of 20 degrees is thus not always twice as hot as a temperature of 10 degrees.
(iv) Ratio scale: The ratio scale possesses all characteristics of the nominal, ordinal, and interval scales. The
acquired data can not only be classified and rated on a ratio scale, but also have equal intervals. A ratio scale
has a true zero, meaning that zero has a significant value. The genuine zero value on a ratio scale allows for
the magnitude to be described. For example, length, time, mass, money, age, etc. are typical examples of
ratio scales. For data analysis, a ratio scale may be utilised to measure sales, pricing, market share, and client
count.

568 The Institute of Cost Accountants of India


Introduction to Data Science for Business Decision-Making

Digitization of Data and


8.3
Information

I
n plain terms, digitization implies the process of converting the data and information from analogue to digital
format. The data in the original form may be stored in as an object, a document or an image. The objective
of digitization is to create a digital surrogate of the data and information in the form of binary numbers that
facilitate processing using computers. There are primarily two basic objectives of digitization. First is to
provide a widespread access of data and information to a very large group of users simultaneously. Secondly,
digitization helps in preservation of data for a longer period. One of largest digitization project taken up in India is
‘Unique Identification number’ (UID) or ‘Aadhar’ (figure 8.8).

Figure 8.8:UID – Aadhar – globally one of the largest projects of digitization (Source: https://uidai.gov.
in/about-uidai/unique-identification-authority-of-india/vision-mission.html)
Digitization brings in some great advantages, which are mentioned below.

Why we digitize?
There are many arguments that favour digitization of records. Some of them are mentioned below:
●● Improves classification and indexing for documents, this helps in retrieval of the records.

The Institute of Cost Accountants of India 569


Financial Management and Business Data Analytics

●● Digitized records may be accessed by more than one person simultaneously.


●● It becomes easier to reuse the data, which are difficult to reuse in present format e.g. very large maps, data
recorded in microfilms etc.
●● Helps in work processing
●● Higher integration with business information systems
●● Easier to keep back-up files and retrieval during any unexpected disaster
●● Can be accessed from multiple locations through networked systems
●● Increased scope for rise in organizational productivity
●● Requires less physical storage space

How do we digitize?
Large institution takes up digitization projects with meticulous planning and execution. The entire process of
digitization may be segregated into six phases:

Phase 1: Justification of the proposed digitization project


At the very initiation of the digitization project, the accrual benefit of the project needs to be identified. Also
need to compute the cost aspect of the project and the assessment of availability of resources. Risk assessment is
an important part project assessment. For the resources that may be facing quick destruction may be required an
early digitization.
Most importantly, the expected value generation through digitization should be expressed in clear terms.

Phase 2: Assessment
In any institutions, all records are never digitized. The data that requires digitization is to be decided on the basis
of content and context. Some data may be digitized in a consolidated format, and some in detailed format. The files,
tables, documents, expected future use etc are to be accessed and evaluated for the assessment.
The hardware and software requirements for digitization is also assessed at this stage. The human resource
requirement for executing the digitization project is also planned. The risk assessment at this level e.g. possibilities
of natural disasters, and/or cyber attacks etc also need to be completed.

Phase 3: Planning
Successful execution of digitization project needs meticulous planning. There are several stages for planning e.g.
selection of digitization approach, Project documentation, Resources management, Technical specifications, and
Risk management.
The institution may decide to complete the digitization in-house or alternatively by an outsourced agency. It may
also be done on-demand or in batches.

Phase 4: Digitization activities


Upon the completion of assessment and planning phase, the digitization activities start. The Wisconsin Historical
Society developed a six-phase process viz. Planning, Capture, Primary quality control, Editing, Secondary quality
control, and storage and management.
The planning schedule is prepared at the fist stage, calibration of hardware/software and scanning etc is done
next. A primary quality check is done on the output to check the reliability. Cropping, colour correction, assigning

570 The Institute of Cost Accountants of India


Introduction to Data Science for Business Decision-Making

Metadata etc is done at the editing stage. A final check of quality is done on randomly selected samples. And
finally, user copies are created, and uploaded to dedicated storage space, after doing file validation. The digitization
process may be viewed at figure 8.9 below.

Documents Scanning Editing Analyzing OCR

Text
Cropping
Image
Erasing Table Save
PDF PDF
JPEG DOC
GIF RTF
Save HTML
TIFF
etc.... EXCEL
PPT
IR / DL
etc..

Figure 8.9: The complete digitization process. Source: Bandi, S., Angadi, M. and Shivarama, J. Best
practices in digitization: Planning and workflow processes. In Proceedings of the Emerging
Technologies and Future of Libraries: Issues and Challenges (Gulbarga University,
Karnataka, India, 30-31 January), 2015

Phase 5: Processes in the care of records


Once the digitization of records is complete, there are few additional requirements arise which may be linked to
administration of records. The permission for accession of data, intellectual control (over data), classification (if
necessary), and upkeeping and maintenance of data are few additional requirements for data management.

Phase 6: Evaluation
Once the digitization project is updated and implemented, the final phase should be a systematic determination
of the project’s merit, worth and significant using objective criteria. The primary purpose is to enable reflection and
assist identify changes that would improve future digitization processes.

The Institute of Cost Accountants of India 571


Financial Management and Business Data Analytics

Transformation of Data to
8.4
Decision Relevant Information

T
he emergence of big data has changed the world of business like never before. The most important shift
has happened in the information generation and the decision-making process. There is a strong emergence
of analytics that supports a more intensive data-centric and data-driven information generation and
decision-making process. The data that encompasses the organization is being harnessed into information
that apprises, cares and prudent decision making in a judicious and repeatable manner.
The pertinent question here is, What an enterprise needs to do for transforming data into relevant information?
As noted earlier, all types of data may not lead to relevant information for decision making. The biannual KPMG
global CFO report says, for today’s finance function leaders, “biggest challenges lie in creating the efficiencies
needed to gather and process basic financial data and continue to deliver traditional finance outputs while at the
same time redeploying their limited resources to enable higher-value business decision support activities.”
For understating the finance functions within an enterprise, we may refer figure 8.10 below:

Financial Business Business


performance analyses at performance
strategic evel and risk
Decision
management
support
CFO

Financial Financial Financial


reporting and Financial Financial
statements and planning ,
control reporting planning
management reporting and
and analysis and control
information control cycle
Past Future
Controllers

Financial The ‘basics’ Transaction


operations (foundation) processing
Standardized Transactional Master data
and
(ERP) services and policy
bookkeeping
Systems (Shared) management
Bookkeepers

Figure 8.10: Understanding finance functions (Source: KPMG international)


At the ‘basics’ or foundation of pyramid (figure 8.10), the data generation may be automated by using ERP and
other relevant software and hardware tools. The tools, techniques and processes that comprise the field of data &
analytics (D&A) play a significant role in improving the quality of standard daily data and transaction processing.

572 The Institute of Cost Accountants of India


Introduction to Data Science for Business Decision-Making

To make the data turn into user friendly information, it should go through six core steps:
1. Collection of data: The collection of data may be done with standardized systems in place. Appropriate
software and hardware may be used for this purpose. Appointment of trained staff also plays an important
role in collecting accurate and relevant data.
2. Organising the data: The raw data needs to be organized in an appropriate manner to generate relevant
information. The data may be grouped, arranged in a manner that create useful information for the target user
groups.
3. Data processing: At this step, data needs to be cleaned to remove the unnecessary elements. If any data
point is missing or not available, that also need to be addressed. The options available for presentation format
for the data also need to be decided.
4. Integration of data: Data integration is the process of combining data from various sources into a single,
unified form. This step include creation of data network sources, a master server and users accessing the data
from master server. Data integration eventually enables the analytics tools to produce effective, actionable
business intelligence.
5. Data reporting: Data reporting stage involves translating the data into a consumable format to make
it accessible by the users. For example, for a business firm, they should be able to provide summarized
financial information e.g. revenue, net profit etc. The objective is, a user, who wants to understand the
financial position of the company should get the relevant and accurate information.
6. Data utilization: At this ultimate step, data is being utilized to back corporate activities and enhance
operational efficiencies and productivity for the growth of business. This makes the corporate decision
making really ‘data driven’.

The Institute of Cost Accountants of India 573


Financial Management and Business Data Analytics

Communication of Information
8.5
for Quality Decision-making
T
he quality information should lead to quality decisions. With the help of well curated and reported data, the
decision makers should be able to add higher-value business insights leading to better strategic decision
making.
In a sense, a judicious use of data analytics is essential for implementation of ‘lean finance’, which implies
optimized finance processes with reduced cost and increased speed, flexibility and quality. By transforming the
information into a process for quality decision making, the firm should achieve the following abilities:
(i) Logical understanding of a wide-ranging structured and unstructured data and put on that information to
corporate planning, budgeting and forecasting and decision support
(ii) Predict outcomes more effectively compared to conventional forecasting techniques based on historical
financial reports
(iii) Real time spotting of emerging opportunities and also capability gaps.
(iv) Making strategies for responding to uncertain events like market volatility and ‘black swan’ events through
simulation.
(v) Diagnose, filter and excerpt value from financial and operational information for making better business
decisions
(vi) Recognize viable advantages to service customers in a better manner
(vii) Identifying possible fraud possibilities on the basis of data analytics.
(viii) Building impressive and useful dashboards to measure and demonstrate success leading to effective
strategies.
The aim of a data driven business organization is develop a business intelligence (BI) system that is not only
focused on efficient delivery of information but also provide accurate strategic insight into the operational and
financial system. This impacts the organizational capabilities in a positive manner. This makes the organization
resilient to market pressures and create competitive advantages by serving customers in better way by using data
and predictive analytics.

574 The Institute of Cost Accountants of India


Introduction to Data Science for Business Decision-Making

Professional Scepticism
8.6
Regarding Data
W
hile data analytics is an important tool for decision making, managers should never take an important
analysis at face value. A deeper understanding of hidden insights that lie underneath the surface of the
data set need to be explored, and what appears on the surface should be looked with some scepticism.
The emergence of new data analytics tools and techniques in financial environment allows the accounting and
finance professionals to gain unique insights into the data, but at the same time creating very unique challenges
while exercising scepticism. As the availability of data is bigger now, analysts and auditors not only getting more
information, but also is facing challenges about managing and investigating red flags.
One major concern about the use of data analytics is the likelihood of false positives, i.e. the data may identify
few potential anomalies that could be later identified as reasonable and explained variation of data.
Studies show that the frequency of false positives increase proportionately with the size and complexity of data.
Few studies also show that analysts face problems while determining outliers using data analytics tools.
Professional scepticism is an important focus area for practitioners, researchers, regulators and standard setters.
At the same time, professional scepticism may result into additional costs e.g. strained client relationships, and
budget coverages.
Under such circumstances, it is important to identify and understand conditions in which the finance and audit
professionals should apply professional scepticism. There is a requirement to keep a fine balance between costly
scepticism and underutilizing data analytics to keep the cost under control.

The Institute of Cost Accountants of India 575


Financial Management and Business Data Analytics

Ethical Use of Data and


8.7
Information

D
ata analytics can help in decision making process and make an impact. However, this empowerment
for business also comes with challenges. The question is how the business organizations can ethically
collect, store and use data? And what rights need to be upheld? Below we will discuss five guiding
principles in this regard. Data ethics addresses the moral obligations of gathering, protecting and
using personally identifiable information. In present days, it is a major concern for analysts, managers and data
professionals.
The five basic principles of data ethics that a business organization should follow are:
(i) Regarding ownership: The first principle is that ownership of any personal information belongs to the
person. It is unlawful and unethical to collect someone’s personal data without their consent. The consent
may be obtained through digital privacy policies or signed agreements or by asking the users to agree with
terms and conditions. It is always advisable to ask for permission beforehand to avoid future legal and
ethical complications. In case of financial data, some data may be sensitive in nature. Prior permission
must be obtained before using the financial data for further analysis.
(ii) Regarding transparency: Maintaining transparency is important while gathering data. The objective with
which the company is collecting user’s data should be known to the user. For example is the company is
using cookies to track the online behaviour of the user, it should be mentioned to the user through a written
policy that cookies would be used for tracking user’s online behaviour and the collected data will be stored
in a secure database to train an algorithm to enhance user experience. After reading the policy, the user
may decide to accept or not to accept the policy. Similarly, while collecting the financial data from clients,
it should be clearly mentioned that for which purpose the data should be used.
(iii) Regarding privacy: As the user may allow to collect, store and analyze the personally identifiable
information (PII), that does not imply it should be made publicly available. For companies, it is mandatory
to publish some financial information to public e.g. through annual reports. However, there may be many
confidential information, which if falls on a wrong hand may create problems and financial loss. To
protect privacy of data, a data security process should be in place. This may include file encryption and
dual authentication password etc. The possibility of breach of data privacy may also be done through de-
identifying a dataset.
(iv) Regarding intention: The intension of data analysis should never be making profits out of others
weaknesses or for hurting others. Collecting data which is unnecessary for analysis should be avoided and
it’s unethical.
(v) Regarding outcomes: In some cases, even if the intentions are good, the result of data analysis may
inadvertently hurt the clients and data providers. This is called disparate impact, which is unethical.

576 The Institute of Cost Accountants of India


Introduction to Data Science for Business Decision-Making

Solved Case 1
Mr. Arjun is working as data analyst with Manoj Enterprises Limited. He was invited by an educational institute
to deliver a lecture on data analysis. He was told that the participants would be fresh graduates, who would like
get a glimpse of the emerging field of ‘data analysis’. He was planning for the lecture and is thinking of the
concepts to be covered during the lecture.
In your opinion, which are the fundamental concepts that Arjun should cover in his lecture.
Teaching note - outline for solution:
While addressing the fresh candidates, Arjun may focus on explaining the basic concepts on data analysis. He
may initiate the discussion with a brief introduction on ‘data’. He may discuss with examples, how mere data is
not useful for decision making. Next, he may move to discussion of link among data, information and knowledge.
The participants should get a clear idea about the formation of knowledge using ‘raw’ data as resource.
Once the basic concepts about data, information and knowledge is clear in the minds of participants, Arjun may
describe the various types of data e.g. numerical data, descriptive data and graphical data. He may explain the
concepts with some real-life examples. Further, he may also discuss another way of looking at data e.g. ordinal
scale, ratio scale etc.
How the data analysis is particularly useful for finance and accounting functions may be discussed next. The
difference between quantitative and qualitative data can be discussed next with help of few practical examples.
However, the key question is how the raw data may be transformed into useful information?
To explore the answer to this question, Arjun may discuss the six steps to be followed for transforming data into
information.
The ultimate objective of adopting so much pain is to generate quality decisions. This is a subjective area. Arjun
may seek inputs from participants and discuss various ways of generating relevant and useful decisions by
exploring raw data.
During this entire process of quality decision making, one should not forget the ethical aspects. Arjun should
convey the importance of adopting ethical practices in data analysis.
At the end, Arjun may end the conversation with a thanking note.

The Institute of Cost Accountants of India 577


Financial Management and Business Data Analytics

Exercise
A. Theoretical Questions:
~~ Multiple Choice Questions:

1. Numerical data may be expressed as


(a) In the form of text
(b) In the form of numbers
(c) In the form of images
(d) All of the above
2. The descriptive data may be deciphered as
(a) May be deciphered in the form of qualitative information
(b) May be deciphered in the form of quantitative information
(c) May be deciphered in the form of information from informal sources
(d) All of the above
3. Data represented in the form of picture is termed as
(a) Graphic data
(b) Qualitative data
(c) Quantitative data
(d) All of the above
4. Which of the following is/are the reason for digitization
(a) Helps in work processing
(b) Requires less physical storage space
(c) Digitized records may be accessed by more than one person simultaneously
(d) All of the above
5. To make the data turn into user friendly information, it should go one/more of following core steps
(a) Collection of data
(b) Organising the data
(c) Data processing
(d) All of the above
Answer:

1 b 2 a 3 a 4 b 5 d
~~ State True or False

1. Improves classification and indexing for documents, this helps in retrieval of the records.
2. Data is not a source of information
3. One of largest digitization project taken up in India is ‘Unique Identification number’ (UID) or ‘Aadhar’

578 The Institute of Cost Accountants of India


Introduction to Data Science for Business Decision-Making

4. When these ‘information’ is used for solving a problem, we may it’s the use of knowledge
5. Any data expressed as a number is a numerical data
Answer:

1 T 2 F 3 T 4 T 5 T

~~ Fill in the blanks

1. There are primarily _________ basic objectives of digitization.


2. By the term ________________, we mean the data expressed in numbers.
3. Daily stock price of Tata Steel Ltd is an example of __________ data.
4. Data is a ___________ of information.
5. When these ‘information’ is used for solving a problem, we may it’s the use of __________.
Answer:
1 Two 2 Quantitative data
3 numerical 4 Source
5 knowledge

~~ Short essay type questions

1. Define the term ‘descriptive data’ with examples.


2. Discuss the difference between ordinal scale and ratio scale.
3. Discuss the relationship between data, information and knowledge
4. ‘One major concern about the use of data analytics is the likelihood of false positives’ – briefly discuss

~~ Essay type questions

1. Discuss the five basic principles of data ethics that a business organization should follow
2. ‘The quality information should lead to quality decisions’ – Discuss
3. Discuss the six core steps that may turn the data into user friendly information.
4. Discuss the six phases that comprise the entire process of digitization.
5. Why we digitize the data?

Unsolved Case
1. Ram Kumar is the head data scientist of Anjana Ltd. For the last few weeks, he is working along with
his team for extracting information from a huge pile of data collected over time. His team members are
working day and night for collecting and cleaning the data. He has to make a presentation before the senior
management of the company to explain the findings. Discuss the important steps, he need to take care of to
transform raw data into useful knowledge.

The Institute of Cost Accountants of India 579


Financial Management and Business Data Analytics

References:
●● Data-driven business transformation. KPMG International
●● Davy Cielen, Arno D B Meysman, and Mohamed Ali. Introducing Data Science. Manning Publications Co
USA
●● www.finance.yahoo.com
●● www.google.com
●● Data, Information and Knowledge. Cambridge International
●● Data Analytics and Skeptical Actions: The Countervailing Effects of False Positives and Consistent Rewards
for SkepticismBy Dereck Barr-Pulliam, Joseph Brazel, Jennifer McCallen and Kimberly Walker
●● Annual Report of Hindustan Unilever Limited. (2021-22)
●● www.uidai.gov.in
●● Bandi, S., Angadi, M. and Shivarama, J. Best. Practices in digitization: Planning and workflow processes.
In Proceedings of the Emerging Technologies and Future of Libraries: Issues and Challenges
●● Finance’s Key Role in Building the Data-Driven Enterprise. Harvard Business Review Analytic Services
●● How to Embrace Data Analytics to Be Successful. Institute of Management Accountants. USA
●● The Data Analytics Implementation Journey in Business and Finance. Institute of Management Accountants.
USA
●● Principles of data ethics in business. Business Insights. Harvard Business School.

580 The Institute of Cost Accountants of India


Data Processing, Organisation, Cleaning and Validation

Data Processing, Organisation,


9
Cleaning and Validation
This Module Includes
9.1 Development of Data Processing
9.2 Functions of Data Processing
9.3 Data Organisation and Distribution
9.4 Data Cleaning and Validation

The Institute of Cost Accountants of India 581


Financial Management and Business Data Analytics

Data Processing, Organisation, Cleaning


and Validation
SLOB Mapped against the Module:
To equip oneself with application-oriented knowledge in data preparation, data presentation and finally data
analysis and modelling to facilitate quality business decisions.

Module Learning Objectives:


After studying this module, the students will be able to –
~~ Understand the basic concepts of developments of data processing
~~ Understand the basic concepts of functions of data processing
~~ Understand the basic concepts of data organisation and distribution
~~ Understand the basic concepts of data cleaning and validation

582 The Institute of Cost Accountants of India


Data Processing, Organisation, Cleaning and Validation

Development of Data
9.1
Processing

D
ata processing (DP) is the process of organising, categorising, and manipulating data in order to extract
information. Information in this context refers to valuable connections and trends that may be used to
address pressing issues. In recent years, the capacity and effectiveness of DP have increased manifold
with the development of technology.
Data processing that used to require a lot of human labour progressively superseded by modern tools and
technology. The techniques and procedures used in DP information extraction algorithms for data are well
developed in recent years, for instance, the treatment of facial data classification is necessary for recognition, and
time series analysis is necessary for processing stock market data.
The information extracted as a result of DP is also heavily reliant on the quality of the data. Data quality may get
affected due to several issues like missing data and duplications. There may be other fundamental problems, such
as incorrect equipment design and biased data collecting, which are more difficult to address.
The history of DP can be divided into three phases as a result of technological advancements (figure 9.1):

MANUAL DP

MECHANICAL DP

ELECTRONIC DP
Figure 9.1: History of data processing
(i) Manual DP: Manual DP involves processing data without much assistance from machines. Prior to the
phase of mechanical DP only small-scale data processing was possible using manual efforts. However, in
some special cases Manual DP is still in use today, and it is typically due to the data’s difficulty in digitization
or inability to be read by machines, like in the case of retrieving data from outdated texts or documents.

The Institute of Cost Accountants of India 583


Financial Management and Business Data Analytics

(ii) Mechanical DP: Mechanical DP processes data using mechanical (not modern computers) tools and
technologies. This phase began in 1890 (Bohme et al., 1991) when a system made up of intricate punch card
machines was installed by the US Bureau of the Census in order to assist in compiling the findings of a recent
national population census. Use of mechanical DP made it quicker and easier to search and compute the data
than manual process.
(iii) Electronic DP: And finally, the electronic DP replaced the other two that resulted fall in mistakes and
rising productivity. Data processing is being done electronically using computers and other cutting-edge
electronics. It is now widely used in industry, research institutions and academia.

How data processing and data science is relevant for finance?


The relevance of data processing and data science in the area of finance is increasing every day. The eleven
significant areas where data science play important role are:

(i) Risk analytics: Business inevitably involves


risk, particularly in the financial industry. It ‘The
Real biggest
is crucial to determine the risk factor before Risk
time risk is not
making any decisions. For example, a better Analytics
Analytics taking any
method for defending the business against
potential cybersecurity risks is risk analytics, risk’
which is determined through data science.
Given that a large portion of a company’s ‘Every Customer
man is a Consumer
risk-related data is “unstructured,” its Data
consumer’ Analytics
analysis without data science methods can Management
be challenging and prone to human mistake.
The importance of the loss and the regularity
of its recurrence can aid in highlighting the
Customer Personalised ‘Data is the
precise regions that represent the maximum
Segmentation Services new oil’
threat, allowing for the future avoidance of
similar circumstances. Once a danger has
been recognised, it may be prioritised and
its recurrence closely watched.
Predictive Fraud
Machine learning algorithms can look
Analytics Detection
through historical transactions and general
information to help banks analyse each
customer’s reliability and trustworthiness
and determine the relative risk of accepting Advanced
or lending to them. Anomaly
Customer
Detection
Similar to this, transaction data may be Service
used to create a dynamic, real-time risk
assessment model that responds immediately
to any new transactions or modifications to
client data. Algorithmic
trading
(ii) Real time analytics: Prior to significant
advances in Data Engineering (Airflow,
Spark, and Cloud solutions), all data was
historical in nature. Data engineers would
Figure 9.2: Data processing and data science in finance

584 The Institute of Cost Accountants of India


Data Processing, Organisation, Cleaning and Validation

discover significance in numbers that were days, weeks, months, or even years old since that was the only
accessible information.
It was processed in batches, which meant that no analysis could be performed until a batch of data had
been gathered within a predetermined timescale. Consequently, any conclusions drawn from this data were
possibly invalid.
With technological advancement and improved hardware, real-time analytics are now available, as Data
Engineering, Data Science, Machine Learning, and Business Intelligence work together to provide the
optimal user experience. Thanks to dynamic data pipelines, data streams, and a speedier data transmission
between source and analyzer, businesses can now respond quickly to consumer interactions. With real-time
analysis, there are no delays in establishing a customer’s worth to an organisation, and credit ratings and
transactions are far more precise.
(iii) Customer data management: Data science enables effective management of client data. In recent years,
many financial institutions may have processed their data solely through the machine learning capabilities
of Business Intelligence (BI). However, the proliferation of big data and unstructured data has rendered
this method significantly less effective for predicting risk and future trends.
There are currently more transactions occurring every minute than ever before, thus there is better data
accessibility for analysis. Due to the arrival of social media and new Internet of Things (IoT) devices, a
significant portion of this data does not conform to the structure of organised data previously employed.
Using methods such as text analytics, data mining, and natural language processing, data science is well-
equipped to deal with massive volumes of unstructured new data. Consequently, despite the fact that data
availability has been enhanced, data science implies that a company’s analytical capabilities may also be
upgraded, leading to a greater understanding of market patterns and client behaviour.
(iv) Consumer Analytics: In a world where choice has never been more crucial, it has become evident that
each customer is unique; nonetheless, there have never been more consumers. This contradiction cannot
be sustained without the intelligence and automation of machine learning.
It is as important to ensure that each client receives a customised service as it is to process their data swiftly
and efficiently, without time-intensive individualised analysis.
As a consequence, insurance firms are using real-time analytics in conjunction with prior data patterns
and quick analysis of each customer’s transaction history to eliminate sub-zero consumers, enhance cross-
sales, and calculate a consumer’s lifetime worth. This allows each financial institution to keep their own
degree of security while still reviewing each application individually.
(v) Customer segmentation: Despite the fact that each consumer is unique, it is only possible to comprehend
their behaviour after they have been categorised or divided. Customers are frequently segmented based on
socioeconomic factors, such as geography, age, and buying patterns.
By examining these clusters collectively, organisations in the financial industry and beyond may assess a
customer’s current and long-term worth. With this information, organisations may eliminate clients who
provide little value and focus on those with promise.
To do this, data scientists can use automated machine learning algorithms to categorise their clients based
on specified attributes that have been assigned relative relevance scores. Comparing these groupings to
former customers reveals the expected value of time invested with each client.
(vi) Personalized services: The requirement to customise each customer’s experience extends beyond gauging
risk assessment. Even major organisations strive to provide customised service to their consumers as a

The Institute of Cost Accountants of India 585


Financial Management and Business Data Analytics

method of enhancing their reputation and increasing customer lifetime value. This is also true for businesses
in the finance sector.
From customer evaluations to telephone interactions, everything can be studied in a way that benefits both
the business and the consumer. By delivering the consumer a product that precisely meets their needs,
cross-selling may be facilitated by a thorough comprehension of these interactions.
Natural language processing (NLP) and voice recognition technologies dissect these encounters into a
series of important points that can identify chances to increase revenue, enhance the customer service
experience, and steer the company’s future. Due to the rapid progress of NLP research, the potential is yet
to be fully realised.
(vii) Advanced customer service: Data science’s capacity to give superior customer service goes hand in
hand with its ability to provide customised services. As client interactions may be evaluated in real-time,
more effective recommendations can be offered to the customer care agent managing the customer’s case
throughout the conversation.
Natural language processing can offer chances for practical financial advise based on what the consumer
is saying, even if the customer is unsure of the product they are seeking.
The customer support agent can then cross-sell or up-sell while efficiently addressing the client’s inquiry.
The knowledge from each encounter may then be utilised to inform subsequent interactions of a similar
nature, hence enhancing the system’s efficacy over time.
(viii) Predictive Analytics: Predictive analytics enables organisations in the financial sector to extrapolate from
existing data and anticipate what may occur in the future, including how patterns may evolve. When
prediction is necessary, machine learning is utilised. Using machine learning techniques, pre-processed
data may be input into the system in order for it to learn how to anticipate future occurrences accurately.
More information improves the prediction model. Typically, for an algorithm to function in shallow
learning, the data must be cleansed and altered. Deep learning, on the other hand, changes the data without
the need for human preparation to establish the initial rules, and so achieves superior performance.
In the case of stock market pricing, machine learning algorithms learn trends from past data in a certain
interval (may be a week, month, or quarter) and then forecast future stock market trends based on this
historical information. This allows data scientists to depict expected patterns for end-users in order to assist
them in making investment decisions and developing trading strategies.
(ix) Fraud detection: With a rise in financial transactions, the risk for fraud also increases. Tracking incidents of
fraud, such as identity theft and credit card scams, and limiting the resulting harm is a primary responsibility
for financial institutions. As the technologies used to analyse big data become more sophisticated, so do
their capacity to detect fraud early on.
Artificial intelligence and machine learning algorithms can now detect credit card fraud significantly more
precisely, owing to the vast amount of data accessible from which to draw trends and the capacity to
respond in real time to suspect behaviour.
If a major purchase is made on a credit card belonging to a consumer who has traditionally been very
frugal, the card can be immediately terminated, and a notification sent to the card owner.
This protects not just the client, but also the bank and the client’s insurance carrier. When it comes to
trading, machine learning techniques discover irregularities and notify the relevant financial institution,
enabling speedy inquiry.

586 The Institute of Cost Accountants of India


Data Processing, Organisation, Cleaning and Validation

(x) Anomaly detection: Financial services have long placed a premium on detecting abnormalities in a
customer’s bank account activities, partly because anomalies are only proved to be anomalous after the
event happens. Although data science can provide real-time insights, it cannot anticipate singular incidents
of credit card fraud or identity theft.
However, data analytics can discover instances of unlawful insider trading before they cause considerable
harm. The methods for anomaly identification consist of Recurrent Neural Networks and Long Short-Term
Memory models.
These algorithms can analyse the behaviour of traders before and after information about the stock market
becomes public in order to determine if they illegally monopolised stock market forecasts and took
advantage of investors. Transformers, which are next-generation designs for a variety of applications,
including Anomaly Detection, are the foundation of more modern solutions.
(xi) Algorithmic trading: Algorithmic trading is one of the key uses of data science in finance. Algorithmic
trading happens when an unsupervised computer utilising the intelligence supplied by an algorithm trade
suggestion on the stock market. As a consequence, it eliminates the risk of loss caused by indecision and
human error.
The trading algorithm used to be developed according to a set of stringent rules that decide whether it
will trade on a specific market at a specific moment (there is no restriction for which markets algorithmic
trading can work on).
This method is known as Reinforcement Learning, in which the model is taught using penalties and rewards
associated with the rules. Each time a transaction proves to be a poor option, a model of reinforcement
learning ensures that the algorithm learns and adapts its rules accordingly.
One of the primary advantages of algorithmic trading is the increased frequency of deals. Based on facts
and taught behaviour, the computer can operate in a fraction of a second without human indecision or
thought. Similarly, the machine will only trade when it perceives a profit opportunity according to its rule
set, regardless of how rare these chances may be.

The Institute of Cost Accountants of India 587


Financial Management and Business Data Analytics

Functions of Data Processing 9.2


Data processing generally involves the following processes:
(i) Validation:
As per the UNECE glossary on statistical data editing (UNECE 2013), data validation may be defined as
‘An activity aimed at verifying whether the value of a data item comes from the given (finite or infinite) set of
acceptable values.’
Simon (2013) defined data validation as “Data validation could be operationally defined as a process which
ensures the correspondence of the final (published) data with a number of quality characteristics.”
A decision-making process called data validation leads to the acceptance or rejection of data as acceptable. Data
is subjected to rules. Data are deemed legitimate for the intended final use if they comply with the rules, which
means that the combination stated by the rules is not broken.
The objective of data validation is to assure a particular degree of data quality.
In official statistics, however, quality has multiple dimensions: relevance, correctness, timeliness and punctuality,
accessibility and clarity, comparability, coherence, and comprehensiveness. Therefore, it is essential to determine
which components data validation addresses.
(ii) Sorting:
Data sorting is any procedure that organises data into a meaningful order to make it simpler to comprehend, analyse,
and visualise. Sorting is a typical strategy for presenting research data in a manner that facilitates comprehension of
the story being told by the data. Sorting can be performed on raw data (across all records) or aggregated information (in
a table, chart, or some other aggregated or summarised output).Summarization(statistical) or (automatic) involves
reducing detailed data to its main points.
Typically, data is sorted in ascending or decreasing order based on actual numbers, counts, or percentages, but
it may also be sorted based on variable value labels. Value labels are metadata present in certain applications that
let the researcher to save labels for each value alternative in a categorical question. The vast majority of software
programmes permit sorting by many factors. A data collection including region and nation fields, for instance, can
be sorted by region as the main sort and subsequently by country. In each sorted region, the county sort will be
implemented.
When working with any type of data, there are a number of typical sorting apps. One such use is data cleaning,
which is the act of sorting data in order to identify anomalies in a data pattern. For instance, monthly sales data can
be sorted by month to identify sales volume variations.
Sorting is also frequently used to rank or prioritise records. In this instance, data is sorted based on a rank,
computed score, or other weighing factor (for example, highest volume accounts or heavy usage customers).

588 The Institute of Cost Accountants of India


Data Processing, Organisation, Cleaning and Validation

It is also vitally necessary to organise visualisations (tables, charts, etc.) correctly to facilitate accurate data
interpretation. In market research, for instance, it is typical to sort the findings of a single-response question
by column percentage, i.e. from most answered to least replied, as indicated by the following brand preference
question.
Incorrect classification frequently results in misunderstanding. Always verify that the most logical sorts are used
to every visualisation.
Using sorting functions is an easy idea to comprehend, but there are a few technical considerations to keep
in mind. The arbitrary sorting of non-unique data is one such issue. Consider, for example, a data collection
comprising region and nation variables, as well as several records per area. If a region-based sort is implemented,
what is the default secondary sort? In other words, how will the data be sorted inside each region?
This depends on the application in question. Excel, for instance, will preserve the original sort as the default
sort order following the execution of the primary sort. SQL databases do not have a default sort order. This rather
depends on other variables, such the database management system (DBMS) in use, indexes, and other variables.
Other programmes may perform extra sorting by default based on the column order.
In nearly every level of data processing, the vast majority of analytical and statistical software programmes offer
a variety of sorting options.
(iii) Aggregation:
Data aggregation refers to any process in which data is collected and summarised. When data is aggregated,
individual data rows, which are often compiled from several sources, are replaced with summaries or totals. Groups
of observed aggregates are replaced with statistical summaries based on these observations. A data warehouse often
contains aggregate data since it may offer answers to analytical inquiries and drastically cut the time required to
query massive data sets.
A common application of data aggregation is to offer statistical analysis for groups of individuals and to provide
relevant summary data for business analysis. Utilizing software tools known as data aggregators, large-scale data
aggregation is in commonplace. Typically, data aggregators comprise functions for gathering, processing, and
displaying aggregated data.
Data aggregation enables analysts to access and analyse vast quantities of data in a reasonable amount of time.
A single row of aggregate data may represent hundreds, thousands, or even millions of individual data entries. As
data is aggregated, it may be queried rapidly as opposed to taking all processing cycles to acquire each individual
data row and aggregate it in real time when it is requested or accessed.
As the amount of data kept by businesses continues to grow, aggregating the most significant and often requested
data can facilitate their efficient access.
(iv) Analysis:
Data analysis is described as the process of cleaning, converting, and modelling data to obtain actionable business
intelligence. The objective of data analysis is to extract relevant information from data and make decisions based
on this knowledge.
Every time we make a decision in our day-to-day lives, we consider what occurred previously or what would
occur if we choose a specific option. This is a simple example of data analysis. This is nothing more than studying
the past or the future and basing judgments on that analysis. We do so by recalling our history or by imagining our
future. That consists solely of data analysis. Now, the same task that an analyst does for commercial goals is known
as Data Analysis.

The Institute of Cost Accountants of India 589


Financial Management and Business Data Analytics

Analysis is sometimes all that is required to expand your business and finance.
If any firm is not expanding, it must admit past errors and create a new plan to avoid making the same mistakes.
And even if the firm is expanding, it must anticipate making it expand even more. All that is required is an analysis
of the business data and operations.

Figure 9.3: Some popular data analysis tools


(v) Reporting:
Data reporting is the act of gathering and structuring raw data and turning it into a consumable format in order to
evaluate the organisation’s continuous performance.
The data reports can provide answers to fundamental inquiries regarding the status of the firm. They can display
the status of certain data within an Excel document or a simple data visualisation tool. Static data reports often
employ the same structure throughout time and collect data from a single source.
A data report is nothing more than a set of documented facts and numbers. Consider the population count as an
illustration. This is a technical paper conveying basic facts on the population and demographics of a country. It may
be presented in text or in a graphical manner, such as a graph or chart. However, static information may be utilised
to evaluate present situations.
Financial data such as revenues, accounts receivable, and net profits are often summarised in a company’s data
reporting. This gives an up-to-date record of the company’s financial health or a portion of the finances, such as
sales. A sales director may report on KPIs based on location, funnel stage, and closing rate in order to present an
accurate view of the whole sales pipeline.
Data provides a method for measuring development in many aspects of our life. It influences both our professional
judgments and our day-to-day affairs. A data report would indicate where we should devote the most time and
money, as well as what need more organisation or attention.
In any industry, accurate data reporting plays a crucial role. Utilizing business information in healthcare enables
physicians to provide more effective and efficient patient care, hence saving lives. In education, data reports may
be utilised to study the relationship between attendance records and seasonal weather patterns, as well as the
intersection of acceptance rates and neighbourhood regions.
The most effective business analysts possess specific competencies. An outstanding business analyst must be

590 The Institute of Cost Accountants of India


Data Processing, Organisation, Cleaning and Validation

able to prioritise the most pertinent data. There is no space for error in data reporting, which necessitates high
thoroughness and attention to detail. The capacity to comprehend and organise enormous volumes of information
is another valuable talent. Lastly, the ability to organise and present data in an easy-to-read fashion is essential for
all data reporters.
Excellence in data reporting does not necessitate immersion in coding or proficiency in analytics. Other necessary
talents include the ability to extract vital information from data, to keep things simple, and to prevent data hoarding.
Although static reporting can be precise and helpful, it has limitations. One such instance is the absence of real-
time insights. If confronted with a vast volume of data to organise into a usable and actionable format, a report
enables senior management or the sales team to provide guidance on future steps. However, if the layout, data, and
formulae are not given in a timely way, they may be out of current context.
The reporting of data is vital to an organisation’s business intelligence. The more is an organisation’s access
to data, the more agile it may be. This can help a firm to maintain its relevance in a market that is becoming
increasingly competitive and dynamic. An efficient data reporting system will facilitate the formation of judicious
judgments that might steer a business in new areas and provide additional income streams.
(vi) Classification:
Data classification is the process of classifying data according to important categories so that it may be utilised and
safeguarded more effectively. The categorization process makes data easier to identify and access on a fundamental
level. Regarding risk management, compliance, and data security, the classification of data is of special relevance.
Classifying data entails labelling it to make it searchable and trackable. Additionally, it avoids many duplications
of data, which can minimise storage and backup expenses and accelerate the search procedure. The categorization
process may sound very technical, yet it is a topic that your organisation’s leadership must comprehend.
The categorization of data has vastly improved over time. Today, the technology is employed for a number of
applications, most frequently to assist data security activities. However, data may be categorised for a variety
of purposes, including facilitating access, ensuring regulatory compliance, and achieving other commercial or
personal goals. In many instances, data classification is a statutory obligation, since data must be searchable and
retrievable within predetermined deadlines. For the purposes of data security, data classification is a useful strategy
that simplifies the application of appropriate security measures based on the kind of data being accessed, sent, or
duplicated.
Classification of data frequently entails an abundance of tags and labels that identify the data’s kind, secrecy, and
integrity. In data classification procedures, availability may also be taken into account. It is common practise to
classify the sensitivity of data based on changing levels of relevance or secrecy, which corresponds to the security
measures required to safeguard each classification level.
Three primary methods of data classification are recognised as industry standards:
●● Classification based on content, examines and interprets files for sensitive data.
●● Context-based classification considers, among other characteristics, application, location, and creator as
indirect markers of sensitive information.
●● User-based classification relies on the human selection of each document by the end user. To indicate
sensitive documents, user-based classification depends on human expertise and judgement during document
creation, editing, review, or distribution.
In addition to the classification kinds, it is prudent for an organisation to identify the relative risk associated with

The Institute of Cost Accountants of India 591


Financial Management and Business Data Analytics

the data types, how the data is handled, and where it is stored/sent (endpoints). It is standard practise to divide data
and systems into three risk categories.
~~ Low risk: If data is accessible to the public and recovery is simple, then this data collection and the
mechanisms around it pose a smaller risk than others.
~~ Moderate risk: Essentially, they are non-public or internal (to a business or its partners) data. However, it is
unlikely to be too mission-critical or sensitive to be considered “high risk.” The intermediate category may
include proprietary operating processes, cost of products, and certain corporate paperwork.
~~ High risk: Anything even vaguely sensitive or critical to operational security falls under the category of high
risk. Additionally, data that is incredibly difficult to retrieve (if lost). All secret, sensitive, and essential data
falls under the category of high risk.

Data classification matrix


Data creation and labelling may be simple for certain companies. If there are not a significant number of data
kinds or if your firm has fewer transactions, it will likely be easier to determine the risk of your data and systems.
However, many businesses working with large volumes or numerous types of data will certainly require a thorough
method for assessing their risk. Many utilise a “data categorization matrix” for this purpose.
Creating a matrix that rates data and/or systems based on how likely they are to be hacked and how sensitive the
data is enables you to rapidly identify how to classify and safeguard all sensitive information (figure 9.4).

Confidential Data Sensitive Data Public


Risk
High Medium Low
General
Institution The negative impact on the The risk for negative impact The impact on the institution
Impact institution should this data be on the institution should this should Public data not
incorrect, improperly disclosed, information not be available be available is Typically
or not available when needed is when needed is typically low, (inconvenient but not
typically very high. moderate. deliberating).
Description Access to Confidential Access to Sensitive Access to Public
institutional data must be institutional data must be institutional data may be
controlled fromcreation to requested from, and authorized granted to any requester,
destruction and vail be grained by, the Functional Security or it is published with no
only to those persons affiliated Module Representative who restrictions.
with, the University who require is responsible for the data. Public data is not considered
such access in order to perform Access to internal data sensitive.
their job, or to those individuals may be authorized to The integrity of “Public’
permitted by law. groups of persons by data should be protected,
Access to confidential data must their job classification or and the appropriate
be individually requested and responsibilities (“role-based” Functional Security Module
then authorized by the Functional access), and may also be
Representative must
Security Module Representative- limited by one’s employing
authorise replication or
who is responsible for the data. unit or affiliation.
copying of the data in
Confidential data is highly Non-Public or Internal data is order to ensure it remains
sensitive and may have personal moderately sensitive in nature. accurate overtime.

592 The Institute of Cost Accountants of India


Data Processing, Organisation, Cleaning and Validation

Confidential Data Sensitive Data Public


Risk
High Medium Low
privacy consideration, ormay be Often, Sensitive data is used
restricted by federal or state law. for making decisions, and
Information which provides therefore it’s important this
access to resources, physical or information remain timely
virtual. and accurate.

Access Only those individuals EMU employees and non- EMU affiliates and general
designated with approved employees who have a public with a need to know
access. - business need to know
Figure 9.4: Sample risk classification matrix

Example of data classification


Data may be classified as Restricted, Private, or Public by an entity. In this instance, public data are the least
sensitive and have the lowest security requirements, whereas restricted data are the most sensitive and have the
highest security rating. This form of data categorization is frequently the beginning point for many organisations,
followed by subsequent identification and tagging operations that label data based on its enterprise-relatedness,
quality, and other categories. The most effective data classification methods include follow-up processes and
frameworks to ensure that sensitive data remains in its proper location.

Data classification process


Classifying data may be a difficult and laborious procedure. Automated systems can assist in streamlining the
process, but an organisation must determine the categories and criteria that will be used to classify data, understand
and define its objectives, outline the roles and responsibilities of employees in maintaining proper data classification
protocols, and implement security standards that correspond with data categories and tags. This procedure will give
an operational framework to workers and third parties engaged in the storage, transfer, or retrieval of data, if carried
out appropriately.
Policies and procedures should be well-defined, respectful of security needs and the confidentiality of data kinds,
and simple enough for staff encouraging compliance to comprehend. For example, each category should include
information about the types of data included in the categorization, security concerns including rules for accessing,
transferring, and keeping data, and the potential risks associated with a security policy breach.

Steps for effective data classification


~~ Understanding the current setup: Taking a comprehensive look at the location of the organisation’s current
data and any applicable legislation is likely the best beginning point for successfully classifying data. Before
one classifies data, one must know what data he is having.
~~ Creation of a data classification policy: Without adequate policy, maintaining compliance with data
protection standards in an organisation is practically difficult. Priority number one should be the creation of
a policy.
~~ Prioritize and organize data: Now that a data classification policy is in place, it is time to categorise the
data. Based on the sensitivity and privacy of the data, the optimal method to be chosen for tagging it.

The Institute of Cost Accountants of India 593


Financial Management and Business Data Analytics

Data Organisation and


9.3
Distribution
~~ Data Organisation
Data organisation is the classification of unstructured data into distinct groups. This raw data comprises
variables’ observations. As an illustration of data organisation, the arrangement of students’ grades in
different topics is one example.
As time passes and the data volume grows, the time required to look for any information from the data source
would rise if it has not previously been structured.
Data organisation is the process of arranging unstructured data in a meaningful manner. Classification,
frequency distribution tables, image representations, graphical representations, etc. are examples of data
organisation techniques.
Data organisation allows us to arrange data in a manner that is easy to understand and manipulate. It is
challenging to deal with or analyse raw data.
IT workers utilise the notion of data organisation in several ways. Many of these are included under the
umbrella term “data management.” For instance, data organisation includes reordering or assessing the
arrangement of data components in a physical record.
The analysis of somewhat organised and unstructured data is another crucial component of business data
organisation. Structured data consists of tabular information that may be readily imported into a database
and then utilised by analytics software or other applications. Unstructured data are raw and unformatted data,
such as a basic text document with names, dates, and other information spread among random paragraphs.
The integration of somewhat unstructured data into a holistic data environment has been facilitated by the
development of technical tools and resources.
In a world where data sets are among the most valuable assets possessed by firms across several industries,
businesses employ data organisation methods in order to make better use of their data assets. Executives
and other professionals may prioritise data organisation as part of a complete plan to expedite business
operations, boost business intelligence, and enhance the business model as a whole.
The examination of both relatively organised and unstructured data is a crucial component of business data
organisation. Structured data consists of tabular information that can be readily incorporated into a database
and supplied to analytics software or other specific applications. Unstructured data is regarded raw and
unformatted, similar to a plain text document in which information is dispersed across random paragraphs.
Few specialists have built technological tools and resources to manage substantially unstructured data.
These data are incorporated into a comprehensive data ecosystem. Businesses implement data organisation
techniques to make better use of their data assets. Data assets have a very significant position in the world,
since they are owned by businesses in a variety of industries. Data organisation is seen as a component of
a holistic strategy that facilitates the streamlining of business operations, whether via the acquisition of
superior business information or the overall improvement of a business model.

594 The Institute of Cost Accountants of India


Data Processing, Organisation, Cleaning and Validation

~~ Data distribution
Data distribution is a function that identifies and quantifies all potential values for a variable, as well as their
relative frequency (probability of how often they occur). Any population with dispersed data is categorised
as a distribution. It is necessary to establish the population’s distribution type in order to analyse it using the
appropriate statistical procedures.
Statistics makes extensive use of data distributions. If an analyst gathers 500 data points on the shop floor,
they are of little use to management unless they are categorised or organised in an usable manner. The data
distribution approach arranges the raw data into graphical representations (such as histograms, box plots,
and pie charts, etc.) and gives relevant information.
The primary benefit of data distribution is the estimation of the probability of any certain observation within
a sample space. Probability distribution is a mathematical model that determines the probabilities of the
occurrence of certain test or experiment outcomes. These models are used to specify distinct sorts of random
variables (often discrete or continuous) in order to make a choice. One can employ mean, mode, range,
probability, and other statistical approaches based on the category of the random variable.

~~ Types of distribution
Distributions are basically classified based on the type of data:
(i) Discrete distributions: A discrete distribution that results from countable data and has a finite number
of potential values. In addition, discrete distributions may be displayed in tables, and the values of the
random variable can be counted. Example: rolling dice, selecting a specific amount of heads, etc.
Following are the discrete distributions of various types:
(a) Binomial distributions: The binomial distribution quantifies the chance of obtaining a specific
number of successes or failures each experiment.
Binomial distribution applies to attributes that are categorised into two mutually exclusive and
exhaustive classes, such as number of successes/failures and number of acceptances/rejections.
Example: When tossing a coin: The likelihood of a coin falling on its head is one-half and the
probability of a coin landing on its tail is one-half.
(b) Poisson distribution: The Poisson distribution is the discrete probability distribution that quantifies
the chance of a certain number of events occurring in a given time period, where the events occur in
a well-defined order.
Poisson distribution applies to attributes that can potentially take on huge values, but in practise take
on tiny ones.
Example: Number of flaws, mistakes, accidents, absentees etc.
(c) Hypergeometric distribution: The hypergeometric distribution is a discrete distribution that
assesses the chance of a certain number of successes in (n) trials, without replacement, from a
sufficiently large population (N). Specifically, sampling without replacement.
The hypergeometric distribution is comparable to the binomial distribution; the primary distinction
between the two is that the chance of success is not the same for all trials in the binomial distribution
but it is in the hypergeometric distribution.
(d) Geometric distribution: The geometric distribution is a discrete distribution that assesses the
probability of the occurrence of the first success. A possible extension is the negative binomial
distribution.
Example: A marketing representative from an advertising firm chooses hockey players from several
institutions at random till he discovers an Olympic participant.

The Institute of Cost Accountants of India 595


Financial Management and Business Data Analytics

(ii) Continuous distributions: A distribution with an unlimited number of (variable) data points that may
be represented on a continuous measuring scale. A continuous random variable is a random variable
with an unlimited and uncountable set of potential values. It is more than a simple count and is often
described using probability density functions (pdf). The probability density function describes the
characteristics of a random variable. Normally clustered frequency distribution is seen. Therefore, the
probability density function views it as the distribution’s “shape.”

Following are the continuous distributions of various types:


(i) Normal distribution: Gaussian distribution is another name for normal distribution. It is a bell-shaped
curve with a greater frequency (probability density) around the core point. As values go away from the
centre value on each side, the frequency drops dramatically.
In other words, features whose dimensions are expected to fall on either side of the target value with equal
likelihood adhere to normal distribution.
(ii) Lognormal distribution: A continuous random variable x follows a lognormal distribution if the
distribution of its natural logarithm, ln(x), is normal.
As the sample size rises, the distribution of the sum of random variables approaches a normal distribution,
independent of the distribution of the individuals.
(iii) F distribution: The F distribution is often employed to examine the equality of variances between two
normal populations.
The F distribution is an asymmetric distribution with no maximum value and a minimum value of 0. The
curve approaches 0 but never reaches the horizontal axis.
(iv) Chi square distributions: When independent variables with standard normal distribution are squared and
added, the chi square distribution occurs.
Example: y = Z12+ Z22 +Z32 +Z42+....+ Zn2 if Z is a typical normal random variable.
The distribution of chi square values is symmetrical and constrained below zero. And approaches the form
of the normal distribution as the number of degrees of freedom grows.
(v) Exponential distribution: The exponential distribution is a probability distribution and one of the
most often employed continuous distributions. Used frequently to represent products with a consistent
failure rate.
The exponential distribution and the Poisson distribution are closely connected.
Has a constant failure rate since its form characteristics remain constant.
(vi) T student distribution: The t distribution or student’s t distribution is a probability distribution with a bell
shape that is symmetrical about its mean.
Used frequently for testing hypotheses and building confidence intervals for means. Substituted for the
normal distribution when the standard deviation cannot be determined.
When random variables are averages, the distribution of the average tends to be normal, similar to the
normal distribution, independent of the distribution of the individuals.

596 The Institute of Cost Accountants of India


Data Processing, Organisation, Cleaning and Validation

Data Cleaning and Validation 9.4


~~ Data Cleaning
Data cleaning is the process of correcting or deleting inaccurate, corrupted, improperly formatted, duplicate,
or insufficient data from a dataset. When several data sources are combined, there are numerous chances for
data duplication and mis-labelling. Incorrect data renders outcomes and algorithms untrustworthy, despite
their apparent accuracy. There is no, one definitive method for prescribing the precise phases of the data
cleaning procedure, as the methods differ from dataset to dataset. However, it is essential to build a template
for your data cleaning process so that you can be certain you are always doing the steps correctly.
Data cleaning is different from data transformation. Data cleaning is the process of removing irrelevant
data from a dataset. The process of changing data from one format or structure to another is known as data
transformation. Transformation procedures are sometimes known as data wrangling or data munging, since
they map and change “raw” data into another format for warehousing and analysis.
Steps for data cleaning:
(i) Step 1: Removal of duplicate and irrelevant information
Eliminate unnecessary observations from your dataset, such as duplicate or irrelevant observations.
Most duplicate observations will occur during data collecting. When you merge data sets from
numerous sites, scrape data, or get data from customers or several departments, there are potential to
produce duplicate data. De-duplication is one of the most important considerations for this procedure.
Observations are deemed irrelevant when they do not pertain to the specific topic you are attempting to
study. For instance, if you wish to study data pertaining to millennial clients but your dataset contains
observations pertaining to earlier generations, you might exclude these useless observations. This may
make analysis more effective and reduce distractions from your core objective, in addition to producing
a more manageable and effective dataset.
(ii) Step 2: Fix structural errors:
When measuring or transferring data, you may detect unusual naming standards, typos, or wrong
capitalization. These contradictions may lead to mislabeled classes or groups. For instance, “N/A” and
“Not Applicable” may both be present, but they should be examined as a single category.
(iii) Step 3: Filter unwanted outliers:
Occasionally, you will encounter observations that, at first look, do not appear to fit inside the data you
are evaluating. If you have a valid cause to eliminate an outlier, such as erroneous data input, doing
so will improve the performance of the data you are analysing. Occasionally, though, the arrival of
an outlier will prove a notion you’re working on. Remember that the existence of an outlier does not
imply that it is erroneous. This step is required to validate the number. Consider deleting an outlier if
it appears to be unrelated to the analysis or an error.

The Institute of Cost Accountants of India 597


Financial Management and Business Data Analytics

(iv) Step 4: Handle missing data


Many algorithms do not accept missing values, hence missing data cannot be ignored. There are several
approaches to handle missing data. Although neither is desirable, both should be explored.
As a first alternative, the observations with missing values may be dropped, but doing so may result in
the loss of information. This should be kept in mind before doing so.
As a second alternative, the missing numbers may be entered based on other observations. Again, there
is a chance that the data’s integrity may be compromised, as action may be based on assumptions rather
than real observations.
(v) Step 5: Validation and QA
As part of basic validation, one should be able to answer the following questions at the conclusion of
the data cleaning process:
(a) Does the data make sense?
(b) Does the data adhere to the regulations applicable to its field?
(c) Does it verify or contradict your working hypothesis, or does it shed any light on it?
(d) Can data patterns assist you in formulating your next theory?
(e) If not, is this due to an issue with data quality?
False assumptions based on inaccurate or “dirty” data can lead to ineffective company strategies and
decisions. False conclusions might result in an uncomfortable moment at a reporting meeting when
it is shown that the data does not withstand inspection. Before reaching that point, it is essential to
establish a culture of data quality inside the firm. To do this, one should specify the methods that may
be employed to establish this culture and also the definition of data quality.

~~ Benefits of quality data


Determining the quality of data needs an analysis of its properties and a weighting of those attributes based
on what is most essential to the company and the application(s) for which the data will be utilised.
Main characteristics of quality data are:
(i) Validity
(ii) Accuracy
(iii) Completeness
(iv) Consistency

~~ Benefits of data cleaning


Ultimately, having clean data would boost overall productivity and provide with the greatest quality
information for decision-making. Benefits include:
(i) Error correction when numerous data sources are involved.
(ii) Fewer mistakes result in happier customers and less irritated workers.
(iii) Capability to map the many functions and planned uses of your data.
(iv) Monitoring mistakes and improving reporting to determine where errors are originating can make it
easier to repair inaccurate or damaged data in future applications.
(v) Using data cleaning technologies will result in more effective corporate procedures and speedier
decision-making.

598 The Institute of Cost Accountants of India


Data Processing, Organisation, Cleaning and Validation

~~ Data validation
Data validation is a crucial component of any data management process, whether it is about collecting
information in the field, evaluating data, or preparing to deliver data to stakeholders. If the initial data is not
valid, the outcomes will not be accurate either. It is therefore vital to check and validate data before using it.
Although data validation is an essential stage in every data pipeline, it is frequently ignored. It may appear
like data validation is an unnecessary step that slows down the work, but it is vital for producing the finest
possible outcomes. Today, data validation may be accomplished considerably more quickly than may
have imagined earlier. With data integration systems that can include and automate validation procedures,
validation may be considered as an integral part of the workflow, as opposed to an additional step.
Validating the precision, clarity, and specificity of data is essential for mitigating project failures. Without
data validation, one may into run the danger of basing judgments on faulty data that is not indicative of the
current situation.
In addition to validating data inputs and values, it is vital to validate the data model itself. If the data model
is not appropriately constructed or developed, one may encounter problems while attempting to use data files
in various programmes and software.
The format and content of data files will determine what can be done with the data. Using validation criteria to
purify data before to usage mitigates “garbage in, garbage out” problems. Ensuring data integrity contributes
to the validity of the conclusions.

Types of data validation


~~ Data type check: A data type check verifies that the entered data has the appropriate data type. For instance,
a field may only take numeric values. If this is the case, the system should reject any data containing other
characters, such as letters or special symbols.
~~ Code check: A code check verifies that a field’s value is picked from a legitimate set of options or that it
adheres to specific formatting requirements. For instance, it is easy to verify the validity of a postal code by
comparing it to a list of valid codes. The same principle may be extended to other things, including nation
codes and NIC industry codes.
~~ Range check: A range check determines whether or not input data falls inside a specified range. Latitude and
longitude, for instance, are frequently employed in geographic data. A latitude value must fall between -90
and 90 degrees, whereas a longitude value must fall between -180 and 180 degrees. Outside of this range,
values are invalid.
~~ Format check: Numerous data kinds adhere to a set format. Date columns that are kept in a fixed format,
such as “YYYY-MM-DD” or “DD-MM-YYYY,” are a popular use case. A data validation technique that
ensures dates are in the correct format contributes to data and temporal consistency.
~~ Consistency check: A consistency check is a form of logical check that verifies that the data has been input
in a consistent manner. Checking whether a package’s delivery date is later than its shipment date is one
example.
~~ Uniqueness check: Some data like PAN or e-mail ids are unique by nature. These fields should typically
contain unique items in a database. A uniqueness check guarantees that an item is not put into a database
numerous times.
Consider the case of a business that collects data on its outlets but neglects to do an appropriate postal code
verification. The error might make it more challenging to utilise the data for information and business analytics.
Several issues may arise if the postal code is not supplied or is typed incorrectly.

The Institute of Cost Accountants of India 599


Financial Management and Business Data Analytics

In certain mapping tools, defining the location of the shop might be challenging. A store’s postal code will also
facilitate the generation of neighborhood-specific data. Without a postal code data verification, it is more probable
that data may lose its value. If the data needs to be recollected or the postal code needs to be manually input, further
expenses will also be incurred.
A straightforward solution to the issue would be to provide a check that guarantees a valid postal code is entered.
The solution may be a drop-down menu or an auto-complete form that enables the user to select a valid postal code
from a list. This kind of data validation is referred to as a code validation or code check.

Solved Case 1
Maitreyee is working as a data analyst with a financial organisation. She is supplied with a large amount of
data, and she plans to use statistical techniques for inferring some useful information and knowledge from it. But,
before starting the process of data analysis, she found that the provided data is not cleaned. She knows that before
applying the data analysis tools, cleaning the data is essential.
In your opinion, what steps Maitreyee should follow to clean the data, and what are the benefits of clean data.
Teaching note - outline for solution:
The instructor may initiate the discussions with explaining the concept of data cleaning and about the importance
of data cleaning.
The instructor may also elaborate the consequences of using an uncleaned dataset on the final analysis. She may
discuss the steps five steps of data cleaning in detail, such as,
(i) Removal of duplicate and irrelevant information
(ii) Fix structural errors
(iii) Filter unwanted outliers
(iv) Handle missing data
(v) Validation and QA
At the outset, Maitreyee should focus on answering the following questions:
(a) Does the data make sense?
(b) Does the data adhere to the regulations applicable to its field?
(c) Does it verify or contradict your working hypothesis, or does it shed any light on it?
(d) Can data patterns assist you in formulating your next theory?
(e) If not, is this due to an issue with data quality?
The instructor may close the discussions with explaining the benefits of using clean data, such as:
(i) Validity
(ii) Accuracy
(iii) Completeness
(iv) Consistency

600 The Institute of Cost Accountants of India


Data Processing, Organisation, Cleaning and Validation

Exercise
A. Theoretical Questions:
~~ Multiple Choice Questions

1. Data science plays an important role in


(a) Risk analytics
(b) Customer data management
(c) Consumer analytics
(d) All of the above
2. The primary benefit of data distribution is
(a) the estimation of the probability of any certain observation within a sample space
(b) the estimation of the probability of any certain observation within a non-sample space
(c) the estimation of the probability of any certain observation within a population
(d) the estimation of the probability of any certain observation without a non-sample space
3. Binomial distribution applies to attributes
(a) that are categorised into two mutually exclusive and exhaustive classes
(b) that are categorised into three mutually exclusive and exhaustive classes
(c) that are categorised into less than two mutually exclusive and exhaustive classes
(d) that are categorised into four mutually exclusive and exhaustive classes
4. The geometric distribution is a discrete distribution that assesses
(a) the probability of the occurrence of the first success
(b) the probability of the occurrence of the second success
(c) the probability of the occurrence of the third success
(d) the probability of the occurrence of the less success
5. The probability density function describes
(a) the characteristics of a random variable
(b) the characteristics of a non-random variable
(c) the characteristics of a random constant
(d) the characteristics of a non-random constant
Answer:

1 d 2 a 3 a 4 a 5 a

~~ State True or False

1. Data validation could be operationally defined as a process which ensures the correspondence of the
final (published) data with a number of quality characteristics.
2. Data analysis is described as the process of cleaning, converting, and modelling data to obtain actionable
business intelligence.

The Institute of Cost Accountants of India 601


Financial Management and Business Data Analytics

3. Financial data such as revenues, accounts receivable, and net profits are often summarised in a company’s
data reporting.
4. Structured data consists of tabular information that may be readily imported into a database and then
utilised by analytics software or other applications.
5. Data distribution is a function that identifies and quantifies all potential values for a variable, as well as
their relative frequency (probability of how often they occur).
Answer:

1 T 2 T 3 T 4 T 5 T

~~ Fill in the blanks

1. Data may be classified as Restricted, ________ or Public by an entity.


2. Data organisation is the _____________ of unstructured data into distinct groups.
3. Classification, frequency distribution tables, _______________, graphical representations, etc. are
examples of data organisation techniques.
4. The distribution or student’s distribution is a probability distribution with a bell shape that is symmetrical
about it’s ______.
5. _________ is the process of correcting or deleting inaccurate, corrupted, improperly formatted, duplicate,
or insufficient data from a dataset.
Answer:
1 Private 2 Classification
3 Image representation 4 mean
5 Data cleaning

~~ Short essay type questions

1. Briefly discuss about the role of data analysis in fraud detection


2. Discuss the difference between discrete distribution and continuous distribution.
3. Write a short note on binomial distribution
4. What is the significance of data cleaning?
5. Write a short note on ‘predictive analytics’.

~~ Essay type questions

1. Elaborately discuss the functions of data analysis.


2. Elaborately discuss the various steps involved in data cleaning.
3. Discuss the benefits of ‘data cleaning’.
4. How data processing and data science is relevant for finance?
5. Discuss the steps for effective data classification.

602 The Institute of Cost Accountants of India


Data Processing, Organisation, Cleaning and Validation

Unsolved Case(s)
1. Arjun is a data analyst working with Akansha Limited. The company deals in retailing of FMCG products.
The company follows both online mode as well as the offline mode for delivering the services. Over the
years, the company accumulated a huge amount of data.
The management is little puzzled about the ways in which this data may be brought into usable format.
Arjun is entrusted with the responsibility of bringing the data into usable format. Make your suggestions,
how this is to be done.

The Institute of Cost Accountants of India 603


Financial Management and Business Data Analytics

References:
●● Davy Cielen, Arno D B Meysman, and Mohamed Ali. Introducing Data Science. Manning Publications Co
USA
●● Cathy O’Neil, Rachell Schutt. Doing data science. O’Reilley
●● Joel Grus. Data science from scratch. O’Reilley
●● www.tableau.com
●● www.corporatefinanceinstitute.com
●● Tyler McClain. Data analysis and reporting. Orgsync
●● Marco Di Zio, Nadežda Fursova, Tjalling Gelsema, Sarah Gießing, Ugo Guarnera, Jūratė Petrauskienė, Lucas
Quensel-von Kalben, Mauro Scanu, K.O. ten Bosch, Mark van der Loo, Katrin Walsdorfer. Methodology of
data validation
●● Barbara S Hawkins, and Stephen W Singer. Design, development and implementation of data processing
system for multiple control trials and epidemiologic studies. Controlled clinical trials (1986)

604 The Institute of Cost Accountants of India


Data Presentation: Visualisation and Graphical Presentation

Data Presentation: Visualisation


10
and Graphical Presentation
This Module Includes
10.1 Data Visualisation of Financial and Non-Financial Data
10.2 Objective and Function of Data Presentation
10.3 Data Presentation Architecture
10.4 Dashboard, Graphs, Diagrams, Tables, Report Design
10.5 Tools and Techniques of Visualisation and Graphical Presentation

The Institute of Cost Accountants of India 605


Financial Management and Business Data Analytics

Data Presentation: Visualisation and


Graphical Presentation
SLOB Mapped against the Module:
To equip oneself with application-oriented knowledge in data preparation, data presentation and finally data
analysis and modelling to facilitate quality business decisions.

Module Learning Objectives:


After studying this module, the students will be able to –
~~ Understand the basic concepts of developments of data presentation
~~ Understand the basic objectives and functions of data Visualisation
~~ Understand the basic concepts of data presentation architecture (DPA)
~~ Understand the basic tools available for data Visualisation and presentation

606 The Institute of Cost Accountants of India


Data Presentation: Visualisation and Graphical Presentation

Data Visualisation of Financial


10.1
and Non-Financial Data
T
here is a saying ‘A picture speaks a thousand words’. Numerous sources of in-depth data are now available
to management teams, allowing them to better track and anticipate organisational performance. However,
obtaining data and presenting it are two distinct and equally essential activities.
Data visualisation comes into play at this point. Recent studies reveal that top-performing finance directors are
more likely than their peers to emphasise data visualisation abilities.
The capacity to explain complicated ideas, identify informational linkages, and provide captivating narratives
resulting from data not only elevates finance’s position in strategic decision making, but also democratises data
throughout the business (Figure 10.1).

Figure 10.1: Data Visualisation in finance. (Source: www.sfmagazine.com)

Why data Visualisation is important?


Scott Berinato, senior editor and data visualisation specialist for Harvard Business Review, writes in a recent post
that data visualisation was once a talent largely reserved for design- and data-minded managers. Today, he deems
it indispensable for managers who wish to comprehend and communicate the significance of the data flood we are
all experiencing.

The Institute of Cost Accountants of India 607


Financial Management and Business Data Analytics

This is particularly true for finance, which is becoming the data hub of the majority of progressive enterprises.
David A.J. Axson of Accenture highlights in his paper “Finance 2020: Death by Digital” that finance is transitioning
from “an expenditure control, spreadsheet-driven accounting and reporting centre” to “a predictive analytics
powerhouse that generates business value.”
Finance is able to communicate these analytic findings to the entire business through the use of data visualisation.
Several studies indicate that sixty five percent of individuals are visual learners. Giving decision makers an
opportunity to have visual representations of facts improves comprehension and can eventually lead to better
judgments (Figure 10.2).
In addition, the technique of developing data visualisations may aid finance in identifying more patterns and
gaining deeper insights, particularly when many data sources or interactive elements are utilised. For example,
contemporary finance professionals frequently monitor both financial and non-financial KPIs. Data visualisation
may assist in correlating these variables, revealing relationships, and elucidating the actions required to enhance
performance.

Doing data Visualisation in the right way


All data visualisation isn’t created equally engaging. When properly executed, it simplifies difficult topics.
However, if data visualisations are executed improperly, they might mislead the audience or misrepresent the data.
Finance professionals who are investigating how data visualisation might help their analytics efforts and
communication should keep the following in mind:
●● Know the objective: Before the development of great images, one must first grasp the objectives. HBR’s
Berinato suggests, first establishment of the information if it’s conceptual or data-driven (i.e. does it rely on
qualitative or quantitative data) is required. Then specify if the objective is exploratory or declarative. For
instance, if the objective is to display the income from the prior quarter, the goal is declarative. If, on the
other hand, one is curious as to whether the income increase correlates with the social media spending, the
objective is exploratory. According to Berinato, determining the answers would assist in determining the
tools and formats required.
●● Always keep the audience in mind: Who views the data visualisations will determine the degree of
detail required. For instance, finance data presentations for the C-suite require high-level, highly relevant
information to aid in strategic decision-making. However, if one is delivering a presentation to ‘line of
business’ executives, delving into the deeper details might offer them with knowledge that influences their
daily operations.
●● Invest in the best technology: There are a multitude of technological tools that make it simple to produce
engaging visualisations in the current digital age. The firm should first implement an ERP that removes data
silos and develops a centralised information repository. Then, look for tools that allows to instantly display
data by dragging and dropping assets, charts, and graphs; offer search options and guided navigation to assist
in answering queries; and enable any member of the financial team to generate graphics.
●● Improve the team’s ability to visualise data: Most of the agile finance directors rank their team’s data
visualisation abilities as good, compared to only twenty four percent of their counterparts, according to an
AICPA survey. While everyone on the finance team can understand the fundamentals of data visualisation,
training and a shift in hiring priorities may advance the team’s data visualisation skills. Find ways
to incorporate user training on data visualisation tools, so that the staff is aware of the options that the
technology affords. Additionally, when making new recruits, look out individuals with proficiency in data
analytics and extensive data visualisation experience.

608 The Institute of Cost Accountants of India


Data Presentation: Visualisation and Graphical Presentation

The amount of data analysed by financial teams has grown dramatically. Data visualisations may help the team
convey its strategic findings more effectively throughout the enterprise.

Figure 10.2: Sample dashboard for financial results (Source: www.sfmagazine.com)

The Institute of Cost Accountants of India 609


Financial Management and Business Data Analytics

Objective and Function of


10.2
Data Presentation

T
he absence of data visualisation would make it difficult for organisations to immediately recognise data
patterns. The graphical depiction of data sets enables analysts to visualise new concepts and patterns.
With the daily increase in data volume, it is hard to make sense of the quintillion bytes of data without
data proliferation, which includes data visualisation.
Every company may benefit from a better knowledge of their data, hence data visualisation is expanding into
all industries where data exists. Information is the most crucial asset for every organisation. Through the use of
visuals, one may effectively communicate their ideas and make use of the information.
Dashboards, graphs, infographics, maps, charts, videos, and slides may all be used to visualise and comprehend
data. Visualizing the data enables decision-makers to interrelate the data to gain better insights and capitalises on
the following objectives of data visualisation:
●● Making a better data analysis:
Analysing reports assists company stakeholders’ in focusing their attention on the areas that require it. The
visual mediums aid analysts in comprehending the essential business issues. Whether it is a sales report or a
marketing plan, a visual representation of data assists businesses in increasing their profits through improved
analysis and business choices.
●● Faster decision making:
Visuals are easier for humans to process than tiresome tabular forms or reports. If the data is effectively
communicated, decision-makers may move swiftly on the basis of fresh data insights, increasing both
decision-making and corporate growth.
●● Analysing complicated data:
Data visualisation enables business users to obtain comprehension of their large quantities of data. It is
advantageous for them to identify new data trends and faults. Understanding these patterns enables users to
focus on regions that suggest red flags or progress. In turn, this process propels the firm forward.
The objective of data visualisation is rather obvious. It is to interpret the data and apply the information for the
advantage of the organisation. Its value increases as it is displayed. Without visualisation, it is difficult to rapidly
explain data discoveries, recognise trends to extract insights, and engage with data fluidly.
Without visualisation, data scientists won’t be able to see trends or flaws. Nonetheless, it is essential to effectively
explain data discoveries and extract vital information from them. And interactive data visualisation tools make all
the difference in this regard.
The continuing epidemic is a current example that is both topical and recent. However, data visualisation assists
specialists in remaining informed and composed despite the volume of data.

610 The Institute of Cost Accountants of India


Data Presentation: Visualisation and Graphical Presentation

(i) Data visualisation enhances the effect of communications for the audiences and delivers the most convincing
data analysis outcomes. It unites the organisation’s communications systems across all organisations and
fields.
(ii) Visualisation allows to interpret large volumes of data more quickly and effectively at a glance. It facilitates
a better understanding of the data for measuring its impact on the business and graphically communicates
the knowledge to internal and external audiences.
(iii) One cannot make decisions in a vacuum. Data and insights available to decision-makers facilitate decision
analysis. Unbiased data devoid of mistakes enables access to the appropriate information and visualisation
to convey and maintain the relevance of that information.
According to an article published by Harvard Business Review (HBR), the most common errors made by analysts
that makes a data visualisation unsuccessful are:
●● Understanding the audience:
As mentioned earlier, before incorporating the data into visualisation, the objective should be fixed, which is
to present large volumes of information in a way that decision-makers can readily ingest. A great visualisation
relies on the designer comprehending the intended audience and executing on three essential points:
(i) Who will read and understand the material and how will they do so? Can it be presumed that it
understands the words and ideas employed, or if there is a need to provide it with visual cues (e.g., a
green arrow indicating that good is ascending)? A specialist audience will have different expectations
than the broader public.
(ii) What are the expectations of the audience, and what information is most beneficial to them?
(iii) What is the functional role of the visualisation, and how may users take action based on it? A
visualisation that is exploratory should leave viewers with questions to investigate, but visualisations
that are instructional or confirmatory should not.
●● Setting up a clear framework
The designer must guarantee that all viewers have the same understanding of what the visualisation represents.
To do this, the designer must establish a framework consisting of the semantics and syntax within which the
data information is intended to be understood. The semantics pertain to the meaning of the words and images
employed, whereas the syntax is concerned with the form of the communication. For instance, when utilising
an icon, the element should resemble the object it symbolises, with size, colour, and placement all conveying
significance to the viewer.
Lines and bars are basic, schematic geometric forms that are important to several types of visualisations; lines
join, implying a relationship. On the other hand, bars confine and divide. In experiments, when participants
were asked to analyse an unlabeled line or bar graph, they viewed lines as trends and bars as discrete
relations, even when these interpretations were inconsistent with the nature of the underlying data.
There is one more component to the framework: Ensure that the data is clean and that the analyst understands
its peculiarities before doing anything else. Does the data set have outliers? How is it allocated? Where
does the data contain holes? Are there any assumptions regarding the data? Real-world data is frequently
complicated, of varied sorts and origins, and not necessarily dependable. Understanding the data can assist
the analyst in selecting and employing an effective framework.
●● Telling a story
In its instructional or positive role, visualisation is a dynamic type of persuasion. There are few kinds of
communication as convincing as a good story. To do this, the visualisation must give the viewer a story.
Stories bundle information into a framework that is readily recalled, which is crucial in many collaborative

The Institute of Cost Accountants of India 611


Financial Management and Business Data Analytics

circumstances in which the analyst is not the same person as the decision-maker or just has to share knowledge
with peers. Data visualisation lends itself nicely to becoming a narrative medium, particularly when the tale
comprises a large amount of data.
Storytelling assists the audience in gaining understanding from facts. Information visualisation is a technique
that turns data and knowledge into a form that is perceivable by the human visual system. The objective is
to enable the audience to see, comprehend, and interpret the information. Design strategies that favour
specific interpretations in visuals that “tell a narrative” can have a substantial impact on the interpretation of
the end user.
In order to comprehend the data and connect with the Visualisation’s audience, creators of visualisations
must delve deeply into the information. Good designers understand not only how to select the appropriate
graph and data range, but also how to create an engaging story through the visualisation.

612 The Institute of Cost Accountants of India


Data Presentation: Visualisation and Graphical Presentation

Data Presentation
10.3
Architecture

D
ata presentation architecture (DPA) is a set of skills that aims to identify, find, modify, format, and
present data in a manner that ideally conveys meaning and provides insight. According to Kelly Lautt,
“data Presentation Architecture (DPA) is a rarely applied skill set critical for the success and value of
Business Intelligence. Data presentation architecture weds the science of numbers, data and statistics
in discovering valuable information from data and making it usable, relevant and actionable with the arts of data
Visualisation, communications, organisational psychology and change management in order to provide business
intelligence solutions with the data scope, delivery timing, format and Visualisations that will most effectively
support and drive operational, tactical and strategic behaviour toward understood business (or organisational)
goals. DPA is neither an IT nor a business skill set but exists as a separate field of expertise. Often confused with
data Visualisation, data presentation architecture is a much broader skill set that includes determining what data
on what schedule and in what exact format is to be presented, not just the best way to present data that has already
been chosen (which is data Visualisation). Data Visualisation skills are one element of DPA.”

Objectives
There are following objectives of DPA:
(i) Utilize data to impart information in the most efficient method feasible (provide pertinent, timely and
comprehensive data to each audience participant in a clear and reasonable manner that conveys important
meaning, is actionable and can affect understanding, behaviour and decisions).
(ii) To utilise data to deliver information as effectively as feasible (minimise noise, complexity, and unneeded
data or detail based on the demands and tasks of each audience).

Scope of DPA
In the light of abovementioned objectives, the scope of DPA may be defined as:
(i) Defining significant meaning (relevant information) required by each audience member in every scenario.
(ii) Obtaining the proper data (focus area, historic reach, extensiveness, level of detail, etc.)
(iii) Determining the needed frequency of data refreshes (the currency of the data)
(iv) determining the optimal presentation moment (the frequency of the user needs to view the data)
(v) Using suitable analysis, categorization, visualisation, and other display styles
(vi) Developing appropriate delivery techniques for each audience member based on their job, duties, locations,
and technological access

The Institute of Cost Accountants of India 613


Financial Management and Business Data Analytics

Dashboard, Graphs, Diagrams,


10.4
Tables, Report Design
D
ata visualisation is the visual depiction of data and information. Through the use of visual elements
like dashboards, charts, graphs, and maps etc, data visualisation tools facilitate the identification and
comprehension of trends, outliers, and patterns in data.

10.4.1 Dashboard
A data visualisation dashboard (Figure 10.3) is an interactive dashboard that enables to manage important metrics
across numerous financial channels, visualise the data points, and generate reports for customers that summarise
the results.
Creating reports for your audience is one of the most effective means of establishing a strong working relationship
with them. Using an interactive data dashboard, the audience would be able to view the performance of their
company at a glance.
On addition to having all the data in a single dashboard, a data visualisation dashboard helps to explain what the
company is doing and why, also fosters client relationships, and gives a data set to guide decision-making.
There are numerous levels of dashboards, ranging from those that represent metrics vital to the firm as a whole
to those that measure values vital to teams inside an organisation. For a dashboard to be helpful, it must be
automatically or routinely updated to reflect the present condition of affairs.

Figure 10.3: A sample dashboard showing traveller spend analysis using Tableau (Source: https://www.tableau.com/)

614 The Institute of Cost Accountants of India


Data Presentation: Visualisation and Graphical Presentation

10.4.2 Graph, Diagram and Charts


Henry D. Hubbard, Creator of the Periodic Table of Elements once said, “There is magic in graphs. The profile
of a curve reveals in a flash a whole situation — the life history of an epidemic, a panic, or an era of prosperity.
The curve informs the mind, awakens the imagination, convinces.” Few important and widely used graphs are
mentioned below:
(i) Bar Chart:
Bar graphs are one of the most used types of data visualisation. It may be used to easily compare data
across categories, highlight discrepancies, demonstrate trends and outliers, and illustrate historical highs
and lows. Bar graphs are very useful when the data can be divided into distinct categories. For instance,
the revenue earned in different years, the number of car model produced in a year by an automobile
company, change in economic value added over the years (Figure 10.4) etc.
To add a zing, the bars can be made colourful. Using stacked and side-by-side bar charts, one may further
dissect the data for a more in-depth examination.

Figure 10.4: Bar chart showing the change in EVA for Hindustan Unilever Ltd. (Source: HUL annual
report for the year 2021-22)

The Institute of Cost Accountants of India 615


Financial Management and Business Data Analytics

(ii) Line chart:


The line chart or line graph joins various data points, displaying them as a continuous progression. Utilize
line charts to observe trends in data, often over time (such as stock price fluctuations over five years or
monthly website page visits). The outcome is a basic, simple method for representing changes in one value
relative to another.
For a better visual impact, the area under the line may be shaded. Also if feasible, the line graph may be
presented combining with bar chart.

Figure 10.5: Line graph: The movement of HUL share price over time (Source: HUL annual report for the
year 2021-22)

616 The Institute of Cost Accountants of India


Data Presentation: Visualisation and Graphical Presentation

(iii) Pie Chart


A pie chart (or circle chart) is a circular graphical representation of statistical data that is segmented to
demonstrate numerical proportion. In a pie chart, the arc length of each slice (and, by extension, its centre
angle and area) is proportionate to the value it depicts. Although it is called for its similarity to a sliced pie,
it can be served in a number of other ways. The corporate world and the mass media make extensive use
of pie charts. For a better representation, the number of wedges in pie chart should be kept in limit. The
categories of HUL shareholders are shown in Figure 10.6 below.

Figure 10.6: Pie Chart - Categories of HUL shareholders as on 31st March 2022 (Source: HUL annual
report for the year 2021-22)
(iv) Map:
For displaying any type of location data, including postal codes, state abbreviations, country names, and
custom geocoding, maps are a no-brainer. If the data is related with geographic information, maps are a
simple and effective approach to illustrate the relationship.
There should be a correlation between location and the patterns in the data. Such as insurance claims by
state and product export destinations by country, automobile accidents by postal code, and custom sales
areas etc.

The Institute of Cost Accountants of India 617


Financial Management and Business Data Analytics

U.S. Bureau of Economic Analysis

Figure 10.7: Map: Real GDP : Percent Change at Annual Rate

618 The Institute of Cost Accountants of India


Data Presentation: Visualisation and Graphical Presentation

(v) Density map:


Density maps indicate patterns or relative concentrations that might otherwise be obscured by overlapping
marks on a map, allowing to identify areas with a larger or lesser number of data points. Density maps are
particularly useful when dealing with large data sets including several data points in a limited geographic
region. Figure 10.8 shows the Cyclone hazard prone districts of India through a density map.

Figure 10.8: Cyclone hazard prone districts of India considering all the parameters and wind based on
BMTPC Atlas (Source: www.ndma.gov.in)
(vi) Scatter plots
Scatter plots are a useful tool for examining the connection between many variables, revealing whether one
variable is a good predictor of another or whether they tend to vary independently. A scatter plot displays
several unique data points on a single graph.

The Institute of Cost Accountants of India 619


Financial Management and Business Data Analytics

Figure 10.9: Grouped Scatterplot - Number of holdings and Irrigated area in Andhra Pradesh. (Source:
indiadataportal.com)
(vii) Gantt Chart
Gantt charts represent a project’s timeline or activity changes across time. A Gantt chart depicts tasks that
must be accomplished before others may begin, as well as the allocation of resources. However, Gantt
charts are not restricted to projects. This graphic can depict any data connected to a time series. Figure
10.10 depicts the Gnatt chart of a project.

Figure 10.10: Gnatt Chart for a project (Source: Wikipedia)

620 The Institute of Cost Accountants of India


Data Presentation: Visualisation and Graphical Presentation

(viii) Bubble chart


Although bubbles are not exactly their own sort of visualisation, utilising them as a method enhances
scatter plots and maps that illustrate the link between three or more variables. By varying the size and
colour of circles, charts display enormous amounts of data in an aesthetically engaging manner. Figure
10.11 shows the bubble chart showing the proportions of professions of people who generate programming
languages

Figure 10.11: Bubble chart: the proportions of professions of people who generate programming languages
(Source: Wikipedia)

The Institute of Cost Accountants of India 621


Financial Management and Business Data Analytics

(ix) Histogram
Histograms illustrate the distribution of the data among various groups. Histograms divide data into
discrete categories (sometimes known as “bins”) and provide a bar proportionate to the number of entries
inside each category. This chart type might be used to show data such as number of items. Figure 10.12 is
showing the sample histogram chart showing the frequency of something in terms of age.

6 Frequency

4
Frequency

0 5 10 15 20 More
Age

Figure 10.12: Histogram

10.4.3 Tables
Tables, often known as “crosstabs” or “matrices,” emphasise individual values above aesthetic formatting. They
are one of the most prevalent methods for showing data and, thus, one of the most essential methods for analysing
data. While their focus is not intrinsically visual, as reading numbers is a linguistic exercise, visual features may be
added to tables to make them more effective and simpler to assimilate.
Tables are most frequently encountered on websites, as part of restaurant menus, and within Microsoft Excel.
It is crucial to know how to interpret tables and make the most of the information they provide since they are
ubiquitous. It is also crucial for analysts and knowledge workers to learn how to make information easier for their
audience to comprehend.

How to use a table?


Similar to most graphs, a table arranges data along one axis. The x-axis is the rows, while the y-axis is the
columns. Because tables are read, it is customary to display categories along the x-axis. The y-axis depicts the

622 The Institute of Cost Accountants of India


Data Presentation: Visualisation and Graphical Presentation

values within each metric, with clearly labelled columns indicating their significance. In contrast to the majority of
charts, tables may arrange qualitative data and show their linkages.
Analysts typically utilise tables to view specific values. They facilitate the identification of measurements or
dimensions across a set of intervals (e.g., what was the company’s profit in November 2018) (Ex. How many
sales did each person close in 2019). A summary table may also efficiently summarise a huge data collection by
providing subtotals and grand totals for each interval or dimension. The problem with tables is that they scale
poorly. More than ten to fifteen rows and five columns make the table difficult to read, comprehend, and get insight
from. This is because a table engages the brain’s linguistic systems whereas data visualisation excites the brain’s
visual systems.
Adding visual features to the table will allow users to obtain understanding from the data more quickly than
with a simple table. Gradients of colour and size aid in identifying trends and outliers. Icons assist the observer
in recognising a shift in proportions. Using different markings will highlight relationships more effectively than a
table of raw data.
Tables and crosstabs are handy for doing comparative analysis between certain data points. They are simple
to construct and may effectively convey a single essential message. Before including a crosstab into a data
visualisation, one should assess whether it serves the project’s aims. Figure 10.13 shows a sample Visualisation of
tabular data.

Figure 10.13: Visualisation of a tabular data (Source: https://vdl.sci.utah.edu)

10.4.4 Report design using data Visualisation


After producing a report, the last thing one anticipates is for someone to actually read it. Whether conveying
ideas or seeking help, the information must leave an impression. To do this, one must present the report in a style
that is both attractive and simple to comprehend. This is especially accurate if your report layout includes numbers.

The Institute of Cost Accountants of India 623


Financial Management and Business Data Analytics

How to use data Visualisation in report design?


There are few strategic steps to include data Visualisation in report design, as mentioned below:
●● Find a story in the data
Data-driven storytelling is a powerful tool. Finding a story that connects with the reader can help to create
an effective report. It’s also not that hard as it looks. In order to locate the story, one must arrange the data,
identify any missing numbers, and then check for outliers. One may then view the data and examine the link
between factors.
●● Create a narrative
When some individuals hear the term “data storytelling,” they believe that it consists of a few statistics and
that the task is complete. This is a frequent misconception that is false. Strong data storytelling comprises an
engaging narrative that takes the audience through the facts and aids in their comprehension. Moreover, an
explanation of the significance of these ideas is essential. To compose an excellent story, one must:
(i) Engage the viewer with a catchy title and subheadings.
(ii) Incorporate context into the data.
(iii) Create a consistent and logical flow.
(iv) Highlight significant discoveries and insights from the data.

●● Choose the most suitable data Visualisation


Data Visualisation is not limited to the creation of charts and graphs. It involves presenting the facts in the
most comprehensible chart possible. Applying basic design principles and utilising features like as form,
size, colour, and labelling may have a significant impact on how people comprehend the data. For instance,
deciding the optimal number of slices for a pie chart or the space between bars in a bar graph. Knowing these
tips may greatly improve the data visualisations.
●● Follow the visual language
The report design may be for internal or external consumption. Despite this, one should develop material
consistent with the company’s style guide. It is essential to adhere to data visualisation principles in order to
achieve both uniformity and comprehension. A strategic methodology assists in implementation.
●● Publicize the report
Some reports are not intended for public consumption. However, since they include so much essential
information, they may contain knowledge that is of interest to individuals or media outside of the business.

624 The Institute of Cost Accountants of India


Data Presentation: Visualisation and Graphical Presentation

Tools and Techniques of Visualisation


and Graphical Presentation 10.5

W
e will now examine some of the most successful data visualisation tools for data scientists and how they
may boost their productivity. Here are four popular data visualisation tools that may assist data scientists
in making more compelling presentations.
(i) Tableau
Tableau is a data visualisation application for creating interactive graphs, charts, and maps. It enables one
to connect to many data sources and generate visualisations in minutes.
Tableau Desktop is the first product of its kind. It is designed to produce static visualisations that may be
published on one or more web pages, but it is incapable of producing interactive maps.
Tableau Public is a free version of Tableau Desktop with some restrictions.
It takes time and effort to understand Tableau, but there are several tools available to assist doing it. As a
data scientist, Tableau must be the most important tool one should understand and employ on a daily basis.
The application may be accessed through https://www.tableau.com/ (Figure 10.14)

Tableau website - https://www.tableau.com/


Figure 10.14

The Institute of Cost Accountants of India 625


Financial Management and Business Data Analytics

(ii) Microsoft Power BI


Microsoft Power BI is a data visualisation tool for business intelligence data. Reporting, self-service
analytics, and predictive analytics are supported. In addition, it provides a platform for end users to generate
reports and share insights with others inside their business. It serves as a centralized repository for all of
the business data, which all of the business users can access. Through such linkages, the prepared reports
may be shared inside the organisation, making it a crucial tool for businesses seeking a consolidated
data reporting system. The application may be accessed through https://powerbi.microsoft.com/en-au/
(Figure 10.15)

Microsoft Power BI website - https://powerbi.microsoft.com/en-au/


Figure 10.15
(iii) Microsoft Excel
Microsoft Excel is a data Visualisation tool with an intuitive interface, so it is not necessarily difficult to
use.
Excel provides several options for viewing data, such as, scatter plot, bar chart, histogram, pie chart, line
chart, and treemap etc. Using these techniques, one may illustrate the relationship between two or more
datasets that is wished to compare. Also one may examine the relationships between variables to discover
if they are connected or not.
Numerous data analysts utilize techniques in MS Excel to examine statistical, scientific, medical, and
economic data for market research and financial planning, among other applications.
The application may be accessed through https://www.microsoft.com/en-in/microsoft-365/excel
(Figure 10.16)

626 The Institute of Cost Accountants of India


Data Presentation: Visualisation and Graphical Presentation

Figure 10.16: Microsoft Excel website - https://www.microsoft.com/en-in/microsoft-365/excel

(iv) QlikView
QlikView is a data discovery platform that enables users to make quicker, more informed choices by
speeding analytics, uncovering new business insights, and enhancing the precision of outcomes.
An easy software development kit that has been utilized by enterprises worldwide for many years. It may
mix diverse data sources with color-coded tables, bar graphs, line graphs, pie charts, and sliders.
It has been designed using a “drag and drop” Visualisation interface, allowing users to input data from a
variety of sources, including databases and spreadsheets, without having to write code. These properties
also make it a reasonably easy-to-learn and -understand instrument. The application may be accessed
through https://www.qlik.com/us/products/qlikview (Figure 10.17)

Figure 10.17: QlikView - https://www.qlik.com/us/products/qlikview

The Institute of Cost Accountants of India 627


Financial Management and Business Data Analytics

Solved Case 1
Sutapa is working as an analyst with SN Company Limited. She is entrusted with the responsibility of making a
presentation before the senior management. She knows that data Visualisation is an important tool for presentation,
and a good data Visualisation can make her presentation more effective. However, she is not very sure about the
data visualisation tools, that are available.
What are the important data Visualisation tools available that Sutapa may use for an effective and impressive
presentation.
Teaching note - outline for solution:
The instructor may initiate the discussions with explaining the importance of data Visualisation. She may also
discuss the objectives of data Visualisation:
(i) Making a better data analysis:
(ii) Faster decision making
(iii) Analysing complicated data
For an effective data Visualisation, the presenter should keep certain important issues in mind:
(i) Know the objective
(ii) Always keep the audience in mind
(iii) Invest in the best technology
(iv) Improve the team’s ability to visualise data
There are various tools available for data Visualisation. The instructor may extend the discussion with mentioning
the following tools. He should also explain the suitability of each tool for visualising and presenting the data:
(i) Dashboards
(ii) Bar charts
(iii) Histogram
(iv) Pie chart
(v) Line chart
(vi) Maps
(vii) Gantt chart
(viii) Bubble Chart etc.
One of the major comforting factor is development of recent software that makes the process of data Visualisation
less painful. The instructor may conclude the discussions with mention of few popular softwares, viz:
(i) Microsoft Power Bi
(ii) Tableau
(iii) Microsoft Excel etc

628 The Institute of Cost Accountants of India


Data Presentation: Visualisation and Graphical Presentation

Exercise
A. Theoretical Questions:
~~ Multiple Choice Questions

1. Following is a widely used graph for data Visualisation


(a) Bar chart
(b) Pie chart
(c) Histogram
(d) All of the above
2. Following are the objectives of data visualisation:
(a) Making a better data analysis
(b) Faster decision making
(c) Analysing complicated data
(d) All of the above
3. Following are the scope of DPA
(a) Defining significant meaning (relevant information) required by each audience member in every
scenario.
(b) Obtaining the proper data (focus area, historic reach, extensiveness, level of detail, etc.)
(c) Determining the needed frequency of data refreshes (the currency of the data)
(d) All of the above
4. Maps may be used for displaying
(a) Pincode
(b) Country name
(c) State abbreviation
(d) All of the above
5. A scatter plot displays several unique data points:
(a) on a single graph.
(b) On two different graphs
(c) On four different graphs
(d) None of the above
Answer:

1 d 2 d 3 d 4 d 5 a

~~ State True or False

1. Data visualisation enhances the effect of communications for the audiences and delivers the most
convincing data analysis outcomes.
2. Visualisation allows to interpret large volumes of data more quickly and effectively at a glance.

The Institute of Cost Accountants of India 629


Financial Management and Business Data Analytics

3. Data presentation architecture (DPA) is a set of skills that aims to identify, find, modify, format, and
present data in a manner that ideally conveys meaning and provides insight.
4. Scatter plots are a useful tool for examining the connection between many variables, revealing whether
one variable is a good predictor of another or whether they tend to vary independently.
5. Gantt charts represent a project’s timeline or activity changes across time.
Answer:

1 T 2 T 3 T 4 T 5 T

~~ Fill in the blanks

1. Data and insights available to decision-makers facilitate _________ analysis.


2. Often confused with data Visualisation, data presentation architecture is a much ______ skill set.
3. A ________________is a circular graphical representation of statistical data that is segmented to
demonstrate numerical proportion.
4. If the data is related with geographic information, _________ are a simple and effective approach to
illustrate the relationship.
5. ____________ indicate patterns or relative concentrations that might otherwise be obscured by
overlapping marks on a map, allowing to identify areas with a larger or lesser number of data points.
Answer:
1 decision 2 broader
3 pie chart (or circle chart) 4 maps
5 Density maps

~~ Short essay type questions

1. State the objectives of Data presentation architecture (DPA).


2. What are the scopes of Data presentation architecture (DPA).
3. Define the concept of data Visualisation dashboard.
4. Write a short note on bar chart
5. Write a short note on density map.

~~ Essay type questions

1. Discuss the ways in which the finance professionals may be helped by data Visualisation in analysing
and reporting information.
2. Discuss the objectives of data Visualisation.
3. How to use data Visualisation in report design?

630 The Institute of Cost Accountants of India


Data Presentation: Visualisation and Graphical Presentation

4. Discuss the different tools for Visualisation and Graphical Presentation


5. Discuss the objectives and scope of data presentation architecture.

Unsolved Case(s)
1. Maitreyee works as a financial analyst with a bank. The departmental meeting with her managing director is
going to happen very soon. Maitreyee is entrusted with the task of preparing a dashboard that will cover the
performance of his department during the past quarter. She wants to prepare the dashboard in such a way,
that it should not look cluttered, but at the same time, it covers all the available information in a visually
pleasing manner.
Discuss the different approaches Maitreyee may adopt to meet her objective.

The Institute of Cost Accountants of India 631


Financial Management and Business Data Analytics

References:
●● Davy Cielen, Arno D B Meysman, and Mohamed Ali. Introducing Data Science. Manning Publications Co
USA
●● Cathy O’Neil, Rachell Schutt. Doing data science. O’Reilley
●● Joel Grus. Data science from scratch. O’Reilley
●● https://go.oracle.com
●● https://sfmagazine.com
●● https://hbr.org/
●● https://www.tableau.com
●● http://country.eiu.com
●● https://en.wikipedia.org
●● https://vdl.sci.utah.edu
●● https://towardsdatascience.com

632 The Institute of Cost Accountants of India


Data Analysis and Modelling

Data Analysis and Modelling 11

This Module Includes


11.1 Process, Benefits and Types of Data Analysis
11.2 Data Mining and Implementation of Data Mining
11.3 Analytics and Model Building (Descriptive, Diagnostic, Predictive, Prescriptive)
11.4 Standards for Data Tagging and Reporting (XML, XBRL)
11.5 Cloud Computing, Business Intelligence, Artificial Intelligence, Robotic Process Automation
and Machine Learning
11.6 Model vs. Data-driven Decision-making

The Institute of Cost Accountants of India 633


Financial Management and Business Data Analytics

Data Analysis and Modelling


SLOB Mapped against the Module:
To equip oneself with application-oriented knowledge in data preparation, data presentation and finally data
analysis and modelling to facilitate quality business decisions.

Module Learning Objectives:


After studying this module, the students will be able to –
~~ Understand the Process, Development and Types of Data Analysis
~~ Understand the concepts of Data Mining and Implementation of Data Analysis
~~ Understand the concepts of Analytics and Model Building
~~ Understand the Standards for Data Tagging and Reporting
~~ Understand the concepts of Cloud Computing, Business Intelligence, Artificial Intelligence, Robotic
Process Automation and Machine Learning
~~ Understand the relationship between model and data driven decision making.

634 The Institute of Cost Accountants of India


Data Analysis and Modelling

Process, Benefits and Types


11.1
of Data Analysis
Data analytics is the science of evaluating unprocessed datasets to draw conclusions about the information they
contain. It helps us to identify patterns in the raw data and extract useful information from them.
Applications containing machine learning algorithms, simulation, and automated systems may be utilised by data
analytics procedures and methodologies. For human usage, the systems and algorithms process unstructured data.
These data are evaluated and used to assist firms in gaining a deeper understanding of their customers, analysing
their promotional activities, customising their content, developing content strategies, and creating new products.
Data analytics enables businesses to boost market efficiency and increase profits.

11.1.1 Process of data analytics


Following are the steps for data analytics:
~~ Step 1: Criteria for grouping data
Data may be segmented by a variety of parameters, including age, population, income, and sex. The data
values might be either numeric or category.
~~ Step 2: Collecting the data
Data may be gathered from several sources, including internet sources, computers, personnel, and community
sources.
~~ Step 3: Organizing the data
After collecting the data, it must be arranged so that it can be analysed. Statistical data can be organised on
a spreadsheet or other programme capable of handling statistical data.
~~ Step 4: Cleaning the data
The data is initially cleansed to verify that there are no duplicates or errors. The document is then examined
to ensure that it is comprehensive. Before data is sent to a data analyst for analysis, it is beneficial to rectify
or eliminate any errors by cleaning the data.
~~ Step 5: Adopt the right type of data analytics process:
There are four types of data analytics process:
(i) Descriptive analytics
(ii) Diagnostics analytics
(iii) Predictive analytics
(iv) Prescriptive analytics
We will discuss more on these types of analytics types in section 11.3.

The Institute of Cost Accountants of India 635


Financial Management and Business Data Analytics

11.1.2 Benefits of data analytics


Following are the benefits of data analytics:
(i) Improves decision making process
Companies can use the information gained from data analytics to base their decisions, resulting in enhanced
outcomes. Using data analytics significantly reduces the amount of guesswork involved in preparing
marketing plans, deciding what materials to produce, and more. Using advanced data analytics technologies,
you can continuously collect and analyse new data to gain a deeper understanding of changing circumstances.
(ii) Increase in efficiency of operations
Data analytics assists firms in streamlining their processes, conserving resources, and increasing their
profitability. When firms have a better understanding of their audience’s demands, they spend less time
creating advertising that do not fulfil those needs.
(iii) Improved service to stakeholders
Data analytics gives organisations with a more in-depth understanding of their customers, employees and
other stake holders. This enables the company to tailor stakeholders’ experiences to their needs, provide
more personalization, and build stronger relationships with them.

636 The Institute of Cost Accountants of India


Data Analysis and Modelling

Data Mining and Implementation


of Data Mining 11.2

D
ata mining, also known as knowledge discovery in data (KDD), is the extraction of patterns and other
useful information from massive data sets. Given the advancement of data warehousing technologies
and the expansion of big data, the use of data mining techniques has advanced dramatically over the
past two decades, supporting businesses in translating their raw data into meaningful information.
Nevertheless, despite the fact that technology is always evolving to manage massive amounts of data, leaders
continue to struggle with scalability and automation.
Through smart data analytics, data mining has enhanced corporate decision making. The data mining techniques
behind these investigations may be categorised into two primary purposes: describing the target dataset or
predicting results using machine learning algorithms. These strategies are used to organise and filter data, bringing
to the surface the most relevant information, including fraud detection, user habits, bottlenecks, and even security
breaches.
When paired with data analytics and visualisation technologies like as Apache Spark, data mining has never been
more accessible and the extraction of valuable insights has never been quicker. Artificial intelligence advancements
continue to accelerate adoption across sectors.

11.2.1 Process of data mining


The process of data mining comprises a series of procedures, from data collecting through visualisation, in order
to extract useful information from massive data sets. As stated previously, data mining techniques are utilised to
develop descriptions and hypotheses on a specific data set. Through their observations of patterns, relationships,
and correlations, data scientists characterise data. In addition to classifying and clustering data using classification
and regression techniques, they discover outliers for use cases such as spam identification.
Data mining typically involves four steps: establishing objectives, acquiring and preparing data, implementing
data mining techniques, and assessing outcomes.
(i) Setting the business objective:
This might be the most difficult element in the data mining process, yet many organisations spend inadequate
effort on it. Together, data scientists and business stakeholders must identify the business challenge, which
informs the data queries and parameters for a specific project. Analysts may also need to conduct further
study to adequately comprehend the company environment.
(ii) Preparation of data:
Once the scale of the problem has been established, it is simpler for data scientists to determine which
collection of data will assist the company in answering crucial questions. Once the pertinent data has been
collected, it will be cleansed by eliminating any noise, such as repetitions, missing numbers, and outliers.

The Institute of Cost Accountants of India 637


Financial Management and Business Data Analytics

Based on the dataset, an extra step may be done to minimise the number of dimensions, as an excessive
amount of features might slow down any further calculation. Data scientists seek to maintain the most
essential predictors to guarantee optimal model accuracy.
(iii) Model building and pattern mining:
Data scientists may study any intriguing relationship between the data, such as frequent patterns, clustering
algorithms, or correlations, depending on the sort of research. While high frequency patterns have larger
applicability, data variations can often be more fascinating, exposing possible fraud areas.
Depending on the available data, deep learning algorithms may also be utilised to categorise or cluster a
data collection. If the input data is marked (i.e. supervised learning), a classification model may be used to
categorise data, or a regression may be employed to forecast the probability of a specific assignment. If the
dataset is unlabeled (i.e. unsupervised learning), the particular data points in the training set are compared to
uncover underlying commonalities, then clustered based on those features.
(iv) Result evaluation and implementation of knowledge:
After aggregating the data, the findings must be analysed and understood. When completing results, they
must be valid, original, practical, and comprehensible. When this criterion is satisfied, companies can execute
new strategies based on this understanding, therefore attaining their intended goals.

11.2.2 Techniques of data mining


Using various methods and approaches, data mining transforms vast quantities of data into valuable information.
Here are a few of the most prevalent:
(i) Association rules:
An association rule is a rule-based technique for discovering associations between variables inside a given
dataset. These methodologies are commonly employed for market basket analysis, enabling businesses to
better comprehend the linkages between various items. Understanding client consumption patterns helps
organisations to create more effective cross-selling tactics and recommendation engines.
(ii) Neural Networks:
Primarily utilised for deep learning algorithms, neural networks replicate the interconnection of the human
brain through layers of nodes to process training data. Every node has inputs, weights, a bias (or threshold),
as well as an output. If the output value exceeds a predetermined threshold, the node “fires” and passes data
to the subsequent network layer. Neural networks acquire this mapping function by supervised learning and
gradient descent, changing based on the loss function. When the cost function is zero or close to it, we may
have confidence in the model’s ability to produce the correct answer.
(iii) Decision tree:
Using classification or regression algorithms, this data mining methodology classifies or predicts likely
outcomes based on a collection of decisions. As its name implies, it employs a tree-like representation to
depict the potential results of these actions.
(iv) K-nearest neighbour:
K-nearest neighbour, often known as the KNN algorithm, classifies data points depending on their closeness
to and correlation with other accessible data. This technique assumes that comparable data points exist in
close proximity to one another. Consequently, it attempts to measure the distance between data points, often
by Euclidean distance, and then assigns some on the most common category or average.

638 The Institute of Cost Accountants of India


Data Analysis and Modelling

11.2.3 Implementation of data mining in Finance and management


The widespread use of data mining techniques by business intelligence and data analytics teams enables them to
harvest insights for their organisations and industries.
Utilizing data mining techniques, hidden patterns and future trends and behaviours in financial markets may be
predicted. Typically, sophisticated statistical, mathematical, and artificial intelligence approaches are necessary for
data mining, particularly for high-frequency financial data. Among the data mining applications are:
(i) Detecting money laundering and other financial crimes:
Money laundering is the illegal conversion of black money to white money. In today’s society, data mining
techniques have advanced to the point where they are deemed suitable for detecting money laundering. The
data mining methodology provides a mechanism for bank customers to detect or verify the detection of the
anti-money laundering impact.
(ii) Prediction of loan repayment and customer credit policy analysis:
Loan Distribution is the core business function of every bank. The loan Prediction system automatically
computes the size of the characteristics it employs and examines data pertaining to its size. Consequently,
data mining aids in the management of all critical data and massive databases by utilising its models.
(iii) Target marketing:
Together, data mining and marketing work to target a certain market, and they also assist and determine
market decisions. With data mining, it is possible to keep earnings, margins, etc. and determine which
product is optimal for various types of customers.
(iv) Design and construction of data warehouses:
The business is able to retrieve or move the data into several huge data warehouses, allowing a vast volume of
data to be correctly and reliably evaluated with the aid of various data mining methodologies and techniques.
It also examines a vast number of transactions.

The Institute of Cost Accountants of India 639


Financial Management and Business Data Analytics

Analytics and Model Building


(Descriptive, Diagnostic, Predictive, 11.3
Prescriptive)

B
usinesses utilise analytics to study and evaluate their data, and then translate their discoveries into insights
that eventually aid executives, managers, and operational personnel in making more educated and
prudent business choices. Descriptive analytics, which examines what has occurred in a firm, diagnostic
analytics, which explores why did it occur, predictive analytics, which examines what could occur, and
prescriptive analytics, which examines what should occur, are the four most important forms of analytics used by
enterprises. While each of these approaches has its own distinct insights, benefits, and drawbacks in their use, when
combined, these analytics tools may be an exceptionally valuable asset for a corporation.
It is also essential to examine the privacy principles while utilising data. Public entities and the business
sector should consider individual privacy when using data analytics. As more and more firms seek to big data
(huge, complex data sets) to raise revenue and enhance corporate efficiency and effectiveness, regulations
are becoming increasingly required.

11.3.1 What are descriptive analytics?


Descriptive analytics is a frequently employed style of data analysis in which historical data is collected,
organised, and presented in a readily digestible format. Descriptive analytics focuses exclusively on what has
already occurred in an organisation and, unlike other types of analysis, does not utilise its results to draw inferences
or make forecasts. Rather, descriptive analytics serves as a basic starting point to inform or prepare data for
subsequent analysis.
In general, descriptive analytics is the simplest kind of data analytics, since it employs simple mathematical and
statistical methods, such as arithmetic, averages, and percentage changes, rather of the complicated computations
required for predictive and prescriptive analytics. With the use of visual tools such as line graphs, pie charts, and bar
charts to communicate data, descriptive analytics can and should be readily understood by a broad corporate audience.

11.3.2 How does descriptive analytics work?


To identify historical data, descriptive analytics employs two fundamental techniques: data aggregation and data
mining (also known as data discovery). The process of gathering and organising data into digestible data sets called
data aggregation. The extracted patterns, trends, and significance are then presented in an intelligible format.
According to Dan Vesset, the process of descriptive analytics may be broken into five broad steps:
Step 1: Decide the business metrics: First, measurements are developed to evaluate performance against
corporate objectives, such as increasing operational efficiency or revenue. According to Vesset, the effectiveness
of descriptive analytics is strongly dependent on KPI governance. ‘Without governance,’ he says, ‘there may be no
consensus on the meaning of the data, assuring analytics a minor role in decision-making.’
Step 2: Identification of data requirement: The data is gathered from sources such as reports and databases.

640 The Institute of Cost Accountants of India


Data Analysis and Modelling

Vesset states that in order to correctly measure against KPIs, businesses must catalogue and arrange the appropriate
data sources in order to extract the necessary data and generate metrics depending on the present status of the
business.
Step 3: Preparation and collection of data: Data preparation, which includes publication, transformation, and
cleaning, occurs prior to analysis and is a crucial step for ensuring correctness; it is also one of the most time-
consuming tasks for the analyst.
Step 4: Analysis of data: Utilizing summary statistics, clustering, pattern tracking, and regression analysis, we
discover data trends and evaluate performance.
Step 5: Presentation of data: Lastly, charts and graphs are utilised to portray findings in a manner that non-
experts in analytics may comprehend.

11.3.3 Information revealed by descriptive analytics:


An organisation uses descriptive analytics regularly in its day-to-day operations. Examples of descriptive
analytics that give a historical overview of an organization’s activities include company reports on inventory,
workflow, sales, and revenue. These types of reports collect data that can be readily aggregated and utilised to
provide snapshots of an organization’s activities.
Social analytics are virtually always a type of descriptive analytics. The number of followers, likes, and posts
may be utilised to calculate, for example, the average number of replies per post, page visits, and response time.
Facebook and Instagram comments are additional instances of descriptive analytics that may be utilised to better
comprehend user sentiments.
However, descriptive analytics does not seek to go beyond the surface data and analysis; extra inquiry falls
outside the scope of descriptive analytics, and conclusions and predictions are not derived from descriptive
analysis. Nevertheless, this research can show patterns and significance by comparing historical data. An annual
income report, for instance, may look financially encouraging until it is compared against the same report from past
years, which reveals a declining trend.

11.3.4 Advantages and disadvantages of descriptive analytics


Due to the fact that descriptive analytics depends just on historical data and basic computations, this technique
is easily applicable to day-to-day operations and does not need an in-depth understanding of analytics. This
implies that firms may report on performance very quickly and simply and acquire insights that can be utilised to
make changes.

11.3.5 Examples of descriptive analytics


Descriptive analytics assists organisations in measuring performance to ensure that objectives and goals are
reached. And if they are not reached, descriptive analytics can indicate improvement or change areas.
Several applications of descriptive analytics include the following:
●● Past events, such as sales and operational data or marketing campaigns, are summarised.
●● Social media usage and engagement data, such as Instagram or Facebook likes, are examples of such
information.
●● Reporting general trends
●● Compiling survey data

The Institute of Cost Accountants of India 641


Financial Management and Business Data Analytics

11.3.6 What is diagnostic analytics?


Diagnostic analytics highlights the tools are employed to question the data, “Why did this occur?” It involves a
thorough examination of the data to discover important insights. Descriptive analytics, the first phase in the data
analysis process for the majority of businesses, is a straightforward method that records what has already occurred.
Diagnostic analytics goes a step further by revealing the rationale behind particular outcomes.
Typical strategies for diagnostic analytics include data discovery, drill-down, data mining, and correlations.
Analysts identify the data sources that assists them in interpreting the outcomes during the discovery phase. Drilling
down entails concentrating on a specific aspect of the data or widget. Data mining is the automated extraction of
information from vast quantities of unstructured data. And identifying consistent connections in the data might
assist to pinpoint the investigation’s parameters.
Analysts are responsible for identifying the data sources that would be utilised. Frequently, this requires them
to search for trends outside of the organization’s own databases. It may be necessary to include data from external
sources in order to find connections and establish causality.

11.3.7 Advantages of diagnostic analytics


Data plays an increasingly important role in every organisation. Using diagnostic tools helps to make the most
of the data by turning it into visuals and insights that can be utilised by everyone. Diagnostic analytics develops
solutions that may be used to discover answers to data-related problems and to communicate insights within the
organisation.
Diagnostic analytics enables to derive value from the data by asking the relevant questions and doing in-
depth analyses of the responses. And this demands a platform for BI and analytics that is adaptable, nimble, and
configurable.

11.3.8 Examples of diagnostic analytics


Here are some steps that may be taken to run diagnostic analytics on the internal data, and it may be required to
add external information, in order to determine “why” something occurred. Set up the data study by determining
what questions are to be answered. This might be an inquiry into the cause of a problem, such as a decreased click-
through rate, or a positive development, such as a significant increase in sales during a specific period or season.
After identifying the problem, the analysis may be set up. You may be able to identify a single root cause, or
you may require numerous data sets to identify a pattern and establish a link. By fitting a collection of variables
to a linear equation, linear regression can assist identify relationships. Remember that the longer you let your data
model to collect data, the more precise your results will be. A data model matures like a superb wine does. Next,
apply a filter to your findings so that just the most significant factor or two potential factors are included in your
report. For example, using the correlative correlations, you should next draw your findings and create a convincing
argument for them.
Consider an HR department that wishes to examine the performance of its employees based on quarterly
performance levels, absenteeism, and weekly overtime hours. You might establish your data models, utilise Python
or R for in-depth examination, and search for correlations in your data.
Cybersecurity is another example of a problem that every organisation should devote resources to. The Cyber
Security Team may determine the relationship between the security rating and the number of incidents, as well as
assess other objectives, such as the reaction teamwork vs the average time to resolution. The company might utilise
these data to design preventative measures for potentially vulnerable regions.

642 The Institute of Cost Accountants of India


Data Analysis and Modelling

11.3.9 What is Predictive Analytics?


Predictive analytics, as implied by its name, focuses on forecasting and understanding what might occur in
the future, whereas descriptive analytics focuses on previous data. By analysing past data patterns and trends by
examining historical data and customer insights, it is possible to predict what may occur in the future and, as a
result, many aspects of a business can be informed, such as setting realistic goals, executing effective planning,
managing performance goals, and avoiding risks.

11.3.10 How does Predictive Analytics work?


The foundation of predictive analytics is probability. Using techniques such as data mining, statistical modelling
(mathematical relationships between variables to predict outcomes), and machine learning algorithms (classification,
regression, and clustering techniques), predictive analytics attempts to predict possible future outcomes and the
probability of those events. To create predictions, machine learning algorithms, for instance, utilise current data
and make the best feasible assumptions to fill in missing data.
Deep learning is a more recent subfield of machine learning that imitates the building of “human brain networks
as layers of nodes that understand a specific process area but are networked together to provide an overall forecast.”
Credit scoring utilising social and environmental data and the sorting of digital medical pictures such as X-rays to
automated predictions for doctors to use in diagnosing patients are instances of deep learning.
This methodology enables executives and managers to take a more proactive, data-driven approach to corporate
planning and decision-making, given that predictive analytics may provide insight into what may occur in the
future. Utilizing predictive analytics, businesses may foresee customer behaviour and purchase patterns, as well
as discover sales trends. Predictions can also assist in forecasting supply chain, operations, and inventory demand.

11.3.11 Advantages and disadvantages of Predictive Analytics


Given that predictive analysis is based on probabilities, it can never be absolutely precise, but it may serve as
a crucial tool for forecasting probable future occurrences and informing future corporate strategy. Additionally,
predictive analytics may enhance several corporate functions, including:
~~ Effectiveness, including inventory forecasting
~~ Customer service, which may aid a business in gaining a deeper knowledge of who its clients are and what
they want so that it can personalise its suggestions, is essential.
~~ Detection and prevention of fraud, which can assist businesses in identifying trends and alterations.
~~ Risk mitigation, which in the financial sector might entail enhanced applicant screening
This kind of analysis requires the availability of historical data, typically in enormous quantities.

11.3.12 Example of Predictive Analytics


There are a multitude of ways predictive analytics may be used to foresee probable occurrences and trends across
sectors and enterprises. The healthcare business is a major benefactor of predictive analytics, for instance. RMIT
University partnered with the Digital Health Cooperative Research Centre in 2019 to develop clinical decision
support software for aged care that will reduce emergency hospitalizations and predict patient deterioration by
analysing historical data and developing new predictive analytics techniques. The purpose of predictive analytics
is to enable senior care professionals, residents, and their families to better prepare for death.

The Institute of Cost Accountants of India 643


Financial Management and Business Data Analytics

Following industries as some in which predictive analysis might be utilised:


~~ E-commerce – anticipating client preferences and proposing items based on previous purchases and search
histories
~~ Sales – estimating the possibility that a buyer will buy another item or depart the shop.
~~ Human resources – identifying employees who are contemplating resigning and urging them to remain.
~~ IT security – detecting potential security vulnerabilities requiring more investigation
~~ Healthcare – anticipating staffing and resource requirements

11.3.13 What is prescriptive analytics?


Descriptive analytics describes what has occurred, diagnostic analytics explore why it occurred, predictive
analytics describes what could occur, and prescriptive analytics describes what should be done. This approach
is the fourth, final, and most sophisticated step of the business analysis process, and it is the one that urges firms
to action by assisting executives, managers, and operational personnel in making the most informed decisions
possible based on the available data.

11.3.14 How does the prescriptive analytics work?


Prescriptive analytics goes one step farther than descriptive and predictive analysis by advising the best potential
business actions. This is the most sophisticated step of the business analytics process, needing significantly more
specialised analytics expertise to execute; as a result, it is rarely utilised in daily company operations.
A multitude of approaches and tools – such as rules, statistics, and machine learning algorithms – may be used to
accessible data, including internal data (from within the business) and external data, in order to produce predictions
and recommendations (such as data derived from social media). The capabilities of machine learning dwarf those
of a human attempting to attain the same outcomes.
The widespread misconception is that predictive analytics and machine learning are same. While predictive
analytics uses historical data and statistical techniques to make predictions about the future, machine learning,
a subset of artificial intelligence, refers to a computer system’s ability to understand large and often enormous
amounts of data without explicit instructions, and to adapt and become increasingly intelligent as a result.
Predictive analytics predicts what, when, and, most importantly, why something may occur. After analysing
the potential repercussions of each choice alternative, suggestions may be made regarding which options would
best capitalise on future opportunities or reduce future hazards. Prescriptive analytics predicts future outcomes
and, by doing so, enables decision-makers to assess the potential consequences for each future outcome before
making a choice.
Effectively conducted prescriptive analytics may have a significant impact on corporate strategy and decision
making to enhance production, customer experience, and business success.

11.3.15 Advantages and disadvantages of prescriptive analytics


When utilised correctly, prescriptive analytics gives important insights for making the most optimal data-driven
decisions to optimise corporate performance. Nonetheless, similar to predictive analytics, this technique requires
enormous volumes of data to deliver effective findings, which is not always the case. In addition, the machine
learning techniques frequently used in this study cannot consistently account for all external variables. On the other
hand, machine learning significantly minimises the likelihood of human mistake.

644 The Institute of Cost Accountants of India


Data Analysis and Modelling

11.3.16 Examples of prescriptive analytics


GPS technology is a frequent prescriptive analytics tool since it gives recommended routes to the user’s intended
destination based on factors such as travel time and road closures. In this scenario, prescriptive analysis “optimises
a goal that analyses the distances between your origin and destination and prescribes the ideal path with the least
distance.”
Further prescriptive analysis applications include the following:
~~ Oil and manufacturing – monitoring price fluctuations
~~ Manufacturing – enhancing equipment administration, maintenance, cost modelling, production, and storage
~~ Healthcare – enhancing patient care and healthcare administration by analysing readmission rates and the
cost-effectiveness of operations.
~~ Insurance – evaluating customer risk in terms of price and premium information
~~ Pharmaceutical research – determining the optimal testing methods and patient populations for clinical trials.

The Institute of Cost Accountants of India 645


Financial Management and Business Data Analytics

Standards for Data Tagging and


Reporting (XML, XBRL) 11.4
11.4.1 Extensible Markup Language (XML)

X
ML is a file format and markup language for storing, transferring, and recreating arbitrary data. It specifies
a set of standards for encoding texts in a format that is understandable by both humans and machines. XML
is defined by the 1998 XML 1.0 Specification of the World Wide Web Consortium and numerous other
related specifications, which are all free open standards.
XML’s design objectives stress Internet usability, universality, and simplicity. It is a textual data format with
significant support for many human languages via Unicode. Although XML’s architecture is centred on texts, the
language is commonly used to express arbitrary data structures, such as those employed by web services.
Several schema systems exist to help in the design of XML-based languages, and numerous application
programming interfaces (APIs) have been developed by programmers to facilitate the processing of XML data.
Serialization, or storing, sending, and rebuilding arbitrary data, is the primary function of XML. In order for
two dissimilar systems to share data, they must agree on a file format. XML normalises this procedure. XML is
comparable to a universal language for describing information.
As a markup language, XML labels, categorises, and arranges information systematically.
The data structure is represented by XML tags, which also contain information. The information included within
the tags is encoded according to the XML standard. A supplementary XML schema (XSD) defines the required
metadata for reading and verifying XML. This is likewise known as the canonical schema. A “well-formed” XML
document complies to fundamental XML principles, whereas a “valid” document adheres to its schema.
IETF RFC 7303 (which supersedes the previous RFC 3023) specifies the criteria for constructing media types for
use in XML messages. It specifies the application/xml and text/xml media types. They are utilised for transferring
unmodified XML files without revealing their intrinsic meanings. RFC 7303 also suggests that media types for
XML-based languages end in +xml, such as image/svg+xml for SVG.
RFC 3470, commonly known as IETF BCP 70, provides further recommendations for the use of XML in a
networked setting. This document covers many elements of building and implementing an XML-based language.

11.4.2 Application of XML


XML is now widely utilised for the exchange of data via the Internet. There have been hundreds of document
formats created using XML syntax, including RSS, Atom, Office Open XML, OpenDocument, SVG, and XHTML.
XML is also the foundational language for communication protocols like SOAP and XMPP. It is the message
interchange format for the programming approach Asynchronous JavaScript and XML (AJAX).

646 The Institute of Cost Accountants of India


Data Analysis and Modelling

Numerous industrial data standards, including Health Level 7, OpenTravel Alliance, FpML, MISMO, and
National Information Exchange Model, are founded on XML and the extensive capabilities of the XML schema
definition. Darwin Information Typing Architecture is an XML industry data standard in publishing. Numerous
publication formats rely heavily on XML as their basis.

11.4.3 Extensible Business Reporting Language (XBRL)


XBRL is a data description language that facilitates the interchange of standard, comprehensible corporate data.
It is based on XML and enables the automated interchange and trustworthy extraction of financial data across all
software types and advanced technology, including Internet.

XBRL for G/L Journal XBRL for Financial XBRL for Regulatory
Entry Reporting Statements Filings

Internal External Investment


Processes Business
Financial Financial and Lending
Operations
Reporting Reporting Analysis

XBRL for Business XBRL for Audit XBRL for Tax


Event Reporting Schedules Filings

Financial
Participants COMPANIES Publisher and Investors
Data Aggregators

Trading Managements
Auditors Regulators
Partners Accountants

SOFTWARE VENDORS

Figure 11.1: XBRL working model (Source: http://www.xbrl.org/business/general/softwareag-


caseforxbrl.pdf)

XBRL allows organisations to arrange data using tags. When a piece of data is labelled as “revenue,” for instance,
XBRL enabled applications know that it pertains to revenue. It conforms to a fixed definition of income and may
appropriately utilise it. The integrity of the data is safeguarded by norms that have been already accepted. In
addition, XBRL offers expanded contextual information on the precise data content of financial documents. For
example, when a a monetary amount is stated. XBRL tags may designate the data as “currency” or “accounts”
within a report.
With XBRL, a business, a person, or another software programme may quickly produce a variety of output
formats and reports based on a financial statement.

The Institute of Cost Accountants of India 647


Financial Management and Business Data Analytics

11.4.4 Benefits of XBRL


~~ All reports are automatically created from a single source of information, which reduces the chance of
erroneous data entry and hence increases data reliability.
~~ Reduces expenses by simplifying and automating the preparation and production of reports for various
clients.
~~ Accelerates the decision-making of financial entities such as banks and rating services.
~~ Facilitates the publication of analyst and investor reports
~~ Access, comparison, and analytic capabilities for information are unparalleled.

648 The Institute of Cost Accountants of India


Data Analysis and Modelling

Cloud Computing, Business Intelligence,


Artificial Intelligence, Robotic Process 11.5
Automation and Machine Learning
11.5.1 Cloud computing
Simply described, cloud computing is the delivery of a variety of services through the Internet, or “the cloud.”
It involves storing and accessing data via distant servers as opposed to local hard drives and private datacenters.
Before the advent of cloud computing, businesses had to acquire and operate their own servers to suit their
demands. This necessitated the purchase of sufficient server capacity to minimise the risk of downtime and
disruptions and to meet peak traffic volumes. Consequently, significant quantities of server space were unused for
the most of the time. Today’s cloud service providers enable businesses to lessen their reliance on costly onsite
servers, maintenance staff, and other IT resources.

Types of cloud computing


There are three deployment options for cloud computing: private cloud, public cloud, and hybrid cloud.
(i) Private cloud:
Private cloud offers a cloud environment that is exclusive to a single corporate organisation, with physical
components housed on-premises or in a vendor’s datacenter. This solution gives a high level of control due
to the fact that the private cloud is available to just one enterprise. In a virtualized environment, the benefits
include a customizable architecture, enhanced security procedures, and the capacity to expand computer
resources as needed. In many instances, a business maintains a private cloud infrastructure on-premises and
provides cloud computing services to internal users over the intranet. In other cases, the company engages
with a third-party cloud service provider to host and operate its servers off-site.
(ii) Public cloud:
The public cloud stores and manages access to data and applications through the internet. It is fully virtualized,
enabling an environment in which shared resources may be utilised as necessary. Because these resources
are offered through the web, the public cloud deployment model enables enterprises to grow with more
ease; the option to pay for cloud services on an as-needed basis is a significant benefit over local servers.
Additionally, public cloud service providers use rigorous security measures to prevent unauthorised access
to user data by other tenants.
(iii) Hybrid cloud:
Hybrid cloud blends private and public cloud models, enabling enterprises to exploit the benefits of shared
resources while leveraging their existing IT infrastructure for mission-critical security needs. The hybrid
cloud architecture enables businesses to store sensitive data on-premises and access it through apps hosted
in the public cloud. In order to comply with privacy rules, an organisation may, for instance, keep sensitive
user data in a private cloud and execute resource-intensive computations in a public cloud.

The Institute of Cost Accountants of India 649


Financial Management and Business Data Analytics

11.5.2 Business Intelligence:


Business intelligence includes business analytics, data mining, data visualisation, data tools and infrastructure,
and best practises to assist businesses in making choices that are more data-driven. When you have a complete
picture of your organization’s data and utilise it to drive change, remove inefficiencies, and swiftly adjust to market
or supply changes, you have contemporary business intelligence. Modern BI systems promote adaptable self-
service analysis, controlled data on dependable platforms, empowered business users, and rapid insight delivery.
Traditional Business Intelligence, complete with capitalization, originated in the 1960s as a method for
disseminating information across enterprises. Alongside computer models for decision making, the phrase
“Business Intelligence” was coined in 1989. Before becoming a distinct product from BI teams with IT-dependent
service solutions, these programmes evolved to transform data into insights.

# BI Methods:
Company intelligence is a broad word that encompasses the procedures and methods of gathering, storing,
and evaluating data from business operations or activities in order to maximise performance. All of these factors
combine to provide a full perspective of a firm, enabling individuals to make better, proactive decisions. In recent
years, business intelligence has expanded to incorporate more procedures and activities designed to enhance
performance. These procedures consist of:
(i) Data mining: Large datasets may be mined for patterns using databases, analytics, and machine learning
(ML).
(ii) Reporting: The dissemination of data analysis to stakeholders in order for them to form conclusions and
make decisions.
(iii) Performance metrics and benchmarking: Comparing current performance data to previous performance
data in order to measure performance versus objectives, generally utilising customised dashboards.
(iv) Descriptive analytics: Utilizing basic data analysis to determine what transpired
(v) Querying: BI extracts responses from data sets in response to data-specific queries.
(vi) Statistical analysis: Taking the results of descriptive analytics and use statistics to further explore the data,
such as how and why this pattern occurred.
(vii) Data Visualization: Data consumption is facilitated by transforming data analysis into visual representations
such as charts, graphs, and histograms.
(viii) Visual Analysis: Exploring data using visual storytelling to share findings in real-time and maintain the
flow of analysis.
(ix) Data Preparation: Multiple data source compilation, dimension and measurement identification, and data
analysis preparation.

11.5.3 Artificial Intelligence (AI)


John McCarthy of Stanford University defined artificial intelligence as, “ It is the science and engineering of
making intelligent machines, especially intelligent computer programs. It is related to the similar task of using
computers to understand human intelligence, but AI does not have to confine itself to methods that are biologically
observable.”
However, decades prior to this description, Alan Turing’s landmark paper “Computing Machinery and
Intelligence” marked the genesis of the artificial intelligence discourse. Turing, commonly referred to as the “father
of computer science,” poses the question “Can machines think?” in this article. From there, he proposes the now-

650 The Institute of Cost Accountants of India


Data Analysis and Modelling

famous “Turing Test,” in which a human interrogator attempts to differentiate between a machine and a human
written answer. Although this test has been subjected to considerable examination since its publication, it remains
an essential aspect of the history of artificial intelligence and a continuing philosophical thought that employs
principles from linguistics.
Stuart Russell and Peter Norvig then published ‘Artificial Intelligence: A Modern Approach’, which has since
become one of the most influential AI textbooks. In it, they discuss four alternative aims or definitions of artificial
intelligence, which distinguish computer systems based on reasoning and thinking vs. acting:
~~ Human approach:
●● Systems that think like humans
●● Systems that act like humans
~~ Ideal approach:
●● Systems that think rationally
●● Systems that act rationally
Artificial intelligence is, in its simplest form, a topic that combines computer science and substantial datasets
to allow problem-solving. In addition, it includes the subfields of machine learning and deep learning, which are
commonly associated with artificial intelligence. These fields consist of AI algorithms that aim to develop expert
systems that make predictions or classifications based on input data.
As expected with any new developing technology on the market, AI development is still surrounded by a great
deal of hype. According to Gartner’s hype cycle, self-driving vehicles and personal assistants follow “a normal
evolution of innovation, from overenthusiasm through disillusionment to an ultimate grasp of the innovation’s
importance and position in a market or area.” According to Lex Fridman’s 2019 MIT lecture, we are at the top of
inflated expectations and nearing the trough of disillusionment.
AI has several applications in the area of financial services (fig 11.2).

INVESTMENT SERVICES LENDING


(i) Algorithmic Trading (i) Retail and commercial lending
(ii) Robo Advisory operaions

(iii) Insurance Claim Processing (ii) Retail and commercial lending scores

(iv) Pricing of insurance products (iii) Detecting possibility of default

ARTIFICIAL INTELLIGENCE
IN FINANCE

AUDIT AND COMPLIANCE CUSTOMER SERVICE


(i) Fraud detection (i) Detecting new sell opportunities
(ii) Regulatory compliance (ii) Know you customer (KYC)
(iii) Travel and expense management (iii) Prediction of customer churning

Fig 11.2: Artificial intelligence in finance

The Institute of Cost Accountants of India 651


Financial Management and Business Data Analytics

Types of Artificial Intelligence – Weak AI vs. Strong AI


Weak AI, also known as Narrow AI or Artificial Narrow Intelligence (ANI), is AI that has been trained and
honed to do particular tasks. Most of the AI that surrounds us today is powered by weak AI. This form of artificial
intelligence is anything but feeble; it allows sophisticated applications such as Apple’s Siri, Amazon’s Alexa, IBM
Watson, and driverless cars, among others.
Artificial General Intelligence (AGI) and Artificial Super Intelligence (AIS) comprise strong AI (ASI). Artificial
general intelligence (AGI), sometimes known as general artificial intelligence (AI), is a hypothetical kind of artificial
intelligence in which a machine possesses human-level intellect, a self-aware consciousness, and the ability to
solve problems, learn, and plan for the future. Superintelligence, also known as Artificial Super Intelligence (ASI),
would transcend the intelligence and capabilities of the human brain. Despite the fact that strong AI is yet totally
theoretical and has no practical applications, this does not preclude AI researchers from studying its development.
In the meanwhile, the finest instances of ASI may come from science fiction, such as HAL from 2001: A Space
Odyssey, a superhuman, rogue computer aide.

Deep Learning vs. Machine Learning


Given that deep learning and machine learning are frequently used interchangeably, it is important to note
the distinctions between the two. As stated previously, both deep learning and machine learning are subfields of
artificial intelligence; nonetheless, deep learning is a subfield of machine learning.

Artificial Intelligence

Machine Learning

Deep Learning

Fig 11.3: Artificial intelligence, machine learning, and deep learning


Neural networks truly constitute deep learning. “Deep” in deep learning refers to a neural network with more
than three layers, which includes inputs and outputs, and may be termed a deep learning method. Typically, this is
depicted by the following diagram (fig 11.4):

652 The Institute of Cost Accountants of India


Data Analysis and Modelling

Fig 11.4: Deep neural network


Deep learning and machine learning differ in how their respective algorithms learn. Deep learning automates a
significant portion of the feature extraction step, reducing the need for manual human involvement and enabling the
usage of bigger data sets. Deep learning may be thought of as “scalable machine learning,” as Lex Fridman stated in
the aforementioned MIT presentation. Classical or “non-deep” machine learning requires more human interaction
to learn. Human specialists develop the hierarchy of characteristics in order to comprehend the distinctions between
data inputs, which often requires more structured data to learn.
Deep machine learning can utilise labelled datasets, also known as supervised learning, to educate its algorithm,
although a labelled dataset is not required. It is capable of ingesting unstructured data in its raw form (e.g., text
and photos) and can automatically establish the hierarchy of characteristics that differentiate certain data categories
from one another. It does not require human interaction to interpret data, unlike machine learning, allowing us to
scale machine learning in more exciting ways.

11.5.4 Robotic Process Automation:


With RPA, software users develop software robots or “bots” that are capable of learning, simulating, and executing
rules-based business processes. By studying human digital behaviours, RPA automation enables users to construct
bots. Give your bots instructions, then let them to complete the task. Robotic Process Automation software bots can

The Institute of Cost Accountants of India 653


Financial Management and Business Data Analytics

communicate with any application or system in the same manner that humans can, with the exception that RPA bots
can function continuously, around-the-clock, and with 100 percent accuracy and dependability.
Robotic Process Automation bots possess a digital skill set that exceeds that of humans. Consider RPA bots to
be a Digital Workforce capable of interacting with any system or application. Bots may copy-paste, scrape site
data, do computations, access and transfer files, analyse emails, log into programmes, connect to APIs, and extract
unstructured data, among other tasks. Due to the adaptability of bots to any interface or workflow, there is no need
to modify existing corporate systems, apps, or processes in order to automate.
RPA bots are simple to configure, utilise, and distribute. You will be able to configure RPA bots if you know how
to record video on a mobile device. Moving files around at work is as simple as pressing record, play, and stop
buttons and utilising drag-and-drop. RPA bots may be scheduled, copied, altered, and shared to conduct enterprise-
wide business operations.

Benefits of RPA
(i) Higher productivity
(ii) Higher accuracy
(iii) Saving of cost
(iv) Integration across platforms
(v) Better customer experience
(vi) Harnessing AI
(vii) Scalability

11.5.5 Machine learning


Machine learning (ML) is a branch of study devoted to understanding and developing systems that “learn,” or
ways that use data to improve performance on a set of tasks. Considered a component of artificial intelligence. In
order to generate predictions or conclusions without being explicitly taught to do so, machine learning algorithms
construct a model based on training data and sample data. In applications such as medicine, email filtering, speech
recognition, and computer vision, when it is difficult or impractical to create traditional algorithms to do the
required tasks, machine learning techniques are utilised.
The premise underlying learning algorithms is that tactics, algorithms, and conclusions that performed well in
the past are likely to continue to perform well in the future. These deductions may be clear, such as “because the
sun has risen every morning for the past 10,000 days, it will likely rise again tomorrow.” They can be nuanced, as
in “X% of families include geographically distinct species with colour variations; thus, there is a Y% possibility
that unknown black swans exist.”
Programs that are capable of machine learning can complete tasks without being expressly designed to do so. It
includes computers learning from available data in order to do certain jobs. For basic jobs handed to computers, it
is feasible to build algorithms that instruct the machine on how to perform all steps necessary to solve the problem
at hand; no learning is required on the side of the computer. For complex jobs, it might be difficult for a person to
manually build the necessary algorithms. In reality, it may be more efficient to assist the computer in developing
its own algorithm as opposed to having human programmers describe each step.
The field of machine learning involves a variety of methods to educate computers to perform jobs for which
there is no optimal solution. In situations when there are a large number of viable replies, one strategy is to classify

654 The Institute of Cost Accountants of India


Data Analysis and Modelling

some of the correct answers as legitimate. This information may subsequently be utilised to train the computer’s
algorithm(s) for determining accurate replies.

Approaches towards machine learning


On the basis of the type of “signal” or “feedback” provided to the learning system, machine learning systems are
generally categorised into five major categories:
(i) Supervised learning
Supervised learning algorithms construct a mathematical model of a data set that includes both the inputs
and expected outcomes. The data consists of a collection of training examples and is known as training
data. Each training example consists of one or more inputs and the expected output, sometimes referred
to as a supervisory signal. Each training example in the mathematical model is represented by an array or
vector, sometimes known as a feature vector, and the training data is represented by a matrix. By optimising
an objective function iteratively, supervised learning algorithms discover a function that may be used to
predict the output associated with fresh inputs. A function that is optimum will enable the algorithm to find
the proper output for inputs that were not included in the training data. It is claimed that an algorithm has
“learned” to do a task if it improves its outputs or predictions over time. Active learning, classification, and
regression are examples of supervised-learning algorithms.
Classification algorithms are used when the outputs are limited to a certain set of values, whereas regression
techniques are used when the outputs may take on any value within a given range. For a classification
algorithm that filters incoming emails, for instance, the input would be an incoming email and the output
would be the folder name in which to file the email.
Similarity learning is a subfield of supervised machine learning that is closely connected to regression and
classification, but its objective is to learn from examples by employing a similarity function that quantifies
how similar or related two items are. It has uses in ranking, recommendation systems, monitoring visual
identities, face verification, and speaker verification.
(ii) Unsupervised learning
Unsupervised learning approaches utilise a dataset comprising just inputs to identify data structure,
such as grouping and clustering. Therefore, the algorithms are taught using unlabeled, unclassified, and
uncategorized test data. Unsupervised learning algorithms identify similarities in the data and respond based
on the presence or absence of such similarities in each new data set. In statistics, density estimation, such as
calculating the probability density function, is a fundamental application of unsupervised learning. Despite
the fact that unsupervised learning encompasses additional disciplines, such as data feature summary and
explanation.
Cluster analysis is the process of assigning a set of data to subsets (called clusters) so that observations
within the same cluster are similar based on one or more preset criteria, while observations obtained from
other clusters are different. Different clustering approaches necessitate varying assumptions regarding the
structure of the data, which is frequently characterised by a similarity metric and evaluated, for example, by
internal compactness, or the similarity between members of the same cluster, and separation, the difference
between clusters. Other methods rely on estimated graph density and connectivity.
(iii) Semi supervised learning
Semi-supervised learning is intermediate between unsupervised learning (without labelled training data)
and supervised learning (with completely labelled training data). Many machine-learning researchers have
discovered that when unlabeled data is combined with a tiny quantity of labelled data, there is a significant
gain in learning accuracy.

The Institute of Cost Accountants of India 655


Financial Management and Business Data Analytics

In poorly supervised learning, the training labels are noisy, restricted, or inaccurate; yet, these labels are
frequently less expensive to acquire, resulting in larger effective training sets.
(iv) Reinforcement learning
Reinforcement learning is a subfield of machine learning concerned with determining how software agents
should operate in a given environment so as to maximise a certain concept of cumulative reward. Due to
the field’s generic nature, it is explored in several different fields, including game theory, control theory,
operations research, information theory, simulation-based optimization, multi-agent systems, swarm
intelligence, statistics, and genetic algorithms. The environment is generally represented as a Markov
decision process in machine learning (MDP). Many methods for reinforcement learning employ dynamic
programming techniques. Reinforcement learning techniques do not need prior knowledge of an accurate
mathematical model of the MDP and are employed when exact models are not practicable. Autonomous cars
and learning to play a game against a human opponent both employ reinforcement learning algorithms.
(v) Dimensionality reduction
Dimensionality reduction is the process of acquiring a set of major variables in order to reduce the number
of random variables under consideration. In other words, it is the process of lowering the size of the feature
set, which is also referred to as the “number of features.” The majority of dimensionality reduction strategies
may be categorised as either deletion or extraction of features. Principal component analysis is a well-known
technique for dimensionality reduction (PCA). PCA includes transforming data with more dimensions
(e.g., 3D) to a smaller space (e.g., 2D). This results in a decreased data dimension (2D as opposed to 3D),
while retaining the original variables in the model and without altering the data. Numerous dimensionality
reduction strategies assume that high-dimensional data sets reside along low-dimensional manifolds, leading
to the fields of manifold learning and manifold regularisation.

656 The Institute of Cost Accountants of India


Data Analysis and Modelling

Model vs. Data-driven


11.6
Decision-making

I
n artificial intelligence, there are two schools of thought: data-driven and model-driven. The data-driven
strategy focuses on enhancing data quality and data governance in order to enhance the performance of a
particular problem statement. In contrast, the model-driven method attempts to increase performance by
developing new models and algorithmic manipulations (or upgrades). In a perfect world, these should go
hand in hand, but in fact, model-driven techniques have advanced far more than data-driven ones. In terms of
data governance, data management, data quality handling, and general awareness, there is still much room for
improvement.
Recent work on Covid-19 serves as an illustration in this perspective. While the globe was struggling from the
epidemic, several AI-related projects emerged. Whether it’s recognising Covid-19 from a CT scan, X-ray, or other
medical imaging, estimating the course of the disease, or even projecting the overall number of fatalities, artificial
intelligence is essential. On the one hand, this extensive effort around the globe has increased our understanding
of the illness and, in certain locations, assisted clinical personnel in their work with vast populations. However,
only few of the vast quantity of work was judged suitable for any actual implementation procedure, such as in the
healthcare industry. Primarily data quality difficulties are responsible for this deficiency in practicality. Numerous
projects and studies utilised duplicate photos from different sources. Even still, training data are notably lacking in
external validation and demographic information. The majority of these studies would fail a systematic review and
fail to reveal biases. Consequently, the quoted performance cannot be applied to real-world scenarios.
A crucial feature of Data science to keep in mind is that poor data will never result in superior performance,
regardless of how strong your model is. Real-world applications require an understanding of systematic data
collection, management, and consumption for a Data Science project. Only then can society reap the rewards of
the ‘wonderful AI’

The Institute of Cost Accountants of India 657


Financial Management and Business Data Analytics

Solved Case 1
Arjun joined as an instructor in a higher learning institution. His responsibility is to teach data analysis
to students. He is particularly interested in teaching analytics and model building. Arjun was preparing a teaching
plan for the new upcoming batch.
What elements do you think, he should incorporate into the plan.
Teaching note - outline for solution:
The instructor may explain first the utility of data analytics from the perspective of business organizations.
He may explain how data analytics may translate their discoveries into insights that eventually aid executives,
managers, and operational personnel in making more educated and prudent business choices.
He may further explain the four forms of data analytics:
(i) Descriptive analytics
(ii) Diagnostic analytics
(iii) Predictive analytics
(iv) Prescriptive analytics
The instructor should explain each of the terms along with their appropriateness in using under real-life problem
situations.
The advantages and disadvantages of using each of the methods should also be discussed thoroughly.

658 The Institute of Cost Accountants of India


Data Analysis and Modelling

Exercise
A. Theoretical Questions:
~~ Multiple Choice Questions

1. Following are the benefits of data analytics


(a) Improves decision making process
(b) Increase in efficiency of operations
(c) Improved service to stakeholders
(d) All of the above
2. Following are the techniques of data mining
(a) Association rules
(b) Neural network
(c) Decision tree
(d) All of the above
3. XML is the abbreviated form of
(a) Extensible mark-up language
(b) Extended mark-up language
(c) Extendable mark-up language
(d) Extensive mark-up language
4. XBRL is the abbreviated form of
(a) eXtensible Business Reporting Language
(b) eXtensive Business Reporting Language
(c) eXtended Business Reporting Language
(d) eXtensive Business Reporting Language
5. Following are the types of cloud computing
(a) Private cloud
(b) Public cloud
(c) Hybrid cloud
(d) All of the above
Answer:

1 d 2 d 3 a 4 a 5 d

~~ State True or False


1. Decision tree classifies or predicts likely outcomes based on a collection of decisions.
2. K-nearest neighbour, often known as the KNN algorithm, classifies data points depending on their
closeness to and correlation with other accessible data.

The Institute of Cost Accountants of India 659


Financial Management and Business Data Analytics

3. Utilizing data mining techniques, hidden patterns and future trends and behaviours in financial
markets may be predicted.
4. Social analytics are virtually always a type of descriptive analytics.
5. Diagnostic analytics highlights the tools are employed to question the data, “Why did this occur?”
Answer:

1 T 2 T 3 T 4 T 5 T

~~ Fill in the blanks

1. Data analytics helps us to identify patterns in the raw ________ and extract useful information from
them.
2. Through smart _________ analytics, data mining has enhanced corporate decision making.
3. Data __________ techniques are utilised to develop descriptions and hypotheses on a specific data set.
4. Data mining typically involves _________ steps.
5. Primarily utilised for deep learning algorithms, ___________ replicate the interconnection of the human
brain through layers of nodes to process training data.
Answer:
1 data 2 Data
3 mining 4 Four
5 neural network

~~ Short essay type questions

1. What are descriptive analytics?


2. Define diagnostic analytics.
3. What is the difference between descriptive analytics and prescriptive analytics?
4. Discuss the advantages and disadvantages of prescriptive analytics.
5. How does the prescriptive analytics work?

~~ Essay type questions

1. Discuss the different steps in the process of data analytics.


2. Discuss the benefits of data analytics
3. Define data mining. Discuss the various steps in data mining.
4. Discuss the various techniques of data mining.
5. Discuss various applications of data mining techniques in finance and accounting.

660 The Institute of Cost Accountants of India

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy