P11 Business Analysis
P11 Business Analysis
T
here is a saying ‘data is the new oil’. Over the last few years, with the advent of increasing computing
power and availability of data, the importance and application of data science has grown exponentially. The
field of finance and accounts has not remained untouched from this wave. In fact, to become an effective
finance and accounts professional, it is very important to understand, analyse and evaluate data sets.
●● 2,4,6,8...........
●● Amul, Nestle, ITC..........
●● 36,37,38,35,36............
If we say that the first series in figure 8.2 is really the first four numbers of multiplication table of 2, the third series
is the highest temperature of Kolkata during previous four days, we are actually discovering some information out
of the raw data.
So, we may say now
stock price data. Figure 8.5 below is showing the daily stock prices of HUL stock. This is an example of
numerical data.
(ii) Descriptive data: Some times information may be deciphered in the form of qualitative information.
Look at the paragraph in figure 8.6 extracted from annual report of HUL (2021-22). This is a descriptive
data provided by HUL in its annual report (2021-22). The user may use this data to make a judicious
investment decision.
Figure 8.6: Descriptive data extracted from HUL annual report (2021-22)
(iii) Graphic data: A picture or graphic may tell thousand stories. Data may also be presented in the form of a
picture or graphics. For example, the stock price of HUL may be presented in the form of a picture or chart
(Figure 8.7)
D
ata plays a very important role in the study of finance and cost accounting. From the inception of the
study of finance, accounting and cost accounting, data always played an important role. Be it in the
form of financial statements, or cost statements etc the finance and accounting professionals played a
significant role in helping the management to make prudent decisions.
The kinds of data used in finance and costing may be quantitative as well as qualitative in nature.
~~ Quantitative financial data: By the term ‘quantitative data’, we mean the data expressed in numbers.
The quantitative data availability in finance is significant. The stock price data, financial statements etc
are examples of quantitative data. As most of the financial records are maintained in the form of organised
numerical data.
~~ Qualitative financial data: However, some data in financial studies may appear in a qualitative format
e.g. text, videos, audio etc. These types of data may be very useful for financial analysis. For example, the
‘management discussion and analysis’ presented as part of annual report of a company is mostly presented
in the form of text. This information is useful for getting an insight into the performance of the business.
Similarly, key executives often appear for an interview in business channels. These interactions are often
goldmines for data and information.
Types of data
There is another way of classifying the types of data. The data may be classified also as:
(i) Nominal
(ii) Ordinal
(iii) Interval
(iv) Ratio
Each gives a distinct set of traits that influences the sort of analysis that may be conducted. The differentiation
between the four scale types is based on three basic characteristics:
(a) Whether the sequence of answers matters or not
(b) Whether the gap between observations is significant or interpretable, and
(c) The existence or presence of a genuine zero.
We will briefly discuss these four types below:
(i) Nominal Scale: Nominal scale is being used for categorising data. Under this scale, observations are classified
based on certain characteristics. The category labels may contain numbers but have no numerical value.
Examples could be, classifying equities into small-cap, mid-cap, and large-cap categories or classifying
funds as equity funds, debt funds, and balanced funds etc.
(ii) Ordinal Scale: Ordinal scale is being used for classifying and put it in order. The numbers just indicate an
order. They do not specify how much better or worse a stock is at a specific price compared to one with a
lower price. For example, the top 10 stocks by P/E ratio
(iii) Interval scale: Interval scale is used for categorising and ranking using an equal interval scale. Equal
intervals separate neighbouring scale values. As a result of scale’s arbitrary zero point, ratios cannot be
calculated. For example, temperature scales. The temperature of 40 degrees is 5 degrees higher than that of
35 degrees. The issue is that a temperature of 0 degrees Celsius does not indicate the absence of temperature.
A temperature of 20 degrees is thus not always twice as hot as a temperature of 10 degrees.
(iv) Ratio scale: The ratio scale possesses all characteristics of the nominal, ordinal, and interval scales. The
acquired data can not only be classified and rated on a ratio scale, but also have equal intervals. A ratio scale
has a true zero, meaning that zero has a significant value. The genuine zero value on a ratio scale allows for
the magnitude to be described. For example, length, time, mass, money, age, etc. are typical examples of
ratio scales. For data analysis, a ratio scale may be utilised to measure sales, pricing, market share, and client
count.
I
n plain terms, digitization implies the process of converting the data and information from analogue to digital
format. The data in the original form may be stored in as an object, a document or an image. The objective
of digitization is to create a digital surrogate of the data and information in the form of binary numbers that
facilitate processing using computers. There are primarily two basic objectives of digitization. First is to
provide a widespread access of data and information to a very large group of users simultaneously. Secondly,
digitization helps in preservation of data for a longer period. One of largest digitization project taken up in India is
‘Unique Identification number’ (UID) or ‘Aadhar’ (figure 8.8).
Figure 8.8:UID – Aadhar – globally one of the largest projects of digitization (Source: https://uidai.gov.
in/about-uidai/unique-identification-authority-of-india/vision-mission.html)
Digitization brings in some great advantages, which are mentioned below.
Why we digitize?
There are many arguments that favour digitization of records. Some of them are mentioned below:
●● Improves classification and indexing for documents, this helps in retrieval of the records.
How do we digitize?
Large institution takes up digitization projects with meticulous planning and execution. The entire process of
digitization may be segregated into six phases:
Phase 2: Assessment
In any institutions, all records are never digitized. The data that requires digitization is to be decided on the basis
of content and context. Some data may be digitized in a consolidated format, and some in detailed format. The files,
tables, documents, expected future use etc are to be accessed and evaluated for the assessment.
The hardware and software requirements for digitization is also assessed at this stage. The human resource
requirement for executing the digitization project is also planned. The risk assessment at this level e.g. possibilities
of natural disasters, and/or cyber attacks etc also need to be completed.
Phase 3: Planning
Successful execution of digitization project needs meticulous planning. There are several stages for planning e.g.
selection of digitization approach, Project documentation, Resources management, Technical specifications, and
Risk management.
The institution may decide to complete the digitization in-house or alternatively by an outsourced agency. It may
also be done on-demand or in batches.
Metadata etc is done at the editing stage. A final check of quality is done on randomly selected samples. And
finally, user copies are created, and uploaded to dedicated storage space, after doing file validation. The digitization
process may be viewed at figure 8.9 below.
Text
Cropping
Image
Erasing Table Save
PDF PDF
JPEG DOC
GIF RTF
Save HTML
TIFF
etc.... EXCEL
PPT
IR / DL
etc..
Figure 8.9: The complete digitization process. Source: Bandi, S., Angadi, M. and Shivarama, J. Best
practices in digitization: Planning and workflow processes. In Proceedings of the Emerging
Technologies and Future of Libraries: Issues and Challenges (Gulbarga University,
Karnataka, India, 30-31 January), 2015
Phase 6: Evaluation
Once the digitization project is updated and implemented, the final phase should be a systematic determination
of the project’s merit, worth and significant using objective criteria. The primary purpose is to enable reflection and
assist identify changes that would improve future digitization processes.
Transformation of Data to
8.4
Decision Relevant Information
T
he emergence of big data has changed the world of business like never before. The most important shift
has happened in the information generation and the decision-making process. There is a strong emergence
of analytics that supports a more intensive data-centric and data-driven information generation and
decision-making process. The data that encompasses the organization is being harnessed into information
that apprises, cares and prudent decision making in a judicious and repeatable manner.
The pertinent question here is, What an enterprise needs to do for transforming data into relevant information?
As noted earlier, all types of data may not lead to relevant information for decision making. The biannual KPMG
global CFO report says, for today’s finance function leaders, “biggest challenges lie in creating the efficiencies
needed to gather and process basic financial data and continue to deliver traditional finance outputs while at the
same time redeploying their limited resources to enable higher-value business decision support activities.”
For understating the finance functions within an enterprise, we may refer figure 8.10 below:
To make the data turn into user friendly information, it should go through six core steps:
1. Collection of data: The collection of data may be done with standardized systems in place. Appropriate
software and hardware may be used for this purpose. Appointment of trained staff also plays an important
role in collecting accurate and relevant data.
2. Organising the data: The raw data needs to be organized in an appropriate manner to generate relevant
information. The data may be grouped, arranged in a manner that create useful information for the target user
groups.
3. Data processing: At this step, data needs to be cleaned to remove the unnecessary elements. If any data
point is missing or not available, that also need to be addressed. The options available for presentation format
for the data also need to be decided.
4. Integration of data: Data integration is the process of combining data from various sources into a single,
unified form. This step include creation of data network sources, a master server and users accessing the data
from master server. Data integration eventually enables the analytics tools to produce effective, actionable
business intelligence.
5. Data reporting: Data reporting stage involves translating the data into a consumable format to make
it accessible by the users. For example, for a business firm, they should be able to provide summarized
financial information e.g. revenue, net profit etc. The objective is, a user, who wants to understand the
financial position of the company should get the relevant and accurate information.
6. Data utilization: At this ultimate step, data is being utilized to back corporate activities and enhance
operational efficiencies and productivity for the growth of business. This makes the corporate decision
making really ‘data driven’.
Communication of Information
8.5
for Quality Decision-making
T
he quality information should lead to quality decisions. With the help of well curated and reported data, the
decision makers should be able to add higher-value business insights leading to better strategic decision
making.
In a sense, a judicious use of data analytics is essential for implementation of ‘lean finance’, which implies
optimized finance processes with reduced cost and increased speed, flexibility and quality. By transforming the
information into a process for quality decision making, the firm should achieve the following abilities:
(i) Logical understanding of a wide-ranging structured and unstructured data and put on that information to
corporate planning, budgeting and forecasting and decision support
(ii) Predict outcomes more effectively compared to conventional forecasting techniques based on historical
financial reports
(iii) Real time spotting of emerging opportunities and also capability gaps.
(iv) Making strategies for responding to uncertain events like market volatility and ‘black swan’ events through
simulation.
(v) Diagnose, filter and excerpt value from financial and operational information for making better business
decisions
(vi) Recognize viable advantages to service customers in a better manner
(vii) Identifying possible fraud possibilities on the basis of data analytics.
(viii) Building impressive and useful dashboards to measure and demonstrate success leading to effective
strategies.
The aim of a data driven business organization is develop a business intelligence (BI) system that is not only
focused on efficient delivery of information but also provide accurate strategic insight into the operational and
financial system. This impacts the organizational capabilities in a positive manner. This makes the organization
resilient to market pressures and create competitive advantages by serving customers in better way by using data
and predictive analytics.
Professional Scepticism
8.6
Regarding Data
W
hile data analytics is an important tool for decision making, managers should never take an important
analysis at face value. A deeper understanding of hidden insights that lie underneath the surface of the
data set need to be explored, and what appears on the surface should be looked with some scepticism.
The emergence of new data analytics tools and techniques in financial environment allows the accounting and
finance professionals to gain unique insights into the data, but at the same time creating very unique challenges
while exercising scepticism. As the availability of data is bigger now, analysts and auditors not only getting more
information, but also is facing challenges about managing and investigating red flags.
One major concern about the use of data analytics is the likelihood of false positives, i.e. the data may identify
few potential anomalies that could be later identified as reasonable and explained variation of data.
Studies show that the frequency of false positives increase proportionately with the size and complexity of data.
Few studies also show that analysts face problems while determining outliers using data analytics tools.
Professional scepticism is an important focus area for practitioners, researchers, regulators and standard setters.
At the same time, professional scepticism may result into additional costs e.g. strained client relationships, and
budget coverages.
Under such circumstances, it is important to identify and understand conditions in which the finance and audit
professionals should apply professional scepticism. There is a requirement to keep a fine balance between costly
scepticism and underutilizing data analytics to keep the cost under control.
D
ata analytics can help in decision making process and make an impact. However, this empowerment
for business also comes with challenges. The question is how the business organizations can ethically
collect, store and use data? And what rights need to be upheld? Below we will discuss five guiding
principles in this regard. Data ethics addresses the moral obligations of gathering, protecting and
using personally identifiable information. In present days, it is a major concern for analysts, managers and data
professionals.
The five basic principles of data ethics that a business organization should follow are:
(i) Regarding ownership: The first principle is that ownership of any personal information belongs to the
person. It is unlawful and unethical to collect someone’s personal data without their consent. The consent
may be obtained through digital privacy policies or signed agreements or by asking the users to agree with
terms and conditions. It is always advisable to ask for permission beforehand to avoid future legal and
ethical complications. In case of financial data, some data may be sensitive in nature. Prior permission
must be obtained before using the financial data for further analysis.
(ii) Regarding transparency: Maintaining transparency is important while gathering data. The objective with
which the company is collecting user’s data should be known to the user. For example is the company is
using cookies to track the online behaviour of the user, it should be mentioned to the user through a written
policy that cookies would be used for tracking user’s online behaviour and the collected data will be stored
in a secure database to train an algorithm to enhance user experience. After reading the policy, the user
may decide to accept or not to accept the policy. Similarly, while collecting the financial data from clients,
it should be clearly mentioned that for which purpose the data should be used.
(iii) Regarding privacy: As the user may allow to collect, store and analyze the personally identifiable
information (PII), that does not imply it should be made publicly available. For companies, it is mandatory
to publish some financial information to public e.g. through annual reports. However, there may be many
confidential information, which if falls on a wrong hand may create problems and financial loss. To
protect privacy of data, a data security process should be in place. This may include file encryption and
dual authentication password etc. The possibility of breach of data privacy may also be done through de-
identifying a dataset.
(iv) Regarding intention: The intension of data analysis should never be making profits out of others
weaknesses or for hurting others. Collecting data which is unnecessary for analysis should be avoided and
it’s unethical.
(v) Regarding outcomes: In some cases, even if the intentions are good, the result of data analysis may
inadvertently hurt the clients and data providers. This is called disparate impact, which is unethical.
Solved Case 1
Mr. Arjun is working as data analyst with Manoj Enterprises Limited. He was invited by an educational institute
to deliver a lecture on data analysis. He was told that the participants would be fresh graduates, who would like
get a glimpse of the emerging field of ‘data analysis’. He was planning for the lecture and is thinking of the
concepts to be covered during the lecture.
In your opinion, which are the fundamental concepts that Arjun should cover in his lecture.
Teaching note - outline for solution:
While addressing the fresh candidates, Arjun may focus on explaining the basic concepts on data analysis. He
may initiate the discussion with a brief introduction on ‘data’. He may discuss with examples, how mere data is
not useful for decision making. Next, he may move to discussion of link among data, information and knowledge.
The participants should get a clear idea about the formation of knowledge using ‘raw’ data as resource.
Once the basic concepts about data, information and knowledge is clear in the minds of participants, Arjun may
describe the various types of data e.g. numerical data, descriptive data and graphical data. He may explain the
concepts with some real-life examples. Further, he may also discuss another way of looking at data e.g. ordinal
scale, ratio scale etc.
How the data analysis is particularly useful for finance and accounting functions may be discussed next. The
difference between quantitative and qualitative data can be discussed next with help of few practical examples.
However, the key question is how the raw data may be transformed into useful information?
To explore the answer to this question, Arjun may discuss the six steps to be followed for transforming data into
information.
The ultimate objective of adopting so much pain is to generate quality decisions. This is a subjective area. Arjun
may seek inputs from participants and discuss various ways of generating relevant and useful decisions by
exploring raw data.
During this entire process of quality decision making, one should not forget the ethical aspects. Arjun should
convey the importance of adopting ethical practices in data analysis.
At the end, Arjun may end the conversation with a thanking note.
Exercise
A. Theoretical Questions:
~~ Multiple Choice Questions:
1 b 2 a 3 a 4 b 5 d
~~ State True or False
1. Improves classification and indexing for documents, this helps in retrieval of the records.
2. Data is not a source of information
3. One of largest digitization project taken up in India is ‘Unique Identification number’ (UID) or ‘Aadhar’
4. When these ‘information’ is used for solving a problem, we may it’s the use of knowledge
5. Any data expressed as a number is a numerical data
Answer:
1 T 2 F 3 T 4 T 5 T
1. Discuss the five basic principles of data ethics that a business organization should follow
2. ‘The quality information should lead to quality decisions’ – Discuss
3. Discuss the six core steps that may turn the data into user friendly information.
4. Discuss the six phases that comprise the entire process of digitization.
5. Why we digitize the data?
Unsolved Case
1. Ram Kumar is the head data scientist of Anjana Ltd. For the last few weeks, he is working along with
his team for extracting information from a huge pile of data collected over time. His team members are
working day and night for collecting and cleaning the data. He has to make a presentation before the senior
management of the company to explain the findings. Discuss the important steps, he need to take care of to
transform raw data into useful knowledge.
References:
●● Data-driven business transformation. KPMG International
●● Davy Cielen, Arno D B Meysman, and Mohamed Ali. Introducing Data Science. Manning Publications Co
USA
●● www.finance.yahoo.com
●● www.google.com
●● Data, Information and Knowledge. Cambridge International
●● Data Analytics and Skeptical Actions: The Countervailing Effects of False Positives and Consistent Rewards
for SkepticismBy Dereck Barr-Pulliam, Joseph Brazel, Jennifer McCallen and Kimberly Walker
●● Annual Report of Hindustan Unilever Limited. (2021-22)
●● www.uidai.gov.in
●● Bandi, S., Angadi, M. and Shivarama, J. Best. Practices in digitization: Planning and workflow processes.
In Proceedings of the Emerging Technologies and Future of Libraries: Issues and Challenges
●● Finance’s Key Role in Building the Data-Driven Enterprise. Harvard Business Review Analytic Services
●● How to Embrace Data Analytics to Be Successful. Institute of Management Accountants. USA
●● The Data Analytics Implementation Journey in Business and Finance. Institute of Management Accountants.
USA
●● Principles of data ethics in business. Business Insights. Harvard Business School.
Development of Data
9.1
Processing
D
ata processing (DP) is the process of organising, categorising, and manipulating data in order to extract
information. Information in this context refers to valuable connections and trends that may be used to
address pressing issues. In recent years, the capacity and effectiveness of DP have increased manifold
with the development of technology.
Data processing that used to require a lot of human labour progressively superseded by modern tools and
technology. The techniques and procedures used in DP information extraction algorithms for data are well
developed in recent years, for instance, the treatment of facial data classification is necessary for recognition, and
time series analysis is necessary for processing stock market data.
The information extracted as a result of DP is also heavily reliant on the quality of the data. Data quality may get
affected due to several issues like missing data and duplications. There may be other fundamental problems, such
as incorrect equipment design and biased data collecting, which are more difficult to address.
The history of DP can be divided into three phases as a result of technological advancements (figure 9.1):
MANUAL DP
MECHANICAL DP
ELECTRONIC DP
Figure 9.1: History of data processing
(i) Manual DP: Manual DP involves processing data without much assistance from machines. Prior to the
phase of mechanical DP only small-scale data processing was possible using manual efforts. However, in
some special cases Manual DP is still in use today, and it is typically due to the data’s difficulty in digitization
or inability to be read by machines, like in the case of retrieving data from outdated texts or documents.
(ii) Mechanical DP: Mechanical DP processes data using mechanical (not modern computers) tools and
technologies. This phase began in 1890 (Bohme et al., 1991) when a system made up of intricate punch card
machines was installed by the US Bureau of the Census in order to assist in compiling the findings of a recent
national population census. Use of mechanical DP made it quicker and easier to search and compute the data
than manual process.
(iii) Electronic DP: And finally, the electronic DP replaced the other two that resulted fall in mistakes and
rising productivity. Data processing is being done electronically using computers and other cutting-edge
electronics. It is now widely used in industry, research institutions and academia.
discover significance in numbers that were days, weeks, months, or even years old since that was the only
accessible information.
It was processed in batches, which meant that no analysis could be performed until a batch of data had
been gathered within a predetermined timescale. Consequently, any conclusions drawn from this data were
possibly invalid.
With technological advancement and improved hardware, real-time analytics are now available, as Data
Engineering, Data Science, Machine Learning, and Business Intelligence work together to provide the
optimal user experience. Thanks to dynamic data pipelines, data streams, and a speedier data transmission
between source and analyzer, businesses can now respond quickly to consumer interactions. With real-time
analysis, there are no delays in establishing a customer’s worth to an organisation, and credit ratings and
transactions are far more precise.
(iii) Customer data management: Data science enables effective management of client data. In recent years,
many financial institutions may have processed their data solely through the machine learning capabilities
of Business Intelligence (BI). However, the proliferation of big data and unstructured data has rendered
this method significantly less effective for predicting risk and future trends.
There are currently more transactions occurring every minute than ever before, thus there is better data
accessibility for analysis. Due to the arrival of social media and new Internet of Things (IoT) devices, a
significant portion of this data does not conform to the structure of organised data previously employed.
Using methods such as text analytics, data mining, and natural language processing, data science is well-
equipped to deal with massive volumes of unstructured new data. Consequently, despite the fact that data
availability has been enhanced, data science implies that a company’s analytical capabilities may also be
upgraded, leading to a greater understanding of market patterns and client behaviour.
(iv) Consumer Analytics: In a world where choice has never been more crucial, it has become evident that
each customer is unique; nonetheless, there have never been more consumers. This contradiction cannot
be sustained without the intelligence and automation of machine learning.
It is as important to ensure that each client receives a customised service as it is to process their data swiftly
and efficiently, without time-intensive individualised analysis.
As a consequence, insurance firms are using real-time analytics in conjunction with prior data patterns
and quick analysis of each customer’s transaction history to eliminate sub-zero consumers, enhance cross-
sales, and calculate a consumer’s lifetime worth. This allows each financial institution to keep their own
degree of security while still reviewing each application individually.
(v) Customer segmentation: Despite the fact that each consumer is unique, it is only possible to comprehend
their behaviour after they have been categorised or divided. Customers are frequently segmented based on
socioeconomic factors, such as geography, age, and buying patterns.
By examining these clusters collectively, organisations in the financial industry and beyond may assess a
customer’s current and long-term worth. With this information, organisations may eliminate clients who
provide little value and focus on those with promise.
To do this, data scientists can use automated machine learning algorithms to categorise their clients based
on specified attributes that have been assigned relative relevance scores. Comparing these groupings to
former customers reveals the expected value of time invested with each client.
(vi) Personalized services: The requirement to customise each customer’s experience extends beyond gauging
risk assessment. Even major organisations strive to provide customised service to their consumers as a
method of enhancing their reputation and increasing customer lifetime value. This is also true for businesses
in the finance sector.
From customer evaluations to telephone interactions, everything can be studied in a way that benefits both
the business and the consumer. By delivering the consumer a product that precisely meets their needs,
cross-selling may be facilitated by a thorough comprehension of these interactions.
Natural language processing (NLP) and voice recognition technologies dissect these encounters into a
series of important points that can identify chances to increase revenue, enhance the customer service
experience, and steer the company’s future. Due to the rapid progress of NLP research, the potential is yet
to be fully realised.
(vii) Advanced customer service: Data science’s capacity to give superior customer service goes hand in
hand with its ability to provide customised services. As client interactions may be evaluated in real-time,
more effective recommendations can be offered to the customer care agent managing the customer’s case
throughout the conversation.
Natural language processing can offer chances for practical financial advise based on what the consumer
is saying, even if the customer is unsure of the product they are seeking.
The customer support agent can then cross-sell or up-sell while efficiently addressing the client’s inquiry.
The knowledge from each encounter may then be utilised to inform subsequent interactions of a similar
nature, hence enhancing the system’s efficacy over time.
(viii) Predictive Analytics: Predictive analytics enables organisations in the financial sector to extrapolate from
existing data and anticipate what may occur in the future, including how patterns may evolve. When
prediction is necessary, machine learning is utilised. Using machine learning techniques, pre-processed
data may be input into the system in order for it to learn how to anticipate future occurrences accurately.
More information improves the prediction model. Typically, for an algorithm to function in shallow
learning, the data must be cleansed and altered. Deep learning, on the other hand, changes the data without
the need for human preparation to establish the initial rules, and so achieves superior performance.
In the case of stock market pricing, machine learning algorithms learn trends from past data in a certain
interval (may be a week, month, or quarter) and then forecast future stock market trends based on this
historical information. This allows data scientists to depict expected patterns for end-users in order to assist
them in making investment decisions and developing trading strategies.
(ix) Fraud detection: With a rise in financial transactions, the risk for fraud also increases. Tracking incidents of
fraud, such as identity theft and credit card scams, and limiting the resulting harm is a primary responsibility
for financial institutions. As the technologies used to analyse big data become more sophisticated, so do
their capacity to detect fraud early on.
Artificial intelligence and machine learning algorithms can now detect credit card fraud significantly more
precisely, owing to the vast amount of data accessible from which to draw trends and the capacity to
respond in real time to suspect behaviour.
If a major purchase is made on a credit card belonging to a consumer who has traditionally been very
frugal, the card can be immediately terminated, and a notification sent to the card owner.
This protects not just the client, but also the bank and the client’s insurance carrier. When it comes to
trading, machine learning techniques discover irregularities and notify the relevant financial institution,
enabling speedy inquiry.
(x) Anomaly detection: Financial services have long placed a premium on detecting abnormalities in a
customer’s bank account activities, partly because anomalies are only proved to be anomalous after the
event happens. Although data science can provide real-time insights, it cannot anticipate singular incidents
of credit card fraud or identity theft.
However, data analytics can discover instances of unlawful insider trading before they cause considerable
harm. The methods for anomaly identification consist of Recurrent Neural Networks and Long Short-Term
Memory models.
These algorithms can analyse the behaviour of traders before and after information about the stock market
becomes public in order to determine if they illegally monopolised stock market forecasts and took
advantage of investors. Transformers, which are next-generation designs for a variety of applications,
including Anomaly Detection, are the foundation of more modern solutions.
(xi) Algorithmic trading: Algorithmic trading is one of the key uses of data science in finance. Algorithmic
trading happens when an unsupervised computer utilising the intelligence supplied by an algorithm trade
suggestion on the stock market. As a consequence, it eliminates the risk of loss caused by indecision and
human error.
The trading algorithm used to be developed according to a set of stringent rules that decide whether it
will trade on a specific market at a specific moment (there is no restriction for which markets algorithmic
trading can work on).
This method is known as Reinforcement Learning, in which the model is taught using penalties and rewards
associated with the rules. Each time a transaction proves to be a poor option, a model of reinforcement
learning ensures that the algorithm learns and adapts its rules accordingly.
One of the primary advantages of algorithmic trading is the increased frequency of deals. Based on facts
and taught behaviour, the computer can operate in a fraction of a second without human indecision or
thought. Similarly, the machine will only trade when it perceives a profit opportunity according to its rule
set, regardless of how rare these chances may be.
It is also vitally necessary to organise visualisations (tables, charts, etc.) correctly to facilitate accurate data
interpretation. In market research, for instance, it is typical to sort the findings of a single-response question
by column percentage, i.e. from most answered to least replied, as indicated by the following brand preference
question.
Incorrect classification frequently results in misunderstanding. Always verify that the most logical sorts are used
to every visualisation.
Using sorting functions is an easy idea to comprehend, but there are a few technical considerations to keep
in mind. The arbitrary sorting of non-unique data is one such issue. Consider, for example, a data collection
comprising region and nation variables, as well as several records per area. If a region-based sort is implemented,
what is the default secondary sort? In other words, how will the data be sorted inside each region?
This depends on the application in question. Excel, for instance, will preserve the original sort as the default
sort order following the execution of the primary sort. SQL databases do not have a default sort order. This rather
depends on other variables, such the database management system (DBMS) in use, indexes, and other variables.
Other programmes may perform extra sorting by default based on the column order.
In nearly every level of data processing, the vast majority of analytical and statistical software programmes offer
a variety of sorting options.
(iii) Aggregation:
Data aggregation refers to any process in which data is collected and summarised. When data is aggregated,
individual data rows, which are often compiled from several sources, are replaced with summaries or totals. Groups
of observed aggregates are replaced with statistical summaries based on these observations. A data warehouse often
contains aggregate data since it may offer answers to analytical inquiries and drastically cut the time required to
query massive data sets.
A common application of data aggregation is to offer statistical analysis for groups of individuals and to provide
relevant summary data for business analysis. Utilizing software tools known as data aggregators, large-scale data
aggregation is in commonplace. Typically, data aggregators comprise functions for gathering, processing, and
displaying aggregated data.
Data aggregation enables analysts to access and analyse vast quantities of data in a reasonable amount of time.
A single row of aggregate data may represent hundreds, thousands, or even millions of individual data entries. As
data is aggregated, it may be queried rapidly as opposed to taking all processing cycles to acquire each individual
data row and aggregate it in real time when it is requested or accessed.
As the amount of data kept by businesses continues to grow, aggregating the most significant and often requested
data can facilitate their efficient access.
(iv) Analysis:
Data analysis is described as the process of cleaning, converting, and modelling data to obtain actionable business
intelligence. The objective of data analysis is to extract relevant information from data and make decisions based
on this knowledge.
Every time we make a decision in our day-to-day lives, we consider what occurred previously or what would
occur if we choose a specific option. This is a simple example of data analysis. This is nothing more than studying
the past or the future and basing judgments on that analysis. We do so by recalling our history or by imagining our
future. That consists solely of data analysis. Now, the same task that an analyst does for commercial goals is known
as Data Analysis.
Analysis is sometimes all that is required to expand your business and finance.
If any firm is not expanding, it must admit past errors and create a new plan to avoid making the same mistakes.
And even if the firm is expanding, it must anticipate making it expand even more. All that is required is an analysis
of the business data and operations.
able to prioritise the most pertinent data. There is no space for error in data reporting, which necessitates high
thoroughness and attention to detail. The capacity to comprehend and organise enormous volumes of information
is another valuable talent. Lastly, the ability to organise and present data in an easy-to-read fashion is essential for
all data reporters.
Excellence in data reporting does not necessitate immersion in coding or proficiency in analytics. Other necessary
talents include the ability to extract vital information from data, to keep things simple, and to prevent data hoarding.
Although static reporting can be precise and helpful, it has limitations. One such instance is the absence of real-
time insights. If confronted with a vast volume of data to organise into a usable and actionable format, a report
enables senior management or the sales team to provide guidance on future steps. However, if the layout, data, and
formulae are not given in a timely way, they may be out of current context.
The reporting of data is vital to an organisation’s business intelligence. The more is an organisation’s access
to data, the more agile it may be. This can help a firm to maintain its relevance in a market that is becoming
increasingly competitive and dynamic. An efficient data reporting system will facilitate the formation of judicious
judgments that might steer a business in new areas and provide additional income streams.
(vi) Classification:
Data classification is the process of classifying data according to important categories so that it may be utilised and
safeguarded more effectively. The categorization process makes data easier to identify and access on a fundamental
level. Regarding risk management, compliance, and data security, the classification of data is of special relevance.
Classifying data entails labelling it to make it searchable and trackable. Additionally, it avoids many duplications
of data, which can minimise storage and backup expenses and accelerate the search procedure. The categorization
process may sound very technical, yet it is a topic that your organisation’s leadership must comprehend.
The categorization of data has vastly improved over time. Today, the technology is employed for a number of
applications, most frequently to assist data security activities. However, data may be categorised for a variety
of purposes, including facilitating access, ensuring regulatory compliance, and achieving other commercial or
personal goals. In many instances, data classification is a statutory obligation, since data must be searchable and
retrievable within predetermined deadlines. For the purposes of data security, data classification is a useful strategy
that simplifies the application of appropriate security measures based on the kind of data being accessed, sent, or
duplicated.
Classification of data frequently entails an abundance of tags and labels that identify the data’s kind, secrecy, and
integrity. In data classification procedures, availability may also be taken into account. It is common practise to
classify the sensitivity of data based on changing levels of relevance or secrecy, which corresponds to the security
measures required to safeguard each classification level.
Three primary methods of data classification are recognised as industry standards:
●● Classification based on content, examines and interprets files for sensitive data.
●● Context-based classification considers, among other characteristics, application, location, and creator as
indirect markers of sensitive information.
●● User-based classification relies on the human selection of each document by the end user. To indicate
sensitive documents, user-based classification depends on human expertise and judgement during document
creation, editing, review, or distribution.
In addition to the classification kinds, it is prudent for an organisation to identify the relative risk associated with
the data types, how the data is handled, and where it is stored/sent (endpoints). It is standard practise to divide data
and systems into three risk categories.
~~ Low risk: If data is accessible to the public and recovery is simple, then this data collection and the
mechanisms around it pose a smaller risk than others.
~~ Moderate risk: Essentially, they are non-public or internal (to a business or its partners) data. However, it is
unlikely to be too mission-critical or sensitive to be considered “high risk.” The intermediate category may
include proprietary operating processes, cost of products, and certain corporate paperwork.
~~ High risk: Anything even vaguely sensitive or critical to operational security falls under the category of high
risk. Additionally, data that is incredibly difficult to retrieve (if lost). All secret, sensitive, and essential data
falls under the category of high risk.
Access Only those individuals EMU employees and non- EMU affiliates and general
designated with approved employees who have a public with a need to know
access. - business need to know
Figure 9.4: Sample risk classification matrix
~~ Data distribution
Data distribution is a function that identifies and quantifies all potential values for a variable, as well as their
relative frequency (probability of how often they occur). Any population with dispersed data is categorised
as a distribution. It is necessary to establish the population’s distribution type in order to analyse it using the
appropriate statistical procedures.
Statistics makes extensive use of data distributions. If an analyst gathers 500 data points on the shop floor,
they are of little use to management unless they are categorised or organised in an usable manner. The data
distribution approach arranges the raw data into graphical representations (such as histograms, box plots,
and pie charts, etc.) and gives relevant information.
The primary benefit of data distribution is the estimation of the probability of any certain observation within
a sample space. Probability distribution is a mathematical model that determines the probabilities of the
occurrence of certain test or experiment outcomes. These models are used to specify distinct sorts of random
variables (often discrete or continuous) in order to make a choice. One can employ mean, mode, range,
probability, and other statistical approaches based on the category of the random variable.
~~ Types of distribution
Distributions are basically classified based on the type of data:
(i) Discrete distributions: A discrete distribution that results from countable data and has a finite number
of potential values. In addition, discrete distributions may be displayed in tables, and the values of the
random variable can be counted. Example: rolling dice, selecting a specific amount of heads, etc.
Following are the discrete distributions of various types:
(a) Binomial distributions: The binomial distribution quantifies the chance of obtaining a specific
number of successes or failures each experiment.
Binomial distribution applies to attributes that are categorised into two mutually exclusive and
exhaustive classes, such as number of successes/failures and number of acceptances/rejections.
Example: When tossing a coin: The likelihood of a coin falling on its head is one-half and the
probability of a coin landing on its tail is one-half.
(b) Poisson distribution: The Poisson distribution is the discrete probability distribution that quantifies
the chance of a certain number of events occurring in a given time period, where the events occur in
a well-defined order.
Poisson distribution applies to attributes that can potentially take on huge values, but in practise take
on tiny ones.
Example: Number of flaws, mistakes, accidents, absentees etc.
(c) Hypergeometric distribution: The hypergeometric distribution is a discrete distribution that
assesses the chance of a certain number of successes in (n) trials, without replacement, from a
sufficiently large population (N). Specifically, sampling without replacement.
The hypergeometric distribution is comparable to the binomial distribution; the primary distinction
between the two is that the chance of success is not the same for all trials in the binomial distribution
but it is in the hypergeometric distribution.
(d) Geometric distribution: The geometric distribution is a discrete distribution that assesses the
probability of the occurrence of the first success. A possible extension is the negative binomial
distribution.
Example: A marketing representative from an advertising firm chooses hockey players from several
institutions at random till he discovers an Olympic participant.
(ii) Continuous distributions: A distribution with an unlimited number of (variable) data points that may
be represented on a continuous measuring scale. A continuous random variable is a random variable
with an unlimited and uncountable set of potential values. It is more than a simple count and is often
described using probability density functions (pdf). The probability density function describes the
characteristics of a random variable. Normally clustered frequency distribution is seen. Therefore, the
probability density function views it as the distribution’s “shape.”
~~ Data validation
Data validation is a crucial component of any data management process, whether it is about collecting
information in the field, evaluating data, or preparing to deliver data to stakeholders. If the initial data is not
valid, the outcomes will not be accurate either. It is therefore vital to check and validate data before using it.
Although data validation is an essential stage in every data pipeline, it is frequently ignored. It may appear
like data validation is an unnecessary step that slows down the work, but it is vital for producing the finest
possible outcomes. Today, data validation may be accomplished considerably more quickly than may
have imagined earlier. With data integration systems that can include and automate validation procedures,
validation may be considered as an integral part of the workflow, as opposed to an additional step.
Validating the precision, clarity, and specificity of data is essential for mitigating project failures. Without
data validation, one may into run the danger of basing judgments on faulty data that is not indicative of the
current situation.
In addition to validating data inputs and values, it is vital to validate the data model itself. If the data model
is not appropriately constructed or developed, one may encounter problems while attempting to use data files
in various programmes and software.
The format and content of data files will determine what can be done with the data. Using validation criteria to
purify data before to usage mitigates “garbage in, garbage out” problems. Ensuring data integrity contributes
to the validity of the conclusions.
In certain mapping tools, defining the location of the shop might be challenging. A store’s postal code will also
facilitate the generation of neighborhood-specific data. Without a postal code data verification, it is more probable
that data may lose its value. If the data needs to be recollected or the postal code needs to be manually input, further
expenses will also be incurred.
A straightforward solution to the issue would be to provide a check that guarantees a valid postal code is entered.
The solution may be a drop-down menu or an auto-complete form that enables the user to select a valid postal code
from a list. This kind of data validation is referred to as a code validation or code check.
Solved Case 1
Maitreyee is working as a data analyst with a financial organisation. She is supplied with a large amount of
data, and she plans to use statistical techniques for inferring some useful information and knowledge from it. But,
before starting the process of data analysis, she found that the provided data is not cleaned. She knows that before
applying the data analysis tools, cleaning the data is essential.
In your opinion, what steps Maitreyee should follow to clean the data, and what are the benefits of clean data.
Teaching note - outline for solution:
The instructor may initiate the discussions with explaining the concept of data cleaning and about the importance
of data cleaning.
The instructor may also elaborate the consequences of using an uncleaned dataset on the final analysis. She may
discuss the steps five steps of data cleaning in detail, such as,
(i) Removal of duplicate and irrelevant information
(ii) Fix structural errors
(iii) Filter unwanted outliers
(iv) Handle missing data
(v) Validation and QA
At the outset, Maitreyee should focus on answering the following questions:
(a) Does the data make sense?
(b) Does the data adhere to the regulations applicable to its field?
(c) Does it verify or contradict your working hypothesis, or does it shed any light on it?
(d) Can data patterns assist you in formulating your next theory?
(e) If not, is this due to an issue with data quality?
The instructor may close the discussions with explaining the benefits of using clean data, such as:
(i) Validity
(ii) Accuracy
(iii) Completeness
(iv) Consistency
Exercise
A. Theoretical Questions:
~~ Multiple Choice Questions
1 d 2 a 3 a 4 a 5 a
1. Data validation could be operationally defined as a process which ensures the correspondence of the
final (published) data with a number of quality characteristics.
2. Data analysis is described as the process of cleaning, converting, and modelling data to obtain actionable
business intelligence.
3. Financial data such as revenues, accounts receivable, and net profits are often summarised in a company’s
data reporting.
4. Structured data consists of tabular information that may be readily imported into a database and then
utilised by analytics software or other applications.
5. Data distribution is a function that identifies and quantifies all potential values for a variable, as well as
their relative frequency (probability of how often they occur).
Answer:
1 T 2 T 3 T 4 T 5 T
Unsolved Case(s)
1. Arjun is a data analyst working with Akansha Limited. The company deals in retailing of FMCG products.
The company follows both online mode as well as the offline mode for delivering the services. Over the
years, the company accumulated a huge amount of data.
The management is little puzzled about the ways in which this data may be brought into usable format.
Arjun is entrusted with the responsibility of bringing the data into usable format. Make your suggestions,
how this is to be done.
References:
●● Davy Cielen, Arno D B Meysman, and Mohamed Ali. Introducing Data Science. Manning Publications Co
USA
●● Cathy O’Neil, Rachell Schutt. Doing data science. O’Reilley
●● Joel Grus. Data science from scratch. O’Reilley
●● www.tableau.com
●● www.corporatefinanceinstitute.com
●● Tyler McClain. Data analysis and reporting. Orgsync
●● Marco Di Zio, Nadežda Fursova, Tjalling Gelsema, Sarah Gießing, Ugo Guarnera, Jūratė Petrauskienė, Lucas
Quensel-von Kalben, Mauro Scanu, K.O. ten Bosch, Mark van der Loo, Katrin Walsdorfer. Methodology of
data validation
●● Barbara S Hawkins, and Stephen W Singer. Design, development and implementation of data processing
system for multiple control trials and epidemiologic studies. Controlled clinical trials (1986)
This is particularly true for finance, which is becoming the data hub of the majority of progressive enterprises.
David A.J. Axson of Accenture highlights in his paper “Finance 2020: Death by Digital” that finance is transitioning
from “an expenditure control, spreadsheet-driven accounting and reporting centre” to “a predictive analytics
powerhouse that generates business value.”
Finance is able to communicate these analytic findings to the entire business through the use of data visualisation.
Several studies indicate that sixty five percent of individuals are visual learners. Giving decision makers an
opportunity to have visual representations of facts improves comprehension and can eventually lead to better
judgments (Figure 10.2).
In addition, the technique of developing data visualisations may aid finance in identifying more patterns and
gaining deeper insights, particularly when many data sources or interactive elements are utilised. For example,
contemporary finance professionals frequently monitor both financial and non-financial KPIs. Data visualisation
may assist in correlating these variables, revealing relationships, and elucidating the actions required to enhance
performance.
The amount of data analysed by financial teams has grown dramatically. Data visualisations may help the team
convey its strategic findings more effectively throughout the enterprise.
T
he absence of data visualisation would make it difficult for organisations to immediately recognise data
patterns. The graphical depiction of data sets enables analysts to visualise new concepts and patterns.
With the daily increase in data volume, it is hard to make sense of the quintillion bytes of data without
data proliferation, which includes data visualisation.
Every company may benefit from a better knowledge of their data, hence data visualisation is expanding into
all industries where data exists. Information is the most crucial asset for every organisation. Through the use of
visuals, one may effectively communicate their ideas and make use of the information.
Dashboards, graphs, infographics, maps, charts, videos, and slides may all be used to visualise and comprehend
data. Visualizing the data enables decision-makers to interrelate the data to gain better insights and capitalises on
the following objectives of data visualisation:
●● Making a better data analysis:
Analysing reports assists company stakeholders’ in focusing their attention on the areas that require it. The
visual mediums aid analysts in comprehending the essential business issues. Whether it is a sales report or a
marketing plan, a visual representation of data assists businesses in increasing their profits through improved
analysis and business choices.
●● Faster decision making:
Visuals are easier for humans to process than tiresome tabular forms or reports. If the data is effectively
communicated, decision-makers may move swiftly on the basis of fresh data insights, increasing both
decision-making and corporate growth.
●● Analysing complicated data:
Data visualisation enables business users to obtain comprehension of their large quantities of data. It is
advantageous for them to identify new data trends and faults. Understanding these patterns enables users to
focus on regions that suggest red flags or progress. In turn, this process propels the firm forward.
The objective of data visualisation is rather obvious. It is to interpret the data and apply the information for the
advantage of the organisation. Its value increases as it is displayed. Without visualisation, it is difficult to rapidly
explain data discoveries, recognise trends to extract insights, and engage with data fluidly.
Without visualisation, data scientists won’t be able to see trends or flaws. Nonetheless, it is essential to effectively
explain data discoveries and extract vital information from them. And interactive data visualisation tools make all
the difference in this regard.
The continuing epidemic is a current example that is both topical and recent. However, data visualisation assists
specialists in remaining informed and composed despite the volume of data.
(i) Data visualisation enhances the effect of communications for the audiences and delivers the most convincing
data analysis outcomes. It unites the organisation’s communications systems across all organisations and
fields.
(ii) Visualisation allows to interpret large volumes of data more quickly and effectively at a glance. It facilitates
a better understanding of the data for measuring its impact on the business and graphically communicates
the knowledge to internal and external audiences.
(iii) One cannot make decisions in a vacuum. Data and insights available to decision-makers facilitate decision
analysis. Unbiased data devoid of mistakes enables access to the appropriate information and visualisation
to convey and maintain the relevance of that information.
According to an article published by Harvard Business Review (HBR), the most common errors made by analysts
that makes a data visualisation unsuccessful are:
●● Understanding the audience:
As mentioned earlier, before incorporating the data into visualisation, the objective should be fixed, which is
to present large volumes of information in a way that decision-makers can readily ingest. A great visualisation
relies on the designer comprehending the intended audience and executing on three essential points:
(i) Who will read and understand the material and how will they do so? Can it be presumed that it
understands the words and ideas employed, or if there is a need to provide it with visual cues (e.g., a
green arrow indicating that good is ascending)? A specialist audience will have different expectations
than the broader public.
(ii) What are the expectations of the audience, and what information is most beneficial to them?
(iii) What is the functional role of the visualisation, and how may users take action based on it? A
visualisation that is exploratory should leave viewers with questions to investigate, but visualisations
that are instructional or confirmatory should not.
●● Setting up a clear framework
The designer must guarantee that all viewers have the same understanding of what the visualisation represents.
To do this, the designer must establish a framework consisting of the semantics and syntax within which the
data information is intended to be understood. The semantics pertain to the meaning of the words and images
employed, whereas the syntax is concerned with the form of the communication. For instance, when utilising
an icon, the element should resemble the object it symbolises, with size, colour, and placement all conveying
significance to the viewer.
Lines and bars are basic, schematic geometric forms that are important to several types of visualisations; lines
join, implying a relationship. On the other hand, bars confine and divide. In experiments, when participants
were asked to analyse an unlabeled line or bar graph, they viewed lines as trends and bars as discrete
relations, even when these interpretations were inconsistent with the nature of the underlying data.
There is one more component to the framework: Ensure that the data is clean and that the analyst understands
its peculiarities before doing anything else. Does the data set have outliers? How is it allocated? Where
does the data contain holes? Are there any assumptions regarding the data? Real-world data is frequently
complicated, of varied sorts and origins, and not necessarily dependable. Understanding the data can assist
the analyst in selecting and employing an effective framework.
●● Telling a story
In its instructional or positive role, visualisation is a dynamic type of persuasion. There are few kinds of
communication as convincing as a good story. To do this, the visualisation must give the viewer a story.
Stories bundle information into a framework that is readily recalled, which is crucial in many collaborative
circumstances in which the analyst is not the same person as the decision-maker or just has to share knowledge
with peers. Data visualisation lends itself nicely to becoming a narrative medium, particularly when the tale
comprises a large amount of data.
Storytelling assists the audience in gaining understanding from facts. Information visualisation is a technique
that turns data and knowledge into a form that is perceivable by the human visual system. The objective is
to enable the audience to see, comprehend, and interpret the information. Design strategies that favour
specific interpretations in visuals that “tell a narrative” can have a substantial impact on the interpretation of
the end user.
In order to comprehend the data and connect with the Visualisation’s audience, creators of visualisations
must delve deeply into the information. Good designers understand not only how to select the appropriate
graph and data range, but also how to create an engaging story through the visualisation.
Data Presentation
10.3
Architecture
D
ata presentation architecture (DPA) is a set of skills that aims to identify, find, modify, format, and
present data in a manner that ideally conveys meaning and provides insight. According to Kelly Lautt,
“data Presentation Architecture (DPA) is a rarely applied skill set critical for the success and value of
Business Intelligence. Data presentation architecture weds the science of numbers, data and statistics
in discovering valuable information from data and making it usable, relevant and actionable with the arts of data
Visualisation, communications, organisational psychology and change management in order to provide business
intelligence solutions with the data scope, delivery timing, format and Visualisations that will most effectively
support and drive operational, tactical and strategic behaviour toward understood business (or organisational)
goals. DPA is neither an IT nor a business skill set but exists as a separate field of expertise. Often confused with
data Visualisation, data presentation architecture is a much broader skill set that includes determining what data
on what schedule and in what exact format is to be presented, not just the best way to present data that has already
been chosen (which is data Visualisation). Data Visualisation skills are one element of DPA.”
Objectives
There are following objectives of DPA:
(i) Utilize data to impart information in the most efficient method feasible (provide pertinent, timely and
comprehensive data to each audience participant in a clear and reasonable manner that conveys important
meaning, is actionable and can affect understanding, behaviour and decisions).
(ii) To utilise data to deliver information as effectively as feasible (minimise noise, complexity, and unneeded
data or detail based on the demands and tasks of each audience).
Scope of DPA
In the light of abovementioned objectives, the scope of DPA may be defined as:
(i) Defining significant meaning (relevant information) required by each audience member in every scenario.
(ii) Obtaining the proper data (focus area, historic reach, extensiveness, level of detail, etc.)
(iii) Determining the needed frequency of data refreshes (the currency of the data)
(iv) determining the optimal presentation moment (the frequency of the user needs to view the data)
(v) Using suitable analysis, categorization, visualisation, and other display styles
(vi) Developing appropriate delivery techniques for each audience member based on their job, duties, locations,
and technological access
10.4.1 Dashboard
A data visualisation dashboard (Figure 10.3) is an interactive dashboard that enables to manage important metrics
across numerous financial channels, visualise the data points, and generate reports for customers that summarise
the results.
Creating reports for your audience is one of the most effective means of establishing a strong working relationship
with them. Using an interactive data dashboard, the audience would be able to view the performance of their
company at a glance.
On addition to having all the data in a single dashboard, a data visualisation dashboard helps to explain what the
company is doing and why, also fosters client relationships, and gives a data set to guide decision-making.
There are numerous levels of dashboards, ranging from those that represent metrics vital to the firm as a whole
to those that measure values vital to teams inside an organisation. For a dashboard to be helpful, it must be
automatically or routinely updated to reflect the present condition of affairs.
Figure 10.3: A sample dashboard showing traveller spend analysis using Tableau (Source: https://www.tableau.com/)
Figure 10.4: Bar chart showing the change in EVA for Hindustan Unilever Ltd. (Source: HUL annual
report for the year 2021-22)
Figure 10.5: Line graph: The movement of HUL share price over time (Source: HUL annual report for the
year 2021-22)
Figure 10.6: Pie Chart - Categories of HUL shareholders as on 31st March 2022 (Source: HUL annual
report for the year 2021-22)
(iv) Map:
For displaying any type of location data, including postal codes, state abbreviations, country names, and
custom geocoding, maps are a no-brainer. If the data is related with geographic information, maps are a
simple and effective approach to illustrate the relationship.
There should be a correlation between location and the patterns in the data. Such as insurance claims by
state and product export destinations by country, automobile accidents by postal code, and custom sales
areas etc.
Figure 10.8: Cyclone hazard prone districts of India considering all the parameters and wind based on
BMTPC Atlas (Source: www.ndma.gov.in)
(vi) Scatter plots
Scatter plots are a useful tool for examining the connection between many variables, revealing whether one
variable is a good predictor of another or whether they tend to vary independently. A scatter plot displays
several unique data points on a single graph.
Figure 10.9: Grouped Scatterplot - Number of holdings and Irrigated area in Andhra Pradesh. (Source:
indiadataportal.com)
(vii) Gantt Chart
Gantt charts represent a project’s timeline or activity changes across time. A Gantt chart depicts tasks that
must be accomplished before others may begin, as well as the allocation of resources. However, Gantt
charts are not restricted to projects. This graphic can depict any data connected to a time series. Figure
10.10 depicts the Gnatt chart of a project.
Figure 10.11: Bubble chart: the proportions of professions of people who generate programming languages
(Source: Wikipedia)
(ix) Histogram
Histograms illustrate the distribution of the data among various groups. Histograms divide data into
discrete categories (sometimes known as “bins”) and provide a bar proportionate to the number of entries
inside each category. This chart type might be used to show data such as number of items. Figure 10.12 is
showing the sample histogram chart showing the frequency of something in terms of age.
6 Frequency
4
Frequency
0 5 10 15 20 More
Age
10.4.3 Tables
Tables, often known as “crosstabs” or “matrices,” emphasise individual values above aesthetic formatting. They
are one of the most prevalent methods for showing data and, thus, one of the most essential methods for analysing
data. While their focus is not intrinsically visual, as reading numbers is a linguistic exercise, visual features may be
added to tables to make them more effective and simpler to assimilate.
Tables are most frequently encountered on websites, as part of restaurant menus, and within Microsoft Excel.
It is crucial to know how to interpret tables and make the most of the information they provide since they are
ubiquitous. It is also crucial for analysts and knowledge workers to learn how to make information easier for their
audience to comprehend.
values within each metric, with clearly labelled columns indicating their significance. In contrast to the majority of
charts, tables may arrange qualitative data and show their linkages.
Analysts typically utilise tables to view specific values. They facilitate the identification of measurements or
dimensions across a set of intervals (e.g., what was the company’s profit in November 2018) (Ex. How many
sales did each person close in 2019). A summary table may also efficiently summarise a huge data collection by
providing subtotals and grand totals for each interval or dimension. The problem with tables is that they scale
poorly. More than ten to fifteen rows and five columns make the table difficult to read, comprehend, and get insight
from. This is because a table engages the brain’s linguistic systems whereas data visualisation excites the brain’s
visual systems.
Adding visual features to the table will allow users to obtain understanding from the data more quickly than
with a simple table. Gradients of colour and size aid in identifying trends and outliers. Icons assist the observer
in recognising a shift in proportions. Using different markings will highlight relationships more effectively than a
table of raw data.
Tables and crosstabs are handy for doing comparative analysis between certain data points. They are simple
to construct and may effectively convey a single essential message. Before including a crosstab into a data
visualisation, one should assess whether it serves the project’s aims. Figure 10.13 shows a sample Visualisation of
tabular data.
W
e will now examine some of the most successful data visualisation tools for data scientists and how they
may boost their productivity. Here are four popular data visualisation tools that may assist data scientists
in making more compelling presentations.
(i) Tableau
Tableau is a data visualisation application for creating interactive graphs, charts, and maps. It enables one
to connect to many data sources and generate visualisations in minutes.
Tableau Desktop is the first product of its kind. It is designed to produce static visualisations that may be
published on one or more web pages, but it is incapable of producing interactive maps.
Tableau Public is a free version of Tableau Desktop with some restrictions.
It takes time and effort to understand Tableau, but there are several tools available to assist doing it. As a
data scientist, Tableau must be the most important tool one should understand and employ on a daily basis.
The application may be accessed through https://www.tableau.com/ (Figure 10.14)
(iv) QlikView
QlikView is a data discovery platform that enables users to make quicker, more informed choices by
speeding analytics, uncovering new business insights, and enhancing the precision of outcomes.
An easy software development kit that has been utilized by enterprises worldwide for many years. It may
mix diverse data sources with color-coded tables, bar graphs, line graphs, pie charts, and sliders.
It has been designed using a “drag and drop” Visualisation interface, allowing users to input data from a
variety of sources, including databases and spreadsheets, without having to write code. These properties
also make it a reasonably easy-to-learn and -understand instrument. The application may be accessed
through https://www.qlik.com/us/products/qlikview (Figure 10.17)
Solved Case 1
Sutapa is working as an analyst with SN Company Limited. She is entrusted with the responsibility of making a
presentation before the senior management. She knows that data Visualisation is an important tool for presentation,
and a good data Visualisation can make her presentation more effective. However, she is not very sure about the
data visualisation tools, that are available.
What are the important data Visualisation tools available that Sutapa may use for an effective and impressive
presentation.
Teaching note - outline for solution:
The instructor may initiate the discussions with explaining the importance of data Visualisation. She may also
discuss the objectives of data Visualisation:
(i) Making a better data analysis:
(ii) Faster decision making
(iii) Analysing complicated data
For an effective data Visualisation, the presenter should keep certain important issues in mind:
(i) Know the objective
(ii) Always keep the audience in mind
(iii) Invest in the best technology
(iv) Improve the team’s ability to visualise data
There are various tools available for data Visualisation. The instructor may extend the discussion with mentioning
the following tools. He should also explain the suitability of each tool for visualising and presenting the data:
(i) Dashboards
(ii) Bar charts
(iii) Histogram
(iv) Pie chart
(v) Line chart
(vi) Maps
(vii) Gantt chart
(viii) Bubble Chart etc.
One of the major comforting factor is development of recent software that makes the process of data Visualisation
less painful. The instructor may conclude the discussions with mention of few popular softwares, viz:
(i) Microsoft Power Bi
(ii) Tableau
(iii) Microsoft Excel etc
Exercise
A. Theoretical Questions:
~~ Multiple Choice Questions
1 d 2 d 3 d 4 d 5 a
1. Data visualisation enhances the effect of communications for the audiences and delivers the most
convincing data analysis outcomes.
2. Visualisation allows to interpret large volumes of data more quickly and effectively at a glance.
3. Data presentation architecture (DPA) is a set of skills that aims to identify, find, modify, format, and
present data in a manner that ideally conveys meaning and provides insight.
4. Scatter plots are a useful tool for examining the connection between many variables, revealing whether
one variable is a good predictor of another or whether they tend to vary independently.
5. Gantt charts represent a project’s timeline or activity changes across time.
Answer:
1 T 2 T 3 T 4 T 5 T
1. Discuss the ways in which the finance professionals may be helped by data Visualisation in analysing
and reporting information.
2. Discuss the objectives of data Visualisation.
3. How to use data Visualisation in report design?
Unsolved Case(s)
1. Maitreyee works as a financial analyst with a bank. The departmental meeting with her managing director is
going to happen very soon. Maitreyee is entrusted with the task of preparing a dashboard that will cover the
performance of his department during the past quarter. She wants to prepare the dashboard in such a way,
that it should not look cluttered, but at the same time, it covers all the available information in a visually
pleasing manner.
Discuss the different approaches Maitreyee may adopt to meet her objective.
References:
●● Davy Cielen, Arno D B Meysman, and Mohamed Ali. Introducing Data Science. Manning Publications Co
USA
●● Cathy O’Neil, Rachell Schutt. Doing data science. O’Reilley
●● Joel Grus. Data science from scratch. O’Reilley
●● https://go.oracle.com
●● https://sfmagazine.com
●● https://hbr.org/
●● https://www.tableau.com
●● http://country.eiu.com
●● https://en.wikipedia.org
●● https://vdl.sci.utah.edu
●● https://towardsdatascience.com
D
ata mining, also known as knowledge discovery in data (KDD), is the extraction of patterns and other
useful information from massive data sets. Given the advancement of data warehousing technologies
and the expansion of big data, the use of data mining techniques has advanced dramatically over the
past two decades, supporting businesses in translating their raw data into meaningful information.
Nevertheless, despite the fact that technology is always evolving to manage massive amounts of data, leaders
continue to struggle with scalability and automation.
Through smart data analytics, data mining has enhanced corporate decision making. The data mining techniques
behind these investigations may be categorised into two primary purposes: describing the target dataset or
predicting results using machine learning algorithms. These strategies are used to organise and filter data, bringing
to the surface the most relevant information, including fraud detection, user habits, bottlenecks, and even security
breaches.
When paired with data analytics and visualisation technologies like as Apache Spark, data mining has never been
more accessible and the extraction of valuable insights has never been quicker. Artificial intelligence advancements
continue to accelerate adoption across sectors.
Based on the dataset, an extra step may be done to minimise the number of dimensions, as an excessive
amount of features might slow down any further calculation. Data scientists seek to maintain the most
essential predictors to guarantee optimal model accuracy.
(iii) Model building and pattern mining:
Data scientists may study any intriguing relationship between the data, such as frequent patterns, clustering
algorithms, or correlations, depending on the sort of research. While high frequency patterns have larger
applicability, data variations can often be more fascinating, exposing possible fraud areas.
Depending on the available data, deep learning algorithms may also be utilised to categorise or cluster a
data collection. If the input data is marked (i.e. supervised learning), a classification model may be used to
categorise data, or a regression may be employed to forecast the probability of a specific assignment. If the
dataset is unlabeled (i.e. unsupervised learning), the particular data points in the training set are compared to
uncover underlying commonalities, then clustered based on those features.
(iv) Result evaluation and implementation of knowledge:
After aggregating the data, the findings must be analysed and understood. When completing results, they
must be valid, original, practical, and comprehensible. When this criterion is satisfied, companies can execute
new strategies based on this understanding, therefore attaining their intended goals.
B
usinesses utilise analytics to study and evaluate their data, and then translate their discoveries into insights
that eventually aid executives, managers, and operational personnel in making more educated and
prudent business choices. Descriptive analytics, which examines what has occurred in a firm, diagnostic
analytics, which explores why did it occur, predictive analytics, which examines what could occur, and
prescriptive analytics, which examines what should occur, are the four most important forms of analytics used by
enterprises. While each of these approaches has its own distinct insights, benefits, and drawbacks in their use, when
combined, these analytics tools may be an exceptionally valuable asset for a corporation.
It is also essential to examine the privacy principles while utilising data. Public entities and the business
sector should consider individual privacy when using data analytics. As more and more firms seek to big data
(huge, complex data sets) to raise revenue and enhance corporate efficiency and effectiveness, regulations
are becoming increasingly required.
Vesset states that in order to correctly measure against KPIs, businesses must catalogue and arrange the appropriate
data sources in order to extract the necessary data and generate metrics depending on the present status of the
business.
Step 3: Preparation and collection of data: Data preparation, which includes publication, transformation, and
cleaning, occurs prior to analysis and is a crucial step for ensuring correctness; it is also one of the most time-
consuming tasks for the analyst.
Step 4: Analysis of data: Utilizing summary statistics, clustering, pattern tracking, and regression analysis, we
discover data trends and evaluate performance.
Step 5: Presentation of data: Lastly, charts and graphs are utilised to portray findings in a manner that non-
experts in analytics may comprehend.
X
ML is a file format and markup language for storing, transferring, and recreating arbitrary data. It specifies
a set of standards for encoding texts in a format that is understandable by both humans and machines. XML
is defined by the 1998 XML 1.0 Specification of the World Wide Web Consortium and numerous other
related specifications, which are all free open standards.
XML’s design objectives stress Internet usability, universality, and simplicity. It is a textual data format with
significant support for many human languages via Unicode. Although XML’s architecture is centred on texts, the
language is commonly used to express arbitrary data structures, such as those employed by web services.
Several schema systems exist to help in the design of XML-based languages, and numerous application
programming interfaces (APIs) have been developed by programmers to facilitate the processing of XML data.
Serialization, or storing, sending, and rebuilding arbitrary data, is the primary function of XML. In order for
two dissimilar systems to share data, they must agree on a file format. XML normalises this procedure. XML is
comparable to a universal language for describing information.
As a markup language, XML labels, categorises, and arranges information systematically.
The data structure is represented by XML tags, which also contain information. The information included within
the tags is encoded according to the XML standard. A supplementary XML schema (XSD) defines the required
metadata for reading and verifying XML. This is likewise known as the canonical schema. A “well-formed” XML
document complies to fundamental XML principles, whereas a “valid” document adheres to its schema.
IETF RFC 7303 (which supersedes the previous RFC 3023) specifies the criteria for constructing media types for
use in XML messages. It specifies the application/xml and text/xml media types. They are utilised for transferring
unmodified XML files without revealing their intrinsic meanings. RFC 7303 also suggests that media types for
XML-based languages end in +xml, such as image/svg+xml for SVG.
RFC 3470, commonly known as IETF BCP 70, provides further recommendations for the use of XML in a
networked setting. This document covers many elements of building and implementing an XML-based language.
Numerous industrial data standards, including Health Level 7, OpenTravel Alliance, FpML, MISMO, and
National Information Exchange Model, are founded on XML and the extensive capabilities of the XML schema
definition. Darwin Information Typing Architecture is an XML industry data standard in publishing. Numerous
publication formats rely heavily on XML as their basis.
XBRL for G/L Journal XBRL for Financial XBRL for Regulatory
Entry Reporting Statements Filings
Financial
Participants COMPANIES Publisher and Investors
Data Aggregators
Trading Managements
Auditors Regulators
Partners Accountants
SOFTWARE VENDORS
XBRL allows organisations to arrange data using tags. When a piece of data is labelled as “revenue,” for instance,
XBRL enabled applications know that it pertains to revenue. It conforms to a fixed definition of income and may
appropriately utilise it. The integrity of the data is safeguarded by norms that have been already accepted. In
addition, XBRL offers expanded contextual information on the precise data content of financial documents. For
example, when a a monetary amount is stated. XBRL tags may designate the data as “currency” or “accounts”
within a report.
With XBRL, a business, a person, or another software programme may quickly produce a variety of output
formats and reports based on a financial statement.
# BI Methods:
Company intelligence is a broad word that encompasses the procedures and methods of gathering, storing,
and evaluating data from business operations or activities in order to maximise performance. All of these factors
combine to provide a full perspective of a firm, enabling individuals to make better, proactive decisions. In recent
years, business intelligence has expanded to incorporate more procedures and activities designed to enhance
performance. These procedures consist of:
(i) Data mining: Large datasets may be mined for patterns using databases, analytics, and machine learning
(ML).
(ii) Reporting: The dissemination of data analysis to stakeholders in order for them to form conclusions and
make decisions.
(iii) Performance metrics and benchmarking: Comparing current performance data to previous performance
data in order to measure performance versus objectives, generally utilising customised dashboards.
(iv) Descriptive analytics: Utilizing basic data analysis to determine what transpired
(v) Querying: BI extracts responses from data sets in response to data-specific queries.
(vi) Statistical analysis: Taking the results of descriptive analytics and use statistics to further explore the data,
such as how and why this pattern occurred.
(vii) Data Visualization: Data consumption is facilitated by transforming data analysis into visual representations
such as charts, graphs, and histograms.
(viii) Visual Analysis: Exploring data using visual storytelling to share findings in real-time and maintain the
flow of analysis.
(ix) Data Preparation: Multiple data source compilation, dimension and measurement identification, and data
analysis preparation.
famous “Turing Test,” in which a human interrogator attempts to differentiate between a machine and a human
written answer. Although this test has been subjected to considerable examination since its publication, it remains
an essential aspect of the history of artificial intelligence and a continuing philosophical thought that employs
principles from linguistics.
Stuart Russell and Peter Norvig then published ‘Artificial Intelligence: A Modern Approach’, which has since
become one of the most influential AI textbooks. In it, they discuss four alternative aims or definitions of artificial
intelligence, which distinguish computer systems based on reasoning and thinking vs. acting:
~~ Human approach:
●● Systems that think like humans
●● Systems that act like humans
~~ Ideal approach:
●● Systems that think rationally
●● Systems that act rationally
Artificial intelligence is, in its simplest form, a topic that combines computer science and substantial datasets
to allow problem-solving. In addition, it includes the subfields of machine learning and deep learning, which are
commonly associated with artificial intelligence. These fields consist of AI algorithms that aim to develop expert
systems that make predictions or classifications based on input data.
As expected with any new developing technology on the market, AI development is still surrounded by a great
deal of hype. According to Gartner’s hype cycle, self-driving vehicles and personal assistants follow “a normal
evolution of innovation, from overenthusiasm through disillusionment to an ultimate grasp of the innovation’s
importance and position in a market or area.” According to Lex Fridman’s 2019 MIT lecture, we are at the top of
inflated expectations and nearing the trough of disillusionment.
AI has several applications in the area of financial services (fig 11.2).
(iii) Insurance Claim Processing (ii) Retail and commercial lending scores
ARTIFICIAL INTELLIGENCE
IN FINANCE
Artificial Intelligence
Machine Learning
Deep Learning
communicate with any application or system in the same manner that humans can, with the exception that RPA bots
can function continuously, around-the-clock, and with 100 percent accuracy and dependability.
Robotic Process Automation bots possess a digital skill set that exceeds that of humans. Consider RPA bots to
be a Digital Workforce capable of interacting with any system or application. Bots may copy-paste, scrape site
data, do computations, access and transfer files, analyse emails, log into programmes, connect to APIs, and extract
unstructured data, among other tasks. Due to the adaptability of bots to any interface or workflow, there is no need
to modify existing corporate systems, apps, or processes in order to automate.
RPA bots are simple to configure, utilise, and distribute. You will be able to configure RPA bots if you know how
to record video on a mobile device. Moving files around at work is as simple as pressing record, play, and stop
buttons and utilising drag-and-drop. RPA bots may be scheduled, copied, altered, and shared to conduct enterprise-
wide business operations.
Benefits of RPA
(i) Higher productivity
(ii) Higher accuracy
(iii) Saving of cost
(iv) Integration across platforms
(v) Better customer experience
(vi) Harnessing AI
(vii) Scalability
some of the correct answers as legitimate. This information may subsequently be utilised to train the computer’s
algorithm(s) for determining accurate replies.
In poorly supervised learning, the training labels are noisy, restricted, or inaccurate; yet, these labels are
frequently less expensive to acquire, resulting in larger effective training sets.
(iv) Reinforcement learning
Reinforcement learning is a subfield of machine learning concerned with determining how software agents
should operate in a given environment so as to maximise a certain concept of cumulative reward. Due to
the field’s generic nature, it is explored in several different fields, including game theory, control theory,
operations research, information theory, simulation-based optimization, multi-agent systems, swarm
intelligence, statistics, and genetic algorithms. The environment is generally represented as a Markov
decision process in machine learning (MDP). Many methods for reinforcement learning employ dynamic
programming techniques. Reinforcement learning techniques do not need prior knowledge of an accurate
mathematical model of the MDP and are employed when exact models are not practicable. Autonomous cars
and learning to play a game against a human opponent both employ reinforcement learning algorithms.
(v) Dimensionality reduction
Dimensionality reduction is the process of acquiring a set of major variables in order to reduce the number
of random variables under consideration. In other words, it is the process of lowering the size of the feature
set, which is also referred to as the “number of features.” The majority of dimensionality reduction strategies
may be categorised as either deletion or extraction of features. Principal component analysis is a well-known
technique for dimensionality reduction (PCA). PCA includes transforming data with more dimensions
(e.g., 3D) to a smaller space (e.g., 2D). This results in a decreased data dimension (2D as opposed to 3D),
while retaining the original variables in the model and without altering the data. Numerous dimensionality
reduction strategies assume that high-dimensional data sets reside along low-dimensional manifolds, leading
to the fields of manifold learning and manifold regularisation.
I
n artificial intelligence, there are two schools of thought: data-driven and model-driven. The data-driven
strategy focuses on enhancing data quality and data governance in order to enhance the performance of a
particular problem statement. In contrast, the model-driven method attempts to increase performance by
developing new models and algorithmic manipulations (or upgrades). In a perfect world, these should go
hand in hand, but in fact, model-driven techniques have advanced far more than data-driven ones. In terms of
data governance, data management, data quality handling, and general awareness, there is still much room for
improvement.
Recent work on Covid-19 serves as an illustration in this perspective. While the globe was struggling from the
epidemic, several AI-related projects emerged. Whether it’s recognising Covid-19 from a CT scan, X-ray, or other
medical imaging, estimating the course of the disease, or even projecting the overall number of fatalities, artificial
intelligence is essential. On the one hand, this extensive effort around the globe has increased our understanding
of the illness and, in certain locations, assisted clinical personnel in their work with vast populations. However,
only few of the vast quantity of work was judged suitable for any actual implementation procedure, such as in the
healthcare industry. Primarily data quality difficulties are responsible for this deficiency in practicality. Numerous
projects and studies utilised duplicate photos from different sources. Even still, training data are notably lacking in
external validation and demographic information. The majority of these studies would fail a systematic review and
fail to reveal biases. Consequently, the quoted performance cannot be applied to real-world scenarios.
A crucial feature of Data science to keep in mind is that poor data will never result in superior performance,
regardless of how strong your model is. Real-world applications require an understanding of systematic data
collection, management, and consumption for a Data Science project. Only then can society reap the rewards of
the ‘wonderful AI’
Solved Case 1
Arjun joined as an instructor in a higher learning institution. His responsibility is to teach data analysis
to students. He is particularly interested in teaching analytics and model building. Arjun was preparing a teaching
plan for the new upcoming batch.
What elements do you think, he should incorporate into the plan.
Teaching note - outline for solution:
The instructor may explain first the utility of data analytics from the perspective of business organizations.
He may explain how data analytics may translate their discoveries into insights that eventually aid executives,
managers, and operational personnel in making more educated and prudent business choices.
He may further explain the four forms of data analytics:
(i) Descriptive analytics
(ii) Diagnostic analytics
(iii) Predictive analytics
(iv) Prescriptive analytics
The instructor should explain each of the terms along with their appropriateness in using under real-life problem
situations.
The advantages and disadvantages of using each of the methods should also be discussed thoroughly.
Exercise
A. Theoretical Questions:
~~ Multiple Choice Questions
1 d 2 d 3 a 4 a 5 d
3. Utilizing data mining techniques, hidden patterns and future trends and behaviours in financial
markets may be predicted.
4. Social analytics are virtually always a type of descriptive analytics.
5. Diagnostic analytics highlights the tools are employed to question the data, “Why did this occur?”
Answer:
1 T 2 T 3 T 4 T 5 T
1. Data analytics helps us to identify patterns in the raw ________ and extract useful information from
them.
2. Through smart _________ analytics, data mining has enhanced corporate decision making.
3. Data __________ techniques are utilised to develop descriptions and hypotheses on a specific data set.
4. Data mining typically involves _________ steps.
5. Primarily utilised for deep learning algorithms, ___________ replicate the interconnection of the human
brain through layers of nodes to process training data.
Answer:
1 data 2 Data
3 mining 4 Four
5 neural network