Project
Project
Process :
Step 1: Define the Business needs:”The first stage in the business analytics process
involves understanding what the business would like to improve on or the problem it
wants solved. Sometimes, the goal is broken down into smaller goals. Relevant data
needed to solve these business goals are decided upon by the business stakeholders,
business users with the domain knowledge and the business analyst. At this stage, key
questions such as, “what data is available”, “how can we use it”, “do we have sufficient
data” must be answered.
This stage involves cleaning the data, making computations for missing data, removing
outliers, and transforming combinations of variables to form new variables. Time series
graphs are plotted as they are able to indicate any patterns or outliers. The removal of
outliers from the dataset is a very important task as outliers often affect the accuracy of
the model if they are allowed to remain in the data set. As the saying goes: Garbage in,
garbage out (GIGO)!
Once the data has been cleaned, the analyst will try to make better sense of the data. The analyst
will plot the data using scatter plots (to identify possible correlation or non-linearity). He will
visually check all possible slices of data and summarise the data using appropriate visualisation
and descriptive statistics (such as mean, standard deviation, range, mode, median) that will help
provide a basic understanding of the data. At this stage, the analyst is already looking for general
patterns and actionable insights that can be derived to achieve the business goal
Step 3. Analyse the data
At this stage, using statistical analysis methods such as correlation analysis and
hypothesis testing, the analyst will find all factors that are related to the target variable.
The analyst will also perform simple regression analysis to see whether simple
predictions can be made. In addition, different groups are compared using different
assumptions and these are tested using hypothesis testing. Often, it is at this stage that the
data is cut, sliced and diced and different comparisons are made while trying to derive
actionable insights from the data.
Business analytics is about being proactive in decision making. At this stage, the analyst
will model the data using predictive techniques that include decision trees, neural
networks and logistic regression. These techniques will uncover insights and patterns that
highlight relationships and ‘hidden evidences’ of the most influential variables. The
analyst will then compare the predictive values with the actual values and compute the
predictive errors. Usually, several predictive models are ran and the best performing
model selected based on model accuracy and outcomes.
At this stage the analyst will apply the predictive model coefficients and outcomes to run
‘what-if’ scenarios, using targets set by managers to determine the best solution, with the
given constraints and limitations. The analyst will select the optimal solution and model
based on the lowest error, management targets and his intuitive recognition of the model
coefficients that are most aligned to the organisation’s strategic goal.
The analyst will then make decisions and take action based on the derived insights from
the model and the organisational goals. An appropriate period of time after this action has
been taken, the outcome of the action is then measured.
Finally the results of the decision and action and the new insights derived from the model
are recorded and updated into the database. Information such as, ‘was the decision and
action effective?’, ‘how did the treatment group compare with the control group?’ and
‘what was the return on investment?’ are uploaded into the database. The result is an
evolving database that is continuously updated as soon as new insights and knowledge
are derived.
Evolution of Business Analytics
Analytics and visualizations have been used throughout history without the support of computers
and software. This was done by manually plotting graphs using statistical methods and manually
recording data. This was quite different from the business analytics that we recognize and know
about. The more modern version of business analytics was only used much later in the 20th
century to identify trends during the Second World War. This process of identifying trends
helped code-breakers use data from encrypted messages such as destination (recipients of the
messages), origin, and the time and date of these messages to find out what information these
contained. This is a more modern use of analytics to predict information. However, we have also
seen business analytics being used a bit earlier in history. Sir Henry Furnese, a well-documented
banker, had been extensively using data during the 1860s to stay ahead of his competition.
The recent evolution that business analytics has experienced can be fundamentally traced back to
the introduction of automation in analytics and the concept of big data. The advent of big data
meant that analytics along with various data sources should become more scalable and more
powerful. This helped in introducing more advanced tools and systems that are compatible with
large volumes of data. The emergence of cloud technologies also meant that data did not need to
be on-site. There was also a huge demand for automating analytical tools by this time due to the
massive amount of data that needed to be worked upon. All of this motivated companies to
upgrade their existing software into more capable applications that can process massive datasets
rapidly and from multiple sources such as from the cloud and distributed file systems rather than
just the traditional RDBMS. Business analysts were also now armed with predictive and
forecasting abilities that were now more accurate than ever with the help of modern business
analytics. This is where businesses truly understood the importance of data analytics in business.
All this technology had already existed, but the industry’s growing requirement encouraged
businesses of all sizes to start incorporating data analytics into daily operations.
There have been four main spheres where business analytics has evolved greatly, these are:
Artificial Intelligence and Automated Analytics
Predictive Analytics
Real-time Analytics
Big Data
Data analysis is the science of examining data to conclude the information to make decisions or
expand knowledge on various subjects. It consists of subjecting data to operations. This process
happens to obtain precise conclusions to help us achieve our goals, such as operations that cannot
be previously defined since data collection may reveal specific difficulties.
If your business is not growing, then you have to look back and acknowledge your mistakes and
make a plan again without repeating those mistakes. And even if your business is growing, then
you have to look forward to making the business to grow more. All you need to do is analyze
your business data and business processes.
Data analysis tools make it easier for users to process and manipulate data, analyze the
relationships and correlations between data sets, and it also helps to identify patterns and trends
for interpretation. Here is a complete list of tools used for data analysis in research.
Text Analysis
Statistical Analysis
Diagnostic Analysis
Predictive Analysis
Prescriptive Analysis
Text Analysis
Text Analysis is also referred to as Data Mining. It is one of the methods of data analysis to
discover a pattern in large data sets using databases or data mining tools. It used to transform raw
data into business information. Business Intelligence tools are present in the market which is
used to take strategic business decisions. Overall it offers a way to extract and examine data and
deriving patterns and finally interpretation of the data.
Statistical Analysis
Statistical Analysis shows “What happen?” by using past data in the form of dashboards.
Statistical Analysis includes collection, Analysis, interpretation, presentation, and modeling of
data. It analyses a set of data or a sample of data. There are two categories of this type of
Analysis – Descriptive Analysis and Inferential Analysis.
Descriptive Analysis
analyses complete data or a sample of summarized numerical data. It shows mean and deviation
for continuous data whereas percentage and frequency for categorical data.
Inferential Analysis
analyses sample from complete data. In this type of Analysis, you can find different conclusions
from the same data by selecting different samples.
Diagnostic Analysis
Diagnostic Analysis shows “Why did it happen?” by finding the cause from the insight found in
Statistical Analysis. This Analysis is useful to identify behavior patterns of data. If a new
problem arrives in your business process, then you can look into this Analysis to find similar
patterns of that problem. And it may have chances to use similar prescriptions for the new
problems.
Predictive Analysis
Predictive Analysis shows “what is likely to happen” by using previous data. The simplest data
analysis example is like if last year I bought two dresses based on my savings and if this year my
salary is increasing double then I can buy four dresses. But of course it’s not easy like this
because you have to think about other circumstances like chances of prices of clothes is
increased this year or maybe instead of dresses you want to buy a new bike, or you need to buy a
house!
So here, this Analysis makes predictions about future outcomes based on current or past data.
Forecasting is just an estimate. Its accuracy is based on how much detailed information you have
and how much you dig in it.
Prescriptive Analysis
Prescriptive Analysis combines the insight from all previous Analysis to determine which action
to take in a current problem or decision. Most data-driven companies are utilizing Prescriptive
Analysis because predictive and descriptive Analysis are not enough to improve data
performance. Based on current situations and problems, they analyze the data and make
decisions.
Data Collection
After requirement gathering, you will get a clear idea about what things you have to measure and
what should be your findings. Now it’s time to collect your data based on requirements. Once
you collect your data, remember that the collected data must be processed or organized for
Analysis. As you collected data from various sources, you must have to keep a log with a
collection date and source of the data.
Data Cleaning
Now whatever data is collected may not be useful or irrelevant to your aim of Analysis, hence it
should be cleaned. The data which is collected may contain duplicate records, white spaces or
errors. The data should be cleaned and error free. This phase must be done before Analysis
because based on data cleaning, your output of Analysis will be closer to your expected outcome.
Data Analysis
Once the data is collected, cleaned, and processed, it is ready for Analysis. As you manipulate
data, you may find you have the exact information you need, or you might need to collect more
data. During this phase, you can use data analysis tools and software which will help you to
understand, interpret, and derive conclusions based on the requirements.
Data Interpretation
After analyzing your data, it’s finally time to interpret your results. You can choose the way to
express or communicate your data analysis either you can use simply in words or maybe a table
or chart. Then use the results of your data analysis process to decide your best course of action.
Data Visualization
Data visualization is very common in your day to day life; they often appear in the form of charts
and graphs. In other words, data shown graphically so that it will be easier for the human brain to
understand and process it. Data visualization often used to discover unknown facts and trends.
By observing relationships and comparing datasets, you can find a way to find out meaningful
information.
Summary:
Data analysis means a process of cleaning, transforming and modeling data to discover
useful information for business decision-making
Types of Data Analysis are Text, Statistical, Diagnostic, Predictive, Prescriptive Analysis
Data Analysis consists of Data Requirement Gathering, Data Collection, Data Cleaning,
Data Analysis, Data Interpretation, Data Visualization
Currently, many industries use data to draw conclusions and decide on actions to implement. It is
worth mentioning that science also uses to test or discard existing theories or models.
There’s more than one advantage to data analysis done right. Here are some examples:
These questions are examples of different types of data analysis. You can include them in your
post-event surveys aimed at your customers:
Example of qualitative data research analysis: Panels where a discussion is held, and
consumers are interviewed about what they like or dislike about the place.
Quantitative research analysis focuses on complex data and information that can be
counted.
o Data is collected by asking questions like: How many? Who? How often? Where?
It is used in many industries regardless of the branch. It gives us the basis to make decisions or
confirm if a hypothesis is true.
Marketing: Mainly, researchers perform data analysis to predict consumer behavior and
help companies place their products and services in the market accordingly. For instance,
sales data analysis can help you identify the product range not-so-popular in a specific
demographic group. It can give you insights into tweaking your current marketing
campaign to better connect with the target audience and address their needs.
Human Resources: Organizations can use data analysis to offer a great experience to
their employees and ensure an excellent work environment. They can also utilize the data
to find out the best resources whose skill set matches the organizational goals.
Academics: Universities and academic institutions can perform the analysis to measure
student performance and gather insights on how certain behaviors can further improve
education.
A data analyst does not directly participate in the decision-making process, rather, he helps
indirectly through providing static insights about company performance. A data engineer is
not responsible for decision making. And, a data scientist participates in the
active decision-making process that affects the course of the company.
A data analyst uses static modeling techniques that summarize the data through descriptive
analysis. On the other hand, a data engineer is responsible for the development and
maintenance of data pipelines. A data scientist uses dynamic techniques like Machine
Learning to gain insights about the future.
Knowledge of machine learning is not important for data analysts. However, this is
mandatory for data scientists. A data engineer need not require the knowledge of machine
learning but he is required to have the knowledge of core computing concepts like
programming and algorithms to build robust data systems.
A data analyst only has to deal with structured data. However, both data scientists and
data engineers deal with unstructured data as well.
A data analyst and data scientist are both required to be proficient in data visualization.
However, this is not required in the case of a data engineer.
Both data scientists and analysts need not have knowledge of application development and
working of the APIs. However, this is the most essential requirement for a data engineer.
For becoming a Data Scientist, you must have the following key skills –
Should be proficient with Math and Statistics.
Should be able to handle structured & unstructured information.
In-depth knowledge of tools like R, Python and SAS.
Well versed in various machine learning algorithms.
Have knowledge of SQL and NoSQL.
Must be familiar with Big Data tools.
Data Analyst
Most entry-level professionals interested in getting into a data-related job start off as Data
analysts. Qualifying for this role is as simple as it gets. All you need is a bachelor’s degree and
good statistical knowledge. Strong technical skills would be a plus and can give you an edge
over most other applicants. Other than this, companies expect you to understand data handling,
modeling and reporting techniques along with a strong understanding of the business.
Data Engineer
Data Engineer either acquires a master’s degree in a data-related field or gather a good amount of
experience as a Data Analyst. A Data Engineer needs to have a strong technical background
with the ability to create and integrate APIs. They also need to understand data pipelining and
performance optimization.
Data Scientist
Data Scientist is the one who analyses and interpret complex digital data. While there are several
ways to get into a data scientist’s role, the most seamless one is by acquiring enough experience
and learning the various data scientist skills. These skills include advanced statistical analyses, a
complete understanding of machine learning, data conditioning etc.
Skill-Sets
Data Analyst vs Data Engineer vs Data Scientist Skill Sets
Data Warehousing Data Warehousing & ETL Statistical & Analytical skills
As mentioned above, a data analyst’s primary skill set revolves around data acquisition,
handling, and processing. A data engineer, on the other hand, requires an intermediate level
understanding of programming to build thorough algorithms along with a mastery of statistics
and math! And finally, a data scientist needs to be a master of both worlds. Data, stats, and math
along with in-depth programming knowledge for Machine Learning and Deep Learning.
The roles and responsibilities of a data analyst, data engineer and data scientist are quite similar
as you can see from their skill-sets. Refer the below table for more understanding:
Pre-processing and data Develop, test & maintain Responsible for developing
gathering architectures Operational Models
Emphasis on
Carry out data analytics and
representing data via Understand programming and
optimization using machine learning
reporting and its complexity
& deep learning
visualization
Responsible for
Involved in strategic planning for
statistical analysis & data Deploy ML & statistical models
data analytics
interpretation
Ensures data acquisition Building pipelines for various Integrate data & perform ad-hoc
& maintenance ETL operations analysis
Optimize Statistical Ensures data accuracy and Fill in the gap between the
Efficiency & Quality flexibility stakeholders and customer
S.No Data Engineer Data Scientist
Extracts, Collects, scientists and Integrates Analyses the data provided by the
2 data engineer
Tools used to process data are MySQL, Programming languages used are
Hive, Oracle, Cassandra, Redis, Riak, Python, R, SAS, SPSS, Julia along with
9 PostgreSQL, MongoDBgoDB, and Sqoop various visualization techniques.
Career in Business Analytics
Business Analyst/ Data Analyst
Any BA job profile focuses on versatility since a BA can switch industries to find their comfort
or their preferred niche in the industry. This job requires you to have a wide spectrum of skills
that can be used to transfer across industries. The transition becomes smoother once you acquire
the requisite skill sets. A Business Analyst can modify and recalibrate diverse activities in an
organization at different levels. This is a suitable job profile for those who are in pursuit of
solving complex business issues.
Analytics Manager
This is for those who have the tenacity to manage teams and people. As an analyst manager, the
assigned team will be managed where the analytical decisions will be passed through. This role
banks on team based management, for those who want to climb the hierarchies of managerial
positions. The job profile is suited for those who want to have a diverse role in managing
employee workflow, resource planning and other associated management activities.
Project Manager
This job profile requires a diverse set of skills and is traditionally a rewarding position. The
effort required is a bit different from the ones you will see with business analysis. Project
Managers function on their versatile skill set, which also helps aspiring Business Analysts to
change tracks. Although traditional, this role is frequented by most Business Analysts.
Account Manager
This suits those who have oratorical skills, can listen to people, and influence the same. A
relationship manager typically builds strong connections with stakeholders, answers to their
queries, and makes constructive assessments to serve a healthy client-company relationship. This
job profile is suited for those who are aiming towards a specialized career path and have already
reached the level of competency to be a relationship manager.
Account Manager
This role focuses on working closely with a company but suits those who qualify as senior
analysts. The experience of a senior analyst is required, for they have a better understanding of
the industry and have the lens to observe an organization without any biases. This role is suited
for those who want to be involved with an enterprise intrinsically.
Subject Matter Expert
This role is specific as the name itself suggests. The expertise in a particular industry niche or a
subject matter is what’s the job profile is after. The job profile is suited for those who are after a
specific field that invigorates their respective interests.
Market Analyst
The job profile requires the BA to collect performance-based data that is to be used in marketing
analytics. This can help enterprises recognize consumer models and help in locating purchase
trends in the future.
INTRODUCTION TO R
According to R-Project.org, R is “… a language and environment for statistical computing and
graphics.” It’s an open-source programming language often used as a data analysis and statistical
software tool.
The R environment consists of an integrated suite of software facilities designed for data
manipulation, calculation, and graphical display. The environment features:
A vast, easily understandable, integrated assortment of intermediate tools dedicated to data analysis
Graphical facilities for data analysis and display that work either for on-screen or hardcopy
The well-developed, simple and effective programming language, featuring user-defined recursive
functions, loops, conditionals, and input and output facilities.
Keywords, reserved words that have a special meaning for the compiler
R was developed in 1993 by Ross Ihaka and Robert Gentleman and includes linear regression,
machine learning algorithms, statistical inference, time series, and more.
R is a universal programming language compatible with the Windows, Macintosh, UNIX, and
Linux platforms. It is often referred to as a different implementation of the S language and
environment and is considered highly extensible.
The R programming language has a lot going for it. Here is a list of some of its major strong
points:
It’s open-source. No fees or licenses are needed, so it’s a low-risk venture if you’re
developing a new program.
It’s platform-independent. R runs on all operating systems, so developers only need to create
one program that can work on competing systems. This independence is yet another reason
why R is cost-effective!
It has lots of packages. For example, the R language has more than 10,000 packages stored in
the CRAN repository, and the number is continuously increasing.
It’s great for statistics. Statistics are a big thing today, and R shines in this regard. As a result,
programmers prefer it over other languages for statistical tool development.
It’s well suited for Machine Learning. R is ideal for machine learning operations such as
regression and classification. It even offers many features and packages for artificial neural
network development.
R lets you perform data wrangling. R offers a host of packages that help data analysts turn
unstructured, messy data into a structured format.
R is still growing. R keeps evolving and growing, constantly updating and upgrading, thanks
to a solid supportive community.
What language doesn’t? When answering the question “What is R?” we should also look at some
of R’s not so great aspects:
It’s a complicated language. R has a steep learning curve. It’s a language best suited for
people who have previous programming experience.
It’s not as secure. R doesn’t have basic security measures. Consequently, it’s not a good
choice for making web-safe applications. Also, R can’t be embedded in web browsers.
It’s slow. R is slower than other programming languages like Python or MATLAB.
It takes up a lot of memory. Memory management isn’t one of R’s strong points. R’s data
must be stored in physical memory. However, the increasing use of cloud-based memory may
eventually make this drawback moot.
It doesn’t have consistent documentation/package quality. Docs and packages can be patchy
and inconsistent, or incomplete. That’s the price you pay for a language that doesn’t have
official, dedicated support and instead is maintained and added to by the community.
Data analysis
Statistical inference
R offers a wide variety of statistics-related libraries and provides a favorable environment for
statistical computing and design. In addition, the R programming language gets used by many
quantitative analysts as a programming tool since it's useful for data importing and cleaning.
As of August 2021, R is one of the top five programming languages of the year, so it’s a favorite
among data analysts and research programmers. It’s also used as a fundamental tool for finance,
which relies heavily on statistical data.
Academic Research
Retail
Social Media
Data Journalism
Manufacturing
Healthcare
This graph, provided by Stackoverflow, gives you a better idea of R programming language
usage in recent history. Given its strength in statistics, it's hardly surprising that R enjoys heavy
use in the world of academia, as illustrated on the chart.
If you’re looking for specifics, here are ten significant companies or organizations that use R,
presented in no particular order.
Airbnb
Microsoft
Uber
Ford
IBM
American Express
HP
RStudio interface
RStudio is split into 4 quadrants:
Script (top left): where commands are written, executed, and saved
Environment (top right): lists the data, variables, and functions that are currently in the
workspace
Console (bottom left): for quickly testing code and where commands and outputs are
displayed, except plots
Plot (bottom right): where graphics are displayed
BASIC C OMMANDS