DA Unit 2 Trio 1
DA Unit 2 Trio 1
Data Analytics
Introduction to Analytics
As an enormous amount of data gets generated, the need to extract useful insights is a must
for a business enterprise. Data Analytics has a key role in improving your business. Here are 4
main factors which signify the need for Data Analytics:
• Gather Hidden Insights:-Hidden insights from data are gathered and then analyzed with
respect to business requirements.
• Generate Reports – Reports are generated from the data and are passed on to the
respective teams and individuals to deal with further actions for a high rise in business.
• Perform Market Analysis – Market Analysis can be performed to understand the
strengths and the weaknesses of competitors.
• Improve Business Requirement – Analysis of Data allows improving Business to
customer requirements and experience.
Data Analytics refers to the techniques to analyze data to enhance productivity and
business gain. Data is extracted from various sources and is cleaned and categorized to analyze
different behavioral patterns. The techniques and the tools used vary according to the
organization or individual.
Data analysts translate numbers into plain English. A Data Analyst delivers value to their
companies by taking information about specific topics and then interpreting, analyzing, and
presenting findings in comprehensive reports. So, if you have the capability to collect data from
various sources, analyze the data, gather hidden insights and generate reports, then you can
become a Data Analyst.
Refer to the image below:
• R programming – This tool is the leading analytics tool used for statistics and data
modeling. R compiles and runs on various platforms such as UNIX, Windows, and Mac
OS. It also provides tools to automatically install all packages as per user-requirement.
• Python – Python is an open-source, object-oriented programming language which is easy
to read, write and maintain. It provides various machine learning and visualization
libraries such as Scikit-learn, TensorFlow, Matplotlib, Pandas, Keras etc. It also can be
• language and environment for data manipulation and analytics, this tool is easily
assembled on any platform like SQL server, a MongoDB database or JSON
• Tableau Public – This is a free software that connects to any data source such as Excel,
corporate Data Warehouse etc. It then creates visualizations, maps, dashboards etc with
real-time updates on the web.
• Qlik View – This tool offers in-memory data processing with the results delivered to the
end-users quickly. It also offers data association and data visualization with data being
compressed to almost 10% of its original size.
• SAS – A programming accessible and can analyze data from different sources.
• Microsoft Excel – This tool is one of the most widely used tools for data analytics.
Mostly used for clients’ internal data, this tool analyzes the tasks that summarize the data
with a preview of pivot tables.
• Rapid Miner – A powerful, integrated platform that can integrate with any data source
types such as Access, Excel, Microsoft SQL, Tera data, Oracle, Sybase etc. This tool is
mostly used for predictive analytics, such as data mining, text analytics, machine
learning.
• KNIME – Konstanz Information Miner (KNIME) is an open-source data analytics
platform, which allows you to analyze and model data. With the benefit of visual
programming, KNIME provides a platform for reporting and integration through its
modular data pipeline concept.
• Open Refine – Also known as Google Refine, this data cleaning software will help you
clean up data for analysis. It is used for cleaning messy data, the transformation of data
and parsing data from websites.
• Apache Spark – One of the largest large-scale data processing engines, this tool executes
applications in Hadoop clusters 100 times faster in memory and 10 times faster on disk.
This tool is also popular for data pipelines and machine learning model development.
Apart from the above-mentioned capabilities, a Data Analyst should also possess skills
such as Statistics, Data Cleaning, Exploratory Data Analysis, and Data Visualization. Also, if
you have knowledge of Machine learning, then that would make you stand out from the
crowd.
Application of Modeling In Business:
Data that are not easily reduced to numbers, Data that are related to concepts,
opinions, values and behaviors of people in social context, Transcripts of individual
interviews and focus groups, field notes from observation of certain activities, copies of
documents, audio/video recordings.
Nominal Data:-
Nominal data is defined as data that is used for naming or labelling variables, without any
quantitative value. It is sometimes called “named” data - a meaning coined from the word
nominal.
Ordinal Data:-
Ordinal data is a kind of categorical data with a set order or scale to it. For example,
ordinal data is said to have been collected when a responder inputs his/her financial happiness
level on a scale of 1-10. In ordinal data, there is no standard scale on which the difference in
each score is measured.
Symmetric:-
Symmetric measures of association take on the same value, no matter which variable is
the independent variable and which is the dependent variable. ... For example, there may be a
non-linear relationship between the two variables.
Asymmetric:-
Which Measure to Use. Note that some statistics take on different values, depending on
which of the two variables is the independent variable and which is the dependent variable.
These are called asymmetric measures of association. ... For example, there may be a non-linear
relationship between the two variables.
Qualitative Data
The aim is to classify features, count them, and construct statistical models in an attempt
to explain what is observed. Researcher knows clearly in advance what he/she is looking for.
Recommended during latter phases of research projects. The design emerges as the study
unfolds. Researcher uses tools, such as questionnaires or equipment to collect numerical data.
Data is in the form of numbers and statistics. Objective: seeks precise measurement& analysis of
target concepts, e.g., uses surveys, questionnaires etc. Quantitative data is more efficient. able to
test hypotheses. but may miss contextual detail. Researcher tends to remain objectively separated
from the subject matter.
Discrete
All data that are the result of counting are called quantitative discrete data. These data take
on only certain numerical values. ... All data that are the result of measuring are quantitative
continuous data assuming that we can measure accurately.
Continuous
Continuous data technically have an infinite number of steps, which form a continuum.
Data modelling is nothing but a process through which data is stored structurally in a
format in a database. Data modelling is important because it enables organizations to make data-
driven decisions and meet varied business goals.
The entire process of data modelling is not as easy as it seems, though. You are required to have
a deeper understanding of the structure of an organization and then propose a solution that aligns
with its end-goals and suffices it in achieving the desired objectives.
Data modeling can be achieved in various ways. However, the basic concept of each of them
remains the same. Let’s have a look at the commonly used data modeling methods:
1. Hierarchical model
2. Relational model
3. Network model
4. Object-oriented model
5. Entity-relationship model
1 Hierarchical model
As the name indicates, this data model makes use of hierarchy to structure the data in a
tree-like format. However, retrieving and accessing data is difficult in a hierarchical database.
This is why it is rarely used now.
4 Object-oriented model
This database model consists of a collection of objects, each with its own features and
methods. This type of database model is also called the post-relational database model.
Example of Object-oriented model
5. Entity-relationship model
Entity-relationship model, also known as ER model, represents entities and their
relationships in a graphical format. An entity could be anything – a concept, a piece of data, or
an object.
Importance of Data Modeling:
Now that we have a basic understanding of data modeling, let’s see why it is important.
Importance of Data Modeling
• A clear representation of data makes it easier to analyze the data properly. It provides a
quick overview of the data which can then be used by the developers in varied
applications.
• Data modeling represents the data properly in a model. It rules out any chances of data
redundancy and omission. This helps in clear analysis and processing.
• Data modeling improves data quality and enables the concerned stakeholders to make
data-driven decisions.
Since a lot of business processes depend on successful data modeling, it is necessary to adopt the
right data modeling techniques for the best results.
3- Keep Organize Your Data Based On Facts, Dimensions, Filters, and Order
You can find answers to most business questions by organizing your data in terms of four
elements – facts, dimensions, filters, and order.
Let’s understand this better with the help of an example. Let’s assume that you run four
e-commerce stores in four different locations of the world. It is the year-end, and you want to
analyze which e-commerce store made the most sales.
In such a scenario, you can organize your data over the last year. Facts will be the overall
sales data of last 1 year, the dimensions will be store location, the filter will be last 12 months,
and the order will be the top stores in decreasing order.
This way, you can organize all your data properly and position yourself to answer an
array of business intelligence questions without breaking a sweat.
Key takeaway:
It is highly recommended to organize your data properly using individual tables for
facts and dimensions to enable quick analysis.
Key takeaway:
Have a clear opinion on how much datasets you want to keep. Maintaining more than
what is actually required wastes your data modeling, and leads to performance issues.
7-The Wrap Up
Data modeling plays a crucial role in the growth of businesses, especially when you
organizations to base your decisions on facts and figures.
To achieve the varied business intelligence insights and goals, it is recommended to
model your data correctly and use appropriate tools to ensure the simplicity of the system.
MISSING DATA:
Missing data is always a problem in real life scenarios. Areas like
machine learning and data mining face severe issues in the accuracy
of their model predictions because of poor quality of data caused by
missing values. In these areas, missing value treatment is a major
point of focus to make their models more accurate and valid.
Missing at random:
Within these broad categories, each business plan is unique. Consider the shaving
industry. Gillette is happy to sell its Mach3 razor handle at cost or lower in order to get steady
customers for its more profitable razor blades. The business model rests on giving away the
handle to get those blade sales. This type of business model is actually called the razor-
razorblade model, but it can apply to companies in any business that sells a product at a deep
discount in order to supply a dependent good at a considerably higher price.
One way analysts and investors evaluate the success of a business model is by looking at
the company's gross profit. Gross profit is a company's total revenue minus the cost of goods
sold. Comparing a company's gross profit to that of its main competitor or its industry sheds light
on the efficiency and effectiveness of its business model.
Gross profit alone can be misleading, however. Analysts also want to see cash flow or net
income. That is gross profit minus operating expenses and is an indication of just how much real
profit the business is generating.
The two primary levers of a company's business model are pricing and costs. A company
can raise prices, and it can find inventory at reduced costs. Both actions increase gross profit.
Nevertheless, many analysts consider gross profit to be more important in evaluating a
business plan. A good gross profit suggests a sound business plan. If expenses are out of control,
the management could be at fault, and the problems are correctable. As this suggests, many
analysts believe that companies that run on the best business models can run themselves.
Examples of Business Plans
Consider a comparison of two competing business plans. Both companies rent and sell
movies. Before the advent of the Internet, both companies made $5 million in revenues after
spending $4 million on their inventories of movies.
That means that each company makes a gross profit calculated as $5 million minus $4
million, or $1 million. They also have the same gross profit margin, calculated as gross profit
divided by revenues, or 20%.
After the advent of the Internet, Company B decides to offer streaming movies online
instead of renting or selling physical copies of movies. This change disrupts the business model
in a positive way. The licensing fees do not change, but the cost of holding inventory is down
considerably. In fact, the change reduces storage and distribution costs by $2 million. The new
gross profit for the company is $5 million minus $2 million, or $3 million. The new gross profit
margin is 60%.
Meanwhile, Company A is stuck with its lower gross profit margin, and its sales will
soon begin sliding downwards. It failed to update its business plan. Company B isn't even
making more in sales, but it has revolutionized its business model, and that has greatly reduced
its costs.
For years, major carriers such as American Airlines, Delta, and Continental built their
businesses around a "hub-and-spoke" structure, in which all flights were routed through a
handful of major airports. By ensuring that most seats were filled most of the time, the business
model produced big profits.But a competing business model arose that made the strength of the
major carriers a burden. Carriers like Southwest and JetBlue shuttled planes between smaller
airports at a lower cost. They avoided some of the operational inefficiencies of the hub-and-
spoke model while forcing labor costs down. That allowed them to cut prices, increasing demand
for short flights between cities.
One way analysts and investors evaluate the success of a business model is by looking at
the company's gross profit. Gross profit is a company's total revenue minus the cost of goods
sold. Comparing a company's gross profit to that of its main competitor or its industry sheds light
on the efficiency and effectiveness of its business model.
Gross profit alone can be misleading, however. Analysts also want to see cash flow or net
income. That is gross profit minus operating expenses and is an indication of just how much real
profit the business is generating.
The two primary levers of a company's business model are pricing and costs. A company
can raise prices, and it can find inventory at reduced costs. Both actions increase gross profit.
That means that each company makes a gross profit calculated as $5 million minus $4
million, or $1 million. They also have the same gross profit margin, calculated as gross profit
divided by revenues, or 20%.
After the advent of the Internet, Company B decides to offer streaming movies online
instead of renting or selling physical copies of movies. This change disrupts the business model
in a positive way. The licensing fees do not change, but the cost of holding inventory is down
considerably. In fact, the change reduces storage and distribution costs by $2 million. The new
gross profit for the company is $5 million minus $2 million, or $3 million. The new gross profit
margin is 60%.
Meanwhile, Company A is stuck with its lower gross profit margin, and its sales will
soon begin sliding downwards. It failed to update its business plan. Company B isn't even
making more in sales, but it has revolutionized its business model, and that has greatly reduced
its costs.
The airline industry is a good place to look to find a business model that stopped making
sense. It includes companies that have suffered heavy losses and even bankruptcy.
For years, major carriers such as American Airlines, Delta, and Continental built their
businesses around a "hub-and-spoke" structure, in which all flights were routed through a
handful of major airports. By ensuring that most seats were filled most of the time, the business
model produced big profits. But a competing business model arose that made the strength of the
major carriers a burden. Carriers like Southwest and JetBlue shuttled planes between smaller
airports at a lower cost. They avoided some of the operational inefficiencies of the hub-and-
spoke model while forcing labor costs down. That allowed them to cut prices, increasing demand
for short flights between cities.