0% found this document useful (0 votes)
8 views69 pages

Unit-2 Pda

This document provides an overview of data analytics, covering its definition, importance, and applications in various industries such as banking, healthcare, and e-commerce. It outlines the data analytics life cycle, including phases like discovery, data preparation, model planning, and operationalization, as well as popular tools and environments used in data analysis. Additionally, it distinguishes between different types of analytics, including descriptive, diagnostic, and predictive analytics.

Uploaded by

greekathena0501
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views69 pages

Unit-2 Pda

This document provides an overview of data analytics, covering its definition, importance, and applications in various industries such as banking, healthcare, and e-commerce. It outlines the data analytics life cycle, including phases like discovery, data preparation, model planning, and operationalization, as well as popular tools and environments used in data analysis. Additionally, it distinguishes between different types of analytics, including descriptive, diagnostic, and predictive analytics.

Uploaded by

greekathena0501
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

UNIT-2

Data Analytics

V.IndivaruTeja
ASSISTANT PROFESSOR
CMRCET
UNIT-2
Data Analytics:
Introduction to Analytics
Introduction to Tools and Environment
Application of Modeling in Business
Databases & Types of Data and variables
Data Modeling Techniques
Missing Imputations etc.
Need for Business Modeling.
INTRODUCTION
Introduction to Analytics:
➢ Analytics is a journey that involves a combination of
potential skills, advanced technologies, applications,
and processes used by firm to gain business insights
from data and statistics. This is done to perform
business planning.
➢ Data Analytics refers to the techniques used to
analyze data to enhance productivity and business
gain.
➢ Data is extracted from various sources and is cleaned
and categorized to analyze various behavioral
patterns. The techniques and the tools used vary
according to the organization or individual.
➢Data Analytics has a key role in improving your
business as it is used to gather hidden
insights, generate reports, perform market
analysis, and improve business requirements.
➢Data analytics is the process of inspecting,
transforming and Extract Meaningful Insights
from data for Decision making.
➢Data analytics is a scientific process of Convert
Data into Useful Information for Decision
Makers.
Role of Data Analytics: Gather Hidden Insights Hidden
insights from data are gathered and then analyzed
with respect to business requirements.
1.Generate Reports : Reports are generated from the
data and are passed on to the respective teams and
individuals to deal with further actions for a high
rise in business.
2.Perform Market Analysis : Market Analysis can be
performed to understand the strengths and weaknesses
of competitors. Improve Business Requirement – Analysis
of Data allows improving Business to customer
requirements and experience..
3.Gather Hidden Insights :Hidden insights from
data are gathered and then analyzed with
respect to business requirements.
4. Improve Business Requirement : Analysis of
Data allows improving Business to customer
requirements and experience
Data analytics vs data analysis:
Data analytics and data analysis are frequently used
interchangeably, data analysis is a subset of data analytics
concerned with examining, cleansing, transforming, and
modeling data to derive conclusions. Data analytics includes
the tools and techniques used to perform data analysis.
Data analytics vs.data science:
Data analytics and data science are closely related. Data
analytics is a component of data science, used to understand
what an organization’s data looks like. Generally, the output of
data analytics are reports and visualizations. Data science takes
the output of analytics to study and solve problems.
The difference between data analytics and data science is often
seen as one of timescale. Data analytics describes the current
or historical state of reality, whereas data science uses that data
to predict and/or understand the future.
Applications of Analytics
➢In today’s world, data rules the most
modern companies.
➢To gain such important insight into data as
a whole, it is important to analyze data
and draw specific information that can be
used to improve certain aspects of a
market or the business as a whole.
➢ There are several applications of data
analytics, and businesses are actively using
such data analytics applications to keep
themselves in the competition.
Fraud and Risk Detection: This has been known as one
of the initial applications of data science which was
extracted from the discipline of Finance.
➢ So many organizations had very bad experiences
with debt and were so fed up with it. Since they
already had data that was collected during the
time their customers applied for loans, they applied
data science which eventually rescued them from
the losses they had incurred.
➢ This led to banks learning to divide and conquer
data from their customers’ profiles, recent
expenditure and other significant information that
were made available to them.
➢ This made it easy for them to analyze and infer if
there was any probability of customers defaulting.
➢Policing/Security Several cities all over the
world have employed predictive analysis in
predicting areas that would likely witness a
surge in crime with the use of geographical
data and historical data.
➢Although, it is not possible to make arrests for
every crime committed but the availability of
data has made it possible to have police
officers within such areas at a certain time of
the day which has led to a drop in crime rate.
Applications of Data Analytics
1.Banking: Banks employ data analytics to manage risks, gather
insights into customer behavior, and personalize financial
services.
➢ Using data analytics, banks and credit card companies can
customize their offerings, identify potential fraud, and assess
potential client creditworthiness by analyzing customer
demographics, transaction data, and credit histories.
➢ Data analytics also helps banks spot money laundering
activities and boost regulatory compliance.
2.Cybersecurity: Data analytics is pivotal in cybersecurity by
detecting and preventing cyber-attacks and similar threats.
➢ Security systems analyze user behavior, network traffic, and
system logs to locate anomalies and possible security
breaches.
➢ By leveraging data analytics, businesses and other
organizations can proactively enhance their security
measures, detect threats, respond to them in real-time, and
protect sensitive data.
3.E-commerce:E-commerce platforms use data analytics to
understand customer behavior, optimize marketing
campaigns, and personalize shopping experiences.
➢ E-commerce companies offer personalized product
recommendations, target specific customer segments,
and improve customer satisfaction and retention by
analyzing customer preferences, purchase histories, and
browsing patterns.
4.Finance: Data analytics plays a vital role in investment
strategies, fraud detection, and risk assessment.
➢ Banks and other financial institutions analyze vast
volumes of data to predict creditworthiness, spot
suspicious transactions, and optimize their investment
portfolios.
➢ Data analytics allows finance companies to offer
personalized financial advice and develop creative
financial products and services.
5.Healthcare: Data analytics positively changes the
healthcare industry by offering better patient care,
disease prevention, and resource optimization.
➢ Hospitals can analyze patient data to spot high-risk
individuals and provide personalized treatment plans.
➢ Data analytics also helps find disease outbreaks,
monitor treatment effectiveness, and improve
healthcare operations.
6.Internet Searches: Data analytics powers Internet
search engines, letting users find relevant information
accurately and quickly.
➢ Search engines analyze colossal amounts of data (e.g.,
web pages, user queries, click-through rates) to deliver
the most relevant search results.
➢ Additionally, data analytics algorithms continuously
learn and adapt to individual user behaviors, providing
increasingly accurate and personalized search results.
7.Logistics: Data analytics is essential in managing fleet
operations, optimizing transportation routes, and
improving overall supply chain efficiency.
➢ Logistics companies can reduce delivery times, minimize
costs, and increase demand forecasting, inventory
management, and customer satisfaction by analyzing
delivery times, routes, and vehicle performance data.
8.Manufacturing: Data analytics revolutionizes
manufacturing by optimizing production processes,
improving predictive maintenance, and enhancing
product quality.
➢ Data analytics allows manufacturers to analyze sensor
data, machine performance, and historical maintenance
records to predict equipment failures, minimize
downtime, and guarantee efficient operations.
➢ Data analytics also allows manufacturers to monitor
production lines in real time, leading to higher
productivity and savings.
9.Retail: Data analytics is changing the retail industry by optimizing
pricing strategies, offering insights into customer preferences, and
improving inventory management. Retailers can use customer
feedback, analyze sales data, and market trends to identify popular
products, personalize offers, and forecast future demand. Data
analytics helps retailers improve their marketing efforts, increase
customer loyalty, and optimize store layouts.
10.Risk Management: Risks are common in the commercial world, and
data analytics is essential in risk management across many industries,
such as finance, insurance, and project management. Organizations
can use data analysis to assess risks, develop sound mitigation
strategies, and make informed decisions by analyzing market trends,
historical data, and external factors.
11.Supply Chain Management: The recent global pandemic has cast
supply chains in a new light. Consequently, data analytics improves
supply chain management by reducing costs, optimizing inventory
levels, and enhancing overall operational efficiency. Using data
analytics, organizations can analyze supply chain data to forecast
demand, identify bottlenecks, and improve logistics and distribution
processes. Data analytics also enhances transparency throughout the
supply chain.
Data Analytics Life Cycle
The Data analytic lifecycle is designed for Big Data
problems and data science projects. The cycle is
iterative to represent real project. To address the
distinct requirements for performing analysis on Big
Data, step – by – step methodology is needed to
organize the activities and tasks involved with
acquiring, processing, analyzing, and repurposing
data.
Phase 1: Discovery :–
The data science team learn and investigate the problem.
Develop context and understanding.
Come to know about data sources needed and available for the
project.
The team formulates initial hypothesis that can be later tested
with data.
Phase 2: Data Preparation: –
Steps to explore, preprocess, and condition data prior to
modeling and analysis.
It requires the presence of an analytic sandbox, the team execute,
load, and transform, to get data into the sandbox.
Data preparation tasks are likely to be performed multiple times
and not in predefined order.
Several tools commonly used for this phase are – Hadoop,
Alpine Miner, Open Refine, etc.
Phase 3: Model Planning: –
Team explores data to learn about relationships between variables
and subsequently, selects key variables and the most suitable
models. In this phase, data science team develop data sets for
training, testing, and production purposes.
Team builds and executes models based on the work done in the
model planning phase. Several tools commonly used for this
phase are – Matlab, STASTICA.
Phase 4: Model Building :–
Team develops datasets for testing, training, and production
purposes. Team also considers whether its existing tools will
suffice for running the models or if they need more robust
environment for executing models.
Free or open-source tools – Rand PL/R, Octave, WEKA.
Commercial tools – Matlab , STASTICA.
Phase 5: Communication Results: –
After executing model team need to compare outcomes of modeling to criteria
established for success and failure.
Team considers how best to articulate findings and outcomes to various team
members and stakeholders, taking into account warning, assumptions.
Team should identify key findings, quantify business value, and develop
narrative to summarize and convey findings to stakeholders.
Phase 6: Operationalize: –
The team communicates benefits of project more broadly and sets up pilot
project to deploy work in controlled way before broadening the work to full
enterprise of users.
This approach enables team to learn about performance and related constraints
of the model in production environment on small scale , and make
adjustments before full deployment.
The team delivers final reports, briefings, codes. Free or open source tools –
Octave, WEKA, SQL, MADlib
Data Analytics Tools
Popular Data Analytics Tools
1.Excel: Microsoft Excel is one of the oldest, most well-known software
applications for data analysis. In addition to providing spreadsheet
functions to manage and organize vast data sets, Excel offers graphing
tools and computing capabilities such as automated summation
(AutoSum).
➢ Excel also includes the Analysis ToolPak, which includes data analysis
tools that perform regression, variance, and statistical analysis.
➢ Furthermore, Excel’s versatility and simplicity make it a robust data
analysis tool ideal for sorting, filtering, cleaning, managing, analyzing,
and visualizing data. Every aspiring data analyst should be proficient in
Excel.
2.Jupyter Notebook: Jupyter Notebook is a web-based interactive
environment where data analysts can share computational documents
or “notebooks.”
➢ Data analysts use Jupyter Notebooks to clean data, write and run code,
conduct data visualization, perform machine learning and statistical
analysis, and function with many other kinds of data analysis.
➢ Furthermore, Jupyter Notebook lets analysts merge data visualizations,
code, comments, and various programming languages into one place to
document the data analysis process better and share it with other
teams or stakeholders.
3.MySQL: MySQL is a popular open-source relational database
management system (also called RDBMS) that stores application data,
primarily web-based.
➢ Popular websites like Facebook, Twitter, and YouTube have used
MySQL.
➢ Structured Query Language (or SQL) is most often used for relational
database management system management, typically employing
relational databases often structured into tables.
4.Python: Python is consistently ranked as one of the most popular
programming languages in the world. Unlike other programming
languages today, Python is relatively simple to learn and can be
employed in many different tasks, including software, web
development, and data analysis.
➢ Data analysts use Python to streamline, analyze, model, and visualize
data using built-in analytics tools.
➢ Python also offers data analytics professionals access to libraries like
Pandas and Numpy, which provide powerful analytics-related tools.
Python is another application that new data analysts should be highly
familiar with.
5.R: R is an open-source programming language typically used for
statistical computing and graphics. R is regarded as a relatively easy-to-
learn programming language like the Python above.
➢ R is usually used for data visualization, statistical analysis, and data
manipulation. Additionally, R’s statistical focus makes it well-suited for
statistical calculations.
➢ In contrast, the included visualization tools make it the ideal language
for creating compelling graphics such as graphs and scatter plots.
Along with Python, R is one of the most essential programming
languages data analysts use.
6.Tableau: Tableau is a data visualization application for business analytics
and business intelligence.
➢ Tableau is one of the most popular data visualization platforms in
today’s business world, mainly because it boasts an easily understood
user interface and smoothly turns data sets into understandable
graphics.
➢ Business users like Tableau because it’s easy to use, and data analysts
like it because it has powerful tools that perform advanced analytics
functions like cohort analysis, predictive analysis, and segmentation.
Data Analytics Environments
a. Integrated Development Environments (IDEs)
b. Cloud Platforms
c. Collaborative Tools
a. Integrated Development Environments (IDEs)
i) Jupyter Notebook: An open-source web
application that allows you to create and share
documents that contain live code, equations,
visualizations, and narrative text. It is particularly
popular for Python and R.
ii) RStudio: An IDE for R that provides tools for data
analysis, including a console, syntax-highlighting
editor, and tools for plotting and history tracking.
iii) PyCharm: A Python IDE that supports
development for data analytics, with features like
code completion, debugging, and integrated tools
for scientific computing.
b. Cloud Platforms
i) Google Cloud Platform (GCP): It Offers tools like
BigQuery for large-scale data analysis, as well as
other data storage and processing solutions.
ii) Amazon Web Services (AWS): It Provides a range
of data analytics services such as Amazon
Redshift for data warehousing and AWS Glue for
ETL.
iii) Microsoft Azure: It Includes Azure Synapse
Analytics for integrating big data and data
warehousing and Azure Machine Learning for
building and deploying machine learning models.
c. Collaborative Tools
i) GitHub/GitLab: Platforms for version control
and collaboration on code projects. They allow
multiple users to work on the same project
and keep track of changes.
ii) Slack/Microsoft Teams: Communication tools
that facilitate collaboration and sharing of
insights among data analysts and teams.
TYPES OF DATA ANALYTICS
1.Descriptive Analytics

What is happening?

❖ The help of descriptive analysis,


we analyze and describe the
features of a data.
❖ It deals with the summarization of
information.
❖ In the descriptive analysis, we
deal with the past data to draw
conclusions and present our data
in the form of actionable results.
2.Diagnostic Analytics

Why is it happening?
❖ Diagnostic analytics is a form of
advanced analytics that examines
data or content to answer the
question,

❖ It is characterized by techniques
such as data discovery, data
mining and correlations.
3.Predictive Analyitcs
What is likely to happen?
❖ With the help of predictive analysis,
determine the future outcome.
❖ Based on the analysis of the historical
data, we are able to forecast the future.
❖ With the help of data analytics,
technological advancements and machine
learning, we are able to obtain predictive
about the future effectively.
❖ Predictive analytics is a complex field that
requires a large amount of data, skilled
implementation of predictive models.
❖ Its tuning to obtain accurate predictions
4. Prescriptive Analytics
What do I need to do?
❖ Understanding of
❖ what has happened,
❖ why it has happened.
❖ variety of “what-might-
happen” analysis.
❖ help the user determine the
best solutions of action to
take.
❖ Prescriptive analysis is
typically not just with one
individual action but is in fact
a host of other actions.
❖ Best route home and
considering the distance of
each route, the speed
TYPES OF DATA ANALYTICS
Various steps involved in Analytics:
1.Access
2.Manage
3.Analyze
4.Report
Types of Data Models

They are 5 types of Data Models.


1.Hierarchical Model
2.Relational Model
3.Network Model
4.Object-Oriented Model
5.Entity-Relationship Model
1.Hierarchical Model
➢ Organizes data in a tree-like structure with parent-child relationships, suitable
for data with clear hierarchical structures.
➢ As the name indicates, this model makes use of hierarchy to structure the
data in a tree-like format. However, retrieving and accessing data is difficult in
hierarchical model.
➢ The hierarchy starts from the root which has root data and then it expands in the
form of a tree adding child node to the parent node. This model easily represents
some of the real-world relationships like food recipes, sitemap of a website etc.
Advantages:
1.It is very simple and fast to traverse through a tree-like structure.
2.Any change in the parent node is automatically reflected in the child node so, the
integrity of data is maintained.
Disadvantages:
1.Complex relationships are not supported
2.As it does not support more than one parent of the child node so if we have some
complex relationship where a child node needs to have two parent node then
that can't be represented using this model. If a parent node is deleted then the
child node is automatically deleted.
Hierarchical Model Example
2. Relational Model
➢ Relational Model is the most widely used model. In this model, the
data is maintained in the form of a two-dimensional table.
➢ All the information is stored in the form of row and columns with
relationships between tables established using foreign keys ideal for
structured data and complex queries using SQL.
➢ The basic structure of a relational model is tables.
Advantages:
1.Simple: This model is more simple as compared to the network and
hierarchical model.
2.Scalable: This model can be easily scaled as we can add as many rows
and columns we want.
3.Structural Independence: We can make changes in database structure
without changing the way to access the data. When we can make
changes to the database structure without affecting the capability
to DBMS to access the data we can say that structural
independence has been achieved.
Relational Model Example
3. Network Model
➢ The network model is an extension of the hierarchical model.
However, unlike the hierarchical model, this model makes it easier
to convey complex relationships as each record can be linked with
multiple parent records.
➢ Allows complex relationships between entities, where a record
can have multiple parent records.
➢ The network model is a database model conceived as a flexible way
of representing objects and their relationships. Its distinguishing
feature is that the schema, viewed as a graph in which object types
are nodes and relationship types are arcs, is not restricted to being a
hierarchy or lattice.
Advantages:
1.The data can be accessed faster as compared to the hierarchical
model. This is because the data is more related in the network
model and there can be more than one path to reach a particular
node. So the data can be accessed in many ways.
2.As there is a parent-child relationship so data integrity is present.
Any change in parent record is reflected in the child record.
Disadvantages:
1.As more and more relationships need to be
handled the system might get complex. So, a
user must be having detailed knowledge of
the model to work with the model.
2.Any change like updation, deletion, insertion is
very complex.
Network Model Example
4.Object oriented Model
➢ This model consists of a collection of objects,
each with its own features and methods. This
type of model is also called the post relational
database model.
➢ Represents data as objects with attributes and
methods,mirroring object-oriented
programming concepts.
➢ The real-world problems are more closely
represented through the object- oriented data
model.
➢ In this model, two are more objects are
connected through links. We use this link to
relate one object to other objects
In the above example, we have two objects Employee and
Department. All the data and relationships of each object are
contained as a single unit.
The attributes like Name, Job_title of the employee and the
methods which will be performed by that object are stored as a
single object. The two objects are connected through a common
attribute i.e the Department_id and the communication between
these two will be done with the help of this common id.
5.Entity-relationship
➢ Entity-Relationship Model or simply ER Model is a high-level data
model diagram.
➢ In this model, we represent the real-world problem in the
pictorial form to make it easy for the stakeholders to understand.
➢ An entity could be anything – a concept, a piece of data, or an
object.
Advantages:
i) Simple: Conceptually ER Model is very easy to build. If we know
the relationship between the attributes and the entities we can
easily build the ER Diagram for the model.
ii) Effective Communication Tool: This model is used widely by the
database designers for communicating their ideas.
iii) Easy Conversion to any Model: This model maps well to the
relational model and can be easily converted relational model
by converting the ER model to the table. This model can also
be converted to any other model like network model,
hierarchical model etc.
Disadvantages:
i) No industry standard for notation: There is no
industry standard for developing an ER model. So
one developer might use notations which are
not understood by other developers.
ii) Hidden information: Some information might be
lost or hidden in the ER model. As it is a high-level
view so there are chances that some details of
information might be hidden.
1.Structured data
➢ Structured data is particularly useful when you’re
dealing with discrete, numeric data.
➢ Examples of this type of data include financial
operations, sales and marketing figures, and
scientific modeling.
➢ You can also use structured data in any case
where records with multiple, short-entry text,
numeric, and enumerated fields are required,
such as HR records, inventory listings, and
housing data.
➢ Example: Relational data.
Example of Structured data
2.Semi-Structured data
➢Semi-structured data is information that does
not reside in a relational database but that has
some organizational properties that make it
easier to analyze.
➢ With some processes, you can store them in
the relation database (it could be very hard for
some kind of semi-structured data), but Semi-
structured exist to ease space.
➢Example: XML data.
Example
3.Unstructured data

➢ Unstructured data is used when a record is


required and the data won’t fit into a structured
data format.
➢ Examples include video monitoring, company
documents, and social media posts. You can also
use unstructured data where it isn’t efficient to
store the data in a structured format, such as
Internet of Things (IoT) sensor data, computer
system logs, and chat transcripts.
Example of Unstructured data
Differences between Structured semi
Structured Unstructured data
Types of Data

(Categorical) (Numerical)
Data Types are an important concept of
statistics, which needs to be understood, to
correctly apply statistical measurements to
your data and therefore to correctly conclude
certain assumptions about it.
➢There are two types of variables you'll find in
your data
1.Numerical(Quantitative)
2. Categorical(Qualitative)
TYPES OF DATA

Binary
1. Quantitative data (Numerical data): It deals
with numbers and things you can measure
objectively: dimensions such as height,
width, and length.
➢ Temperature and humidity, Prices, Area and
volume.
➢ Numerical data is information that is
measurable, and it is, of course, data
represented as numbers and not words or
text.
Numerical data can be divided into continuous or discrete
values.
a) Continuous Data: Continuous Data represents
measurements and therefore their values can’t be
counted but they can be measured.
➢ Continuous numbers are numbers that don’t have a
logical end to them.
➢ An example would be the height of a person, which you
can describe by using intervals on the real number line.
b) Discrete Data: We speak of discrete data if its values are
distinct and separate. In other words: We speak of
discrete data if the data can only take on certain values.
➢ Discrete numbers are the opposite; they have a logical
end to them.
➢ Some examples include variables for days in the month, or
number of bugs logged.
2. Categorical Data :
➢Categorical data represents characteristics. This is
any data that isn’t a number, which can mean a
string of text or date.
➢Therefore it can represent things like a person’s
gender, language etc.
➢These variables can be broken down into nominal
and ordinal values.
a) Nominal Data :Nominal values represent
discrete units and are used to label variables,
that have no quantitative value. Nominal
value examples include variables such as
“Country” or “Marital Status”.

Example:
b) Ordinal data : Ordinal values represent
discrete and ordered units. It is therefore
nearly the same as nominal data, except that
it’s ordering matters.
➢Examples of ordinal values include having a
priority on a bug such as “Critical” or “Low” or
the ranking of a race as “First” or “Third”.
Example:
c) Binary data : In addition to ordinal and
nominal values, there is a special type of
categorical data called binary. Binary data
types only have two values – yes or no.
➢This can be represented in different ways such
as “True” and “False” or 1 and 0.
➢Examples of binary variables can include
whether a person has stopped their
subscription service or not, or if a person
bought a car or not.
Missing Imputations: In R, missing values are represented by
the symbol NA (not available).
➢ Impossible values (e.g.,dividing by zero) are represented
by the symbol NaN (not a number). To remove missing
values from our dataset we use na.omit() function.
For Example:
➢ We can create new dataset without missing data as below:
newdata<-na.omit(mydata)
Or
➢ we can also use “na.rm=TRUE” in argument of the
operator. From above example we use na.rm and get
desired result.
x <- c(1,2,NA,3)
mean(x, na.rm=TRUE)
# returns 2
Missing Imputations (MICE Package) :
➢MICE : MICE (Multivariate Imputation via
Chained Equations) is one of the commonly
used package by R users.
➢ Creating multiple imputations as compared to
a single imputation takes care of uncertainty in
missing values.
➢The mice package implements a method to
deal with missing data.
➢The MICE algorithm can impute mixes of
continuous, binary, unordered categorical and
ordered categorical data.
For example:
➢ Suppose we have X1, X2….Xk variables. If X1 has
missing values, then it will be regressed on other
variables X2 to Xk.
➢ The missing values in X1 will be then replaced by
predictive values obtained. Similarly, if X2 has missing
values, then X1, X3 to Xk variables will be used in
prediction model as independent variables.
➢ Later, missing values will be replaced with predicted
values. mice package has a function known as
md.pattern(). It returns a tabular form of missing value
present in each variable in a data set.
Syntax:
imputed_Data<- mice(data, m=5, maxit = 5,
method = ‘NULL', seed = NA)
Precisely, the methods used by this package are:

1. PMM (Predictive Mean Matching)


– For numeric variables.

2. logreg(Logistic Regression)
– For Binary Variables( with 2 levels)

3. polyreg
– For Factor Variables (>= 2 levels)

4. Proportional odds model


(ordered, >= 2 levels)
Application of Modeling in Business
➢ A statistical model embodies a set of assumptions
concerning the generation of the observed data,
and similar data from a larger population.
➢ A model represents, often in considerably
idealized form, the data-generating process.
➢ Signal processing is an enabling technology that
encompasses the fundamental theory,
applications, algorithms, and implementations of
processing or transferring information contained
in many different physical, symbolic, or abstract
formats broadly designated as signals.
➢It uses mathematical, statistical,
computational, heuristic representations,
formalisms, and techniques for
representation, modeling, analysis, synthesis,
discovery, recovery, sensing, acquisition,
extraction, learning, security, or forensics.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy