0% found this document useful (0 votes)

9 views28 pages

Unit - 1 Introduction-Statistical Inference (1)

The document provides an overview of statistical inference and modeling, explaining the distinction between descriptive and inferential statistics, as well as the concepts of population and sample. It discusses various statistical models, including linear regression and hypothesis testing, and highlights their applications in fields like medical research and market analysis. Additionally, it emphasizes the importance of model fitting to ensure accurate predictions and insights from data.

Uploaded by

littlegost384

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views28 pages

Unit - 1 Introduction-Statistical Inference (1)

Uploaded by

littlegost384

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Introduction to Statistical

Inference, Statistical Models

Statistical Inference
• Statistics is a branch of Mathematics, that deals with the collection, analysis,
interpretation, and presentation of numerical data. In other words, it is defined as the
collection of quantitative data.
• The main purpose of Statistics is to make an accurate conclusion using a limited sample
about a greater population.
• Statistics can be classified into two different categories. The two different types of
Statistics are:
➢ Descriptive Statistics
➢ Inferential Statistics
• In Statistics, descriptive statistics describe the data, whereas inferential statistics help
you make predictions from the data.
• In inferential statistics, the data are taken from the sample and allows you to generalize
the population. In general, inference means “guess”, which means making an inference
about something. So, Statistical Inference means, making inference about the
population.
Population
• In statistics, population is the entire set of items from which you draw data for a
statistical study. It can be a group of individuals, a set of items, etc. It makes up the data
pool for a study.
• Generally, population refers to the people who live in a particular area at a specific time.
But in statistics, population refers to data on your study of interest. It can be a group of
individuals, objects, events, organizations, etc.
Population
• An example of a population would be the entire student body at a school. It would
contain all the students who study in that school at the time of data collection.
Depending on the problem statement, data from each of these students is collected. An
example is the students who speak Hindi among the students of a school.
• For the above situation, it is easy to collect data. The population is small and willing to
provide data and can be contacted. The data collected will be complete and reliable.
• If you had to collect the same data from a larger population, say the entire country of
India, it would be impossible to draw reliable conclusions because of geographical and
accessibility constraints, not to mention time and resource constraints. A lot of data
would be missing or might be unreliable. Furthermore, due to accessibility issues,
marginalized tribes or villages might not provide data at all, making the data biased
toward certain regions or groups.
Sample
• A sample is defined as a smaller and more manageable representation of a larger group.
A subset of a larger population that contains characteristics of that population. A sample
is used in statistical testing when the population size is too large for all members or
observations to be included in the test.
• The sample is an unbiased subset of the population that best represents the whole data.
• To overcome the restraints of a population, you can sometimes collect data from a subset
of your population and then consider it as the general norm. You collect the subset
information from the groups who have taken part in the study, making the data reliable.
The results obtained for different groups who took part in the study can be extrapolated
to generalize for the population.
Sample

The process of collecting data from a small subsection of the population and then
using it to generalize over the entire set is called Sampling.
Sample
Samples are used when :
• The population is too large to collect data.
• The data collected is not reliable.
• The population is hypothetical and is unlimited in size. Take the example of a study that
documents the results of a new medical procedure. It is unknown how the procedure will
affect people across the globe, so a test group is used to find out how people react to it.

A sample should generally :

• Satisfy all different variations present in the population as well as a well-defined
selection criterion.
• Be utterly unbiased on the properties of the objects being selected.
• Be random to choose the objects of study fairly.
Sample and Population
• Population: The entire set of possible cases
• Sample: A subset of the population from which data are collected
• Statistic: A measure concerning a sample (e.g., sample mean)
• Parameter: A measure concerning a population
(e.g., population mean)
Sample and Population
Example: Student Housing
A survey is carried out at Australia to estimate the proportion of all undergraduate students
living at home during the current term. Of the 3,838 undergraduate students enrolled at the
campus, a random sample of 100 was surveyed.

Population: All 3,838 undergraduate students at Australia

Sample: The 100 undergraduate students surveyed
We can use the data collected from the sample of 100 students to make inferences about
the population of all 3,838 students.
Population and Sample Formulas
Population Parameter:
Mean: μ = (ΣX) / N, where ΣX is the sum of all values in the population and N is the size of the
population

Standard Deviation: σ = √[(Σ(X-μ)²) / N], where X is a value in the population, μ is the population
mean, and N is the size of the population

Sample Statistic:
Mean: x̄ = (Σx) / n, where Σx is the sum of all values in the sample and n is the size of the sample

Standard Deviation: s = √[(Σ(x-x̄)²) / (n-1)], where x is a value in the sample and x̄ is the sample
mean

Note that the formulas for the population parameter and sample statistic are similar, but they use
different notation and have slightly different calculations. The population parameter uses the entire
population, while the sample statistic uses a subset (i.e., sample) of the population.
Examples of Statistical Inference Using Population
and Sample Data
Statistical inference using population and sample data can be applied in various fields.
Here are some examples:

• Medical Research: In medical research, clinical trials are conducted on a sample of the
population to estimate the effects of a drug or treatment. Statistical inference is used to
estimate the effect size and determine the probability that the results are due to chance.

• Market Research: In market research, a sample of customers is surveyed to estimate

the demand for a product or service. Statistical inference is used to estimate the
proportion of the population that would be interested in the product or service.

• Quality Control: In quality control, a sample of products is tested to estimate the

proportion of defective items in the population. Statistical inference is used to determine
whether the proportion of defects in the sample is significantly different from the
population.
Examples of Statistical Inference Using Population
and Sample Data
• Political Polling: In political polling, a sample of voters is surveyed to estimate the
proportion of voters who support a candidate or party. Statistical inference is used to
estimate the margin of error and determine the probability of a candidate winning the
election.

In all these examples, statistical inference using population and sample data is used to
draw conclusions or make predictions about the population of interest. By using
probability theory and statistical methods, researchers can estimate population parameters,
such as proportions or means, and determine the likelihood that the results are due to
chance.
Statistical Inference
The main types of statistical inference are:
• Estimation
• Hypothesis testing
Estimation
• Statistics from a sample are used to estimate population parameters.
• The most likely value is called a point estimate.
• There is always uncertainty when estimating.
• The uncertainty is often expressed as confidence intervals defined by a likely lowest
and highest value for the parameter.

• An example could be a confidence interval for the number of bicycles a Dutch person
owns:
"The average number of bikes a Dutch person owns is between 3.5 and 6."
Hypothesis Testing
• Hypothesis testing is a method to check if a claim about a population is true. More
precisely, it checks how likely it is that a hypothesis is true is based on the sample data.
• The steps of the test depends on:
• Type of data (categorical or numerical)
• If you are looking at:
❑ A single group
❑ Comparing one group to another
❑ Comparing the same group before and after a change

Some examples of claims or questions that can be checked with hypothesis testing:
• 90% of Australians are left handed
• Is the average weight of dogs more than 40kg?
• Do doctors make more money than lawyers?
What is a Data Model?
• A data model organizes data elements and standardizes how the data elements relate to
one another. Since data elements document real life people, places and things and the
events between them, the data model represents reality. For example a house has many
windows or a cat has two eyes.
• Data models are often used as an aid to communication between the business people
defining the requirements for a computer system and the technical people defining the
design in response to those requirements. They are used to show the data needed and
created by business processes.
• A data model explicitly determines the structure of data. Data models are specified in
a data modeling notation, which is often graphical in form.
• The creation of the data model is the critical first step that must be taken after business
requirements for analytics and reporting have been defined.
• A data model can be sometimes referred to as a data structure, especially in the context
of programming languages. Data models are often complemented by function models.
• A model is an artificial construction where all extraneous detail has been removed or
abstracted.
What is Statistical Modeling?
• Statistical modeling is the process of describing the connections between variables in a
dataset using mathematical equations and statistical approaches.
• Statistical modeling is used to identify correlations between variables, generate
predictions, and influence decision-making in a range of professions and sectors. It can
be used in any case where we wish to improve our understanding of how different
variables are connected and make predictions based on that information.
What is Statistical Modeling?
• Predicting the number of people who will travel on a specific rail route is an example of
statistical modeling. To develop a statistical model, we would collect data on the number
of passengers who utilize the train route over time, as well as data on variables that
might affect passenger counts, such as time of day, day of the week, and weather.
• Then, using statistical approaches such as regression analysis, we can determine the
correlations between these factors and the number of passengers utilizing the railway
route. For example, we might discover that the number of passengers is larger during
rush hour and on weekdays, and fewer when it is raining. We can apply this data to build
a statistical model that forecasts the number of people who would use the railway route
depending on the time of day, day of the week, and weather conditions. This model can
then be used to anticipate future passenger numbers and make resource allocation
choices, such as adding additional trains during rush hour or giving specials during
severe weather.
• It is essential in statistical modeling to pick an appropriate statistical model that fits the
data and to evaluate the model to ensure accuracy and reliability. This might include
running the model on a new set of data or employing statistical tests to assess the
model’s performance.
Types of Statistical Models
• There are several statistical models, each designed to solve a specific research issue or data
format. Here are a few common types of statistical models and their applications:
• Linear regression models: These models are used to represent the connection between a
continuous result variable and one or more predictor variables. For example, depending on a
person’s height, age, and gender, a linear regression model may be used to estimate their weight.
• Logistic regression models: Logistic regression models are used to represent the connection
between a binary outcome variable (for example, yes/no) and one or more predictor variables.
For example, depending on age, blood pressure, and cholesterol levels, a logistic regression
model may be used to predict if a patient would have a heart attack.
• Time series models: Time series models are used to model data that changes over time, such as
stock prices, weather trends, or monthly sales numbers. These types of models may be applied to
data to find trends, seasonal patterns, and other forms of temporal correlations.
• Multilevel models: These models are used to model data having a hierarchical structure, such as
pupils in schools or patients in hospitals. Multilevel models can be used to investigate how
individual-level and group-level factors impact outcomes, as well as to account for the fact that
people in the same group may be more similar to each other than those in different groups.
Types of Statistical Models
• Structural equation models: These types of models are used to represent complicated
interactions between several variables. Structural equation models can be used to
evaluate ideas regarding causal links between variables and to quantify their strength and
direction.
• Clustering models: Clustering models are used to bring together comparable
observations based on their similarities in terms of features. Clustering algorithms can be
used to uncover patterns in data that would be difficult to detect using other approaches.
What is Linear Regression?
• Linear regression is a type of statistical analysis used to predict the relationship between
two variables.
• It assumes a linear relationship between the independent variable and the dependent
variable, and aims to find the best-fitting line that describes the relationship.
• The line is determined by minimizing the sum of the squared differences between the
predicted values and the actual values.
Simple Linear Regression
• In simple linear regression, there is one independent variable and one dependent
variable. The model estimates the slope and intercept of the line of best fit, which
represents the relationship between the variables.
• The slope represents the change in the dependent variable for each unit change in the
independent variable, while the intercept represents the predicted value of the
dependent variable when the independent variable is zero.
• Linear regression shows the linear relationship between the independent(predictor)
variable i.e. X-axis and the dependent(output) variable i.e. Y-axis, called linear
regression. If there is a single input variable X(independent variable), such linear
regression is called simple linear regression.
Simple Linear Regression
• To calculate best-fit line linear regression uses a traditional slope-intercept form which is
given below,
Yi = β0 + β1Xi
where Yi = Dependent variable, β0 = Intercept, β1 = Slope, Xi = Independent variable.
• In regression, the difference between the observed value of the dependent variable(yi)
and the predicted value(predicted) is called the residuals or random error.

• εi = ypredicted – yi

• where ypredicted = B0 + B1 Xi
Fitting a model
• Model Fitting is a measurement of how well a machine learning model adapts to data that is
similar to the data on which it was trained.
• The fitting process is generally built-in to models and is automatic.
• A well-fit model will accurately approximate the output when given new data, producing
more precise results. A model is fitted by adjusting the parameters within the model, leading
to improvements in accuracy. During the fitting process, the algorithm is run on test data,
otherwise known as “labeled” data. Once the algorithm has finished running, the results
need to be compared to real and observed values of the target variable, in order to identify
the accuracy of the model.
• Using the results, the parameters of the algorithm can be further adjusted to better uncover
relationships and patterns between the inputs, outputs, and targets. The process can be
repeated until valid and accurate insights are given.
What Does a Well-Fit Model Look Like?
• A well-fit model should closely match the available data while also closely matching the
general curves of the model. No model will be able to perfectly match the input data, but a
well-fit model will be able to closely match the data and general shapes. In the image below,
it is important to note that the line does not match every individual data point, but does
capture the general curve.
Fitting a model
• Model Fitting is a measurement of how well a machine learning model adapts to data that is
similar to the data on which it was trained.
• The fitting process is generally built-in to models and is automatic.
• A well-fit model will accurately approximate the output when given new data, producing
more precise results. A model is fitted by adjusting the parameters within the model, leading
to improvements in accuracy. During the fitting process, the algorithm is run on test data,
otherwise known as “labeled” data. Once the algorithm has finished running, the results
need to be compared to real and observed values of the target variable, in order to identify
the accuracy of the model.
• Using the results, the parameters of the algorithm can be further adjusted to better uncover
relationships and patterns between the inputs, outputs, and targets. The process can be
repeated until valid and accurate insights are given.
Why is model fitting important?
Underfitting
Underfitting occurs when a model oversimplifies the data and fails to capture enough information on
the relationships within it. This is frequently a result of insufficient model training time. An underfit
model can be identified when a model performs poorly on the training data.

Overfitting
Overfitting is the opposite of underfitting. It transpires when a model is overly sensitive to the data
within, which results in an over-analysis of the patterns within. Overfitting is generally a result of
overtraining on training data sets. It can be identified when a model performs well on the data used for
training, but does not adapt to new data and performs poorly.

A Well-Fit Model
A Well-fit model performs well on training data and on evaluation data, due to correct
hyperparameters that capture the relationships between the variables and target variables. Generally,
fitting is an automatic process where the hyperparameters are adjusted individually and automatically
to best suit the data inputted. The use of well-fit models enables users to make better decisions and
draw accurate insights
Why is model fitting important?

Week 01statistics
No ratings yet
Week 01statistics
47 pages
Unit 3
No ratings yet
Unit 3
167 pages
Unit-1-Introduction To Statistical Analysis
No ratings yet
Unit-1-Introduction To Statistical Analysis
103 pages
Agb 1 &2
No ratings yet
Agb 1 &2
204 pages
Statistics Book
No ratings yet
Statistics Book
170 pages
Population Sample Parameter
No ratings yet
Population Sample Parameter
16 pages
Inferential Statistics
No ratings yet
Inferential Statistics
169 pages
Math236_Lecture_1 (1)
No ratings yet
Math236_Lecture_1 (1)
47 pages
Statistics
No ratings yet
Statistics
45 pages
Presentation 1
No ratings yet
Presentation 1
88 pages
Statistics Definition of Terms
No ratings yet
Statistics Definition of Terms
47 pages
'MATH 233 Statistics for Social Sciences_Week 1' D_241029_161224
No ratings yet
'MATH 233 Statistics for Social Sciences_Week 1' D_241029_161224
110 pages
MFCS
No ratings yet
MFCS
53 pages
Cornelissen - Functional Traits - 2003 PDF
100% (1)
Cornelissen - Functional Traits - 2003 PDF
46 pages
Lecture 2
No ratings yet
Lecture 2
24 pages
Applications of Inference Statistics
No ratings yet
Applications of Inference Statistics
28 pages
Lecture 1
No ratings yet
Lecture 1
34 pages
Economics Sem 1Lecture Notes Introduction to Statistics (1)
No ratings yet
Economics Sem 1Lecture Notes Introduction to Statistics (1)
90 pages
BRIEF HISTORY OF STATISTICS
No ratings yet
BRIEF HISTORY OF STATISTICS
15 pages
MATH30-6-Lecture-1-1
No ratings yet
MATH30-6-Lecture-1-1
32 pages
مبادئ الاحصاء
No ratings yet
مبادئ الاحصاء
66 pages
Stats Unit I Notes
No ratings yet
Stats Unit I Notes
24 pages
8 Agr Niwe
No ratings yet
8 Agr Niwe
22 pages
0000 Statistics Lecture 1
No ratings yet
0000 Statistics Lecture 1
25 pages
Statistics
No ratings yet
Statistics
57 pages
1B. Topic 1_Introduction to Statistics_16_04_2009
No ratings yet
1B. Topic 1_Introduction to Statistics_16_04_2009
26 pages
Statistics-Glossary CSE
No ratings yet
Statistics-Glossary CSE
13 pages
JHE-74-1-3-001-21-3307-Akinyemi-O-Tx[1]
No ratings yet
JHE-74-1-3-001-21-3307-Akinyemi-O-Tx[1]
7 pages
Intro123243ewqs1
No ratings yet
Intro123243ewqs1
37 pages
Introduction
No ratings yet
Introduction
37 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
18 pages
C 7 06barros Proof Approv
No ratings yet
C 7 06barros Proof Approv
42 pages
1.Population and Sample
No ratings yet
1.Population and Sample
9 pages
Chapter - 2-ML
No ratings yet
Chapter - 2-ML
63 pages
NISTIR6919 Scale Uncertainty
No ratings yet
NISTIR6919 Scale Uncertainty
82 pages
Statistics For Managers Using Microsoft Excel: 4 Edition
No ratings yet
Statistics For Managers Using Microsoft Excel: 4 Edition
45 pages
IRJMETS(AV)
No ratings yet
IRJMETS(AV)
7 pages
Module 1
No ratings yet
Module 1
108 pages
Electricity Price forecasting-ARIMA Model Approach
No ratings yet
Electricity Price forecasting-ARIMA Model Approach
5 pages
Ch 1 Lecture Notes
No ratings yet
Ch 1 Lecture Notes
10 pages
Cambridge O Level: Mathematics (Syllabus D) 4024/21
No ratings yet
Cambridge O Level: Mathematics (Syllabus D) 4024/21
20 pages
Tuesday, 16 January 2024 2:58 PM
No ratings yet
Tuesday, 16 January 2024 2:58 PM
46 pages
Employing SWOT Analysis and Normal Cloud Model For Water Resource Sustainable Utilization Assessment and Strategy Development
No ratings yet
Employing SWOT Analysis and Normal Cloud Model For Water Resource Sustainable Utilization Assessment and Strategy Development
23 pages
Appendix A Statistical Tables and Charts
No ratings yet
Appendix A Statistical Tables and Charts
26 pages
Trend, Variation, and Universal Kriging
No ratings yet
Trend, Variation, and Universal Kriging
12 pages
A1 Chapter 1 MAT530 As Latest 5 May 2012
No ratings yet
A1 Chapter 1 MAT530 As Latest 5 May 2012
47 pages
STATISTICS
No ratings yet
STATISTICS
8 pages
Inferential Statistices ch5
No ratings yet
Inferential Statistices ch5
18 pages
1 Introduction To Statistics
No ratings yet
1 Introduction To Statistics
2 pages
Business Statistics
No ratings yet
Business Statistics
25 pages
Inbound 2135293801498926123
No ratings yet
Inbound 2135293801498926123
4 pages
Lesson+1+Introduction+to+Statistics
No ratings yet
Lesson+1+Introduction+to+Statistics
12 pages
G7 Math Frequency
No ratings yet
G7 Math Frequency
11 pages
Long Term Analysis of 20 MHZ Solar Radio Events
No ratings yet
Long Term Analysis of 20 MHZ Solar Radio Events
8 pages
Topic 1 ELEMENTARY STATISTICS
No ratings yet
Topic 1 ELEMENTARY STATISTICS
29 pages
01 SPSS
No ratings yet
01 SPSS
14 pages
Lesson 1
No ratings yet
Lesson 1
18 pages
Math Module 1 Module 2
No ratings yet
Math Module 1 Module 2
7 pages
Rosalie Act. 2.0
No ratings yet
Rosalie Act. 2.0
9 pages
Statistics ClassNotes - 1
No ratings yet
Statistics ClassNotes - 1
9 pages
Chapter 1: Introduction To Statistics: 1.1 An Overview of Statistics
No ratings yet
Chapter 1: Introduction To Statistics: 1.1 An Overview of Statistics
5 pages
Introduction To Econometrics - Stock & Watson - CH 4 Slides
100% (2)
Introduction To Econometrics - Stock & Watson - CH 4 Slides
84 pages
STATISTICS Lecture 1
No ratings yet
STATISTICS Lecture 1
5 pages
1 Chapt 1 Part 1
No ratings yet
1 Chapt 1 Part 1
41 pages
Sap Demand Planning (DP) 26125070 PDF
100% (2)
Sap Demand Planning (DP) 26125070 PDF
40 pages
To Statistics
No ratings yet
To Statistics
85 pages
Introduction To Statistics Web
No ratings yet
Introduction To Statistics Web
18 pages
Linear Regression
100% (2)
Linear Regression
228 pages
Chapter 1
No ratings yet
Chapter 1
9 pages
Statistics and Probability - Midterm Reviewer
No ratings yet
Statistics and Probability - Midterm Reviewer
12 pages
Stat Basic Definitions
No ratings yet
Stat Basic Definitions
4 pages
Method For Calculating Schedule Delay Considering
100% (1)
Method For Calculating Schedule Delay Considering
11 pages
To The Program On: Seven QC Tools & QC Story
No ratings yet
To The Program On: Seven QC Tools & QC Story
130 pages
Powerline Ampacity System - Theory Modeling and AP
No ratings yet
Powerline Ampacity System - Theory Modeling and AP
2 pages
Quantitative Genetics
No ratings yet
Quantitative Genetics
31 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
18 pages
Chapter1 Stats
No ratings yet
Chapter1 Stats
7 pages
Scientific Method Practice Worksheet Key
0% (1)
Scientific Method Practice Worksheet Key
2 pages
Engineering Data Analysis
No ratings yet
Engineering Data Analysis
64 pages
Lilley Modified SGS PHF
No ratings yet
Lilley Modified SGS PHF
3 pages
Introduction To Survival Analysis: Lecture Notes
No ratings yet
Introduction To Survival Analysis: Lecture Notes
28 pages
MODULE 1 STATISTICS AND DATA ANALYSIS Final
No ratings yet
MODULE 1 STATISTICS AND DATA ANALYSIS Final
9 pages
Elementary Quality Assurance Tools
No ratings yet
Elementary Quality Assurance Tools
19 pages
Performance of Woven Fabrics Containing Spandex
No ratings yet
Performance of Woven Fabrics Containing Spandex
8 pages
Inferential Statistics
100% (1)
Inferential Statistics
38 pages
Project Noah - Climatex: Accurate Rain Forecasting For The Philippines
No ratings yet
Project Noah - Climatex: Accurate Rain Forecasting For The Philippines
5 pages
P) Eiemiee (: Shot Peening Intensity Measurement
No ratings yet
P) Eiemiee (: Shot Peening Intensity Measurement
6 pages
Stock Market Analysis and Prediction Using Time Series
No ratings yet
Stock Market Analysis and Prediction Using Time Series
10 pages
Statistical Theory and Its Solutions
From Everand
Statistical Theory and Its Solutions
Pasquale De Marco
No ratings yet
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit - 1 Introduction-Statistical Inference (1)

Uploaded by

Unit - 1 Introduction-Statistical Inference (1)

Uploaded by

Introduction to Statistical

Inference, Statistical Models

A sample should generally :

Population: All 3,838 undergraduate students at Australia

• Market Research: In market research, a sample of customers is surveyed to estimate

• Quality Control: In quality control, a sample of products is tested to estimate the

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.