0% found this document useful (0 votes)

48 views2 pages

QR Midterm Memo

This document defines key concepts in statistics including data definition, variables, populations and samples, descriptive and inferential statistics, and probability functions. It discusses how data is collected and stored, and different types of variables. It also covers topics like statistical dependence, sampling bias, association vs causation, and normal distributions. Parameters and statistics are distinguished as summaries of populations and samples respectively.

Uploaded by

Markus H.

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views2 pages

QR Midterm Memo

Uploaded by

Markus H.

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Data Definition Collection of information, a set of measurements taken on a set of ind.

units Shape - Skewness: measures lack of symmetry

Data Storage & Data is usually stored and presented in a dataset comprised of variables measured on Positive Skew Symmetrical Distr. Negative Skew
Presentation cases
Definition Variable - Characteristic of observed statistical unit; Can have different values
- Values: categorical vs. numeric
Categorical vs. - Categorical - Kurtosis: heavy-tailed/light-tailed vs. normal distr.
numeric o Nominal: e.g. gender
o Ordinal e.g. educational level
- Numeric
o Discrete: e.g. #ppl. In household
Variability Statistic Parameter
o continuous: e.g. income
- Variance 𝑠2 =
1
∑𝑁 2 1
𝜎 2 = ∑𝑁 2
Can be dependent on measurement, e.g. categorical variable can be coded as numeric 𝑁−1 𝑖=1(𝑥𝑖 − 𝑥̅ ) 𝑁 𝑖=1(𝑥𝑖 − 𝜇)
- Standard Dev.
sometimes 𝑠 = √𝑠 2 𝜎 = √𝜎 2
Population & Sample Population: Total observations possible, Five Number Median: p=50, LQ: p=25, UP p=75,
Definition Sample: A set of observations from sample summary: Min, Interquartile range: 𝐼𝑄𝑅 = 𝑈𝑄 – 𝐿𝑄,
Statistical Inference Using the sample data to make inferences (learn) about the population 25%,50%,75%,max 𝑅𝑎𝑛𝑔𝑒 = 𝑀𝑎𝑥 − 𝑀𝑖𝑛
Parameter vs - parameter: summary of population Z-Score 𝑥 −𝜇
𝑧𝑖 = 𝑖 :
𝜎
Statistic - statistic: summary of sample data
deviation of observation from average divided by 𝜎
Descriptive vs. - descriptive:
Statistical - Covariance: measures the relationship between to variables & their tendency to
Inferential Statistics o describing one variable
Dependence, for vary together
o describing the relation ship between two variables
more than 1 Variable 𝐶𝑂𝑉(𝑋, 𝑌) = 𝐸[(𝑥 − 𝜇𝑥 )(𝑌 − 𝜇𝑦 )]
- inferential statistics: sampling distribution
- Correlation: association not driven by arbitrary changes
Sampling Bias - If the sample that is selected does not accurately represent the population 𝐶𝑂𝑉(𝑋,𝑌)
because it was selected in a way such that it is biased 𝜌(𝑋, 𝑌) = 𝜎𝑋 𝜎𝑌
- RANDOM sampling; Properties |𝜌(𝑋, 𝑌) ≤ 1| |𝜌(𝑋, 𝑌) ≤ 1| 𝑋⊥𝑌→ 𝜌=0
How to avoid o however even with random sample (without sampling bias) data can be Linear relation Not vice versa
biased 𝜌 provides a measure of the extent to which X and Y are linearly related
Explanatory and - Explanatory variable (X): Understand or predict values response variable: Probability Function - can be used to present distribution of random variable
response variables - Response Variable (Y): Values predicted by explanatory variable - Probability Function → Discrete random var.
 Remember by f(x)=y - Probability Densitiy Function → continuous random v.
Association vs. - Association: values of two variables are related Prob. Function (pf) - Discrete Random Variable: X can take finite number of k of different values:
Causation - Causal association: changing value of explanatory variable influences value of 𝑥1 , … , 𝑥𝑘
response variable 𝑓(𝑥) = Pr(𝑋 = 𝑥)
Association may not be causal Prob Densitiy - Continuous Random Variable: X can assume every value in an interval
Confounding - Third variable that is associated with both explanatory and response variable Function (pdf) 𝑏
Pr(𝑎 < 𝑋 ≤ 𝑏) = ∫𝑎 𝑓(𝑥)𝑑𝑥
variable - Is explanatory variable for both X and Y ∞
(1) 𝑓(𝑥) ≥=, for all x; (2) ∫ 𝑓(𝑥)𝑑𝑥 = 1
- Causal association cannot be determined if there is a confounding variable −∞
Causation: Common Cause Common effect Expectation of Discrete X: Continuous X:
∞
X causes Y Xand Y are both caused by 3. var. Z X and Y both predict the 3. var. Z pf & pdf 𝐸(𝑋) = ∑𝑎𝑙𝑙 𝑥 𝑥 ∗ 𝑓(𝑥) 𝐸(𝑋) = ∫−∞ 𝑥𝑓(𝑥)
𝐸(𝑥): expected value / mean of X
Observational study - Observational study: solely observation of values as they exist Variance of X 𝑉𝑎𝑟(𝑋) = 𝐸[(𝑋 − 𝜇)2 ]
vs experiment - Experiment: controlling one or more explanatory var. - Measure of the spread of the distribution around its mean (𝜇), Large variance:
𝑛
Relative Frequency 𝑟𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝐹𝑟𝑒𝑞. = 𝑁𝑖 wide spread around 𝜇
Normal Distribution: 1
(𝑥−𝜇)2
Summary statistics Summarizing observations to communicate as much information as possible as simply 𝑓(𝑥|𝜇, 𝜎 2 ) = 𝑒 − 2𝜎2
95% rule: 95% of √2𝜋𝜎 2
as possible If X is normally distributed,
values fall within 2
Central Tendency - Mean: most important descriptive statistic for center - Pr(𝜇 − 𝜎 ≤ 𝑋 ≤ 𝜇 + 𝜎) ≈ 68.27%
𝑦1 +𝑦2 +⋯+𝑦𝑛 ∑𝑛 ∑𝑛
standard deviations
𝑖=1 𝑦𝑖 𝑖=1 𝑦𝑖 - Pr(𝜇 − 2𝜎 ≤ 𝑋 ≤ 𝜇 + 2𝜎) ≈ 95.45%
Statistic 𝑥̅ = 𝑛
= 𝑛
Parameter 𝜇̂ = 𝑛 of the mean
- Pr(𝜇 − 3𝜎 ≤ 𝑋 ≤ 𝜇 + 3𝜎) ≈ 99.73%
- Median: middle measurement of ordered sample Point Estimate - point estimate: of population parameter is single value
- Mode: most common observation - interval estimate: interval defined by two numbers, between a parameter is
expected to lie
Sampling - Sample statistic: random variable; varies across samples
Distribution - Sampling distribution: probability distribution of a given random-sample-based
statistic Type I and Type II - Type I Error: Rejecting the Null Hypothesis although it is true, signifccance level 𝛼
- → Statistical Inference: Based on sampling distribution, inferences on population Error is the tolerated Type I error rate
can be made - Type I Error: Not rejecting the Null Hypothesis although it is false
Sampling - Center: Sampling distribution will be centered around the population parameter P-Value - Test statistic: random variable calculated form sample data and used in
Distribution: - Shape: If the sample size is large enough, sampling distribution will be symmetric → reject 𝐻0 if: hypothesis test
Characteristics & bell-shaped, like normal- /t-distribution p-value< 𝛼 - Measures degree of difference between sample of data and 𝐻0
Variability of Sample - Standard Error of a statistic, SE: standard deviation of the sample statistic - P-value is the probability that the test statistic equals the observed value or a
Statistic (SE) - Measures how much statistic varies from sample to sample value even more extreme in the direction predicted by 𝐻𝑎
Sample size n & - Increasing n, decreasing SE Standardizing - X (raw score) random variable with mean 𝜇 and standard deviation 𝜎 > 0
Standard Error SE Random Variables - Standardizing X: 𝒁 = 𝝈
𝑿−𝝁

Central Limit - Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample of size n darwn from a population - Standardized normal distribution will be standard normally distr.: N(0,1)
Theorem distribution with mean 𝜇 and variance 𝜎 2 . Suppose we are interested in 𝑋̅, the Probability of tail - Every normal distribution 𝑁(𝜇, 𝜎 2 ) can be standardized, therefore we only need
sample mean. Then: as n approaches +∞ area one Z-table which shows the probability under the standard normal distribution,
√𝑛(𝑋̅−𝜇) 𝑑 𝑑 𝜎2
→ 𝑁(0,1), or equivalently, 𝑋̅ → 𝑁(𝜇, ) e.g. 𝑁(4,4) → 𝑍 =
𝑋−4
𝜎 𝑛 √4
Implications of ∑
𝜇 ≈ 𝑋̅ = 𝑖=1
𝑛
𝑋
𝜎2 ≈ 𝑠2 =
1
∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 - Therefore, to find the probability of the Tail area e.g. 𝑃(𝑋 ≤ 𝒂): 𝑃(𝑍 ≤
𝒂−𝜇
)→
𝑛−1
Central Limit 𝑛 𝜎
If n approaches +∞, SE of sample mean will be
𝑠 check the Z table to find value
Theorem √𝑛
Steps of Significance 1. Assumptions
Characteristics of Unbi𝛔 asedness: Efficiency: Consistency:
Test (1)random sample, (2)quantitative variable, (3) pop. Distr. Normal distr.
Good Estimator Expected value: identical Low Variance As sample size n
(one sided t-test) 2. Stating 𝑯𝟎 and 𝑯𝒂
to population parameter increases, estimator will
Two sided test One sided test
approach the parameter
𝐻0 : 𝜇 = 𝜇0 , vs. 𝐻𝑎 : 𝜇 ≠ 𝑦0 𝐻0 : 𝜇 = 𝜇0 , vs. 𝐻𝑎 : 𝜇 < 𝑜𝑟 > 𝑦0
Approximation using - 𝒏 ≤ 𝟑𝟎: t-distribution, df=n-1
3. Calculate test statistic
normal / t- - 𝒏 > 𝟑𝟎: normal distribution (𝑥̿ − 𝜇0 ) 𝑠
distribution - Degrees of freedom (df): number of values in the final calculation of a statistic 𝑡= , 𝑠𝑒 =
𝑠𝑒 √𝑛
that are free to vary, most of the time: 𝑁 − 1 4. Found p-value based on the test statistic
Parameter vs Parameter Statistics Two sided: p-value=2*tail probability; one sided: p-value=tail probability
Statistic Mean 𝜇 𝑥̅ 5. Compare p-value with pre chosen 𝜶 & 6. Make Conclusions
Proportion 𝑝 𝑝̂ If 𝑃 ≤ 𝛼, we reject null hypothesis
Standard Deviation 𝜎 s Comparing Means - Comparison of means between two groups
Correlation 𝜌 r between two groups - In the two independent samples test, steps are the same but there are
Types of Population Distribution: Sample Distribution: Sampling Distribution: differences in steps 1. Assumptions and 3. Test statistic
Distributions Probability distribution of prob. Distr. of sample prob. Distr. of sample 1. Assumptions
population from the population statistics Additional assumption: 𝜎 2 equals each other and population 𝜎12 = 𝜎22 = 𝜎 2
Interval Estimate - Gives a range of plausible values for a population parameter 2. Test statistic
- Common interval estimate: 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 ± 𝑚𝑎𝑟𝑔𝑖𝑛 𝑜𝑓 𝑒𝑟𝑟𝑜𝑟, ̅𝑦̅̅1̅−𝑦
̅̅̅1̅ 𝑠2 𝑠2
We use: 𝑡 = 𝑠𝑒
, 𝑠𝑒 = √𝑛1 + 𝑛2 , 𝑑𝑓 = (𝑛1 − 1) + (𝑛2 − 1)
- margin of error: precision of the sample statistic as point estimate for the 1 2

parameter; determined using variability of the sampling distribution R-Programming

95%-CI - A 95% Confidence Interval will contain true parameter for 95% of samples Defining something: <-
Package installing and Loading install.packages(), e.g. haven library(), to load packages
Constructing a CI - Sample statistics (𝜃̂), e.g. mean/proportion
Working directory setwd(), to set the working directory to a specified location
𝑠 𝑝̂(1−𝑝̂)
- SE of sample mean: 𝜎̂ = , SE of sample proportion: 𝜎̂ = √ Data Import read_dta(), read stata files view(), view contents of data frame
√𝑛 𝑛
class(), checks data type [class] summary(), summary of variables
- CI: [𝜃̂ − 𝑧𝛼 ∗ 𝜎̂, 𝜃̂ + 𝑧𝛼 ∗ 𝜎̂]
2 2 Describing Variables coding and distribution
Hypothesis Testing - Goal of stat. inference: draw conclusions about population, by (dis)proving factor(), creates a new factor variable with custom labels
hypothesis Creating Tables table(), creates a frequency table for CrossTable(), creates cross-
Statistical Test - Null hypothesis 𝐻0 : Claim that there is no effect/difference, we want to disprove a variable tabulation table for two variables
Hypotheses - Alternative hypothesis 𝐻𝑎 : Claim that we want to search evidence for Data Visualisation ggplot(), create visualizations geom_bar(), create scatter plots for
Statistical Test Two sided test One sided test cat. variables
One- vs two-sided 𝐻0 : 𝜇 = 𝜇0 , vs. 𝐻𝑎 : 𝜇 ≠ 𝑦0 𝐻0 : 𝜇 = 𝜇0 , vs. 𝐻𝑎 : 𝜇 < 𝑜𝑟 > 𝑦0 hist(), create histograms geom_point(), var num. variables labs(), add labels to the axes
Stat. Significance - Results are as extreme to be unlikely to occur by random chance alone (assuming
𝐻0 is true), we say statistically significant
- If sample statistic is statistically significant, we have convincing evidence against
𝐻0 and thus, in favor of 𝐻𝑎

Newbold, P. (2019) - Statistics For Business and Economics. 9thed, Pearson
No ratings yet
Newbold, P. (2019) - Statistics For Business and Economics. 9thed, Pearson
20 pages
Anger Buddhist
100% (2)
Anger Buddhist
73 pages
It0089 Finalreviewer
100% (1)
It0089 Finalreviewer
143 pages
History of Corrections
No ratings yet
History of Corrections
17 pages
HEALTH 7 Q2 Modules 1 To 7
No ratings yet
HEALTH 7 Q2 Modules 1 To 7
215 pages
01 - Intro To Statistics
No ratings yet
01 - Intro To Statistics
23 pages
Lecture 1 - Introduction To Statistics
No ratings yet
Lecture 1 - Introduction To Statistics
41 pages
Stats Reviewer
No ratings yet
Stats Reviewer
16 pages
Analyzing The Decentralization of Health Systems in Developing Countries: Decision Space, Innovation and Performance
No ratings yet
Analyzing The Decentralization of Health Systems in Developing Countries: Decision Space, Innovation and Performance
15 pages
Ecs Notes
No ratings yet
Ecs Notes
10 pages
Rick Warren, Purpose Driven Life: Section 36
100% (1)
Rick Warren, Purpose Driven Life: Section 36
22 pages
Accounting For Share Issuance
No ratings yet
Accounting For Share Issuance
19 pages
Untitled Document (3) 2
No ratings yet
Untitled Document (3) 2
3 pages
Psychology in Modern India
100% (1)
Psychology in Modern India
19 pages
Begreber Note For Statistics
No ratings yet
Begreber Note For Statistics
17 pages
Cheatsheet Summary Made
No ratings yet
Cheatsheet Summary Made
3 pages
Prob Stat Definition
No ratings yet
Prob Stat Definition
2 pages
Sutures and Suturing PDF
0% (1)
Sutures and Suturing PDF
56 pages
STAT Vocab
No ratings yet
STAT Vocab
15 pages
STAT100 - Full Course Notes
No ratings yet
STAT100 - Full Course Notes
27 pages
Lecture Note On Biostatistics
No ratings yet
Lecture Note On Biostatistics
74 pages
Satistics
No ratings yet
Satistics
18 pages
Managing Risksin PPPProjects Thecaseof Izmit Bay Suspension Bridge
No ratings yet
Managing Risksin PPPProjects Thecaseof Izmit Bay Suspension Bridge
9 pages
ECONOMICS SEM 4 Notes Sakshi
No ratings yet
ECONOMICS SEM 4 Notes Sakshi
10 pages
AP Stats Study Guide
No ratings yet
AP Stats Study Guide
17 pages
STATISTICS AND PROBABILITY Revised
No ratings yet
STATISTICS AND PROBABILITY Revised
32 pages
School of Law, Mumbai: Digital Forgery
No ratings yet
School of Law, Mumbai: Digital Forgery
9 pages
MAT 361 Lecture 15 16
No ratings yet
MAT 361 Lecture 15 16
40 pages
What Is Leukemia?: Ramoran, Pamela Grace L
No ratings yet
What Is Leukemia?: Ramoran, Pamela Grace L
4 pages
BPSY 55 Terminologies
No ratings yet
BPSY 55 Terminologies
6 pages
Crypto Exchange: Marketing Options
No ratings yet
Crypto Exchange: Marketing Options
6 pages
Answers To Exercises For Section 2.2
No ratings yet
Answers To Exercises For Section 2.2
7 pages
Understanding The Self
No ratings yet
Understanding The Self
7 pages
Bibliography I. Books
No ratings yet
Bibliography I. Books
3 pages
Theory of Architecture
No ratings yet
Theory of Architecture
3 pages
BLA Lec-4
No ratings yet
BLA Lec-4
20 pages
Psychology 117 Study Guide
100% (3)
Psychology 117 Study Guide
41 pages
Review of Chapters 1-5
No ratings yet
Review of Chapters 1-5
21 pages
Untitled Document
No ratings yet
Untitled Document
3 pages
Section 6 Data - Statistics For Quantitative Study
No ratings yet
Section 6 Data - Statistics For Quantitative Study
142 pages
Basic Statistics
No ratings yet
Basic Statistics
23 pages
SHS Fundamentals of Statistics PDF
No ratings yet
SHS Fundamentals of Statistics PDF
210 pages
What Is Statistics?: "Statistics Is A Way To Get Information From Data"
No ratings yet
What Is Statistics?: "Statistics Is A Way To Get Information From Data"
220 pages
ST Formula Sheet Midterm
No ratings yet
ST Formula Sheet Midterm
4 pages
Model Test 3 Lexico-Grammar
No ratings yet
Model Test 3 Lexico-Grammar
4 pages
CRP Phase 4-Analyzing and Interpreting Quantitative Data
No ratings yet
CRP Phase 4-Analyzing and Interpreting Quantitative Data
24 pages
Concrete Shear Wall With Complete Details Ram Concept
100% (1)
Concrete Shear Wall With Complete Details Ram Concept
108 pages
Chapter 1
No ratings yet
Chapter 1
23 pages
STAB22 Lecture's Notes
No ratings yet
STAB22 Lecture's Notes
64 pages
Unit 1 - Examining Distributions
No ratings yet
Unit 1 - Examining Distributions
80 pages
Stas Tics
No ratings yet
Stas Tics
129 pages
STA301 IMP Notes Headings and Some Questions Answers Prepared by
No ratings yet
STA301 IMP Notes Headings and Some Questions Answers Prepared by
32 pages
Data Management (1) (1) - Compressed
No ratings yet
Data Management (1) (1) - Compressed
46 pages
Notes For Assignment
No ratings yet
Notes For Assignment
1 page
What Is Abstract About The Art of Music - K. L. Walton (1988)
100% (1)
What Is Abstract About The Art of Music - K. L. Walton (1988)
15 pages
Bio Statistics
No ratings yet
Bio Statistics
72 pages
The United Theological College, Bengaluru: TH TH RD
No ratings yet
The United Theological College, Bengaluru: TH TH RD
2 pages
Statistics For Data Science
No ratings yet
Statistics For Data Science
30 pages
STATISTICS (Tanya) PG 1 - 28
No ratings yet
STATISTICS (Tanya) PG 1 - 28
35 pages
Collection of Data Part 2 Edited MLIS
No ratings yet
Collection of Data Part 2 Edited MLIS
45 pages
Statistics Introduction
No ratings yet
Statistics Introduction
26 pages
Statistics Lecture 1
No ratings yet
Statistics Lecture 1
20 pages
Lecture 1-Statistics-New
No ratings yet
Lecture 1-Statistics-New
8 pages
Week 01 Introduction
No ratings yet
Week 01 Introduction
33 pages
STATISTICS Lecture 1
No ratings yet
STATISTICS Lecture 1
5 pages
All The Great Scholars
No ratings yet
All The Great Scholars
97 pages
Digital Image Forensics
No ratings yet
Digital Image Forensics
6 pages
Probstats Reviewer
No ratings yet
Probstats Reviewer
3 pages
Statistics and Probability - Midterm Reviewer
No ratings yet
Statistics and Probability - Midterm Reviewer
13 pages
2statsnotes 1
No ratings yet
2statsnotes 1
24 pages
CPR Checklist
No ratings yet
CPR Checklist
2 pages
Midterms Gec Math Adooooor
No ratings yet
Midterms Gec Math Adooooor
6 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
VERBS in English CG
No ratings yet
VERBS in English CG
3 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
4 pages
Statistics and Probability - Midterm Reviewer
No ratings yet
Statistics and Probability - Midterm Reviewer
12 pages
Stats Reviewer
No ratings yet
Stats Reviewer
5 pages
The Secret Behind Support and Resistance
94% (17)
The Secret Behind Support and Resistance
76 pages
Unit II: Basic Data Analytic Methods
No ratings yet
Unit II: Basic Data Analytic Methods
38 pages
A. Variables:: Types of Distributions
No ratings yet
A. Variables:: Types of Distributions
10 pages
Statistical Concepts and Principles
No ratings yet
Statistical Concepts and Principles
37 pages
Stats Midterms Cheat Sheet
No ratings yet
Stats Midterms Cheat Sheet
3 pages
Mathematics Statistics
No ratings yet
Mathematics Statistics
4 pages
ISDS 361A - Cheat Sheet Exam 1 PDF
No ratings yet
ISDS 361A - Cheat Sheet Exam 1 PDF
2 pages
PLC Questions Liii
No ratings yet
PLC Questions Liii
209 pages
Religion in Ancient Egypt - Gods, Myths, and Personal Practice
No ratings yet
Religion in Ancient Egypt - Gods, Myths, and Personal Practice
236 pages
3.2, Machine in The Garden
No ratings yet
3.2, Machine in The Garden
6 pages
Mba Statistics Midterm Review Sheet
No ratings yet
Mba Statistics Midterm Review Sheet
1 page
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Exercises of Advanced Statistics
From Everand
Exercises of Advanced Statistics
Simone Malacrida
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

QR Midterm Memo

Uploaded by

QR Midterm Memo

Uploaded by

Data Definition Collection of information, a set of measurements taken on a set of ind.

units Shape - Skewness: measures lack of symmetry

parameter; determined using variability of the sampling distribution R-Programming

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.