0% found this document useful (0 votes)

12 views31 pages

Session 3 Week 2

The document covers various statistical concepts relevant to courtroom statistics, including types of variables, descriptive statistics, population and sample parameters, and methods for calculating grouped mean and variance. It also discusses Chebyshev's Inequality, rates and standardization, and the importance of adjusting for confounders in demographic data. The session concludes with a preview of topics for the next session, including probability and risk calculations.

Uploaded by

robin.bishnoi.m

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views31 pages

Session 3 Week 2

Uploaded by

robin.bishnoi.m

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Session 3 Week 2

Nivedita Nadkarni
Statistics in the Courtroom
November 12th 2024, NLSIU
Today’s Statistical Topics
• Review of last time’s concepts
• Grouped mean
• Grouped variance
• Chebyshev’s Inequality
• Rates and Standardization
Types of variables
Variable Values Examples
type
Continuous An infinite number of Height of students,
real values in an Systolic blood pressure
interval
Binary Either 0 or 1 Presence or absence of
diabetes
Categorical / Nominal Any number of Age groups like 0-14,
categories 15-25 etc..
Ordinal Ordered categories Pain scale, Likert scale
Descriptive Statistics
• Descriptive statistics can help in summarizing data in the form of
simple quantitative measures such as percentages or means or in the
form of visual summaries such as histograms and box plots.
• Mean (s.d) / Median (IQR) / Mode are the typically used measures for
continuous data.
• Mean (s.d) denotes mean and the corresponding standard deviation.
• Median (IQR) represents the median value and the inter-quartile range.
• The mode is the value that occurs most frequently in the dataset.
It’s useful for identifying the most common value in a dataset.
Population and sample: parameters and
estimates
• N is population size; n is sample size.
• X, Y or Z are used to denote random variables.
• Xi, i=1,..,n is used to denote the random sample.
• 𝑥ҧ denotes the sample mean.
• s denotes the sample standard deviation.
• Sample mean:
Population and sample: parameters and
estimates
• Variance is denoted by s2
σ𝑁
𝑖=1 𝑥𝑖 −μ
2
• Population variance 𝜎2 =
𝑁

σ𝑛
𝑖=1 𝑥𝑖 −𝑥ҧ
2
• Sample variance s2 =
(𝑛−1)
• The sample standard deviation is the square root of the sample
variance.
• n independent observations are used to obtain the sample mean 𝑥ҧ .
• Hence, for estimating the second parameter 𝜎2, we just have (n-1)
observations available for estimation. This is referred to as degrees of
freedom.
Mean for binary data
• If I have only 0’s and 1’s, so just binary data, I can arithmetically
calculate the “mean” of the data.
• However, it gives the proportion of the 1’s in the data.
• Therefore, though the method by itself may be used to calculate the
proportion,
• We can just count the number of 1’s and divide by the total which by
definition is the proportion of 1’s in the data.
• Personally, the idea of applying the mean function to binary data is not
something I would recommend.
• The example in the book is to just demonstrate that arithmetically it can
be done. Would not be used or recommended in practice.
Dispersion and Distribution
• Dispersion is just the extent to which numerical data is likely to
vary about an average value.
• Distribution refers to the theoretical function that shows the
possible values a variable can take and how frequently they occur.
For example, see the figure on page 43.
• It shows the graph of two different distributions. Both have the
same mean, median and mode but different variance.
• Which effectively means that the dispersion measure is different
between the two distributions. One is spread more widely across
the interval compared to the other.
Problems
• Complete problem 7 from last time,
• For both the variables
• Let us discuss once everyone has finished solving the problem.
Grouped mean, variance and an inequality
• We know how to calculate the mean, but how about the grouped
mean?
• Consider the example on page 49.
• We can obtain the mean using the standard technique which gives
mean = 8.6 years.
• Now, if you notice, a few values occur multiple times in the same
table.
• Three 5’s, one 6, one 8, three 11’s and two 12’s.
• Therefore, we can compute the sum as in page 50.
• The mean can then be obtained by dividing this sum by 10.
Grouped mean, variance and an inequality
• This technique is useful as it can be applied to data that have
been summarized in the form of a frequency distribution.
• Data that are organized in this way are referred to as grouped data.
• Even if the original data is unavailable, and original values are not
known, we are able to determine the number of measurements
that fall into each specified interval.
• Refer to table 3.4 on page 51.
σ𝑘
𝑖=1 𝑚𝑖𝑓𝑖
• 𝑥ҧ = , k=number of intervals, mi is the midpoint of
σ𝑘
𝑖=1 𝑓𝑖
• the ith interval and fi is frequency associated with the ith interval.
Grouped mean, variance and an inequality
• Therefore, the grouped mean is actually a weighted average of the
interval midpoints;
• Each midpoint is weighted by the frequency of observations within
the interval.
• What would be the variance or standard deviation of this data?
σ𝑘 2
𝑖=1(𝑚𝑖 −𝑥) 𝑓𝑖
• 𝑠2 = ,
σ𝑘
𝑖=1 𝑓𝑖 −1

• Where all terms are as defined for the mean.

The Bell Curve and the
Empirical Rule

• In a bell-shaped distribution
with mean μ and standard deviation
σ,
• Approximately 68% of the
observations fall within one standard
deviation (σ) of the mean μ.
• Approximately 95% of the
observations fall within two standard
deviations (2σ) of the mean μ.
• Approximately 99.7% of the
observations fall within three
standard deviations (3σ) of
the mean μ.
Chebyshev’s Inequality
• The empirical rule is an approximation that applies only when the
data are symmetric and unimodal.
• If they’re not, Chebyshev’s inequality can be used instead to
summarize the distribution of values.
• This inequality is less specific than the empirical rule, but it is true
for any set of observations, no matter what its shape.
1 2
• Therefore, for any number k ≥ 1, at least [1-( ) ] of the
𝑘
measurements in the data lie within k standard deviations of their
mean.
Chebyshev’s Inequality
• For example, for k=2
• At least ¾ or 75% of the values lie within two standard deviations
of the mean.
• Equivalently, we could say that 𝑥 ± 2𝑠, encompasses at least 75%
of the observations in the group.
• Similarly, for k=3, 𝑥 ± 3𝑠, contains at least 88.9% of the
measurements.
• We can revisit the FEV1 data in table 3.1,see page 53.
• So, though conservative, this inequality allows us to use the mean
and sd of any set of data to describe the entire group!
Rates and Standardization
• Demographic data and vital statistics are numbers that are used
to characterize a population.
• Demographic data includes information such as the size of the
population and its composition by gender, race and age.
• Vital statistics describe the life of a population: dealing with
births, deaths, marriages, divorces and disease occurrence.
• Both types of data are used to describe the health status of a
population, to spot trends and make projections.
• Vital statistics are also used to make comparisons between
groups.
Rates
• Rates are used to make comparisons between groups more
meaningful.
• A rate is defined as the number of cases of a particular outcome
of interest that occur over a given time period divided by the size
of the population at that time period.
• Rate and proportion though used interchangeably, are not
synonymous.
• A proportion is a ratio in which all individuals included in the
numerator must also be included in the denominator.
• Proportions do not have a unit of measurement unlike rates.
Types of frequently used rates
• Death rate or mortality rate: Total # deaths in a time period / Total # at
risk during the same period.
• Infant mortality rate: Total # deaths among infants under 1 in a time
period / Total # of live births during the same period.
• These mortality rates we have considered are all crude rates. Why?
• A crude rate is a single number computed as a summary measure for
an entire population; it disregards differences caused by age, gender,
race and other characteristics.
• Mortality rates calculated for individual age groups are called age-
specific mortality rates.
Standardization of rates
• What is a confounder?
• Two populations from different geographical areas: one
composed entirely of males and the other only of females.
• How to be sure that the difference in the mortality rates is die to
location or some effect of gender.
• In this situation, gender is referred to as a confounder.
• Since it is associated with both geographical area and death rate,
it obscures the true relationship between these factors.
Example to motivate standardization
• Rate of impairment increases with age.
• Age is a confounder between hearing impairment and
employment as it is independently associated with each of these
quantities.
• We therefore cannot infer that the higher rate of impairment
among individuals not in the labour force is the result of some
inherent characteristic of the members of the group or simply the
effect of age.
• For a more accurate comparison, we need to consider the age-
specific impairment rates rather than the crude ones.
Direct and Indirect standardization
• Although the subgroup specific rates provide a more accurate
comparison among populations than the crude rates, if the sub-groups
were far more in number, it would be an overwhelming number of rates
to compare.
• It would be therefore convenient to be able to summarize the entire
situation with a single number calculated for each sub-population, a
number that adjusts for difference in composition.
• Two ways: direct method of standardization and the indirect method.
• Both focus on two components : population composition and
subgroup-specific rates
• Attempt is to overcome problem of confounding by holding one of
these components constant across populations.
Direct method
• The direct method of adjusting for differences among populations
focuses on computing the overall rates;
• That would result if, instead of having different distributions, all
populations being compared were to have the same standard
composition.
• Steps:
• Select the standard distribution.
• For the hearing impairment example, use the total population
questioned in the survey.
• Calculate the numbers of impairments that would have occurred in
each of the two employment status subgroups.
Direct method
• Currently employed and those not in the labour force, assuming that
each has this standard population distribution while retaining its own
individual age-specific impairment rates.
• Refer to the table on page 73
• Therefore, the age-adjusted impairment rates for each group”
• Currently employed = 5.91 per 1000
• Not in the labour force = 5.54 per 1000
• These age adjusted rates are the impairment rates that would apply if
both the currently employed and those not in the labour force had the
same age distribution as the total surveyed population.
Direct method
• After we control for the effect of age in this way, the adjusted
impairment rate for those who are employed is higher than the
adjusted rate for those who are not in the labour force.
• This is the opposite of what we observed when we looked at the
crude rates, implying that the crude rates were indeed being
influenced by the age structure of the underlying groups.
• Note that the choice of a different standard age distribution,
would have led to different adjusted impairment rates.
• This is not critical since an adjusted rate had no meaning by itself.
Direct method
• It is merely a construct that is based on a hypothetical standard
distribution;
• Unlike a crude or specific rate, it does not reflect the true impairment
rate of any population.
• Adjusted rates are meaningful only when comparing two or more
groups, and it has been shown that trends among the groups are
generally unaffected by the choice of a standard.
• If another, but reasonable age distribution were chosen for instance,
the magnitude of the difference between the adjusted impairment
rates of the two sub-groups should not change drastically even if the
rates themselves do; the currently employed would still have a slightly
higher adjusted rate of impairment.
Indirect method
• The indirect method of adjusting for differences in composition
involves the use of a set of standard age-specific impairment rates
along with the actual age composition of each sub-population
being compared.
• Use the total surveyed population as the standard.
• This time however, we calculate the number of impairments that
would have occurred in the two population subgroups if each had
taken on the age-specific impairment rates of the surveyed
population as a whole while retaining its own its own individual
age distribution.
• Refer to page 74.
Indirect method
• The observed number of hearing impairments in each employment
group by the total expected number of impairments.
• The resulting quantity is known as the standardized morbidity ratio.
• If the data pertained to deaths, then the resulting ratio would be
referred to as the standardized mortality ratio.
• Currently employed = 552/536.9 =1.03 = 103%
• Not in the labour force = 368/372.4 = 0.99 = 99%
• This indicates that the group of currently employed individuals has a
3% higher impairment rate than the surveyed population as a whole.
Indirect method
• Where as, the group not in the labour force has an impairment rate
that is 1% lower than that of the total population.
• Recall that the total surveyed population also includes the group
of individuals not currently employed.
• Application of the indirect method often concludes with a
comparison of the standardized ratios.
• Compute the actual age-adjusted impairment rates for each
group.
• These are derived by multiplying the crude impairment rate for the
total surveyed population by the appropriate standardized ratios.
Indirect method
• Currently employed: 5.80/1000 x 1.03 = 5.97 per 1000
• Not in the labour force: 5.8/1000 x 0.99 =5.74 per thousand
• With the effect of age removed, the group of currently employed
individuals is again seen to have a slightly higher adjusted rate
than those not in the labour force.
• Note that though the rates themselves are different, we arrived at
the same conclusion when the direct method of standardization
was applied.
• Let us review section 4.2.3 page 75 on the use of standardized
rates.
Next session
• Events and Probability
• Bayes’ Theorem
• Sensitivity and Specificity – Prosecutor’s fallacy and defense
fallacy
• ROC Curve
• Calculation of prevalence
• Relative Risk and Odds ratio
Prescribed reading
• Please do read:
• From Statistical Science in the Courtroom
• Interpretation of Evidence, and Sample Size Determination (pages
64-68)
• Interpreting DNA evidence: Can Probability Theory Help? (4
Sampling, sections 4.1 and 4.2)

Da Session 2
No ratings yet
Da Session 2
95 pages
CH 2
No ratings yet
CH 2
49 pages
$RELC031
No ratings yet
$RELC031
43 pages
Summarizing Data
No ratings yet
Summarizing Data
49 pages
Lecture-1 Descriptive Statistics
No ratings yet
Lecture-1 Descriptive Statistics
50 pages
Basics of Statistics
No ratings yet
Basics of Statistics
40 pages
Biostatistics - I
No ratings yet
Biostatistics - I
46 pages
Handout-A-Preliminaries (Advance Statistics)
No ratings yet
Handout-A-Preliminaries (Advance Statistics)
29 pages
Screenshot 2024-07-22 at 10.26.36 AM
No ratings yet
Screenshot 2024-07-22 at 10.26.36 AM
35 pages
Mini Exam 1 Questions
No ratings yet
Mini Exam 1 Questions
4 pages
2NUBIONormalCurve2T24 25
No ratings yet
2NUBIONormalCurve2T24 25
50 pages
Lecture 2 - Descriptive Statistics
No ratings yet
Lecture 2 - Descriptive Statistics
53 pages
Business Statistics
No ratings yet
Business Statistics
106 pages
Introduction To Biostatistics
No ratings yet
Introduction To Biostatistics
59 pages
Manm526 W1
No ratings yet
Manm526 W1
38 pages
Biostat Aguila Mission Solis
No ratings yet
Biostat Aguila Mission Solis
44 pages
2 - Central Tendency and Dispersion - SFB
No ratings yet
2 - Central Tendency and Dispersion - SFB
69 pages
Topic 2 - Descriptive - Statistics
No ratings yet
Topic 2 - Descriptive - Statistics
36 pages
Bio Statistics
No ratings yet
Bio Statistics
55 pages
Basic Biostatistics
No ratings yet
Basic Biostatistics
31 pages
Lecture 01 Introduction To Statistics PPT 06022025 095924am
No ratings yet
Lecture 01 Introduction To Statistics PPT 06022025 095924am
40 pages
Intro SRM
No ratings yet
Intro SRM
73 pages
Statistics 1
No ratings yet
Statistics 1
9 pages
Ipsita Panda-Biostats Assignment
No ratings yet
Ipsita Panda-Biostats Assignment
11 pages
Module I. Basic Calculations. Average, Standard Deviation by Excel
No ratings yet
Module I. Basic Calculations. Average, Standard Deviation by Excel
48 pages
Introduction To Biostatistics
No ratings yet
Introduction To Biostatistics
53 pages
Chapter 3 (Technical English For Statistics)
No ratings yet
Chapter 3 (Technical English For Statistics)
8 pages
Normal DistrCent Tendency Measures of Dispersion
No ratings yet
Normal DistrCent Tendency Measures of Dispersion
26 pages
Statistics 101 Study Notes
No ratings yet
Statistics 101 Study Notes
33 pages
Statistics
No ratings yet
Statistics
30 pages
Bio Statistics
No ratings yet
Bio Statistics
72 pages
Lecture 1
No ratings yet
Lecture 1
32 pages
BSP3A Group10 CHAPTER1-3.FINAL
100% (1)
BSP3A Group10 CHAPTER1-3.FINAL
81 pages
BOT 315 Slide
No ratings yet
BOT 315 Slide
20 pages
1 Intro-Statistics
No ratings yet
1 Intro-Statistics
61 pages
Statistics SS2020
No ratings yet
Statistics SS2020
12 pages
B26 Notes
No ratings yet
B26 Notes
11 pages
Unit 8. Data Analysis
No ratings yet
Unit 8. Data Analysis
69 pages
Business and Technical English: Handouts 201
No ratings yet
Business and Technical English: Handouts 201
255 pages
43hyrs Principles of Statistics 3
No ratings yet
43hyrs Principles of Statistics 3
56 pages
2statsnotes 1
No ratings yet
2statsnotes 1
24 pages
"The Gay Clubs Are It" An Analysis of Straight Women's Motivations For Frequenting Gay Bars
No ratings yet
"The Gay Clubs Are It" An Analysis of Straight Women's Motivations For Frequenting Gay Bars
26 pages
Bio Statistics
No ratings yet
Bio Statistics
97 pages
Resumo
No ratings yet
Resumo
13 pages
Chapter1 Statistics
No ratings yet
Chapter1 Statistics
17 pages
Mean, Median, Mode and Standard Deviation
No ratings yet
Mean, Median, Mode and Standard Deviation
42 pages
Reviewer Part 1
No ratings yet
Reviewer Part 1
9 pages
Micro Project Format Proposal
No ratings yet
Micro Project Format Proposal
5 pages
Halden Prison Design
No ratings yet
Halden Prison Design
19 pages
02 - Descriptive Statistics
No ratings yet
02 - Descriptive Statistics
45 pages
Biostatistics 140127003954 Phpapp02
No ratings yet
Biostatistics 140127003954 Phpapp02
47 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
Basics For Understanding
No ratings yet
Basics For Understanding
8 pages
Introduction To Bio Statistics
No ratings yet
Introduction To Bio Statistics
53 pages
Jurnal 2
No ratings yet
Jurnal 2
10 pages
WK 1b Biostat
No ratings yet
WK 1b Biostat
38 pages
Basic Statistics: Populations and Samples
No ratings yet
Basic Statistics: Populations and Samples
10 pages
Course Introduction Inferential Statistics Prof. Sandy A. Lerio
No ratings yet
Course Introduction Inferential Statistics Prof. Sandy A. Lerio
46 pages
Unit II: Basic Data Analytic Methods
No ratings yet
Unit II: Basic Data Analytic Methods
38 pages
Statistics
100% (1)
Statistics
11 pages
Statistics Is The Study of The Collection, Organization, Analysis, Interpretation, and
No ratings yet
Statistics Is The Study of The Collection, Organization, Analysis, Interpretation, and
18 pages
Action Evaluation
No ratings yet
Action Evaluation
3 pages
Paano Gumawa NG Thesis Sa Filipino
100% (3)
Paano Gumawa NG Thesis Sa Filipino
7 pages
Levy - Psychology and Foreign Policy Decision-Making
No ratings yet
Levy - Psychology and Foreign Policy Decision-Making
33 pages
Concept of Quantitative Revolution in Geography
50% (2)
Concept of Quantitative Revolution in Geography
3 pages
Cureus 0016 00000062459
No ratings yet
Cureus 0016 00000062459
10 pages
Term Paper Law
100% (1)
Term Paper Law
8 pages
BKTReport
No ratings yet
BKTReport
10 pages
Statistics 1 (Final) / Orthodontic Courses by Indian Dental Academy
No ratings yet
Statistics 1 (Final) / Orthodontic Courses by Indian Dental Academy
15 pages
CV Vipin - Fy24
No ratings yet
CV Vipin - Fy24
2 pages
MKF 2121 Assignment 1
No ratings yet
MKF 2121 Assignment 1
5 pages
Chat GPT FDP Online
No ratings yet
Chat GPT FDP Online
2 pages
IIT Ropar CV Template 4
No ratings yet
IIT Ropar CV Template 4
1 page
Socio Critical GELS
No ratings yet
Socio Critical GELS
13 pages
Meaning, Measurement, and Assessment of Vocational Interests For Career Intervention
No ratings yet
Meaning, Measurement, and Assessment of Vocational Interests For Career Intervention
21 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
4 pages
Profitable Mathematical Trading That Works in The Market Including Bitcoin
No ratings yet
Profitable Mathematical Trading That Works in The Market Including Bitcoin
3 pages
Argumentative Essay Sample
No ratings yet
Argumentative Essay Sample
4 pages
Inventory Control and Performance of Nigeria Manufacturing Companies
No ratings yet
Inventory Control and Performance of Nigeria Manufacturing Companies
8 pages
Key Officials
No ratings yet
Key Officials
2 pages
Fruit Operation Lesson Plan: Bloom's Taxonomy
No ratings yet
Fruit Operation Lesson Plan: Bloom's Taxonomy
3 pages
CHN Film
No ratings yet
CHN Film
2 pages
IV - Understanding Ways To Collect Data
100% (8)
IV - Understanding Ways To Collect Data
6 pages
ORALCOMM S1 Q2 - Jan.12
No ratings yet
ORALCOMM S1 Q2 - Jan.12
6 pages
Confidence Interval For Different of Mean: X X Z N N
No ratings yet
Confidence Interval For Different of Mean: X X Z N N
1 page
Citi - Test Lead - TCoE
No ratings yet
Citi - Test Lead - TCoE
2 pages
Statistics: An Introduction and Overview
No ratings yet
Statistics: An Introduction and Overview
51 pages
Sales Account Manager Biotechnology in Boston MA Resume Cynthia Smith
No ratings yet
Sales Account Manager Biotechnology in Boston MA Resume Cynthia Smith
2 pages
Statistics II Essentials
From Everand
Statistics II Essentials
Emil Milewski
2.5/5 (1)
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Session 3 Week 2

Uploaded by

Session 3 Week 2

Uploaded by

Session 3 Week 2

• Where all terms are as defined for the mean.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.