0% found this document useful (0 votes)
5 views9 pages

Assignment 1 s4b

The document discusses various statistical exercises and concepts, emphasizing the importance of distinguishing correlation from causation, as seen in the analysis of health and academic performance data. It also covers sampling techniques, biases in telemarketing, and descriptive statistics related to airline ticket purchases and sales data. Additionally, it explores probability concepts in the context of biometric security devices and demographic studies on smoking prevalence.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views9 pages

Assignment 1 s4b

The document discusses various statistical exercises and concepts, emphasizing the importance of distinguishing correlation from causation, as seen in the analysis of health and academic performance data. It also covers sampling techniques, biases in telemarketing, and descriptive statistics related to airline ticket purchases and sales data. Additionally, it explores probability concepts in the context of biometric security devices and demographic studies on smoking prevalence.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

International University – Statistics for Business

Tuesday Morning Class – Group 4


Huỳnh Hương Thuỳ - BABAIU23270
ASSIGNMENT 1 (2024-2025)
CHAPTER 1
Exercise 1.22: A recent study showed that women who lived near a
freeway had an unusually high rate of rheumatoid arthritis. Sarah said,
“They should move away from freeways.” Is there a fallacy in Sarah’s
reasoning? Explain.
Scenario: A study showed that women living near freeways had a higher rate of
rheumatoid arthritis. Sarah responded, "They should move away from freeways."
Fallacy in Reasoning: Yes, there is a fallacy in Sarah’s reasoning —
specifically, confusing correlation with causation.
Chapter 1 emphasizes the importance of critical thinking in statistical
interpretation and warns against common statistical pitfalls such as:
 Assuming a cause-effect relationship from observational data.
 Ignoring lurking or confounding variables.
Sarah assumes that proximity to freeways causes rheumatoid arthritis, without
considering:
 Whether other factors (like pollution, socioeconomic status, or access to
healthcare) may be involved.
 Whether the relationship is causal or merely correlational.
Correct interpretation: The data suggests an association but does not prove that
living near freeways causes the disease. More controlled, experimental research
would be needed to establish causation.
Exercise 1.25: A research study showed that 7 percent of “A” students
smoke, while nearly 50 percent of “D” students do.
(a) List in rank order six factors that you think affect grades. Is
smoking on your list?
1. Study habits
2. Attendance/class participation
3. Time management
4. Parental/peer support
5. Sleep and health habits
6. Motivation
Smoking is not on this list directly, because it is likely an indirect factor at best.
(b) If smoking is not a likely cause of poor grades, can you
suggest reasons why these results were observed?
Yes. Several explanations exist:
 Confounding variables: Students who smoke may also engage in other
behaviors that correlate with lower academic performance (e.g., skipping
class, lack of motivation).
 Lifestyle correlation: Smoking could be part of a broader pattern of risk-
taking or stress-coping behavior that affects school performance.
 Reverse causality: Students with poor academic outcomes may experience
more stress or peer pressure, leading to higher smoking rates.
Thus, smoking may be a marker for other underlying issues rather than a direct
cause of poor grades.
(c) Assuming these statistics are correct, would “D” students
who give up smoking improve their grades? Why or why not?
Not necessarily. Giving up smoking alone would not directly improve academic
performance, unless the act of quitting is accompanied by:
 Healthier lifestyle choices
 Improved study habits
 Reduced stress or better time management
This again ties back to Chapter 1’s stress on not overinterpreting statistical
relationships — correlation is not causation.
CHAPTER 2
Exercise 2.46: The General Accounting Offi ce conducted random
testing of retail gasoline pumps in Michigan, Missouri, Oregon, and
Tennessee. The study concluded that 49 percent of gasoline pumps
nationwide are mislabeled by more than one-half of an octane point.
What kind of sampling technique was most likely to have been used in
this study?
The most likely sampling technique used was cluster sampling.
 The GAO tested pumps in specific states: Michigan, Missouri, Oregon, and
Tennessee — these are geographical clusters.
 Cluster sampling involves selecting entire geographic regions or
groups (e.g., states or cities) at random, then testing all or a random sample
of units within those regions.
This method is efficient and cost-effective when conducting national studies,
especially where travel and logistics matter.
Exercise 2.57: Households can sign up for a telemarketing “no-call
list.” How might households who sign up differ from those who don’t?
What biases might this create for telemarketers promoting (a) financial
planning services, (b) carpet cleaning services, and (c) vacation travel
packages?
Households that sign up for the no-call list likely:
 Value privacy more.
 Are more affluent, educated, or internet-savvy
 May already have established relationships with service providers and don’t
want unsolicited offers.
(a) Financial Planning Services
 Bias: You’ll miss affluent, financially cautious individuals — ironically, your
ideal clients.
 Effect: Marketing appears less effective than it is because it misses the
most promising leads.
(b) Carpet Cleaning Services
 Bias: Households that use professional cleaning services may be less
responsive to cold calls and more likely to block them.
 Effect: You may over-target lower-income segments and under-represent
real demand.
(c) Vacation Travel Packages
 Bias: People who travel often or can afford leisure travel may be more
privacy-conscious, hence less likely to accept cold calls.
 Effect: Telemarketers could wrongly conclude there’s little interest in travel
deals, missing high-potential clients.
CHAPTER 3
Exercise 3.31: Use the attached data file GPA1.dta instead, with hsGPA
on the X-axis and colGPA on the Y-axis.
Descriptive Statistics
hsGPA (X- colGPA (Y-
Metric axis) axis)
Count 141 141
Mean 3.4 3.06
Std Dev 0.32 0.37
Min 2.4 2.2
25th
Percentile 3.2 2.8
Median 3.4 3
75th
Percentile 3.6 3.3
Max 4 4
Exercise 3.33: Use the attached data file LAWSCH85.dta instead, with
libvol on the X-axis and salary on the Y-axis.

Interpretation
 There appears to be a positive association — as the number of library
volumes increases, the median salary tends to rise.
 However, the relationship is not perfectly linear, and there’s
noticeable variability, especially among schools with similar library sizes.
 There may be diminishing returns at the higher end — after a certain point,
more books don't significantly increase salaries.
CHAPTER 4
Exercise 4.65: How many days in advance do travelers purchase their
airline tickets? Below are data showing the advance days for a sample of
28 passengers on United Airlines Flight 815 from Chicago to Los Angeles.
(a) Calculate the mean, median, mode, and midrange.
 Mean: 26.71 days
 Median: 14.5 days
 Mode: 0 days
 Midrange: 124.5 days
b) Calculate the quartiles and midhinge.
Q1 (25th percentile): 7.75
Q2 (Median): 14.5
Q3 (75th percentile): 20.25
Midhinge: 14.0
c) Why can’t you use the geometric mean for this data set?
You cannot use the geometric mean because the data contains zeros, and the
geometric mean is only defined for positive numbers. Multiplying by zero results
in a product of zero, invalidating the geometric mean.
11 7 11 4 15 14 71 29 8 7 16 28 17 249
0 20 77 18 14 3 15 52 20 0 9 9 21 3
Exercise 4.73: The table below shows average daily sales of Rice
Krispies in the month of June in 74 Noodles & Company restaurants.
(a) Make a histogram for the data.

(b) Would you say the distribution is skewed? The distribution


appears slightly right-skewed, as there's a visible tail extending toward higher
values (e.g., 49).
(c) Calculate the mean and standard deviation.
 Mean: 20.12 units/day
 Standard Deviation: 7.64 units/day
(d) Are there any outliers?
32 8 14 20 28 19 37 31 16 16
16 29 11 34 31 18 22 17 27 16
24 49 25 18 25 21 15 16 20 11
21 29 14 25 10 15 8 12 12 19
21 28 27 26 12 24 18 19 24 16
17 20 23 13 17 17 19 36 16 34
25 15 16 13 20 13 13 23 17 22
11 17 17 9
Using the IQR method - Only one outlier was found: 49
CHAPTER 5
Problem 5.77: (a) In a certain state, license plates consist of three
letters (A–Z) followed by three digits (0–9). How many different plates
can be issued?
263×103=17,576,000 unique plates
(b) If the state allows any six-character mix (in any order) of
26 letters and 10 digits, how many unique plates are possible?
Total character options = 26 letters + 10 digits = 36
366 = 2,176,782,336 unique plates
(c) Why might some combinations of digits and letters be
disallowed?
To avoid offensive or misleading combinations, like:
 Slurs, swear words, or profanity
 Confusing combinations like “OOO000” or “ILL1L1”
Administrative policies might also restrict plates for certain groups (e.g.,
government or law enforcement).
(d) Would the system described in (b) permit a unique
license number for every car in the United States? For every
car in the world? Explain your assumptions.
 System (b) allows ~2.18 billion combinations.
 U.S.: As of recent data, there are ~290 million registered vehicles — Yes, it
covers this.
 Worldwide: Estimates suggest over 1.4–1.5 billion vehicles — System (b)
might be insufficient globally unless reused or recycled.
(e) If the letters O and I are not used because they look too
much like the numerals 0 and 1, how many different plates
can be issued?
24 letters × 10 digits
24 ×10 = 13,824,000 unique plates
3 3
Problem 5.94

a)i.P(S):

320
P(S) = 1000 = 0.32 → 32% of the sampled males are smokers.

ii. P(W):

850
P(W) = = 0.85 → 85% of the sample are White males.
1000

iii. P(S|W):

290
P(S∣W) = ≈ 0.3412 → Approximately 34.12% of White males are smokers.
850

iv. P(S | B):

30
P(S∣B) = = 0.2 → 20% of Black males are smokers.
150

v. P(S ∩ W) (i.e., P(S and W)):

290
P(S and W) = = 0.29 → 29% of the entire group are White smokers.
1000

vi. P(N ∩ B) (i.e., P(N and B)):


120
P(N and B) = = 0.12 → 12% of the group are Black nonsmokers.
1000

b) To test independence, compare: P(S∣W) = 0.3412 vs. P(S∣B) = 0.2


Since these conditional probabilities differ, smoking is not independent of race in
this sample. If they were independent, P(S | W) would ≈ P(S | B) ≈ P(S).
(c) This is subjective, but one might reflect:
 It’s not uncommon for smoking prevalence to vary by demographic factors
(including socioeconomic status, culture, and geographic location).
 You might or might not find these numbers align with what you've observed
in your own community or region.
(d) Public health officials could use this data to:
 Target anti-smoking campaigns more effectively (e.g., focusing on higher-
risk groups).
 Allocate resources for education, outreach, and cessation programs.
 Monitor health disparities across different racial or ethnic groups.
This type of analysis is crucial for data-driven health policy and risk reduction
planning.

Problem 5.98: A biometric security device using fingerprints


erroneously refuses to admit 1 in 1,000 authorized persons from a
facility containing classified information. The device will erroneously
admit 1 in 1,000,000 unauthorized persons. Assume that 95 percent of
those who seek access are authorized. If the alarm goes off and a person
is refused admission, what is the probability that the person was really
authorized?

Given:

 P(Authorized) = 0.95
1
 P(Rejected∣Authorized) =
1000
999999
 P(Rejected∣Unauthorized) =
1000000

 P(Authorized∣Rejected) ≈ 0.0186 or 1.86%

Interpretation: If the device denies access, there's only a 1.86% chance the
person was truly authorized — most rejections are unauthorized people.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy