0% found this document useful (0 votes)
75 views9 pages

Ce 023 Module 5 and 6

This document provides an overview of hypothesis testing concepts and methods. It discusses estimating the difference between two population means using a confidence interval. It also covers the traditional method of hypothesis testing, which involves a null hypothesis (H0) stating no difference and an alternative hypothesis (Ha) stating a difference. Key steps are outlined, such as defining the hypotheses, choosing a significance level, calculating a test statistic, and determining whether to reject the null hypothesis. Examples of hypothesis tests are given for situations like comparing wage rates, medication effects, and process improvements. Common error types - type I and type II - are defined.

Uploaded by

George Yanela
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views9 pages

Ce 023 Module 5 and 6

This document provides an overview of hypothesis testing concepts and methods. It discusses estimating the difference between two population means using a confidence interval. It also covers the traditional method of hypothesis testing, which involves a null hypothesis (H0) stating no difference and an alternative hypothesis (Ha) stating a difference. Key steps are outlined, such as defining the hypotheses, choosing a significance level, calculating a test statistic, and determining whether to reject the null hypothesis. Examples of hypothesis tests are given for situations like comparing wage rates, medication effects, and process improvements. Common error types - type I and type II - are defined.

Uploaded by

George Yanela
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

CE 023- ENGINEERING DATA ANALYSIS

MODULE 5 Hypothesis Testing

PART 1 : Estimating the difference between two Population


Means

2. As a group, students majoring in the engineering disciplines have


the highest salary expectations, followed by those studying the
computer science fields, according to a Michigan State University
study. To compare the starting salaries of college graduates majoring
in electrical engineering and computer science, random samples of 50
recent college graduates in each major were selected and the
following information obtained:

a. Find a point estimate for the difference in the average starting


salaries of college students majoring in electrical engineering and
computer science. What is the margin of error for your estimate?

b, Based upon the results in part a, do you think that there is a


significant difference in the average starting salaries for electrical
engineers and computer scientists? Explain

1. The wearing qualities of two types of automobiles tires were


compared by road-testing samples of n1=n2=100 tires for each type
and recording the number of kilometers until wearout, defined as a
specific amount of tire wear. The test results are shown below.
Estimate (𝜇 𝜇 , the difference in mean kilometers to wearout,
using a 99% confidence interval. Is there a difference in the average
wearing quality for the two types of tires?
CE 023- ENGINEERING DATA ANALYSIS the patients who take the medication. Will the pulse rate
MODULE 5 PART 2: Statistical Test of Hypothesis increase, decrease, or remain unchanged after a patient
takes the medication? Since the researcher knows that the
Hypothesis is a claim , premise or statement about a property of mean pulse rate for the population under study is 82 beats
population. per minute, the hypotheses for this situation are
A hypothesis test or test of significance is a procedure for testing a 3. A chemist invents an additive to increase the life of an
claim about a property of a population. automobile battery. If the mean lifetime of the automobile
battery without the additive is 36 months.
The three methods used to test hypotheses are 4. A contractor wishes to lower heating bills by using a special
1. The traditional method type of insulation in houses. If the average of the monthly
2. The P-value method heating bills is $78.
3. The confidence interval method 5. A researcher studies gambling in young people. She thinks
those who gamble spend more than $30 per day.
Traditional Method 6. A researcher wishes to see if police officers whose spouses
A statistical hypothesis is a conjecture/statement about a population work in law enforcement have a lower score on a work stress
parameter. This conjecture may or may not be true. questionnaire than the average score of 120.
7. An engineer hypothesizes that the mean number of defects
can be decreased in a manufacturing process of USB drives
by using robots instead of humans for certain tasks. The
mean number of defective drives per 1000 is 18.
8. A teacher feels that if an online textbook is used for a course
The null hypothesis, symbolized by H0, is a statistical hypothesis that
instead of a hardback book, it may change the students’
states that there is no difference between a parameter and a specific
scores on a final exam. In the past, the average final exam
value, or that there is no difference between two parameters.
score for the students was 83.
The alternative hypothesis, symbolized by Ha, is a statistical
hypothesis that states the existence of a difference between a
Statistical Test
parameter and a specific value, or states that there is a difference
A statistical test uses the data obtained from a sample to make a
between two parameters
decision about whether the null hypothesis should be rejected.
The numerical value obtained from a statistical test is called the
test value.

1. We reject the
null hypothesis
when it is true.
This would be
an incorrect
decision and
would result in a
type I error.
2. We reject the
null hypothesis
when it is false.
This would be a
correct decision.
3. We do not reject the null hypothesis when it is true. This would
be a correct decision.
4. We do not reject the null hypothesis when it is false. This would
be an incorrect decision and would result in a type II error.

A type I error occurs if you reject the null hypothesis when it is


true. Occurs when the null is a true statement, but we end up
rejecting the null because the test statistic incorrectly suggested
so.

A type II error occurs if you do not reject the null hypothesis when
Examples: it is false.
1. You wish to show that the average hourly wage of
electricians in the state of California is different from $21
which is the national average. The level of significance is the maximum probability of
2. A medical researcher is interested in finding out whether a committing a type I error. This probability is symbolized by a
new medication will have any undesirable side effects. The (Greek letter alpha). That is, P(type I error) = 𝛼. The probability
researcher is particularly concerned with the pulse rate of
of a type II error is symbolized by b, the Greek letter beta. That is, Summary of Hypothesis Testing and Critical Values
P(type II error) =𝛽

The critical or rejection region is the range of test values that


indicates that there is a significant difference and that the null
hypothesis should be rejected.

The noncritical or nonrejection region is the range of test values


that indicates that the difference was probably due to chance and that
the null hypothesis should not be rejected.
The critical value separates the critical region from the noncritical
region. The symbol for critical value is C.V.
A one-tailed test indicates that the null hypothesis should be rejected
when the test value is in the critical region on one side of the mean. A
one-tailed test is either a right-tailed test or a left-tailed test, depending
on the direction of the inequality of the alternative hypothesis.

Examples:
1. find the critical value(s) for each situation and draw the appropriate
figure, showing the critical region.
a. A left-tailed test with a 𝛼 =0.10.
b. A two-tailed test with a 𝛼 = 0.02.
c. A right-tailed test with a 𝛼 = 0.005.

Critical and Noncritical Regions for 𝛼 0.01 (Right-Tailed Test)

Z-Test for a mean

Critical and Noncritical Regions for 𝛼 0.01 (Left-Tailed Test)


The z test is a statistical test for the mean of a population. It can be
In a two-tailed test, the null hypothesis should be rejected when the used either when n 30 or when the population is normally distributed
test value is in either of the two critical regions and s is known. The formula for the z test is

 
2. A researcher claims that the average wind speed in a certain city is
8 miles per hour. A sample of 32 days has an average wind speed of
8.2 miles per hour. The standard deviation of the population is 0.6 mile
  per hour. At a 𝛼 =0.05, is there enough evidence to reject the claim?
 
Use the P-value method.
Examples:
3. A special cable has a breaking strength of 800 pounds. The
1. A researcher believes that the mean age of medical doctors in a
standard deviation of the population is 12 pounds. A researcher
large hospital system is older than the average age of doctors in the
selects a random sample of 20 cables and finds that the average
United States, which is 46. Assume the population standard deviation
breaking strength is 793 pounds. Can he reject the claim that the
is 4.2 years. A random sample of 30 doctors from the system is
breaking strength is 800 pounds? Find the P-value. Should the null
selected, and the mean age of the sample is 48.6. Test the claim at α
hypothesis be rejected at a 𝛼 0.01? Assume that the variable is
= 0.05.
normally distributed.
2. For a specific year, the average score on the SAT Math test was
4. What is normal, when it comes to people’s body temperatures? A
515. The variable is normally distributed, and the population standard
random sample of 130 human body temperatures, provided by Allen
deviation is 100. The same superintendent in the previous example
Shoemaker! in the Journal of Statistical Education, had a mean of
wishes to see if her students scored significantly below the national
98.25°F and a standard deviation of 0.73°F. Does the data indicate
average on the test. She randomly selected 36 student scores, as
that the average body temperature for healthy humans is different from
shown. At a 𝛼 =0.10, is there enough evidence to support the claim?
98.6°F, the usual average temperature cited by physicians and
others?
a, Test using the p-value approach with 𝛼 =.05
b. Test using the critical value approach with 𝛼 = .05.
c. Compare the conclusions from parts a and b, Are they the same?

Confidence Interval Method

3.In Pennsylvania the average IQ score is 101.5. The variable is


normally distributed, and the population standard deviation is 15. A
school superintendent claims that the students in her school district
have an IQ higher than the average of 101.5. She selects a random
sample of 30 students and finds the mean of the test scores is 106.4.
Test the claim at 𝛼 =0.05.
4. The average depth of the Hudson Bay is 305 feet. Climatologists
were interested in seeing if warming and ice melt were affecting the
water level. Fifty-five measurements over a period of randomly
selected weeks yielded a sample mean of 306.2 feet. The population
variance is known to be 3.6. Can it be concluded at the 0.05 level of
significance that the average depth has increased? Is there evidence
Example
of what caused this to happen?
1. Analyses of drinking water samples for 100 homes in each of two
5. The average “moviegoer” sees 8.5 movies a year. A moviegoer is
different sections of a city gave the following information on lead levels
defined as a person who sees at least one movie in a theater in a 12-
(in parts per million):
month period. A random sample of 40 moviegoers from a large
university revealed that the average number of movies seen per
person was 9.6. The population standard deviation is 3.2 movies. At
the 0.05 level of significance, can it be concluded that this represents
a difference from the national average?

P-Value
The P-value (or probability value) is the probability of getting a sample a. Calculate the test statistic and its p-value to test for a difference in
statistic (such as the mean) or a more extreme sample statistic in the the two population means. Use the p-value to evaluate the significance
direction of the alternative hypothesis when the null hypothesis is true. of the results at the 5% level.
b. Use a 95% confidence interval to estimate the difference in the
mean lead levels for the two sections of the city.

Examples:
1. A researcher wishes to test the claim that the average cost of tuition
and fees at a four-year public college is greater than $5700. She
selects a random sample of 36 four-year public colleges and finds the
mean to be $5950. The population standard deviation is $659. Is there
evidence to support the claim at a 𝛼 =0.05? Use the P-value method.
CE 023- ENGINEERING DATA ANALYSIS The range of the linear correlation coefficient is from -1 to +1. If there
MODULE 6: Correlation and Regression is a strong positive linear relationship between the variables, the value
of r will be close to +1. If there is a strong negative linear relationship
PART 1: Correlation between the variables, the value of r will be close to -1. When there is
A scatter plot is a graph of the ordered pairs (x, y) of numbers no linear relationship between the variables or only a weak
consisting of the independent variable x and the dependent variable y. relationship, the value of r will be close to 0.
■ Figure 10-2(a): Distinct straight-line, or linear, pattern. We say that
there is a positive linear correlation between x and y, since as the x
values increase, the corresponding y values also increase
■ Figure 10-2(b): Distinct straight-line, or linear pattern. We say that
there is a negative linear correlation between x and y, since as the x
values increase, the corresponding y values decrease.
■ Figure 10-2(c): No distinct pattern, which suggests that there is no
correlation between x and y.
■ Figure 10-2(d): Distinct pattern suggesting a correlation between x
and y, but the pattern is not that of a straight line.

Example
1. Construct a scatter plot for the data shown for car rental companies
in the United States for a recent year.

Correlation
Correlation Coefficient Statisticians use a measure called the
correlation coefficient to determine the strength of the linear
relationship between two variables. There are several types of
correlation coefficients.

The population correlation coefficient denoted by the Greek letter


𝜌 is the correlation computed by using all possible pairs of data values
(x, y) taken from a population.

The linear correlation coefficient computed from the sample data


measures the strength and direction of a linear relationship between
two quantitative variables. The symbol for the sample correlation
coefficient is r.
2. Compute the linear correlation coefficient for the data in Example 1.

4. You can monitor every step you take, your speed, your pace, or
some other aspect of your daily activity. The data that follows lists the
overall rating scores for 14 fitness trackers and their prices."
3. Compute the value of the linear correlation coefficient for the data
obtained in the study of the number of absences and the final grade of
the seven students in the statistics class.

The Significance of the Correlation Coefficient

a. Use a scatterplot of the data to check for a relationship between the


rating scores and prices for the fitness trackers.

b. Calculate the sample coefficient of correlation r and interpret its


value.

5. Is there a relationship between the life expectancy for men and the
life expectancy for women in a given country? A random sample of
Examples: nonindustrialized countries was selected, and the life expectancy in
1. Test the significance of the correlation coefficient. Use a 𝛼 0.05 years is listed for both men and women. Are the variables linearly
and r = 0.982. related?

2. The number of faculty and the number of students are shown for a
random selection of small colleges. Is there a significant relationship
between the two variables? Switch x and y and repeat the process.
Which do you think is really the independent variable?\

3. An architect wants to determine the relationship between the heights


(in feet) of a building and the number of stories in the building. The
data for a sample of 10 buildings in Pittsburgh are shown. Explain the
relationship.
CE 023- ENGINEERING DATA ANALYSIS
MODULE 6: Correlation and Regression
PART 2: Regression

In studying relationships between two variables, collect the data and


then construct a scatter plot. The purpose of the scatter plot, as
indicated previously, is to determine the nature of the relationship
between the variables. The possibilities include a positive linear
relationship, a negative linear relationship, a curvilinear relationship, or
no discernible relationship. After the scatter plot is drawn and a linear
relationship is determined, the next steps are to compute the value of
the correlation coefficient and to test the significance of the
relationship. If the value of the correlation coefficient is significant, the
next step is to determine the equation of the regression line, which is Probabilistic Model
the data’s line of best fit.

Formulas for the Regression Line y’=a + bx

Given a scatter plot, you must be able to draw the line of best fit. Best
fit means that the sum of the squares of the vertical distances from where a is the y’ intercept and b is the slope of the line.
each point to the line is at a minimum.
Example:
1. Find the equation of the regression line for the data shown, and
graph the line on the scatter plot of the data.

The difference between the actual value y and the predicted value (that
is, the vertical distance) is called a residual or a predicted error. 2. For each exercise, find the equation of the regression line and find
Residuals are used to determine the line that best describes the the y value for the specified x value. Remember that no regression
relationship between the two variables. should be done when r is not significant.
The method used for making the residuals as small as possible is
called the method of least squares. As a result of this method, the a. The number of murders and robberies per 100,000
regression line is also called the least squares regression line. population for a random selection of states are shown. Find
y’ when x = 4.5 murders.
Deterministic Model

b. Stories and heights of buildings data follow: Find y when


x=44.
3. Do a complete regression analysis by performing these steps.
a. Draw a scatter plot.
b. Compute the correlation coefficient.
c. State the hypotheses.
d. Test the hypotheses at . Use Table I.
e. Determine the regression line equation if r is significant.
f. Plot the regression line on the scatter plot, if appropriate.
g. Summarize the results.

These data were obtained for the years 1993 through 1998 and
indicate the number of fireworks (in millions) used and the related
injuries. Predict the number of injuries if 100 million fireworks are used
during a given year.

Types of Variation for the Regression Model


Consider the following hypothetical regression model. Find the total
variation.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy