Ce 023 Module 5 and 6
Ce 023 Module 5 and 6
1. We reject the
null hypothesis
when it is true.
This would be
an incorrect
decision and
would result in a
type I error.
2. We reject the
null hypothesis
when it is false.
This would be a
correct decision.
3. We do not reject the null hypothesis when it is true. This would
be a correct decision.
4. We do not reject the null hypothesis when it is false. This would
be an incorrect decision and would result in a type II error.
A type II error occurs if you do not reject the null hypothesis when
Examples: it is false.
1. You wish to show that the average hourly wage of
electricians in the state of California is different from $21
which is the national average. The level of significance is the maximum probability of
2. A medical researcher is interested in finding out whether a committing a type I error. This probability is symbolized by a
new medication will have any undesirable side effects. The (Greek letter alpha). That is, P(type I error) = 𝛼. The probability
researcher is particularly concerned with the pulse rate of
of a type II error is symbolized by b, the Greek letter beta. That is, Summary of Hypothesis Testing and Critical Values
P(type II error) =𝛽
Examples:
1. find the critical value(s) for each situation and draw the appropriate
figure, showing the critical region.
a. A left-tailed test with a 𝛼 =0.10.
b. A two-tailed test with a 𝛼 = 0.02.
c. A right-tailed test with a 𝛼 = 0.005.
2. A researcher claims that the average wind speed in a certain city is
8 miles per hour. A sample of 32 days has an average wind speed of
8.2 miles per hour. The standard deviation of the population is 0.6 mile
per hour. At a 𝛼 =0.05, is there enough evidence to reject the claim?
Use the P-value method.
Examples:
3. A special cable has a breaking strength of 800 pounds. The
1. A researcher believes that the mean age of medical doctors in a
standard deviation of the population is 12 pounds. A researcher
large hospital system is older than the average age of doctors in the
selects a random sample of 20 cables and finds that the average
United States, which is 46. Assume the population standard deviation
breaking strength is 793 pounds. Can he reject the claim that the
is 4.2 years. A random sample of 30 doctors from the system is
breaking strength is 800 pounds? Find the P-value. Should the null
selected, and the mean age of the sample is 48.6. Test the claim at α
hypothesis be rejected at a 𝛼 0.01? Assume that the variable is
= 0.05.
normally distributed.
2. For a specific year, the average score on the SAT Math test was
4. What is normal, when it comes to people’s body temperatures? A
515. The variable is normally distributed, and the population standard
random sample of 130 human body temperatures, provided by Allen
deviation is 100. The same superintendent in the previous example
Shoemaker! in the Journal of Statistical Education, had a mean of
wishes to see if her students scored significantly below the national
98.25°F and a standard deviation of 0.73°F. Does the data indicate
average on the test. She randomly selected 36 student scores, as
that the average body temperature for healthy humans is different from
shown. At a 𝛼 =0.10, is there enough evidence to support the claim?
98.6°F, the usual average temperature cited by physicians and
others?
a, Test using the p-value approach with 𝛼 =.05
b. Test using the critical value approach with 𝛼 = .05.
c. Compare the conclusions from parts a and b, Are they the same?
P-Value
The P-value (or probability value) is the probability of getting a sample a. Calculate the test statistic and its p-value to test for a difference in
statistic (such as the mean) or a more extreme sample statistic in the the two population means. Use the p-value to evaluate the significance
direction of the alternative hypothesis when the null hypothesis is true. of the results at the 5% level.
b. Use a 95% confidence interval to estimate the difference in the
mean lead levels for the two sections of the city.
Examples:
1. A researcher wishes to test the claim that the average cost of tuition
and fees at a four-year public college is greater than $5700. She
selects a random sample of 36 four-year public colleges and finds the
mean to be $5950. The population standard deviation is $659. Is there
evidence to support the claim at a 𝛼 =0.05? Use the P-value method.
CE 023- ENGINEERING DATA ANALYSIS The range of the linear correlation coefficient is from -1 to +1. If there
MODULE 6: Correlation and Regression is a strong positive linear relationship between the variables, the value
of r will be close to +1. If there is a strong negative linear relationship
PART 1: Correlation between the variables, the value of r will be close to -1. When there is
A scatter plot is a graph of the ordered pairs (x, y) of numbers no linear relationship between the variables or only a weak
consisting of the independent variable x and the dependent variable y. relationship, the value of r will be close to 0.
■ Figure 10-2(a): Distinct straight-line, or linear, pattern. We say that
there is a positive linear correlation between x and y, since as the x
values increase, the corresponding y values also increase
■ Figure 10-2(b): Distinct straight-line, or linear pattern. We say that
there is a negative linear correlation between x and y, since as the x
values increase, the corresponding y values decrease.
■ Figure 10-2(c): No distinct pattern, which suggests that there is no
correlation between x and y.
■ Figure 10-2(d): Distinct pattern suggesting a correlation between x
and y, but the pattern is not that of a straight line.
Example
1. Construct a scatter plot for the data shown for car rental companies
in the United States for a recent year.
Correlation
Correlation Coefficient Statisticians use a measure called the
correlation coefficient to determine the strength of the linear
relationship between two variables. There are several types of
correlation coefficients.
4. You can monitor every step you take, your speed, your pace, or
some other aspect of your daily activity. The data that follows lists the
overall rating scores for 14 fitness trackers and their prices."
3. Compute the value of the linear correlation coefficient for the data
obtained in the study of the number of absences and the final grade of
the seven students in the statistics class.
5. Is there a relationship between the life expectancy for men and the
life expectancy for women in a given country? A random sample of
Examples: nonindustrialized countries was selected, and the life expectancy in
1. Test the significance of the correlation coefficient. Use a 𝛼 0.05 years is listed for both men and women. Are the variables linearly
and r = 0.982. related?
2. The number of faculty and the number of students are shown for a
random selection of small colleges. Is there a significant relationship
between the two variables? Switch x and y and repeat the process.
Which do you think is really the independent variable?\
Given a scatter plot, you must be able to draw the line of best fit. Best
fit means that the sum of the squares of the vertical distances from where a is the y’ intercept and b is the slope of the line.
each point to the line is at a minimum.
Example:
1. Find the equation of the regression line for the data shown, and
graph the line on the scatter plot of the data.
The difference between the actual value y and the predicted value (that
is, the vertical distance) is called a residual or a predicted error. 2. For each exercise, find the equation of the regression line and find
Residuals are used to determine the line that best describes the the y value for the specified x value. Remember that no regression
relationship between the two variables. should be done when r is not significant.
The method used for making the residuals as small as possible is
called the method of least squares. As a result of this method, the a. The number of murders and robberies per 100,000
regression line is also called the least squares regression line. population for a random selection of states are shown. Find
y’ when x = 4.5 murders.
Deterministic Model
These data were obtained for the years 1993 through 1998 and
indicate the number of fireworks (in millions) used and the related
injuries. Predict the number of injuries if 100 million fireworks are used
during a given year.