100% found this document useful (3 votes)
869 views34 pages

Basic Statistics

This document outlines a course on statistics. It covers key topics including introduction to statistics, data collection and presentation, summarizing data, the normal distribution, and inferential statistics. Statistics is defined as dealing with collecting, organizing, analyzing, and interpreting numerical data. Descriptive statistics describes data through measures of central tendency, variability, and graphs/tables. Inferential statistics uses samples to make inferences about populations through hypothesis testing, analysis of variance, chi-square tests, correlation, and regression analysis. Data can be collected through surveys, experiments, existing records, and observation and presented textually, in tables, or graphs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
869 views34 pages

Basic Statistics

This document outlines a course on statistics. It covers key topics including introduction to statistics, data collection and presentation, summarizing data, the normal distribution, and inferential statistics. Statistics is defined as dealing with collecting, organizing, analyzing, and interpreting numerical data. Descriptive statistics describes data through measures of central tendency, variability, and graphs/tables. Inferential statistics uses samples to make inferences about populations through hypothesis testing, analysis of variance, chi-square tests, correlation, and regression analysis. Data can be collected through surveys, experiments, existing records, and observation and presented textually, in tables, or graphs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Workbook

in
Statistics
Name:

Control no.:
COURSE OUTLINE V. INFERENTIAL STATISTICS
A. Test of Hypothesis
1. Types of Statistical hypothesis
I. INTRODUCTION 2. Types of Test of Hypothesis
A. Definition of Statistics 3. Types of Error
B. Uses of Statistics 4. Level of Significance
C. Division of Statistics 5. Measures in Decision Making
D. Population Versus Sample
E. Variable
1. Types B. Application
2. Levels of Measurement 1. Parameter
a. Test of Significance
a.1 one-sample z or t-test
a.2 two-sample
II. DATA COLLECTION AND
PRESENTATION
2. ANOVA
A. Kinds of Data
a. Functions
B. Methods of Data Collection
b. Rationale
C. Sampling Technique
c. ANOVA Calculations
D. Methods of Data Presentation
1. Textual
2. Tabular
3. Graphical 3. CHI-SQUARE TEST
a. Types
III. SUMMARIZING DATA b. Two-by-two Contingency table
A. Parameter versus Parameter c. Limitations of Chi-Square
B. Measures of Central Tendency
C. Measures of Variability
D. Coefficient of Variation 4. CORRELATION AND LINEAR
REGRESSION
IV. THE NORMAL a. Difference between Correlation
DISTRIBUTION and Regression
A. Importance b. The Scatter Diagram
B. Properties c. Correlation Coefficient
C. Areas Under The Normal Curve d. Regression Analysis

5. NONPARAMETRIC METHOD
a. Advantages and Disadvantages
b. Spearman Rank Order
Correlation
I. INRODUCTION
A. STATISTICS – deals with the collection, organization, presentation, analysis and
interpretation of data obtained by conducting a survey and experiment.
- Refers to numerical facts.

B. Uses of Statistics
- Essential tool in education, government, business, economics, medicine, psychology,
sociology, sports and others.
- To describe and draw inferences about the numerical properties of population.
- To make good decision/correct decision based on the data (how and where).
- Used to informed instructor in building and analyzing test and in preparing grades

C. Division of Statistics
1. Descriptive Statistics – consist of methods for organizing, displaying and
describing data by using tables, graphs and summary measures.
- Aims to give information about large group of data without dealing with each and
every element of this group.
Tools: measures of central tendency, measures of variability, skewness and kurtosis.
2. Inferential Statistics – consist of methods that that use sample results to help
make decisions or prediction about a population.
- Aims to give information about large group of data without dealing with each and
every element of this group.
- It uses only small portion of total set of data to draw conclusion.
Tools: testing of hypothesis using t-test, z-test, simple linear correlation, analysis of
variance, chi-square, regression analysis and time series analysis

D. Population vs. Sample


Population –refers to the total collection/whole set/complete set elements, individual,
objects, places or events are interested in.
- Values that refer to population is called parameter.

Sample – part/subset/representative of population that will be studied in detail.


- Values that refer to samples are called statistics.
E. Variables
Refers to the factor, property, attributes, characteristics or behavior that differentiates a
group of person, a set of things, events, conditions or approaches from another group
or levels with descriptive or numerical values that can measure either quantitative or
qualitative.

Kinds of variable
1. Independent – presumed cause of change happens in the dependent variable.
2. Dependent – presumed effect of the change happens in the independent variable.
Types of variables
1. Qualitative – express in kind of categories.
- are nonmeasurable characteristics that cannot assume a numerical
value.
Ex. Gender, civil status,
2. Quantitative – express in amount or numbers
a. Continuous – are obtained by measurement
Ex. Height, weight, and time in minutes
b. Discrete – a variable whose values are countable. It can assume only certain
values with no intermediate values
Ex. No. of students in Statistics Class, no. of subjects a student can enroll in
a semester.

E. Levels of Measurement/Measurement Scale (Stevens, 1946, 1958, 1968)

1. Nominal – numbers may serve as labels to identify items or classes.


- Assign one and only one category based on specific criteria, most primitive, purely
qualitative.
Example: gender, civil status, degree earned, location of residents, numbers
carried on backs of athletes.

2. Ordinal Scale – it tells which one a person, object or aspects in greater than or less
than the other but does not tell how much the difference is.
- It orders the category of variables by ranking them.
Example: performance rating, prize won, I.Q. level

3. Interval Scale – provides numbers that reflect differences among items or classes.
Example: test score, I.Q. scores, performance score, Fahrenheit, Celsius, time,
etc.

4. Ratio Scale – the only scale that has a true zero. The point of origin being the fixed
one.
Example: Kelvin, height, weight, length, width, loudness, etc.
II. Data Collection and Presentation

Planning the study


1. Determine the problem existing
2. Come up with title (SMART)
3. Assess resources (respondents, time, money, reading, materials)
4. Determine the sample size needed (Slovin’s Formula; n=N/1+Ne2)
5. Prepare the quesionaires

A. Sources of Data
1. Documentary Sources – data contained in published or unpublished reports,
statistics, documents, manuscript, letter, etc.
a. Primary source – data gathered originally, first hand data
b. Secondary source – data gathered from original sources
2. Field Sources – includes living persons with sufficient knowledge about social
condition or had been an intimate contact with the subjects over a considerable
period of time.

B. Methods of Collecting Data


1. Direct/Interview method – process of asking questions.
Characteristics: consistent, more precise
2. Indirect/Questionnaire method – easiest method of data gathering
3. Experiment – done by conducting scientific inquiry
4. Registration – utilizing existing records in compliance with the law, rules,
regulations, decrees and standard practices.
5. Observation – can be done directly or indirectly.
6. Telephone interview – utilized when questions to be asked are brief and few.

C. Sampling Technique
1. Non-Probability sampling/Non-Random sampling – there is no way of
estimating the probability that each individual or element will be included in sample.
Types
a. Accident or incident sampling
b. Quota sampling-m the proportion of various subgroups in population are
determined and the sample is drawn to have the same percentage in it.
c. Purposive sampling – base on certain criteria laid by the researcher.
d. Convenience sampling
2. Probability sampling/Random sampling – every individual has equal chance of
being selected in sample before the selection is done.
Types
a. Simple random sampling – the item are picked out for sample at random
b. Systematic Random sampling – the items are chosen from the population at
uniform intervals of time.
c. Cluster sampling – used when the population is spread out within the geography.
d. Stratified random sampling – this procedure divides the population into
subgroups called strata.

Course Frequency(f) Relative Percentage Sample(nc)


(No. of Frequency(rf) (%) Nc=rf x n
Students) Rf=f/∑f %=rf x 100
BSN 150
BSPT 60
BSRT 500
DEBTECH 50
BSPHARMA 75
BSBA 200
BSAC 60
BSCA 400
BSED 52
BSCRIM 180
BSHIM 250
BSIT 200
BSSW 23
TOTAL N=∑f=2,200

D. Methods of Data Presentation


1. Textual – data is presented in paragraph form. Weak means presenting a
quantitative comparison
2. Tabular – data is presented in rows and columns. Effective way presenting
relationships and comparison(comprehensible).
3. Semi Tabular – both textual and tabular method. Employed if there are few
figures.
4. Graphical – data is presented in visual form. Most effective methodof presenting
statistical results and findings.
a. Frequency polygon(Xm vs. f)
b. Histogram – bars are placed adjacent to each other (CB vs. f)
c. Cumulative frequency polygon(ogive curve) – utilizes the percentage of classes
less than or greater than a given value(Xm vs. < or > cf)
Definition of terms
Class Interval (CI) – refers to the lowest and highest value that can be entered in each class.
Lowest Class Limit (LCL) – the lower value that can go in each class.
Upper Cass Limit (UPL) – the upper value that can go in each class.
Frequency (f) – the number values that fall in a given interval.
Midpoint or Class mark (Xm) – summary of an interval. The value midway between the
upper limit of a certain interval and the lower limit of the next..
Class Boundary (CB) – refers to the true limit
Class size (I) – refers to the number of entry scores from lower limit to upper limit.
Frequency Distribution Table (FDT) – refers to the tabular arrangement of data by classes
or categories with their corresponding class frequencies.

Example
Suppose a Statistics class with 60 students was given 100-item examination and result are:
Table 1: Test Score Obtained by the Sixty Students in Statistics Class
45 73 57 69 68 35 70 62 47 60

46 70 49 45 53 60 39 65 38 59

50 69 62 35 58 69 45 28 58 65

38 49 28 36 41 58 37 35 61 48

36 51 59 55 60 37 55 59 57 36

70 36 50 63 68 30 56 70 53 57

Table 2: The frequency Distribution of Examination Result of Sixty Student in a


Statistical Class (Qualitative Data)
Exam No. of Student (f) Xm CB <cf >cf fXm
Score(CI)
Table 3: The Frequency Distribution of the Sixty Students in Statistics Grouped
According to
Gender (Qualitative Data)
Gender No. of Students (f) Rf=f/∑f %=rfxx100
Male 15
Female 45
Total 60
Name: Score:
Year & Section: Data:

Exercise 1.1

1. The data below shows the frequency distribution table of religion of all the students of
Pala High School. Show how many samples will be taken from each of the following
categories using stratified random e sampling. Use 5% level significance.
Religion Frequency N
Roman Catholic 355
Iglesia ni Cristo 250
Jehovah’s Witnesses 245
Others 150
Total

2. The data below shows the number of clients/patients per day certain hospital in
Tuguegarao City
30 34 36 48 44 43
44 43 33 30 33 46
33 51 34 35 35 49
49 41 45 34 30 34
32 24 33 33 29 48

a. Construct a frequency distribution


b. Construct frequency polygon.
c. Construct histogram.
d. Construct greater cumulative frequency.
e. Construct less than cumulative frequency.
III. Summarizing Data
A. Measures of Central Tendency
1. Mean (X) – used as average, most accurate in describing quantitative data.
- Characteristics: sensitive
- Many statistics are based on the mean.
- Use when the data is either interval or ratio and when distributions are symmetrical
or approximately the shape of the normal curve.
- The sum of distance of the measures from the mean on one side is equal to the sum
of the distance on the other side.
Consider: a. 15, 15, 16, 18, 21
b. The ages of 10 students in a certain class were taken and shown below:
15, 18, 17, 24, 18,16,17,20,21,19.
c. The daily sales of Mercury Drug for the first seven days of a certain month
are P5, 186, P1,826, P2,580, P5,186, 4,650, P3,635, P8,625. Determine the daily
mean sales of the store for the first seven days.

Ungrouped Data:

∑𝑋
𝑥= ∑x – sum of scores or measurement
𝑛
n – number of cases

∑𝒇𝑿
𝒙= f = (weighted mean)
∑𝒇

Group Data:
F - frequency
∑𝒇𝑿𝒎
𝒙= Xm – classmark or midpoint
∑𝒇

2. Median (X) – positional measure, affects extreme values


- The point in a distribution with 50% of the measure or the score.
- Used when distribution departs from normal and when data is ordinal
Grouped data:

𝑛⁄
2−𝑐𝑓
𝑀𝑑 = 𝑋 = 𝐿𝐶𝐵 + [ 𝑓
]𝑖

LCB – Lower class boundary of the median class


Cf – cumulative frequency before the median
f – frequency of the median class
I – class interval
3. Mode (X) – refers to the frequency that scored most in the distribution
- The least reliable but the most accurate to describe qualitative data
- Used when data id nominal.
Unimodal – distribution that consist of only one mode
Bimodal – distribution that consist of two modes
Multimodal – distribution that consist more than 2 modes

Grouped Data:

𝑑1
𝑀𝑜 = 𝑋 = 𝐿𝐶𝐵 + [ ]𝑖
𝑑1+𝑑2

LCB – lower class boundary of the modal class


d1 – the difference of the frequency of the modal class and the proceeding frequency
d2 – the difference of the frequency of the modal class and succeeding frequency
i – class boundary
B. Measures of Variability
Measures of variation – values used to determine the scatter of values in distribution

1. Range (R) – the difference between the highest score and the lowest score value in
the distribution
- The value is easily determined if the objective is to emphasize the extreme variation.
- Disadvantages: always affected by values, not all values are considered.
2. Mean Deviation (MD) – Takes into account all the values in distribution

Ungrouped Data:
x– individual values
∑(𝑥−𝑋)
𝑀𝐷 = X – mean of the distribution
𝑛

Grouped Data:
Xm – midpoint of the distribution
∑𝑓(𝑋𝑚−𝑚)
𝑀𝐷 = X – mean of the distribution
∑𝑓
F – frequency f the distribution
3. Variance (s2) – Area
- Most reliable because it can be treated mathematically and be sed for deeper
analysis.
- The only measure of spread which can be used for statistical inferences.

Ungrouped: x – individual values


∑(𝑥−𝑋)2
2
𝑠 = 𝑛
X – mean of the distribution

Grouped: Xm – midpoint of each class


∑(𝑋𝑚−𝑋)2
𝑠2 = X – mean of distribution
∑𝑓

4. Standard Deviation (s) – one of the most important Measure of Variation


- The process of extracting the Square root variance

Ungrouped x – individual values


∑(𝑥−𝑋)2
𝑠= √
𝑛
X – mean of distribution

Grouped Xm – midpoint of each class


∑𝑓(𝑋𝑚−𝑋)2
𝑠= √ ∑𝑓
X – mean of distribution
5. Coefficient of Variation(CV)
𝑠
𝐶𝑉 = 𝑥
IV. THE NORMAL DISTRIBUTION

A. Importance

Abraham de Molvre (1667-1754) & Karl Freidrich Gauss (1775-1855)

- Mathematicians who contributed a lot in the Development of Normal Distribution.

Normal Distribution – Gaussian distribution

- Is central in the study of Statistics as it is the basis for solving various types of
Statistical problems.
- The distribution of Variables such as grades of students, weight or height of
persons, incomes of Families, IQ’s of children may be said to e approximately
normal.
Many

Few Few

Short average tall

There are relatively few short adults, relatively few tall adults, and the height of most adults will
tend towards middle value between the shortest and the tallest.

Intervals:

A – Less than 4 ft.

B – 4 ft. to less than 4 ft. and 6 inches

C – 4 ft. And 6 inches to less than 5 ft.

D – 5 ft. to less than 5 ft. and 6 inches

E – 5 ft. And 6 inches to less than 6 ft.

F – 6 ft. to less than 6 ft. and 6 inches

G – Greater than 6 ft. and 6 inches


Relatively Many

Relatively Few Relatively Few

A B C D E F G

The Shape of “Smoothed –out” frequency polygon is called normal curve which represents a
normal distribution.

Skewed Distribution (non-normal)


Types:

1. Positively Skewed/Skewed to the right – a distribution that has a tail longer on the
right end.

Ex. Age at marriage, Families tend to have low monthly incomes

2. Negatively Skewed/Skewed to the left – a distribution that has a tail longer the left
end.

Ex. Most students in a class have high grades, families have high incomes.

B. Properties of Normal Curve


1. It is symmetrical about the mean

2. The mean is equal to the median, which is also equal to the mode.

3. The tails or ends are asymptotic relative to the horizontal line

4. The total area under the normal curve is equal to 1 or 100%

5. The normal curve area may be subdivided into at least three standard scores each of the left
and to the right of the vertical axis.
6. Along the Horizontal line, the distance from the integral standard score to the next integral
score is measured by the standard deviation

7. The area under the normal curve and above the horizontal axis, 68.27% is within 1 standard
deviation from the mean, 95.45% is within 2 standard deviations from the mean, 99.74% is
within 3 standard deviations from the mean.

C. Areas under the Normal Curve

𝑥−𝑋
𝑧=
𝑠
Where: Z= standard score
X= mean
S= Standard Deviation
x= a given value of particular variable

Ex. 1. Suppose that we have a distribution of the test scores for which the following
statistics have been computed. Mean grade is 80, standard deviation is 16, and

A. X= 110

B. X= 77

C. X=64

D. X= 120

E. X=59

F. Find the grade of two students whose z. scores are – 0.6 and 1.2 respectively

2. Find the area under the normal curve from Z= 0 to z = 1.2

3. Find the area under the normal curve from Z= 0 to z = 1.25

4. Find the area under the normal curve from Z= -1.25 to z= 0

5. Find the area under the normal curve from Z= 1.5 to z = 0

6. Find the area under the normal curve from Z= 0.81 to z= 1.94

7. Find the area under the normal curve from Z= 0.01 to z= 1.2
8. Find the area under the normal curve from Z= 2.85 to z= 1.98

9. Find the area under the normal curve from Z= 1.27 to z= 0.02

10. Find the area under the normal curve from Z= 1.27 to z= 1.5

11. Find the area under the normal curve from Z= -2.4 to z= 2.5

12. Find the area to the right of Z= 1.52

13. Find {z ≥ 1.52} and {z ≤ 1.78}

14. Find the area to the left of Z = -2.70

15. Find: {z≤ -1.52} and {z ≥ 2.32}

Applications:
1. The hourly wages of 500 skilled workers is found to approximate a normal
distribution. The average hourly wages is computed to be P100 with a standard
deviation of P10.

Find:
A. A Percentage of workers whose hourly wages are:
a. from P100 to P110
b. from to P100 to P110
c. from to P80 to P90
B. The number of workers whose hourly wages are:
a. Greater than or equal to P115
b. from P75 to P118
c. Greater than or equal to P85
d. Less than or equal to P113
C. 1. The minimum hourly wage of the upper 10% of workers (That is with the highest
wages)
2. The Maximum hourly wage of the lowest 20% of workers (that is the lowest
wages).
2. One thousand skilled workers were given an examination to determine how much
they know about the job. If the scores are normally distributed and the score of one
worker measured in z score is 0.8, how many of the workers who took the examination
scored higher than or equal to this particular worker.
Name: Score: __________

Year & Section: Date:

Exercise 1.3

1. Calculate the area under the standard normal curve.

a. P(z≤ 2.1)

b.P(z≤ -3.0)

c. P(z≥ 1.8)

d. P(z≥ 2.8)

2. Calculate the area under the normal curve where X = 36 and S = 5.

a. P(x≥28)

b. P(x≥42)

c. P(x≤38)

d. P(x≥39)

e. P(x≤45)

f. P(32≤ x≤40)

g. P(26≤ x≤35)

h. P(39≤ x≤42)

i. P(37≤ x ≤ 44)

j. P(47≤ x ≤48)
V. INFERENTIAL STATISTICS

A. TEST OF HYPOTHESIS
Statical Hypothesis
- a conjecture about a population parameter.
- Statement/tentative theory which aims to explain facts about the real world.

2 KINDS OF HYPOTHESIS

A. Null Hypothesis (H0)


- is statistical hypothesis that states that there is no difference between a parameter and
specific value or that there is no difference between two parameters.
B. Alternative Hypothesis (Ha)
- is statistical hypothesis that states a specific difference between a parameter and specific
value or that there is a difference between two parameters.

The null and Alternative hypotheses are stated together as shown:

Two- tailed test Right- tailed test Left-tailed test


Ho:= Ho:= Ho:=
Ha:≠ Ha:> Ha:<

Types of Tests
One-Tailed – when the rejection region is located at only one extreme of the range of values
for the test statistics
- > and < Indicate the use of a one-tailed test in Ha
Two-Tailed – when the z-score is located on both sides of the mean.
- ≠ indicates the use of two-tailed test in Ha
-
Types of Error

Ho is True Ho is False
Reject Ho Type 1 error Correct Decision
Accept Ho Correct Decision Type II error

Level of Significance
- The probability of making a type 1 or alpha error in a test.
- The maximum value of the probability of rejecting the null hypothesis H 0
where in fact it is true.
- (0.5)/5% or (.01)/1%
TESTING THE DIFFERNCE BETWEEN TWO MEANS

Z-test – used when the population standard deviation is known


If the standard deviation is not known but the sample size is > 30

T- test – used when the sample standard deviations are known.

STEPS:
Test of Hypothesis:
1. State the hypothesis
Ho: There is no significant difference between items being compared.
Ha: There is a significant difference between items being compared.
2. Set the level of significance.
3. Determine the test to be used.
Used z – test, if the population S.D. is given
Used t – test, if the sample S.D. is given
4. Determine the tabular value for the test.
For z – test, used the table c
For t – test, one must first compute for the degrees of freedom; then look table
D
For single sample of t- test,
df =# of items – 1 = n-1.
For 2 samples of t – test,
df=n1 + n2 – 2

Where:
n1= refers to the # of items in the 1st sample
n2= refers to the # of items in the 2nd sample

5. Compute for z or t – test


Z - test T - test
Sample mean compared (𝑥−𝜇)√𝑛 (𝑥−𝜇)√𝑛
with population mean 𝑧 = 𝑡=
𝑎 𝑠
Comparing the two 𝑥1 − 𝑥2 𝑡=
𝑥1 −𝑥2
𝑧= (
sample mean 1 1 √
𝜎√𝑛 + 𝑛
1 2
X – Sample mean X - sample mean
µ - Population mean µ - population mean
n – number of items within the sample n – number of items within the sample
ð – population standard deviation s – sample standard deviation
x1 – mean of the 1st sample x1 – mean of the 1st sample
x2 – mean of the 2nd sample x2 – mean of the 2nd sample
n1 – no. of items in the 1st sample s1 – standard deviation of the 1st sample
n2 - no. of items in the 2nd sample s2 – standard deviation of the 2nd sample
n1 – no. of items in the 1st sample
n2 - no. of items in the 2nd sample
1. State your conclusion
A. Reect Ho: /CV / > / TV /
B. Accept Ho: / CV /< / TV/

CRITICAL VALUES OF Z AT VARYING SIGNIFICANCE LEVEL


Significance level .10 .05 .025 .01

Test type
+ 1.28 + 1.645 + 1.96 + 2.33
One-tailed test
+ 1.645 + 1.96 + 2.33 + 2.58
Two-tailed test

Problems /applications:
1. Data from a school census show that the mean weight of college students was 45 kilos,
with a standard deviation of 3 kilos. A sample of 200 college students was found to have
a mean weight of 47 kilos. Are the 200 college students really heavier than the rest,
using .05 significance level?
2. A researcher knows that the average height of Filipino women is 1.525 meters. A
random sample of 26 women was taken and wasfound to have a mean height of 1.56
meters, with standard deviation of .10 meters. Is there a reason to believe that the 26
women in the sample are significantly smaller than the others at .05 significance level?
A researcher wishes to find out whether or not there is a significant difference between the
monthly allowances of morning and afternoon students in his school. By random sampling,he
took a sample of 239 students in the morning session. These students were found to have a
mean monthly allowance of P 142.00. The researcher also took a sample of 209 students in the
afternoon session. They were found to have a mean allowance of P148.00. The total
population of students in that school has a standard deviation of P140 is there significant
difference between the two samples at .01 level of
Name: Score:
Year & Section: Date:

EXERCISE 1.5

1. A researcher found out that the average age of Filipinos who get married is 21 years old. A
random sample of 50 married couples were taken and found out to have an average age of 22.8
with a standard deviation of 32. Is there a reason to believe that the sample is significantly
older than the others using 1% level of significance?

2. The language cluster of MCNP-ISAP claims that the standard deviation of the latest English
Proficiency Exam of MCNP-ISAP is 4.0 and the mean score is 40. A random sample of 400
students from the school was taken and found to have a mean score of 44. At 5% level of
significance enough evidence to reject the claim?

3. JM Carpetech Inc. claims that the average cost of carpet installation and repairs is P 8,550. A
sample of 60 repairs has an average of P 8,600. The standard deviation of the sample is P 8,380.
At 5% level of significance, is there enough evidence to reject the company’s claim?

4. A researcher wishes to test whether or not the case method of teaching is more effective than
the traditional method. She picks two classes of approximately equal intelligence. She Gathers a
sample of 20 students to whom she uses the case method and another sample of 21 students to
whom she uses the traditional method. After the experiment, an objective test revealed that the
first sample got a mean score of 29, while the second group got a mean score of 28.5. The
standard deviation of the population is 5.2. Based on the result of the administered test, can we
say that the case method is as effective as the traditional method?

Simple Analysis of Variance (ANOVA)


-to test the significant of more than two samples/group. Simultaneous testing.
-developed by R.A. Fisher (F-test)
-data must be arrange by rows: represent by the items in the sample and columns: represent
the sample classification.

Assumptions:
1. The various groups are assumed to be with normal populations.
2. The variances of the different groups are assumed to be equal.
3. The random samples in the groups should be independent.

Steps:
1. State the hypothesis
Ho: there is no significant difference among the samples/group.
Ha: there is a significant difference among the samples/group.

2. Level of significance
3. 3.a Compute the sum of squares

(∑ 𝑥)2
TSS=∑ 𝑥 2 − 𝑁

∑(𝑠𝑢𝑚 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑐𝑜𝑙𝑢𝑚𝑛)2 (∑ 𝑥)2


SSb = 𝑟
− 𝑁

SSw= TSS-SSb

3b. Compute the degrees of freedom

dft = N-1
dfb = k-1
dfw = dft-dfb

3c. compute the mean sum of squares.

𝑆𝑆𝑏 𝑆𝑆𝑤
𝑀𝑆𝑆𝑏 = 𝑑𝑓𝑏 𝑀𝑆𝑆𝑤 = 𝑑𝑓𝑤

3d. Compute the ANOVA value


𝑀𝑆𝑆𝑏
𝐹 = 𝑀𝑆𝑆𝑤

4. Locate the table value: n1/n2 (dfb/dfw)


5. CV vs TV,
CV<TV, Ho is accepted
CV > TV, Ho is rejected.

Problems:

1. The weight in kilograms of 3 groups of 5 each are shown in the table below. Is there unusual
variation among the groups using 5% level of significance?

First group Second group Third group


st
1 member 40 40 50
2nd member 50 50 55
3th member 55 50 50
4th member 55 60 45
5th member 45 55 40

2. Below are the bowling scores of 5 groups of 4players each. At 2.5% level of significance, find out
if there is unusual variation among the group?

Group A Group B Group C Group D Group E


Player A 98 100 87 90 98
Player B 90 90 92 95 95
Player C 95 90 90 96 90
Player D 100 85 90 97 80
Name:

Year and Section:

Exercise 1.6

1. Aiza Grey, a manager of Elizza Manufacturing Company wants to see whether the average time
(in minutes) it takes her employees to commute to work is different for three groups.
The data are shown here. At a=0.05, can she conclude that there is significance difference
among the means?

Managers Sales Office Clerks


30 10 26
20 10 20
18 15 10
20 10 45
12 25 32
35 25 30
30 45 23

2. Karl James of MND research Incorporated tests the lifetime (in hours) of four DVD disks.
The data are shown here. At a = 0.01, is there a difference in means?

Disk A Disk B Disk C Disk D


254 200 254 200
270 254 260 320
300 310 234 209
211 230 150 350
208 208 200 280
Analysis of Numeration Data

Enumeration Data

Expressed in the form of frequencies which represent the number of items within specified
qualitative description or categories.

2 Classifications
1. One-way classification – has one variable described by at least twocategories.

Table 1

One –Way Classification on the gender of 95 students

Gender Frequency
Male 40
Female 55
total 95

Table 2

One –Way Classification of the IQ of 95 students

High IQ Average IQ Low IQ total


12 15 8 35

Table 3

One-Way Classification on the Production Quality of 100 Products

Production Quality Number of Products


Non-defective 99
Defective 1
Total 100

2. Two-way Classification- have two variables described by their respective categories.


-Best summarized and presented in a contingency table which is made up of several rows
and columns-the rows representing in categories of one variable, and the columns
representing the categories of the other variable.

Name:

Year & Section:

Exercise 1.7

1. Test the hypothesis whether the IQ level is independent of educational attainment among 150
female respondents. Use the data below at 0.05 level.

Secondary Tertiary Post Graduate


Low IQ 58 25 12
Average IQ 46 48 26
High IQ 52 41 66

2. A study was conducted to determine the opinion of the college students regarding the increase
of tuition fees. The data below show the result of the survey. Use 5% level of significant.

Agree Disagree
Freshmen 35 68
Sophomore 40 75
Juniors 28 80
Seniors 24 86
SIMPLE CORRELATION ANALYSIS

Analysis-concerned with the relationship in the changes of such variables.

- Use to measure the degree of linear relationship or association between two variables.

Assumptions:

1. Linear relationship is present.


2. The variables are either interval or ratio.

3 DEGREES OF CORRELATIONAL/ RELATIONSHIP BET. 2 VARIABLES

1. Perfect Correlation (+ & -) = using the scatter diagram, points lie in a straight line. E.g.
As the pressure ↑, temperature also ↑ wherein the volume is constant.

2. Some Degree of Correlation (+ & - ) = as One of variable ( the independent variable ) ↑, the
other variable may rise or fall although not in straight- line fashion E.g. weight vs. height;
average grade vs. IQ; GNP vs aggregate investment.
3. No Correlation = points in the scatter diagram show no trend or direction at all.

Techniques/ Methods of Correlation

1. Pearson r ( Pearson Product Moment of Correlation)

Where: x=observed data for independent variable


y= observed data for dependent variable
n=size of the sample
r= degree of relationship between x & y

2. Spearman Rank/ spearman Rho

6 ∑ 𝐷2
Rho= rs= 1 − 𝑛(𝑛2 −1) where: D= difference of x & y
n= number of samples in ordered pairs
rho= degree of relationship between x & y

Interpretation: the degree of linear relationship can be interpreted through the use of range of
values.

1 Perfect Positive (negative) Correlation


0.99 to 0.90(-0.99 to -0.90) Very high Positive (negative) Correlation
0.89 to 0.70(-0.89 to -0.70) High Positive (negative) Correlation
0.69 to 0.50(-0.69 to -0.50) Moderate Positive (negative) Correlation
0.49 to 0.30(-0.49 to -0.30) Low Positive (negative) Correlation
0.29 to 0.01(-0.29 to -0.01) Little, if any Correlation
0 No Correlation

Testing of significant of the correlation coefficient:


1. Ho: r=0 ( the population r value is zero or the correlation is zero)
Ha: r≠0 ( the correlation is not equal to zero)
2. 𝛼 =
3. df = n-2, TV =
4. Computed value
𝑛−2
𝑡 = 𝑟√
1 − 𝑟2

5. Conclusion: there exist ___________ relationship between two variables


Example:
1. A study was made to determine he relationship existing between the grade in basic
statistics and grade in Computer Literacy. A random sample of 10 computer students
of ISAP were taken and the following is the obtained relationship significant at 1%
level? Use pearson r.

Student no. 1 2 3 4 5 6 7 8 9 10
Basic Statistics 70 80 85 75 80 80 91 85 92 85
Computer Literacy 75 85 76 79 90 80 89 89 90 88

2. The following are the score on the NSAT examinations and achievement grades of
15 students of a certain college. Determine the degree of relationship existing
between the two variables and the significant of the obtain r using the 0.05 level.
Use spearman rho.

Student no. NSAT(x) Achievement Grade(y)


1 60 85
2 95 91
3 80 92
4 75 80
5 80 90
6 72 80
7 85 90
8 72 88
9 86 92
10 95 93
11 88 93
12 85 80
13 85 87
14 85 85
15 90 90
Name: Score:
Year & Section: Data:

Exercise 1.8

1. Determine if there is a relationship existing between the income(x) and the expenditure
(y) of a random sample of 10 families I particular urban area. Test the significance of
the r value using 0.01 given the data below. Use Spearman Rank.

Family no. Monthly Income(x) Monthly Expenditure(y)


1 5600 4700
2 3875 3680
3 7200 6500
4 4278 3500
5 2695 2588
6 5678 5668
7 5325 4256
8 3054 3054
9 4487 4074
10 6952 6638

2. Find the relationship between the length and weight of 12 newborn babies in a certain
private hospital. Using the data below, test if the relationship is significant at 0.025
level.

Babies Length in cm. Weight in kg.


1 40.5 2.75
2 36.4 2.16
3 44.2 4.41
4 49.7 5.52
5 39.6 3.24
6 38.8 4.30
7 41.6 2.35
8 42.9 4.52
9 38.7 3.85
10 45.8 4.75
11 36.4 3.15
12 32.4 2.83

3. If a Pearson r value of -0.74 was computed on the data between the width of the road
and the number of accidents, what interpretation could be deduced? If the size of the
sample considered in this study is 30, can we say that there exists a real correlation
between the two variables at 0.01 level?

4. Which is more significant, a value of r = 0.84 from a sample of 15, or a value of r=0.36
from a sample of 80?

REGRESSION ANALYSIS
-concerned with the problem of estimation, forecasting, prediction
- literally predict the value of one variable by going back to (or regressing to) the values of
another related variable.
-it is possible to estimate the value of the dependent variable corresponding to a given value of
the independent variable
-ex. Weight of persons corresponding to specific heights, job performance of an applicant using
information available at the time of his application, academic performance in school with the
knowledge of the scores in the intelligent test.

2 way to solve
1. By graphing
2. By regression formula

Trend line- the line represents the series of points that were plotted in such a way that the line
approximates the general direction of the points and passes through the points.

1st method: Graphing

The following are conditions to fulfill the correct trend line


1. It approximates the general direction of the points
2. It passes through the points
3. The sum of the vertical distances(from the points to the trend line) of the points above
the line is approximately equal to the sum of the distances of the vertical points below
the line

2nd method: Use the equation of the Least Square Regression Line or LSRL
( equation: Y= a +bX)
 The method using the LSRL is reduced to finding the equation of the trend line which in
turn is found by solving for a and b in the equation
 Least Square: means that the most accurate trend line that may be drawn is one where
the sum of the squares of the vertical distances of the points from the line is least or
minimum

FORMULAS:
(∑ 𝑌)(∑ 𝑋 2 )−(∑ 𝑋) (∑ 𝑋𝑌)
𝑎= 2 or a= Y-bX
𝑛(∑ 𝑋 2 ) −(∑ 𝑋)

𝑛(∑ 𝑋𝑌)−(∑ 𝑋) (∑ 𝑌)
𝑎= 2
𝑛(∑ 𝑋 2 ) −(∑ 𝑋)

Where:
∑Y = sum of the values of Y, the dependent variable
N = the number of pairs of X and Y
∑X = sum of the values of X,
∑XY = the sum of the column XY
∑X2 = the sum of the column X2
Example:

X Y XY
1 1 1 1
3 2 6 9
4 4 16 16
6 4 24 36
8 5 43 64
9 7 63 81
11 8 88 121
14 9 126 196
∑X = 56 ∑Y = 40 ∑XY = 364 ∑X2 = 524

∑Y = 40
N=8
∑X = 56
∑XY = 364
∑X2= 524

(40)(524)−(56)(364) 8(364)−(56)(40)
𝑎= 8(524)−(56)2
𝑏= 8(524)−(56)2

= 0.545 ≈ 0.54 = 0.636 ≈ 0.64

LSRL: Y = a + bX
Y = 0.54 + 0.64 X (X = 16, Y = ? )
Y = 0.54 + 0.64 (16)
Y = 0.54 + 10.24
Y = 10.78
Y ≈ 10.8
Name:
Year & Section:

Exercise 1.8

1. A researcher wants to know if there is relationship between hours spent in studying a particular
subject at home and the achievement of the students I that subject. If a significant relationship
can be established, what prediction equation can be used to estimate achievement in the
subject knowing in the number of hours spent in studying the subject at home? Let a = 0.05. The
following are the results of the observation.

Student No. Hours spent (x) Achievement Grade (y)


1 2.5 89
2 5.75 88
3 1.5 82
4 1.0 77
5 3.0 90
6 2.5 91
7 1.25 80
8 3.5 93
9 1.5 81
10 2.0 86
What would be the predicted achievement grade of a student who spent 3.25 hours studying the
subjects?

2. Given the following data:


X 2 4 6 8 10 12
y 11 9 8 5 4 3

a. Estimate the value of y if x is 7, 10, 1, 2, 6 and 7


3. The result of the abstract reasoning score and the verbal ability score among 15 students are as
follows:
Student No. Abstract Reasoning Score (x) Verbal Ability Score (y)
1 35 41
2 40 57
3 45 60
4 53 68
5 38 53
6 47 59
7 60 72
8 56 54
9 65 75
10 56 50
11 38 67
12 59 43
13 44 58
14 49 87
15 70 82
a. Find the equation of the regression line.
b. If the abstract reasoning score is 46, what would be the predicted score on verbal ability?

4. The data below represented the mid-term and final grades of 10 students during a particular
semester:

Student No. Mid-term (x) Final (y)


1 83 86
2 87 91
3 79 80
4 77 80
5 85 88
6 78 82
7 94 94
8 87 85
9 84 86
10 81 85
a. Find the equation of the regression line.
b. Estimate the final grade of a student whose midterm grade is 92.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy