0% found this document useful (0 votes)
13 views12 pages

(LBOLYTC) Notes

Uploaded by

mastersiops
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views12 pages

(LBOLYTC) Notes

Uploaded by

mastersiops
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

MODULE 1: INTRODUCTION TO STATISTICS

STATISTICS QUALITATIVE (CATEGORICAL) VARIABLES


➢ a science that deals with collecting, ➢ variables that indicate what kind of a
organizing, presenting, analyzing, and given characteristic an individual, object,
interpreting data. or event possesses.

● PURPOSE OF STATISTICS QUANTITATIVE (NUMERICAL) VARIABLES


➔ provide information ➢ variables that indicate how much a
➔ provide comparisons given characteristic an individual, object,
➔ help discern relationships or event possesses.
➔ aid in decision making
➔ estimate unknown quantities ● TYPES OF QUANTITATIVE VARIABLE
➔ justify claims or assertions ○ Discrete Variables - values are
➔ predict future outcomes obtained through counting
○ Continuous Variables - values
BRANCHES OF STATISTICS are obtained through measuring
1. Descriptive Statistics ○ Dependent Variable - variables
➢ consists of methods concerned which are affected by another
with the collection, organization, variable (IQ, test scores, etc.)
summarization and presentation ○ Independent Variable - a
of a set of data. variable that affects the
2. Inferential Statistics dependent variable (number of
➢ comprised of those methods hours spent studying, etc.)
concerned with making
predictions or inferences about SCALES OF MEASUREMENT OF VARIABLES
an entire population based on 1. Nominal (Categorical Scale)
information provided by the ➢ values are simply labels or
sample names or categories without any
explicit or implicit ordering of the
TERMS TO REMEMBER labels
● Population - consists of the totality of all ➢ lowest level of measurement
the elements or entities from which you 2. Ordinal
want to obtain an information ➢ values are simply labels or
● Sample - a subset of the population names or categories with an
● Census - the process of collecting implied ordering in these labels
information from the population ➢ ranking can be done on the data
● Survey - the process of collecting ➢ distance between two labels can
information from the sample not be determined
● Parameter - a summary or numerical 3. Interval
measure used to describe a population ➢ values can be ordered and the
● Statistic - a summary or numerical distance between any two labels
measure used to describe a sample is of known size
● Constant - characteristic or property of a ➢ always numeric and have no
population or sample which makes the true zero point
members similar to each other 4. Ratio
● Variable - any characteristic or ➢ values have all the properties of
information measurable or observable the interval scale and the ratio
on every element of the population or of two values is meaningful
sample ➢ has a true zero point
➢ highest level of measurement
MODULE 2: DATA PRESENTATION

DATA PRESENTATION ELEMENTS OF THE FDT


➢ numerical quantities focus on expected 1. Class Size/Class Width
values, graphical summaries on ➢ difference between the upper or
unexpected values lower class limits of consecutive
➢ Textual, Tabular, Graphical classes. All classes should have
the same class width.
TEXTUAL 2. Lower Class Limit
➢ data are presented in paragraph form ➢ least value that can belong to a
➢ involves enumeration of important class
characteristics, giving emphasis on 3. Upper Class Limit
significant figures, and identifying the ➢ greatest value that can belong
important features of the data to a class
4. Class Boundaries (CB)
➢ numbers that separate classes
without forming gaps between
them
5. Class Mark/Midpoint (CM)
➢ middle value of each data class
➢ average the upper and lower
class limits
6. Relative Frequency (RF)
➢ obtained by dividing the
frequency of the given class by
the total number of observations
7. Cumulative Frequency (CF)
➢ number of data elements in that
TABULAR - present data using tables class and all previous classes
(may be ascending or
FREQUENCY DISTRIBUTION TABLE (FDT) descending)
➢ tabular summary of data showing the 8. Less than CF (<CF)
frequency of items in each of several ➢ total number of observations
non-overlapping classes within a class whose values do
not exceed the upper limit
9. Greater than CF (>CF)
➢ total number of observations
within a class whose values are
not less than the lower limit

GRAPHICAL - present data using graphs

STEPS IN CONSTRUCTING FDT


1. Determine the range (R)
- difference b/w the highest value
(HV) and lowest value (LV)
2. Number of classes (k)
3. Compute for the class size (c)
4. Identify the class intervals, (CI)
5. Identify the frequency in each CI or
tallying
MODULE 6: TECHNIQUES AND TOOLS IN PREDICTIVE ANALYTICS

● TYPES OF GRAPHS
1. Pie chart/circle graph - any data
2. Bar Graph
➢ Bar chart
➢ Histogram
3. Line Graph
➢ Frequency Polygon
- > Ogives, < Ogives

BAR CHART
➢ with gaps between bars = discrete
➢ y-axis: Class Interval (CI)
➢ x-axis: Frequency (f)

OGIVES
< Ogive
➢ y-axis: Upper Class Boundary (UCB)
➢ x-axis: < CF

> Ogive
➢ y-axis: Lower Class Boundary (LCB)
➢ x-axis: > CF

HISTOGRAM
➢ no gaps between bars = continuous
➢ y-axis: Class Boundary (CB)
➢ x-axis: Frequency (f)

FREQUENCY POLYGON
➢ continuous data
➢ y-axis: Class Mark (CM)
➢ x-axis: Frequency (f)
MODULE 3: DESCRIPTIVE MEASURES

I. Measures of Central Tendency II. Measures of Variability/Dispersion


➢ describes the “center” of a given ➢ describes the extent to which
data set the data are dispersed
➢ single value about which the ➢ how similar a set of scores are
observation tends to cluster to each other

1. Mean (x̅) ● More similar = lower dispersion


➢ average of all observations ● Less similar = higher dispersion
➢ 𝑥̅ =
Σ𝑥 ● More spread out a distribution is, larger
𝑛
the measure of dispersion will be
2. Median (Md)
➢ middle value of an array
1. Range (R)
➢ when data has extreme values,
➢ R = HV – LV
median is the preferred
➢ rarely used in scientific work
measure of central location
because it is fairly insensitive
➢ most often reported for annual
2. Variance (s2 or σ2)
income and property value data
➢ mean squared differences of the
➢ a few extremely large incomes
observation from their mean
or property values can inflate
➢ difference is called a deviate or
the mean
𝑛+1
a deviation score
➢ 𝑀𝑑 = 𝑋 2
, if n is odd ➢ deviate = how far a given score
𝑛
𝑋 +𝑋 +1
𝑛
is from the typical, average,
➢ 𝑀𝑑 = 2 2
, if n is even
2 score
3. Mode (Mo) ➢ deviate is a measure of a
➢ most frequent observation dispersion for a given score
➢ two modes, data are bimodal 2 Σ(𝑥−𝑥)
2

➢ 𝑠 = , for samples
➢ more than two modes, data are 𝑛−1
2
multimodal ➢ σ =
2 Σ(𝑥−𝑥)
, for population
𝑁
2
➢ 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
Mean Median Mode
3. Standard Deviation (s or σ)
Exists ✓ ✓ - ➢ positive square root of the
variance
Unique ✓ ✓ - ➢ Standard deviation is the square
root of variance
Sensitive ✓ x x
➢ 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
4. Coefficient of Variation (CV)
➢ ratio of the standard deviation to
its mean expressed in percent
➢ compare variability of two
populations that are expressed
in different units of
measurement
➢ expressed as a percentage
rather than in terms of the units
of the particular data
𝑠
➢ 𝐶𝑉 = 𝑥̅
× 100
MODULE 6: TECHNIQUES AND TOOLS IN PREDICTIVE ANALYTICS

III. Measures of Position IV. Measures of Skewness


➢ measures that discriminate a ➢ measure of symmetry in the
group of scores from another distribution of scores
group in the same data set ➢ 𝑆𝑘 =
3(𝑚𝑒𝑎𝑛−𝑚𝑒𝑑𝑖𝑎𝑛)
𝑠
Quantiles - divides data to equal parts
1. Quartile - four equal parts (Q1, …, Q4)
2. Decile - ten equal parts (D1, D2, …, D10)
3. Percentile - 100 equal parts (P1…, P100)

Steps on computing for the Percentile


1. Arrange the data in ascending order
2. Compute index ί, the position of the pth
𝑝
percentile [𝑖 = ( 100 )𝑛]
3. If ί is not an integer, round up. The pth
percentile is the value in the ίth position
4. If ί is an integer, the pth percentile is the
average of the values in positions ί and ● Skewness
ί+1. ➢ Sk < 0, negative skew, to the left
➢ Sk > 0, positive skew, to the right
Ungrouped Measures of Position ➢ Sk = 0, symmetrical
𝑘(𝑛+1)
➢ 𝑃𝑘 = 100
, locate the position
○ if not exact (w/ decimals), do V. Measures of Kurtosis
interpolation ➢ measures whether the scores
are spread out more or less
Interpolation than they would be in a normal
1. Get two consecutive terms, where the (Gaussian) distribution
4
quantile is in between the two terms ➢ 𝐾=
Σ(𝑥−𝑥)
, note that s4 = (s2)2
4
2. Subtract the two terms 𝑛𝑠

3. The difference between the two terms


will be multiplied by the decimal from the
quantile
4. The product will be added to the lesser
number from the two terms

● Kurtosis
➢ K = 3, mesokurtic
➢ K > 3, leptokurtic
○ can be seen if there are
more data
➢ K < 3, platykurtic
MODULE 4: SAMPLING TECHNIQUES

Population available or includes


➢ a set which includes all measurements only people who are
of interest to the researcher (the easy to reach
collection of all responses, ➢ Ex. using student
measurements, or counts that are of volunteers as subjects
interest) for the research

Sample ○ Purposive Sampling


➢ a subset of the population ➢ researcher looks for
predefined groups that
Why sampling? will serve as samples
➔ impossible to study the whole population ➢ Ex. researcher wants to
➔ ↓ Manageability of data know what it takes to
➔ ↓ Economic Reasons graduate summa cum
➔ ↓ Time and effort laude in college → first
hand advice = summa
TYPES OF SAMPLING cum laude graduates
● Probability Sampling
➢ each member of the population PROBABILITY SAMPLING
is given equal change or ● Simple Random Sampling (SRS)
opportunity of being included in ➢ all members of the population
the sample have a chance of being included
in the sample
1. complete sampling frame ➢ fish bowl method
2. select random sample from ● Stratified Sampling
population ➢ population can be subdivided
3. generalize results from a into several smaller groups (or
random sample strata) and then SRS is applied
4. can be more expensive and to get samples from each
time consuming stratum
● Cluster Sampling
● Non-probability Sampling ➢ employs the use of cluster
➢ each member of the population (groups) instead of individuals
does not have equal chance or that are randomly chosen
opportunity of being included in ● Systematic Sampling
the sample ➢ selects every kth member of the
population with the starting point
1. used when there isn’t an determine at random
exhaustive population list ● Multi Stage Sampling
available
2. not random DETERMINING SAMPLE SIZE
3. can be effective when trying to ● Slovin’s formula
generate ideas and getting ○ 𝑛=
𝑁
2
1+𝑁𝑒
feedback
4. more convenient and less costly

○ Convenience Sampling
➢ researcher uses
subjects that are readily
MODULE 5: HYPOTHESIS TESTING

What is a hypothesis?
➢ an assumption about the population
parameter
➢ an educated guess about the population
parameter

Statistical Hypotheses
➢ guess or prediction made by the
researcher regarding the possible ● One-tailed right directional test
outcome of the study ○ used if Ha uses > symbol

Types of Statistical Hypotheses


1. Null hypothesis (Ho)
● always hoped to be rejected
● always contains “=” sign
2. Alternative hypothesis (Ha)
● challenges Ho
● never contains “=” sign
● uses “< or > or ≠”
● Two-tailed test: Non-directional
● generally represents the idea
○ used if Ha uses ≠ symbol
which the researcher wants to
prove (researcher’s hypothesis)

Hypothesis Testing
➢ process of making an inference or
generalization on population parameters
based on the results of the study on
samples
➢ deciding between what is reality and
coincidence CRITERION:
● One-tailed test (right directional)
Steps in Hypothesis Testing ○ “Reject Ho if Zc ≥ Zt
1. Formulate Ho and Ha ● One-tailed test (left directional)
2. Set the level of significance α, usually it ○ “Reject Ho if Zc ≤ Zt”
is given in the problem ● Two-tailed test (both sides)
3. Formulate the decision rule (when to ○ “Reject Ho if Zc ≥ Zt” and
reject Ho); Find the critical value/P-value ○ “Reject Ho if Zc ≤ Zt”
4. Test Statistics; do the computation
5. Make your decision Testing the hypothesized value of the mean
6. Write a conclusion ● 𝑍𝑐 =
(𝑥−µ) 𝑛
, large sample size (n ≥ 30)
σ
(𝑥−µ) 𝑛
Types of Hypotheses Tests ● 𝑡𝑐 = 𝑠
, small sample size (n < 30)
● One-tailed left directional test
○ used if Ha uses < symbol Testing the difference between two means
𝑥1−𝑥2
● 𝑍𝑐 = 2 2
σ1 σ2
𝑛1
+ 𝑛2
𝑥1−𝑥2
● 𝑡𝑐 =
2 2
(𝑛1−1)𝑠 +(𝑛2−1)𝑠
1 2 1 1
𝑛1+𝑛2−2
(𝑛 +𝑛 )
1 2

Decisions made regarding Ho


(Reject Ho/Do not reject Ho)
● If we reject Ho, it means it is wrong
● If we accept Ho, it doesn't mean it is
correct, we just don’t have enough
evidence to reject it

Errors in Hypothesis Testing


➔ Type I (α error)
- rejecting a true Ho
➔ Type II (β error)
- accepting a false Ho
MODULE 6: TECHNIQUES AND TOOLS IN PREDICTIVE ANALYTICS

Pearson Product Moment Coefficient of ➔ if there is a relationship between x and


Correlation/Pearson R (r) y, then we can determine the extent by
➢ an index of relationship between two which x influences y using the coefficient
variables of determination which is equal to the
➢ powerful test of relationship square of r and multiplied by 100%
➢ x = independent, y = dependent (wanted ➔ answer or explain how much the
to predict) independent variable influences the
➢ the value of r ranges from -1, 0, 1 dependent variable or how much y
○ r < - 0.5 and 0.5 (weak) depends on x
○ r = - 0.5 and 0.5 (moderate) ◆ degree of relationship between
○ r > - 0.5 and 0.5 (strong) x and y which cannot be seen in
○ r = -1 and 1 (perfect correlation) other statistical tests of
○ r = 0 (x and y are independent) relationships
➔ more powerful test of relationship
compared with other nonparametric
tests
➔ compute positive, negative, strong,
moderate, and weak

When do we use r, the Pearson Product


Moment Coefficient of Correlation?
➔ determine the index relationship
between two variables, the independent
and the dependent variables
➔ if there is relationship between the
independent variable and the dependent
variables, it can be said that x influences
y or y depends on x
➔ if there is no relationship that exists
between x and y, then x and y are
independent of each other

Coefficient of determination
➢ r2
➢ how much x can influence the y
➢ how much y depends on x

Which is better? Positive correlation or


negative correlation?
➔ There is no problem between the two
since it tells that the relationship
between two variables are
inverse/opposite to each other

Why do we use r?
➔ to analyze if a relationship exists
between two variables
Sample Problem:
V. Decision Rule
Below are the midterm (x) and final (y) grades: ➢ if the computed value r value is
x 75 70 65 90 85 85 80 70 65 90 greater than the tabular value,
disconfirm (reject) Ho.
y 80 75 65 95 90 85 90 75 70 90

VI. Conclusion/Implication
Solving by Stepwise Method: ➢ r > 0.632 (tabular value at 0.05
I. Problem level of significance with 8
➢ Is there a significant relationship degrees of freedom)
between the midterm and the ➢ Null hypothesis (Ho) is
final examination of 10 students disconfirmed
in Mathematics? ➢ There is a significant
relationship between the
II. Hypotheses midterm and the final
➢ Ho: there is no significant examination of 10 students in
relationship between the Mathematics
midterm and the final
examinations of 10 students in Simple Linear Regression Analysis
Mathematics ➢ predicts the value of y given the value of
➢ Ha: there is a significant x
relationship between the
midterm and the final When to use simple linear regression?
examination of 10 students in ➔ when there is a relationship between x
Mathematics and y variables
➔ the data should be normally distributed
III. Level of significance using the level of measurement which is
➢ n = 10 expressed in an interval or ratio data
➢ a = 0.05
➢ df = n - 2; 10 - 2 = 8 Why do we use simple linear regression?
➢ r.05 = 0.632 ➔ we are interested in predicting the value
of y, the dependent variable
IV. Statistics ➔ used for forecasting and prediction
➢ use r formula
Formulas:
● 𝑦 = 𝑏𝑥 + 𝑎
𝑛Σ𝑥𝑦−Σ𝑥Σ𝑦
● 𝑏= 2 2
𝑛Σ𝑥 −(Σ𝑥)

● 𝑎 = 𝑦 − 𝑏𝑥

Where:
● y = dependent variable
● x = independent variable
● a = y-intercept
● b = slope of the line
➢ 𝑥̅ = 77.50
➢ 𝑦̅ = 81.50
➢ r = 0.949 or 0.95
MODULE 7: ANALYSIS OF VARIANCE (ANOVA)

F-test ANOVA TABLE


➢ a parametric test used to compare the
means of two or more groups of
independent samples
➢ also known as the Analysis of Variance
(ANOVA)

Three kinds of analysis of variance:


1. One-way analysis of variance - only one
(1) variable involved ➢ Mean Squares Between (MSB) is
2. Two-way analysis of variance - two (2) equal to BSS/df
variables involved, the column and the ➢ Mean Squares Within (MSW) is equal
row variables; used to know if there are to WSS/df
significant differences between and ➢ to get the F-computed value, divide
among columns and rows MSB/MSW
3. Three-way analysis of variance - three ➢ F-computed value must be compared
(3) variables involved with the F-tabular value at a given level
of significance with the corresponding
Why do we use the F-test? dfs of BSS and WSS
➔ to find if there is a significant difference
between and among the means of the Note:
two or more independent groups ● if F-computed value > F-tabular value
○ Disconfirm null hypothesis in
When to use the F-test? favor of the research hypothesis
➔ if there is normal distribution and when ○ This means there is a significant
the level of measurement is expressed difference between and among
in interval or ratio data (like t-test and the means of the different
z-test) groups

How to use the F-test? Sample Problem (One-way ANOVA):


➔ to get the F computed value, use
formula: A sari-sari store is selling 4 brands of shampoo.
2
(𝐺𝑇)
The owner is interested if there is a significant
◆ 𝐶𝐹 = 𝑁 difference in the average sales of the four
Where: CF = correction factor brands of shampoo for one week. The following
GT = grand total data are recorded:
N = population
➔ compute the following to construct the
ANOVA table
1. TSS - total sum of squares
minus CF
2. BSS - between sum of squares
minus the CF
3. WSS - within sum of squares or
it is the difference between the Perform the analysis of variance and test the
TSS minus the BSS hypothesis at 0.5 level of significance that the
average sales of the 4 brands of shampoo are
equal.
Solving by the Stepwise Method V. Conclusion
I. Problem ➢ Since F-computed > F-tabular
➢ Is there a significant difference value (7.98 > 3.01) at 0.05 level
in the average sales of the four of significance and 24 degrees
brands of shampoo? of freedom, the null hypothesis
is disconfirmed
II. Hypotheses ○ This means that there is
➢ Ho: There is no significant significant difference in
difference in the average sales the average sales of the
of the four brands of shampoo four brands of shampoo
➢ Ha: There is a significant
difference in the average sales
of the four brands of shampoo

III. ANOVA Table

IV. Decision Rule


➢ If the F-computed value is
greater than the F-tabular value,
disconfirm Ho

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy