0% found this document useful (0 votes)
291 views12 pages

Lbolytc Finals Notes XXXX - Compress

The document provides an overview of introductory statistics concepts including: - Descriptive statistics which summarize data and inferential statistics which make predictions based on samples. Key terms defined are population, sample, parameter, statistic, and variables. - Frequency distribution tables are introduced as a way to organize data into classes and tally frequencies. Steps for constructing these tables are outlined. - Graphical representations of frequency distributions include histograms and frequency polygons. Additional descriptive statistics measures introduced are mean, median and common notation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
291 views12 pages

Lbolytc Finals Notes XXXX - Compress

The document provides an overview of introductory statistics concepts including: - Descriptive statistics which summarize data and inferential statistics which make predictions based on samples. Key terms defined are population, sample, parameter, statistic, and variables. - Frequency distribution tables are introduced as a way to organize data into classes and tally frequencies. Steps for constructing these tables are outlined. - Graphical representations of frequency distributions include histograms and frequency polygons. Additional descriptive statistics measures introduced are mean, median and common notation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Lbolytc Finals Notes - xxxx

Business Analytics (BANA105)

ALL LECTURE NOTES


LBOLYTC (Introduction to Analytics)
Sec. K35 | Prof. Wilson Cordova | De La Salle University TERM 1 AY 2022-2023

INTRODUCTION TO STATISTICS or sample which makes members


Definition of Terms similar to one another.
● Statistics - A branch of science that ● Variables - any characteristic or
deals with the collection, organization, information measurable or observable
presentation, analysis and interpretation on every element of the population or
of data. sample.
○ Analysis - extract information ○ Dependent variable - which is
○ Interpretation - based on the affected by the independent
analyzed information variable.
● Descriptive Statistics - methods ○ Independent variable - which
concerned with collection, organization, affects the dependent variable.
summarization, and presentation of a ○ Qualitative (Categorical)
set of data. Statistics - indicates what kind
● Inferential Statistics - methods of a given characteristic an
concerned with making individual, object, or event
predictions/inferences about a possesses.
population based on information ○ Quantitative (Numerical)
provided by a sample. Statistics - indicates how much
○ Population - whole a given characteristic an
○ Sample - portion of a whole individual, object, or event
● Census - collecting information from a possesses.
population.
● Survey - collecting information from a Types of Quantitative Variables
sample. ● Discrete Variables - whose values are
● Parameter - a summary or numerical obtained through counting.
descriptive measure to describe a ● Continuous Variables - whose values
population. are obtained through measuring.
● Statistic - a summary or numerical
descriptive measure to describe a Scales of Measurement of Variables
sample. 1. Nominal - names, labels, categories
● Constant - fixed numbers; a that do not to be in order; lowest level of
characteristic or property of a population measurement

1 0
ALL LECTURE NOTES
LBOLYTC (Introduction to Analytics)
Sec. K35 | Prof. Wilson Cordova | De La Salle University TERM 1 AY 2022-2023

2. Ordinal - names, labels, categories that ○ R - difference between the


have an order; ranking can be done on highest and lowest value (HV-LV)
the data; distance between two labels ● Decide on the number of classes,
cannot be determined. denoted by k
3. Interval - values can be ordered and ○ k - number of non-overlapping
distance between any two labels are of intervals (usually given)
known size; always numeric, no true ● Compute for the class size, denoted by
zero point. C
4. Ratio - values have all the properties of ○ quotient of R and k (R/k)
the interval scale and the ratio of two ● Identify class intervals, denoted by CI
values is meaningful; has a true zero ● Identify the frequency in each CI
point; highest level of measurement. (Tallying)

Nominal, Ordinal - Categorical Definition of Terms


Interval, Ratio - Numerical ● Class Size / Width - The difference
between the upper and lower class limits
Purpose of Statistics of consecutive classes. All classes
● To provide information should have the same class width.
● To provide comparisons ● Lower Class Limit - The least value
● To help discern relationships that can belong to a class.
● To aid in decision making ● Upper Class Limit - The greatest value
● To estimate unknown quantities that can belong to a class.
● To justify claims or assertions
● To predict future outcomes

FREQUENCY DISTRIBUTION TABLE


● A tabular summary of data showing the
frequency or number of items in each of
several non-overlapping classes.
● Class Boundaries (CB) - The numbers
Steps in constructing the FDT that separate classes without forming
● Determine the range, denoted by R gaps between them.

1 0
ALL LECTURE NOTES
LBOLYTC (Introduction to Analytics)
Sec. K35 | Prof. Wilson Cordova | De La Salle University TERM 1 AY 2022-2023

○ Subtract the Upper CL of the first ● Cumulative Frequency - The number


class from the Lower CL of the of data elements in that class and all
second class. Then, divide by 2. previous classes.
● Class Mark / Midpoint (CM) - The
middle value of each data class.
○ Add the upper and lower class
interval then divide by 2.
● Relative Frequency (RF) - dividing the
frequency of the given class by the total
number of observations.
Graphical Representation of the FDT
○ ex. 5 is f – divide by 30 (n: total
● Frequency Histogram (HA: Class
number of observations)
Boundary; VA: Frequency)

● Less than CF (<CF) - Total number of


observations within a class whose
values do not exceed the upper class ● Frequency Polygon (HA: Class Mark;
limit VA: Frequency)
○ Add frequency from top to
bottom
● Greater than CF (>CF) - Total number
of observations within a class whose
values are not less than the lower class
limit
○ Add frequency from bottom to
top

1 0
ALL LECTURE NOTES
LBOLYTC (Introduction to Analytics)
Sec. K35 | Prof. Wilson Cordova | De La Salle University TERM 1 AY 2022-2023

● < Ogive (HA: Upper CB; VA: <cf) NUMERICAL DESCRIPTIVE MEASURES
Measures of Central Tendency
● Describes the “center” of a given data
set. It is a single value about which the
observation tends to cluster.
● Arithmetic Mean (or Mean)
○ Sum of all observations divided
by the total number of
observations, denoted by x
○ Properties : It always exists,
● > Ogive (HA: Lower CB; VA: >cf)
unique, takes everything into
account – easily affected by
other values
● Median - the middle value of an array,
denoted by Md. (x)
○ Properties : Not easily affected
by other values, always exists
and is unique.
● Mode - the observation/s that occur
● Ogives (combined < and > ogive) most frequently in the given set of data,
denoted by Mo.
○ Properties : no calculations
required, may not exist, may not
be unique.

Considerations for choosing a Measure of


Central Tendency
● Nominal Variable - use mode only
● Ordinal Variable - mode and median
MAY be used, median provides more
information
● Interval-Ratio Variable - mode, median
and mode may be used. Mean provides

1 0
ALL LECTURE NOTES
LBOLYTC (Introduction to Analytics)
Sec. K35 | Prof. Wilson Cordova | De La Salle University TERM 1 AY 2022-2023

most information (distribution); Median is ● Describes the extent to which the data
preferred if distribution is skewed. are dispersed
● Variability is descriptive statistics that
Measures of Position describe how similar a set of scores are
● Measures that discriminate a group of to one another
scores from another group in the same
data set
● Quantile - Divides data into an equal
number of parts
● Quartile - values that divide a set of
data into four equal parts, denoted by Q
● Range - difference between the highest
● Decile - values that divide a set of data
and lowest value in the data set (R = HV
into ten equal parts, denoted by D
- LV)
● Percentile - values that divide a set of
○ Rarely used because of its
data into one hundred equal parts,
sensitivity
denoted by P
● Variance (s2 or σ2) - the mean squared
differences of the observations from
*Ungrouped Measures of Position
their mean
To locate desired quantile:
○ Difference - deviate or deviation
● Pk = k (n+1) / 100 → position
score
● If Pk = k (n+1) / 100 is not exact, use
○ Deviate tells a user how far a
interpolation
given score is from the typical, or
○ Interpolation computed a number
average, score; a measure of
between 2 unidentified numbers
dispersion for a given score
but is not necessarily in the
middle
○ Subtract the 2 values based on
Pk formula → multiply the
decimal → add the lower number
● Standard Deviation (s or σ) - positive
square root of the variance
Measures of Variability (or Measures of
Dispersion)

1 0
ALL LECTURE NOTES
LBOLYTC (Introduction to Analytics)
Sec. K35 | Prof. Wilson Cordova | De La Salle University TERM 1 AY 2022-2023

○ Using the same data from


Variance, use that then square ○ Mean + Median = related to the
root it to get SD direction skewness
● Coefficient of Variation (CV) - the ratio
of the SD to its mean expressed in
percent
○ Compared variability of 2
populations that are expressed in
different units of measurement
○ Expressed as a percentage
rather than in terms of the units
of the particular data

● Mean > Median - positive curve


● Mean < Media - negative curve

If Sk < 0, then the distribution has a negative


skew
Measures of Skewness If Sk > 0, then the distribution has a positive
● Skew is a measure of symmetry in the skew
distribution of scores If Sk = 0, then the distribution is symmetrical

Measures of Kurtosis
● Measures whether the scores are
spread out more or less than they would
be in a normal (Gaussian) distribution

● A frequency curve that is not


symmetrical about the mean is said to
be skewed.
○ Tails off to the right - Positive
○ Tails off to the left - Negative

1 0
ALL LECTURE NOTES
LBOLYTC (Introduction to Analytics)
Sec. K35 | Prof. Wilson Cordova | De La Salle University TERM 1 AY 2022-2023

● Can generalize results from the random


sample
● Can be more expensive and
time-consuming

Non-Probability Sampling
● Used when there isn’t an exhaustive
population list available
Note: ● Not random
● Mesokurtic if K = 3 ● Can be effective when trying to generate
ideas and getting feedback
● Leptokurtic if K > 3
● More convenient and less costly
● Platykurtic if K < 3
Types of Non-Probability Sampling
● Convenience sampling
SAMPLING TECHNIQUES
○ Uses subjects that are readily
available or includes people who
Population
are easy to reach
● A set which includes all measurements
● Purposive Sampling
of interest to the researcher
○ Looks for predefined groups that
serves as samples
Sample
● A subset of the population
Types of Probability Sampling
● Simple Random Sampling - ALL
Why do sampling?
members of the population have a
● Impossible to study the whole population
chance of being part of the sample.
● Manageability of data
● Stratified Sampling - Used when the
● Economic reasons
population can be subdivided into
● Time and effort
smaller groups (or strata) and then SRS
is applied to get samples from each
Types of Sampling
stratum
● Probability Sampling - each member
● Cluster Sampling - Employs the use of
of the population is given equal chance
cluster (groups) instead of individuals
to become part of the sample
that are randomly chosen
● Non-probability Sampling - each
● Systematic Sampling - Selects every
member of the population does not have
nth member of the population with the
equal chance to become part of the
starting point determined at random
sample
● Multi-Stage Sampling
Probability Sampling
Sample Size, denoted by n
● Complete Sampling Frame
● To get a meaningful result, let n be at
● Can select a random sample from the
least 100.
population

1 0
ALL LECTURE NOTES
LBOLYTC (Introduction to Analytics)
Sec. K35 | Prof. Wilson Cordova | De La Salle University TERM 1 AY 2022-2023

● Maximum sample size is 10% as long as ● Alternative Hypothesis (Ha)


it does not exceed 1,000 ○ Challenges the null hypothesis
○ Never contains “=”
Determining Sample Size ○ Uses <, >, or “not equal to”
● Census (n ≤ 100) ○ Represents the idea which the
● Sample size which is 10% of n researcher wants to prove
● Published tables
● Using Slovin’s Formula (n = N / 1 + Level of Significance, α and the rejection
Ne^2) region

HYPOTHESIS TESTING α = 0.05 – probability of being wrong is 5%;


Hypothesis probability of being right is 95%
● Assumption about the population
parameter α = 0.01 – probability of being wrong is 1%;
● An educated guess about the population probability of being right is 99%
parameter
Types of Hypotheses Tests
Hypotheses Testing 1. One-tailed left directional test (Used if
● Process of making an inference or Ha uses <)
generalization on population parameters
based on the results of the study on
samples
● Deciding between what is reality and
coincidence

Statistical Hypotheses
● A guess or prediction made by the
researcher regarding the possible
outcome of the study 2. One-tailed right directional test (Used
if Ha uses >)
Steps in Hypothesis Testing
1. Formulate Ho and Ha
2. Set the level of significance (α)
3. Formulate the decision rule; Find the
critical value or P-value
4. Test statistics and do the computation
5. Make your decision
6. Write a conclusion

Types of Statistical Hypothesis


● Null Hypothesis (Ho)
○ Always hoped to be rejected
○ Contains “=” sign

1 0
ALL LECTURE NOTES
LBOLYTC (Introduction to Analytics)
Sec. K35 | Prof. Wilson Cordova | De La Salle University TERM 1 AY 2022-2023

3. Two-tailed test: Non-directional (Used Testing the difference between 2 means


if Ha uses ≠)

Criterion
1. One-tailed left directional test Decisions made regarding Ho (Reject/Do not
“Reject H0 if Zc ≤ Zt” reject)
● If we reject Ho, it means it is wrong
2. One-tailed right directional test ● If we accept Ho, it doesn’t mean it’s
“Reject H0 if Zc ≥ Zt” correct, we don’t have enough evidence
to reject it
3. Two-tailed test: Non-directional
“Reject H0 if Zc ≥ Zt” and Errors in Hypothesis Testing
“Reject H0 if Zc ≤ Zt”
Ho Accept Reject
Testing the hypothesized: Value of the mean
True ✔ Type I Error

False Type II Error ✔

Type I Error - rejecting a true Ho (“Sayang”)


Type II Error - accepting a false Ho (“T*nga”)

PEARSON PRODUCT MOMENT OF


CORRELATION
For CLASSROOM DISCUSSION
● An index of relationship between two
● Z-test for large sample size variables
○ if n ≥ 30
● X = independent variable ; Y =
● T-test for small sample size dependent variable
○ If n < 30
● Value of r ranges from -1, 0, +1
○ Degrees of freedom (df) = v =
● r = +1 or -1 = perfect correlation ; r = 0 =
n-1
x and y are independent
● Scatter plot is going upward = r is
positive (x increases, y increases)
● Scatter plot is going downward = r is
negative (x increases, y decreases)

1 0
ALL LECTURE NOTES
LBOLYTC (Introduction to Analytics)
Sec. K35 | Prof. Wilson Cordova | De La Salle University TERM 1 AY 2022-2023

● Scatter plot is scattered - r = 0 (no Why do we use r?


correlation between x and y) ● We are interested in predicting the value
of y (dependent variable).
Why do we use r? ● Used for forecasting and prediction
● To analyze if a relationship exists
between two variables FORMULA
● IF there is a relationship, we can
determine how x influences y using the
coefficient of determination (r2 x
100%) – cannot be seen in other
statistical tests
○ Can explain how IND influences
DEP or how y depends on x
● A more powerful test of relationship ONE-WAY ANALYSIS OF VARIANCE
compared with other nonparametric ● F-test or the Analysis of Variance
tests (ANOVA) is a parametric test used to
compare the means of two or more
FORMULA groups of independent variables

Kinds of ANOVA
● One-way ANOVA - only 1 variable is
involved
● Two-way ANOVA - 2 variables involved;
column and row variables; used to know
if there is a significant difference
between and among columns and rows
● Three-way ANOVA - 3 variables
Use a table* that contains the ff: x, y, x2, y2, involved
xy
Why do we use ANOVA?
SIMPLE LINEAR REGRESSION ● To determine if there is a significant
● Predicts the value of y given the value of difference between and among the
x means of two or more independent
variables.
When to use?
● When there is relationship between x When to use?
and y ● If there is a normal distribution and when
● The data should be normally distributed the level of measurement is expressed
using the level of measurement in interval or ratio (numerical) data
expressed in interval or ratio (numerical)
data

FORMULA

1 0
ALL LECTURE NOTES
LBOLYTC (Introduction to Analytics)
Sec. K35 | Prof. Wilson Cordova | De La Salle University TERM 1 AY 2022-2023

Compute the ff to construct the ANOVA


table
1. TSS - total sum of squares minus CF
2. BSS - between sum of squares minus
CF
3. WSS - within sum of squares (difference
Two-way ANOVA
between TSS and BSS)

ANOVA TABLE Multiple Linear Regression


● F-computed value must be compared
● Used to predict the dependent variable y
with the F-tabular value at a given level
of significance with the corresponding given the independent variables xs.
dfs of BSS and WSS ● Determine the relationship between the
dependent variable and independent
variables

When do we use MLR?


● Predicting y dependent variable with 2
or more independent variables.
MSB = BSS / df ● Want to know if there is a relationship
MSW = WSS /df
between dependent and independent
Between groups df = K-1
Within group df = (N-1)-(K-1)
Total df = N-1 Why do we use MLR?
● Know the extent of influence that
Note:
● If Fc > Ft, then disconfirm null hypothesis independent have on dependent
in favor of the research hypothesis
● Means, there is a significant difference Coefficient of determination = r2 x 100%
between and among the means of the
Correlation = + or -
different groups

How to use the MLR?


Y = bo + b1x1 + b2x2+ … bnxn
Solving using the Stepwise Method

1 0
ALL LECTURE NOTES
LBOLYTC (Introduction to Analytics)
Sec. K35 | Prof. Wilson Cordova | De La Salle University TERM 1 AY 2022-2023

Y = dependent variable
X = independent variable
B = numerical constant

Three normal equations

1 0

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy