0% found this document useful (0 votes)
14 views7 pages

Matht Reviewer

1. There are various types of bias that can occur when collecting data from a sample, including non-response bias, response bias, and selection bias. 2. There are two main types of methods for collecting data - non-probability and probability sampling. Non-probability methods include convenience sampling and gathering volunteers, while probability methods involve simple random sampling, stratified random sampling, and cluster sampling. 3. There are two main types of studies - observational studies and experimental studies. Observational studies observe relationships without manipulation, while experimental studies involve random assignment of a treatment to draw causal conclusions.

Uploaded by

nininaricadecena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views7 pages

Matht Reviewer

1. There are various types of bias that can occur when collecting data from a sample, including non-response bias, response bias, and selection bias. 2. There are two main types of methods for collecting data - non-probability and probability sampling. Non-probability methods include convenience sampling and gathering volunteers, while probability methods involve simple random sampling, stratified random sampling, and cluster sampling. 3. There are two main types of studies - observational studies and experimental studies. Observational studies observe relationships without manipulation, while experimental studies involve random assignment of a treatment to draw causal conclusions.

Uploaded by

nininaricadecena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

REVIEWER NI JUWEL (MATH) 1.

NON-RESPONSE BIAS – large percentage of


those sampled do not respond or participate.
CU1: OBTAINING DATA 2. RESPONSE BIAS – participants either do not
respond truthfully or give answers they feel the
STATISTICS - art and science of answering questions
researcher wants to hear.
and exploring ideas through the processes of gathering
data, describing data, and generalizing about a 3. SELECTION BIAS – sample selected does not
population based on a smaller sample. reflect the population of interest. For instance, you are
interested in the attitude of female students regarding
POPULATION - any large collection of objects or
campus safety but when sampling you also include
individuals, such as Filipinos, students, or trees about
males. In this case, your population of interest was
which information is desired.
female students however your sample included
PARAMETER - any summary number, like an subjects not in that population (i.e. males).
average or percentage, that describes the entire
population. TYPES OF METHODS FOR COLLECTING
DATA (TYPES OF SAMPLING)
SAMPLE - representative group drawn from the
population. 1. NON-PROBABILITY METHODS

STATISTIC - any summary number, like an average (A) CONVENIENCE SAMPLING (HAPHAZARD)
or percentage, that describes the sample.  Collecting data from subjects who are
VARIABLE - any characteristic, number, or quantity conveniently obtained.
that can be measured, counted, or observed for record.  Example: surveying students as they pass by in
the university's student union building.
DESCRIPTIVE STATISTICS – techniques of
describing data in ways to capture the essence of the (B) GATHERING VOLUNTEERS
information in the data.
 Collecting data from subjects who volunteer to
INFERENTIAL STATISTICS – to draw conclusions provide data.
from data about the population.  Example: using an advertisement in a
magazine or on a website inviting people to
TYPES OF COLLECTING DATA complete a form or participate in a study.
1. PERSONAL INTERVIEW – People usually 2. PROBABILITY METHODS
respond when asked by a person
(A) SIMPLE RANDOM SAMPLE – making
2. TELEPHONE INTERVIEW – Cost-effective but selections from a population where each subject in the
need to keep it short since respondents tend to be population has an equal chance of being selected.
impatient.
(B) STRATIFIED RANDOM SAMPLE – where you
3. SELF-ADMINISTERED QUESTIONNAIRES – have first identified the population of interest, you then
Cost-effective but the response rate is lower, divide this population into strata or groups based on
respondents may be a biased sample. some characteristic (e.g. sex, geographic region), then
perform simple random sample from each strata.
4. DIRECT OBSERVATION – For certain quantities
of interest, one may be able to measure it from the (C) CLUSTER SAMPLE – where a random cluster
sample. of subjects is taken from the population of interest.
5. WEB-BASED SURVEY – Can only target the
population who uses the web.

TYPES OF STUDIES

1. OBSERVATIONAL – study where a researcher


TYPES OF BIAS
records or observes the observations or measurements
without manipulating any variables. These studies
show that there may be a relationship but not treatment can be referred to as the explanatory
necessarily a cause-and-effect relationship. variable and the result as the response variable.
2. If random selection is done where the subjects
2. EXPERIMENTAL – study that involves some
are randomly selected from some population,
random assignment of a treatment; researchers can
then the results can be extended to that
draw cause and effect (or causal) conclusions. An
population. The random assignment is required
experimental study may also be called a scientific
for an experiment. When both random
study or an experiment.
assignment and selection are part of the study
VARIABLES then we have a completely randomized
experiment.
VARIABLE – any characteristic, number, or quantity
that can be measured, counted, or observed for record.

(A) RESPONSE VARIABLE – Variable that about CLASSIFYING DATA


which the researcher is posing the question. May also
1. QUALITATIVE (CATEGORICAL) – Data that
be called the outcome or the dependent variable.
serves the function of a name only. Categorical values
(B) EXPLANATORY VARIABLE – Variables that may be:
explain changes in the response. They may also be
(A). BINARY – where there are two choices, e.g.
called the predictor or independent variables.
Male and Female
(C) LURKING VARIABLE – neither the explanatory
(B). ORDINAL – where the names imply levels with
variable nor the response variable but has a
hierarchy or order of preference, e.g. level of
relationship with the response and the explanatory
education.
variable. It is not considered in the study but could
influence the relationship between the variables in the (C). NOMINAL – where no hierarchy is implied, e.g.
study. political party affiliation.

(D) CONFOUNDING VARIABLE – variable that is


in the study and is related to the other study variables,
2. QUANTITATIVE – Data that takes on numerical
thus influencing the relationship between these
values that has a measure of distance between them.
variables.
Quantitative values can be:

(A). DISCRETE – or “counted” as in the number of


PRINCIPLES OF EXPERIMENTAL DESIGN people in attendance.

1. CONTROL – Need to control for effects due to (B). CONTINUOUS – or “measured” as in the weight
factors other than the ones of primary interest. or height of a person.

2. RANDOMIZATION – Subjects should be SUMMARIZING ONE QUALITATIVE


randomly divided into groups to avoid unintentional VARIABLE
selection bias in the groups.
PROPORTION – fraction or part of the total that
3. REPLICATION – sufficient number of subjects possesses a certain characteristic.
should be used to ensure that randomization creates
groups that resemble each other closely and to increase
the chances of detecting differences among the
treatments when such differences actually exist.
GRAPHING ONE QUALITATIVE VARIABLE
THE BENEFITS TO RANDOMIZATION ARE:
1. PIE CHART
1. If a random assignment of treatment is done,
then significant results can be concluded as  each sector of the circle represents the
causal or cause and effect conclusions. That is, percentage of that category.
that the treatment caused the result. This  may not be suitable for too many categories.
 Readers may find the pie chart more useful if II. MEASURES OF POSITION – give a range
the percentages are arranged in a descending where a certain percentage of the data fall. The
or ascending order. measures we consider here are percentiles and
quartiles.
2. BAR CHART
PERCENTILES
 The height of the bar for each category is
equal to the frequency (number of  The pth percentile of the data set is a
observations) in the category. measurement such that after the data are
 Leave space in between the bars to emphasize ordered from smallest to largest, at most, p%
that there is no ordering in the classes. of the data are at or below this value and at
 Though histogram also have bars sticking up, most, (100 - p) % at or above it.
they are used to describe the frequency for  A common application of percentiles is their
quantitative variables, bar chart is reserved to use in determining passing or failure cutoffs
describe graphs that show frequency of for standardized exams. If you have a 95th
categorical variables. percentile score, then you are at or above 95%
of all test takers.
 The median is the value where fifty percent or
the data values fall at or below it. Therefore,
SUMMARIZING ONE QUANTITATIVE
the median is the 50th percentile.
VARIABLE

I. MEASURES OF CENTRAL TENDENCY –


is an important aspect of quantitative data. It is Q1 – is commonly called the lower quartile.
an estimate of a “typical” value.
a. MEAN – is the average of data. Q3 – is commonly called the upper quartile.
b. MEDIAN – middle value of the
ordered data.
c. MODE – is the value that occurs most THE 5 - NUMBER SUMMARY – a helpful
often in the data. summary of the data is called the five number
summary. The five number summary consists of five
EFFECTS OF OUTLIERS
values:
 RESISTANT – measures that are not that
1. The minimum
affected by extreme values.
2. The lower quartile, Q1
 SENSITIVE – measures that are affected by 3. The median (also known as Q2)
extreme values. 4. The upper quartile, Q3
SHAPE 5. The maximum

 SYMMETRIC
o mean, median, and mode are all the
same here.
o no skewness is apparent. III. MEASURES OF
o the distribution is described as VARIABILITY – There are many ways to
symmetric. describe variability or spread including:

 LEFT-SKEWED OR SKEWED LEFT 1. RANGE


o mean < median.  is the difference in
o long tail on the left. the maximum and
minimum values of
a data set.
 RIGHT-SKEWED OR SKEWED RIGHT
 The maximum is the largest value in
o mean > median.
the dataset and the
o long tail on the right.
minimum is the
smallest value.
 The range is easy to calculate but it is  If there is more than one observation with
very much affected by extreme values. the same value, a dot is placed above the
others.
2. INTERQUARTILE RANGE (IQR)  A Dot plot provides us with a quick glance
 The interquartile range is the at the data. We can easily see the minimum
difference between upper and lower and maximum values and the mode.
quartiles and denoted as IQR.  Dot plots are generally used for small data
 Like the range, the IQR is a measure sets.
of variability, but you must find the
quartiles in order to compute its value. 2. STEM-AND-LEAF DIAGRAM
 IQR is not affected by extreme values.  To produce the diagram, the data need to
It is thus a resistant measure of be grouped based on the “stem”, which
variability. depends on
 the number of digits of the quantitative
variable. The “leaves” represent the last
3. VARIANCE AND STANDARD digit. One
DEVIATION  advantage of this diagram is that the
 One way to describe spread or original data can be recovered (except the
variability is to compute the standard order the data
deviation. The standard deviation is  is taken) from the diagram.
the square root of the variance.
3. HISTOGRAM
a. VARIANCE – the average squared distance  If there are many data points and we
from the mean. would like to see the distribution of the
b. STANDARD DEVIATION – approximately data, we can
the average distance the values of a data set  represent the data by a frequency
are from the mean or the square root of the histogram or a relative frequency
variance. histogram.
 A histogram looks similar to a bar chart
but it is for quantitative data.
4. Coefficient of Variation  To create a histogram, the data need to be
 A popular statistic to use in such grouped into class intervals.
situations is the Coefficient of  Then create a tally to show the frequency
Variation or CV. (or relative frequency) of the data into
 This is a unit-free statistic and one each interval.
where the higher the value the greater  The relative frequency is the frequency in
the dispersion. a particular class divided by the total
number of observations.
 The bars are as wide as the class interval
GRAPHING ONE QUANTITATIVE VARIABLE and as tall as the frequency (or relative
frequency).
Now that we discussed how to find summary statistics
for quantitative variables, the next step is to graph the 4. BOXPLOT – To create this plot we need the five
data. The graphs we will discuss include: number summary. Therefore, we need:

1. DOT PLOT 1. minimum value,


 Displays the data as dots on a number line. 2. Q1 (lower quartile),
It is useful to show the relative positions 3. Q2 (median),
of the data. 4. Q3 (upper quartile), and
 Each of the observations is represented as 5. maximum value.
a dot.
USING THE FIVE NUMBER SUMMARY, ONE
CAN CONSTRUCT A SKELETAL BOXPLOT.
 Mark the five number summary above the SAMPLE SPACE – is the set of all possible outcomes
horizontal axis with vertical lines. of a probability experiment.
 Connect Q1, Q2, Q3 to form a box, then
TREE DIAGRAM – is a device consisting of line
connect the box to min and max with a line to
segments emanating from a starting point and from the
form the whisker.
outcome point. It is used to determine all possible
 Most statistical software does NOT create outcomes of a probability experiment.
graphs of a skeletal boxplot but instead opt for
the boxplot as follows below. Boxplots from EVENT - consists of a set of outcomes of a probability
statistical software are more detailed than experiment.
skeletal boxplots because they also show
CLASSICAL PROBABILITY - uses sample spaces
outliers. However, if there are no outliers, what
to determine the numerical probability that an event
is produced by the software is essentially the
will happen.
skeletal boxplot.
COMPLEMENT OF AN EVENT E – is the set of
The following terminology will prepare us to
outcomes in the sample space that are not included in
understand and draw this more detailed type of the
the outcomes of event E.
boxplot.
EMPIRICAL PROBABILITY – probability where
a. Potential outliers are observations that lie
one observes the various frequencies and use these
outside the lower and upper limits.
frequencies to determine the probability of an
b. Lower limit = Q1 - 1.5 * IQR
outcome.
c. Upper limit = Q3 +1.5 * IQR
d. Adjacent values are the most extreme values SUBJECTIVE PROBABILITY – uses a probability
that are not potential outliers. value based on an educated guess or estimate,
employing opinions and inexact information.

Two events are MUTUALLY EXCLUSIVE


BOXPLOTS AND DISTRIBUTION SHAPES
EVENTS if they cannot occur at the same time (i.e.,
1. SYMMETRIC DATA – A symmetric they have no outcomes in common).
distribution with its corresponding box plot:
Two events A and B are INDEPENDENT EVENTS if
the fact that A occurs does not affect the probability of
B occurring.

2. RIGHT-SKEWED DATA – A right-skewed


distribution along with its corresponding box PROBABILITY NOTATION
plot: PROBABILITY – is
the likelihood of an
outcome.

3. LEFT-SKEWED DATA – A left-skewed EVENT – a collection


distribution along with its corresponding box of outcomes, typically denoted by capital letters such
plot: as A, B, C, etc...

OUTCOME – The
result of an event

OUTCOME SPACE – The outcome space of a


CU2: PROBABILITY scenario is all the possible outcomes that can occur and
is often denoted S. The outcome space may also be
PROBABILITY EXPERIMENT – is a chance referred to as the sample space.
process that leads to well-defined results called
outcomes. SET OPERATIONS
OUTCOME – is the result of a single trial of a
probability experiment.
Set notation is used to represent set operations. Each  A probability of 0 means that the event is
operation will also be presented in a Venn Diagram. impossible. A probability of 1 means an event
Set operations are important because they allow us to is guaranteed to happen.
create a new event by manipulation of other events.  A probability close to 0 means the event is
"not likely" and a probability close to 1 means
1. UNION
the event is "highly likely" to occur.
The union of two events, A and B, contains all of the
2. PROBABILITY OF A COMPLEMENT
outcomes that are in A, B or both. In statistics, ‘or’
means at least one event occurs and therefore includes  If A is an event, then the probability of A is
the event where both occur. equal to 1 minus the probability of the
2. INTERSECTION complement of A.

The intersection of two events, A and B, contains all of 3. PROBABILITY OF THE EMPTY SET
the outcomes that are in both A and B.  If A and B are mutually exclusive, then.
3. COMPLEMENT Therefore, Probability of the union of two
events
The complement of an event, A, contains all of the
outcomes that are not in A.

4. MUTUALLY EXCLUSIVE CU3: PROBABILITY


A and B are called mutually exclusive (or disjoint) if DISTRIBUTIONS
the occurrence of outcomes in A excludes the
RANDOM VARIABLE (X)
occurrence of outcomes in B.
 is a variable that takes on different values
determined by chance. In other words, it is
numerical quantity that varies at random.

INTERPRETATIONS OF PROBABILITY

1. Classical Interpretation of Probability TYPES OF RANDOM VARIABLES

 The probability that event E occurs is denoted 1. DISCRETE RANDOM VARIABLE


by P(E). When all outcomes are equally likely.  When the random variable can assume only a
2. SUBJECTIVE PROBABILITY countable, sometimes infinite, number of
values.
 Subjective probability reflects personal belief
which involves personal judgment, 2. CONTINUOUS RANDOM VARIABLE
information, intuition, etc.  When the random variable can assume an
3. RELATIVE FREQUENCY CONCEPT OF uncountable number of values in a line
PROBABILITY (EMPIRICAL APPROACH) interval.

 If a particular outcome happens over a large


number of events, then the percentage of that
PROBABILITY FUNCTIONS
outcome is close to the true probability.
A probability function is a mathematical function that
provides probabilities for the possible outcomes of the
PROBABILITY PROPERTIES random variable.

1. PROBABILITY OF AN EVENT 1. PROBABILITY MASS FUNCTION (PMF)

 Probabilities will always be between (and  If the random variable is a discrete random
including) 0 and 1. variable.

2. PROBABILITY DENSITY FUNCTION (PDF)


 If the random variable is a continuous random PROBABILITIES FOR NORMAL RANDOM
variable. VARIABLES (Z-SCORES)

3. CUMULATIVE DISTRIBUTION FUNCTION  The standard normal is important because we


(CDF) can use it to find probabilities for a normal
random variable with any mean and any
 is a function that gives the probability that the
standard deviation.
random variable, X, is less than or equal to the
 We can convert any normal distribution into
value x.
the standard normal distribution in order to
find probability and apply the properties of the
standard normal. In order to do this, we use the
DISCRETE PROBABILITY DISTRIBUTIONS z-value.
 We can define the probabilities of each of the Z-VALUE, Z-SCORE, OR Z
outcomes using the probability mass function
(PMF). If we assume the probabilities of all  The Z-value (or sometimes referred to as Z-
the outcomes were the same, the PMF could score or simply Z) represents the number of
be displayed in function form or a table. standard deviations an observation is from the
mean for a set of data.
EXPECTED VALUE (OR MEAN) OF A
DISCRETE RANDOM VARIABLE
THE EMPIRICAL RULE
 For a discrete random variable, the expected
value, usually denoted as or E(X), is calculated The Empirical Rule is sometimes referred to as the 68-
using: 95-99.7% Rule. The rule is a statement about normal
or bell-shaped distributions.

CONTINUOUS PROBABILITY
DISTRIBUTIONS

 If the data is continuous, the distribution is


modeled using a probability density function
(or PDF). We define the probability
distribution function (PDF) of Y as where: is
the area under over the interval from to a to b.

NORMAL DISTRIBUTION

 The Normal Distribution is a family of


continuous distributions that can model many
histograms of real-life data which are mound-
shaped (bell-shaped) and symmetric (for
example, height, weight, etc.).

A NORMAL CURVE HAS TWO PARAMETERS:

1. MEAN (center of the curve)


2. STANDARD DEVIATION (spread about the
center) (and variance )

The mean can be any real number and the standard


deviation is greater than zero. The normal curve ranges
from negative infinity to infinity. The image below
shows the effect of the mean and standard deviation on
the shape of the normal curve.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy