0% found this document useful (0 votes)
4 views25 pages

Statistics Unit 1

The document provides an overview of statistics, including definitions of descriptive and inferential statistics, types of data, sampling methods, and levels of measurement. It discusses the importance of frequency distributions, histograms, and various graph types for data representation. Key concepts such as population, sample, parameters, and methods for collecting sample data are also outlined.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views25 pages

Statistics Unit 1

The document provides an overview of statistics, including definitions of descriptive and inferential statistics, types of data, sampling methods, and levels of measurement. It discusses the importance of frequency distributions, histograms, and various graph types for data representation. Key concepts such as population, sample, parameters, and methods for collecting sample data are also outlined.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Math-119 (Statistics) Notes​

Chapter 1.1/1.2​
STATISTICAL AND CRITICAL THINKING​
Statistics: the science of planning studies and experiments, obtaining data, and
organizing, summarizing, presenting, analyzing, and interpreting those data and then
drawing conclusions based on them​
Descriptive Statistics: statistics that summarize or describe features of a data set, such
as its central tendency or dispersion​
-DESCRIBE the traits that you SEE (not guessing it)​
Inferential Statistics: drawing conclusions and/or making decisions concerning a
population based only on sample data​
-What information can you ASSUME based on your data?​

TYPES OF DATA​
Data: collections of observations, such as measurements, genders, or survey responses​
Population: The complete collection of ALL measurements or data that are being
considered​
-Typically, a population is the COMPLETE collection of data that we would like to make
inferences about​
-The ENTIRE group that you’re interested in studying​
Parameter: A parameter is the numerical measurement used to describe characteristic of
the entire population being studied​
-TRAIT​
Sample: a subcollection of members from a population​
-A PART of the population that represents the population being studied​
Statistics: the measure that is used to describe the sample being studied​
Census: the collection of data from every member of a population​

Voluntary Response Sample/Self-Selected Sample: one in which the respondents
themselves decide whether to be included or not​
-Typically Seriously FLAWED​
-Voluntary Response Sample Example: the following types of polls are common
examples of voluntary response samples. By their very nature, all are seriously flawed
because we should not make conclusions about a population on the basis of samples
with a strong possibility of bias:​
→Internet polls in which people online can decide whether to respond​
→Mail-in polls, in which people can decide whether to reply​
→Telephone call-in polls, in which newspaper, radio, or television announcements ask
that you voluntarily call a special number to register your opinion​

Difference between Population and Census: the entire group of objects or individuals
about which information is WANTED is called the population​
-A Census is an attempt to gather information about EVERY individual in a population​

Quantitative (or numerical) Data: consists of NUMBERS representing counts or
measurements​
-Examples:​
​ >The weights of supermodels
>The ages of respondents​
-Can be further described by distinguishing between discrete and continuous types​
-Discrete Data: result when the data values are quantitative and the number of values is
FINITE, or COUNTABLE​
​ >The number of tosses of a coin before getting tails​
-Continuous (numerical) data: result from infinitely many possible quantitative values,
where the collection of values is not countable​
>Examples:​
​ →The lengths of distances from 0cm to 12cm​

Categorical (or qualitative or attribute) data: consists of NAMES or LABELS )not
numbers that represent counts or measurements)​
-Data that reflects the qualitative characteristics​
-Examples:​
​ >The gender (male/female) of professional athletes​
​ >Shirt numbers on professional athlete’s uniform (substitutes for numbers)​

Level of Measurements: another way of classifying is to use 4 levels of measurement​
1.) Nominal: characterized by data that consists of names, labels, or categories only, and
the data CANNOT be arranged in some order (such as low to high)​
-Examples:​
​ >Survey responses of yes, no, and undecided​
2.) Ordinal: involves data that CAN be arranged in some order, but the differences
(obtained by subtraction) between data values either CANNOT be determined or are
MEANINGLESS​
-Examples:​
​ >Course grades A, B, C, D, or F​
3.) Interval: involves data that can be arranged in order, and the differences between data
values can be found and are meaningful. However, there is NO natural zero starting point
at which none of the quantity is present​
-Examples:​
​ >Years 1000, 2000, 1776, and 1492​
4.) Ratio: data can be arranged in order, differences can be found and are MEANINGFUL,
and there is a natural zero starting point (where zero indicates that none of the quantity
is present)​
-Differences and ratios are both meaningful​
-Examples:​
​ >Class times of 50 minutes and 100 minutes​

LEVEL OF MEASUREMENTS:​
1.) Nominal:​
A.) Qualitative/Categorical​
B.) Names, colors, labels, gender, etc.​
C.) Order does NOT MATTER​
2.) Ordinal Data:​
A.) Ranking/Placement​
B.) The order MATTERS​
C.) Difference cannot be measured​
3.) Interval:​
A.) The order MATTERS​
B.) Difference CAN be measured (except ratios)​
C.) No true “0” starting point​
4.)Ratio:​
A.) The order MATTERS​
B.) Difference CAN be measured (including ratios)​
C.) Contains “0” starting point​

Chapter 1.3​
COLLECTING SAMPLE DATA​
Simple Random Sample: the method used to collect sample data that influences the
quality of the statistical analysis

Observational Study: observing and measuring specific characteristics without
attempting to MODIFY the individuals being studied

Experimental Study: apply some treatment and then proceed to observe its effects on
the individuals​
-The individuals in experiments are called experimental units, they are often called
subjects when they are people

Randomization: used when subjects are assigned to different groups through a process
of random selection.
-The logic is to use chance as a way to create 2 groups that are similar​

Simple Random Sample: a sample of n subjects is selected in such a way that every
possible sample of the same size n has the same chance of being chosen​
-A simple random sample is often called a random sample, but strictly speaking, a
random sample has the weaker requirement that all members of the population have
the same chance of being selected​

Systematic Sampling: select some starting point and then select every kth element in
the population

Convenience Sampling: use data that are very easy to get​


-Example: ​
​ >Online Poll​
​ >Survey your best friends​
​ >Asking for volunteers at the mall​



Stratified Sampling: subdivide the population into at least 2 different subgroups(or
strata) so that subjects within the same subgroup share the same characteristics. Then
draw a sample from each subgroup (for stratum)​
Cluster Sampling: divide the population area into sections (or clusters), then randomly
select some of those clusters, and choose ALL the members from those selected clusters​


LESS PREFERRED/EFFECTIVE SAMPLING METHODS​
Convenience Sampling: it is usually not random​
-An example would be sitting outside a supermarket and sampling the first 50 people
who come out

Voluntary Response Sampling: the most biased as it normally is done by sending out a
survey and asking people to fill it out and send it back​
-It can also be a radio talk show host posing a question to its viewers and asking them to
call in with their opinion.​
-The sample response will most assuredly be comprised of almost all who feel strongly
either way, which presents a good deal of bias​
Chapter 2.1​
FREQUENCY DISTRIBUTIONS FOR ORGANIZING AND SUMMARIZING
DATA​
Descriptive Statistics: statistics that summarize or describe features of a data set, such
as its central tendency or dispersion​
-How often the data is appearing

CHARACTERISTICS OF DATA​
1.) Center: a representative value that shows us where the middle of the data set is
located
2.) Variation: a measure of the amount that values vary
3.) Distribution: the nature or shape of the spread of the data over a range of values (such
as bell-shaped)
4.) Outliers: sample values that lie far away from the vast majority of other sample
values
5.) Time: any change in the characteristics of the data over time​

FREQUENCY DISTRIBUTION​
-When working with large data sets, a frequency distribution (or frequency table) is
often helpful in organizing and summarizing data​
-A frequency distribution helps us to understand the nature of the distribution of a data
set​
Frequency Distribution: shows how the data are partitioned among several categories
(or classes) by listing the categories along with the number (frequency) of data values in
each of them​
Lower Class Limits: are the SMALLEST numbers that can belong to each of the different
classes​
Upper Class Limits: are the LARGEST numbers that can belong to each of the different
classes​
Class Boundaries: are the numbers used to SEPARATE the classes, but without the gaps
created by class limits​
Class midpoints: are the values in the middle of the classes​
Class Width: the difference between 2 consecutive lower/upper-class limits in a
frequency distribution
PROCEDURE FOR CONSTRUCTING A FREQUENCY DISTRIBUTION​
-We construct Frequency Distributions to:​
>1.) Summarize large data sets​
>2.) See the distribution and identify outliers​
>3.) Have a basis for constructing graphs (such as histograms)​
-Technology can generate Frequency Distributions, but here are the steps for
MANUALLY constructing them:​
1.) Select the number of classes, usually between 5 and 20​
-The number of classes might be affected by the convenience of using round numbers​
2.) Calculate the Class width

2.5.) Round this result to get a convenient numbers (round up)​


-Using a specific number of classes is not too important, and its usually wise to change
the number of classes so that they use convenient values for the class limits​
3.) Choose the value for the first lower class limit by using either the minimum value or a
convenient value below the minimum​
4.) Using the first lower class limit and the class width, list the other lower-class limits​
-Do this by adding the class width to the first lower class limit to get the 2nd lower class
limit​
>Add the class width to the second lower class limit to get the 3rd lower class limit, and
so on​
5.) List the lower-class limits in a vertical column and then determine and enter the
upper-class limits​
6.) Take each individual data value and put a tally mark in the appropriate class.​
-Add the tally marks to find the total frequency for each class​

CLASS BOUNDARY​
-A Class Boundary is the value that lies halfway between the upper limit of one class
and the lower limits of the next class.
-After one class boundary, add (or subtract) the class width to find the next class
boundary​
-The boundaries of a class are typically given in interval form as (lower boundary, upper
boundary)​
RELATIVE FREQUENCY DISTRIBUTION​
-A variation of the basic frequency distribution is a relative frequency distribution or
percentage frequency distribution, in which each class frequency is replaced by a relative
frequency (or proportion) or a percentage​
-Relative frequencies and percentages are calculated as follows.​

-The sum of the percentages in a relative frequency distribution must be very close to
100% (with a little wiggle room for rounding errors)​

CUMULATIVE FREQUENCY DISTRIBUTION​


-Another variation of a frequency distribution is a cumulative frequency distribution in
which the frequency for each class is the sum of the frequencies for that class and all
previous classes.

NORMAL DISTRIBUTION​
1.) The frequencies start low, then increase to 1 or 2 high frequencies, and then decrease
to a low frequency​
2.) The distribution is approximately symmetric: frequencies preceding the maximum
frequency should be roughly a mirror image of those that follow the maximum
frequency
Chapter 2.2​
HISTOGRAMS​
Histogram: a graph consisting of bars of equal width drawn adjacent to each other
(unless there are gaps in the data).​
-The horizontal scale represents classes of quantitative data values, and the vertical scale
represents frequencies​
-The heights of the bars correspond to frequency values​
-With a data set that is so small, the true nature of the distribution cannot be seen with a
histogram​

IMPORTANT USES OF A HISTOGRAM​
1.) Visually displays the shape of a distribution of the data​
2.) Shows the location of the center of the data​
3.) Shows the spread of the data​
4.) Identifies outliers
-When graphed as a histogram, a normal distribution has a “bell” shape​
-In uniform distribution the different possible values occur with approximately the same
frequency, so the heights of the bars in the histogram are approximately uniform​
-In skewness, a distribution of data is skewed if it is not symmetric and extends more to
1 side than to the other​
>Data skewed to the right (positively skewed) have a longer right tail​
>Data skewed to the left (negatively skewed) have a longer left tail

CRITERIA FOR ASSESSING NORMALITY WITH A NORMAL QUANTILE


PLOT​
Normal Distribution: the population distribution is normal if the pattern of the points in
the normal quantile plot is reasonably close to a straight line, and the points do not show
some systematic pattern that is not a straight-line pattern

Not a Normal Distribution: the population distribution is not normal if the normal
quantile plot has either or both of these 2 conditions​
1.) The points do not lie reasonably close to a straight-line pattern​
2.) The points show some systematic pattern that is not a straight-line pattern
How to arrange the data in array:​
1.) Stat-Edit-enter the data in list​
2.) 2nd-Mode-Stat-#2 (for Ascending data in list) or #3 (Descending data in list)
3.) Go to the list to check the array​
How to Create TI-84: Histograms​
1.) Turn your Stat Plot ON and select the histogram icon​
2.) Go to STAT→Edit
3.) Type values into L1​
4.) Go to Zoom Stat (Zoom 9) to view and to create a friendly window​
5.) Use the TRACE button and arrow keys to toggle through the bars of the histogram​
Adjust Windows for Histogram​
-X-min: smallest data​
-X-max: LARGEST data​
-X-scale: class width​
-y-min: 0​
-y-max: highest class frequency
Chapter 2.3​
GRAPHS THAT ENLIGHTEN AND GRAPHS THAT DECEIVE​
-Some graphs are deceptive because they create impressions about data that are
somehow misleading or wrong

GRAPHS THAT ENLIGHTEN​
Dot Plots: consists of a graph of quantitative data in which each data value is plotted as a
point (or dot) above a horizontal scale of values​
-Dots representing equal values are stacked​
Features of a Dot Plot:​
>Displays the shape of the distribution of data​
>It is usually possible to recreate the original list of data values​
-Example:
“How long does it take you to eat breakfast?”​
Stem-Plots(Stem and leaf plot): represents quantitative data by separating each value
into 2 parts: the stem (such as the leftmost digit) and the leaf (such as the rightmost
digit)​
Features of a Stem Plot:​
>Shows the shape of the distribution of the data​
>Retains the original data values​
>The sample data are sorted (arranged in order)​
-Example:​

Time-Series Graph: a time-series graph is a graph of time-series data,which are


quantitative data that have been collected at different points in time, such as monthly or
yearly​
Features of a Time-Series Graph:​
>Reveals information about trends over time

Bar Graphs: a bar graph uses bars of equal width to show frequencies of categories of
categorical/qualitative data. ​
-The bars may or may not be separated by small gaps​
Features of a Bar Graph:​
>Shows the relative distribution of categorical data so that it is easier to compare the
different categories​
-Example:​
→“Housing Types for Students in a Statistics Class”​


Pareto Charts: a bar graph for categorical data, with the added stipulation that the bars
are arranged in descending order according to frequencies, so the bars decrease in height
from left to right ​
Features of a Pareto Chart:​
>Shows the relative distribution of categorical data so that it is easier to compare the
different categories
>Draws attention to the more important categories
Pie Charts: a very common graph that depicts categorical data as slices of a circle, in
which the size of each slice is proportional to the frequency count for the category​
>Although pie charts are very common, they are not as effective as Pareto Charts​
Features of a Pie Chart:​
>Shows the distribution of categorical data in a commonly used format

Frequency Polygon: uses line segments connected to points located directly above class
midpoint values​
-Although frequency polygons are very similar to a histogram, a frequency polygon uses
line segments instead of bars ​
-A variation of the basic frequency polygon is the relative frequency polygon, which
uses relative frequencies (proportions or percentages) for the vertical scale​
-An advantage of relative frequency polygons is that 2 or more of them can be combined
on a single graph for easy comparison​
Features of a Frequency Polygon:​
>Consists of class midpoints as x-values and frequency or relative frequency as y-values
GRAPHS THAT DECEIVE​
Nonzero Vertical Axis: uses a vertical scale that starts at some value greater than zero to
exaggerate differences between groups​
-Always examine a graph carefully to see whether a vertical axis begins at some point
other than zero so that the differences are exaggerated

Pictographs: drawings of objects, called pictographs, are often misleading


-Data that are 1-dimensional in nature (such as budget amounts) are often depicted with
2-dimensional objects(such as dollar bills) or 3-dimensional objects (such as stacks of
coins, homes, or barrels)​

-When examining data depicted with a pictograph, determine whether the graph is
misleading because objects of area or volume are used to depict amounts that are
actually 1-dimensional.​
>Histograms and bar charts represent 1-dimensional data with 2-dimensional bars, but
they use bars with the same width so that the graph is not misleading
Chapter 3.1​
MEASURES OF CENTER
THE 4 MEASURES OF CENTER​
1.) Mean (Arithmetic Mean): the sum of the data values, divided by the number of data
values​
Properties of the Mean:​
>Means vary less than the other measures of the center​
>Uses every data value​
>Sensitive to outliers (the mean is not resistant)​
​ →A statistic is resistant if the presence of outliers does not cause it to change very
much​
∑: the sum of a set of data values.
x:the variable usually used to represent the individual data values.
n:the number of data values in a sample.
N:the number of data values in a population.

-NEVER use the term average when referring to a measure of center​


>The word average is often used for the mean, but it is sometimes used for other
measures of center​
>The term average is NOT USED by statisticians and it will not be used for us​
2.) Median: the middle value when the original data values are arranged in order of
increasing (or decreasing) magnitude​
Properties of the Mean:​
>A resistant measure of center (not sensitive to outliers)​
>Does not directly use every data value​
>Sometimes is denoted by ​
How to compute the Median:​
1.) Sort the values (arrange them in order)​
2.) Depends:​
>If the sample size of n is ODD, the median is the middle number in the sorted list
>If the sample size of n is EVEN, compute the mean of the 2 middle numbers in the
sorted list​

3.) Mode: the values that occur with the greatest frequency​
Properties of the Mode:​
>Can be found with qualitative data​
Finding the Mode:​
-When no data is reported​
​ >There is NO mode
-The data set is bimodal​
​ >When 2 data values occur with the same greatest frequency, each one is a mode​
-The data set is multimodal​
​ >When more than 2 data values occur with the same greatest frequency, each is a
mode​

4.) Midrange: the value halfway between the maximum and minimum values in the
original data set​
Properties of the Midrange:​
>Very sensitive to outliers (not resistant)​
>Easy to compute​
>Sometimes confused with median
→Median: half of the values are above and below it​
​ →Midrange: the mean of the max and min​
-Mean, Median, and Mode for Normal Distribution are ALL EQUAL​


ROUND-OFF RULES FOR MEASURES OF CENTER​
1.) For the mean, median, and midrange, carry one more decimal place than is present in
the original set of values​
2.) For the mode, leave the value as is without rounding (because values of the mode are
the same as some of the original data values)​

GRAPH AND MEASURES OF CENTER​
-The Mode is the data value at which a distribution has its highest peak​
-The Median is the number that divides the area of the distribution in half​
-The Mean of a distribution will be pulled toward any outliers​
-In Graph highest peak is the mode, median is where you divide the graph with equal
area, and mean is center of gravity and if its skewed then the mean is towards that side​
-Of the 3 measures of center, mean is closest to the outlier while the median and mode
are more similar in value and are not affected by the outlier
Calculating the Mean From a Frequency Distribution​


Calculating a Weighted Mean​
-When different x data values are assigned different weights w, we can compute a
weighted mean

Chapter 3.2​
MEASURES OF VARIATIONS​
-Measure of Variation: statistics of how far away the values in the observations (data
points) are from each other​
1.) Range: the range of a set of data values is the difference between the maximum data
value and the minimum data value​

Properties of Range:​
>Very sensitive to outliers (not resistant)​
>Does not take every data value into account (so it doesn’t reflect the variation among all
data values)

2.)Standard Deviation​

-Denoted by s, is a measure of how much data values deviate away from the mean​

Formula for Finding​


1.) Sample Standard Deviation:​

2.) Shortcut to find Sample Standard Deviation:​

3.) Population Standard Deviation:​

4.) To find Sample Standard Deviation when frequency distribution is given​


7 IMPORTANT PROPERTIES OF SAMPLE STANDARD DEVIATION​
1.) A measure of how much data values deviate from the mean​
2.) s>0
3.) s=0 only when all data values are exactly the same​
4.) Large s= More Variation​
5.) Very sensitive to outliers (not resistant)​
6.) Has the same units (such as minutes, feet, pounds) as the original data values​
7.) The Sample Standard Deviation s is a biased estimator of the population

Variance:​

-The variance of the sample is denoted by sample standard deviation squared

6 IMPORTANT PROPERTIES OF SAMPLE VARIANCE​


1.) Sample standard deviation squared​
2.) s^2>0
3.) s^2=0 only when all data values are exactly the same​
4.) Very sensitive to outliers (not resistant)​
5.) Its units: the squared units of the original data values​
>More difficult to interpret than standard deviation​
6.) An unbiased estimator of population variance sigma^2​


ROUND-OFF RULE FOR MEASURE OF VARIATION​
-When rounding the value of a measure of variation, carry one more decimal place
than is present in the original set of data​

RANGE RULE OF THUMB FOR UNDERSTANDING STANDARD DEVIATION​
-The range rule of thumb is a crude but simple tool for understanding and interpreting
standard deviation​
-It is based on the principle that for many data sets, the vast majority (such as 95%) of
sample values lie within 2 standard deviations of the mean​

RANGE RULE OF THUMB FOR ESTIMATING A SAMPLE STANDARD


DEVIATION​
-To roughly estimate the standard deviation from a collection of sample data, use ​

EMPIRICAL RULE (OR 68-95-99.7%) RULE FOR DATA WITH A


BELL-SHAPED DISTRIBUTION​
1.) About 68% of all values fall within 1 standard deviation of the mean​
2.) About 95% of all values fall within 2 standard deviations of the mean​
3.) About 99.7% of all values fall within 3 standard deviations of the mean​

COMPARING VARIATION IN DIFFERENT SAMPLES OR POPULATIONS​
-We should only compare sample standard deviation when the sample means are
approximately the same​
-If we’re trying to compare​
>Samples that have very different means, or​
>Samples or populations with different scales or units, we should use the coefficients of
variation​

COEFFICIENT OF VARIATION​
-The coefficient of variation (or CV) for a set of nonnegative sample or population data,
expressed as a percent, describes the standard deviation relative to the mean, and is
given by the following:​
Sample:​

Population:​


CALCULATOR​
To calculate Standard Deviation (and Mean)​
-Enter the data in calculator by going in stat-Edit-List​
-Exit out and then go to stat-calc-1 var stats-list-calculate​
To calculate Standard Deviation using frequency​
-Enter midpoint data in list1 and frequency in list 2​
-Exit out and then go to stat-calc-1 var stats-list1-freq-list2 and calculate
Chapter 3.3​
MEASURES OF RELATIVE STANDING AND BOXPLOTS​

Z SCORES​
-A Z score (or standard score or standardized value) is the number of standard deviations
that a given value x is above or below the mean​
-The Z score is calculated by using the following:​

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy