We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15
STA-2023
Statistics for Business and Economics
Text Book: McClave, Benson and Sincich, 12th edition Vocabulary (Revised: Aug. 2014)
Chapter 1: Statistical Methods
Statistics is a branch of science dealing with methods used
for collecting, organizing, summarizing, analyzing and interpreting data sets
Descriptive Statistics consists of the procedures used to
organize and summarize data sets, as well as to describe their major characteristics
Inferential Statistics consists of the procedures used to
make estimations and decisions about a population based on the information contained in a representative sample
Population is the set of all units (subjects or objects) of
interest in any statistical study
Sample is a subset of units chosen from the defined
population with the purpose of making a statistical inference about the population
Representative Sample is a sample that reflects the
relevant characteristics of the population. Representative samples can be obtained by using sampling techniques Simple Random Sampling is the most basic probability sampling technique. It involves a single list of all units of the population which are given an equal chance to be included in the sample
Census is a type of statistical study conducted on the
entire population
Sampling survey is a type of statistical study that involves
sample units and a questionnaire
Units are the individuals (subjects or objects) included in
any statistical study
Variables are characteristics that vary from one unit to
another
Qualitative (Categorical) data are the observations
describing an attribute or categorical characteristic of the individuals
Quantitative data are the observations/measurements
describing a numerical characteristic of the individuals
Data/Data Set is the set of all observations/measurements
collected for one or more variables on a particular set of units Chapter 2: Descriptive Statistics
Frequency table is a way of organizing and summarizing the
information contained in a data set
Elements of a frequency table
Classes Class limits Class boundaries Frequency or count frequency Relative frequency Percent frequency
Frequency graphs for categorical data
Bargraph: unattached bars on a rectangular system
Pareto chart: bargraph where the bars are arranged in decreasing order of frequency from left to right Piechart: circle graphs
Frequency graphs for quantitative data
Histogram: connected bars on a rectangular coordinate
system Stem & Leaf plot
Frequency Distribution Curves
Frequency Distribution Curve for a quantitative data set is a smooth curve that fits the relative frequency histogram. Three typical patterns of frequency curves: Bell curve is symmetric and mound shaped Skewed to the right has a long right tail Skewed to the left has a long left tail
Measures of Central Tendency (Center)
Mean is the simple average calculated over all data points
Median is a value located at the middle of the distribution when the data points are arranged in order Mode is the most frequent or repeated data point
Modal Class of grouped data is the class (category or interval)
with the highest frequency
Measures of Variability (Spread)
Range is the distance between the endpoints (highest and
lowest values) of the data set
Standard deviation is a measure of the average distance
of all data values relative to the mean
Outliers are extremely high or low data values disconnected
from the rest of the data set
Chebyshev and Empirical Rules provide the expected percent
of data values falling within 1, 2, and 3 standard deviations of the mean
Percentiles are measures of relative standing that describe the
percent of data points falling below or at any given data value Quartiles (Q1, Q2, Q3) are special percentiles that divide the data set in four (evenly weighted) subsets
Interquartile Range (IQR) is the distance between the first and
third quartile (Q3 - Q1). It describes the spread of the central 50% of the data set.
Five Number Summary is a way of describing a data set using
five special percentiles (Min, Q1, Q2, Q3, Max)
Box and Whiskers Plot is the graphical representation of the
Five Number Summary Chapter 3: Probability
Random Experiment is an observable activity whose outcome
can not be predicted with certainty
Sample Space is the set of all basic outcomes of a random
experiment
Sample points are the elements of the sample space
Event is any subset of basic outcomes of a random experiment
Impossible Event is an event containing no sample points
Certain Event is an event containing all sample points of the
sample space
Tree Diagram is a graphical tool used to determine the sample
space of random experiments
Venn Diagram is a way of graphically portraying the sample
space and various events
Mutually Exclusive Events are events that do not share any
sample point
Compound Events:
Intersection of two events A and B is the compound event
containing the sample points that belong to both A and B
Union of two events A and B is the compound event
containing the sample points that belong to either A or B Complement of any event A is the compound event containing the sample points that belong to S (sample space) and do not belong to A
Conditional Probability is a probability calculated on a
reduced sample space. This reduced sample space is given by a pre-established event or condition
Independent Events are events such that the occurrence of one
of them does not affect the probability of the other
Contingency Table is a two-way table containing frequency
data on two categorical variables
Probability Tree is a tree diagram involving probabilities of
given events
Probability Rules:
Addition P(A U B) = P(A) + P(B) – P(A ∩ B)
Complement P( AC ) = 1 – P(A)
Conditional P(A│B) = P(A ∩ B) / P(B)
Multiplication P(A ∩ B) = P(A) P(B│A) or
= P(B) P(A│B) Chapter 4, Part I: Discrete Probability Distributions
Random Variable is a numerical variable whose values are
associated with a random experiment and therefore cannot be predicted with certainty
Types of random variables: Discrete and Continuous.
Discrete random variables are random variables defined
on isolated real numbers. They are typically used for counting.
Continuous random variables are random variables
defined on a line interval of real numbers. They are typically used for measuring.
Discrete Probability Distribution is a table, graph or formula
assigning probabilities to each value of a discrete random variable.
Probability Histogram is a graphical representation of a
discrete probability distribution, associating the heights of bars with the given probabilities.
Point-line probability graph is a graphical representation of a
discrete probability distribution, associating the heights of vertical lines with the given probabilities.
Mean μ of a discrete probability distribution is the expected
value of any given discrete random variable “X”. The expected value of “X” takes into account not only the X-values but also their associated probabilities. Standard Deviation σ of a discrete probability distribution describes the variability of any discrete random variable “X” relative to the mean μ. It takes into account not only the deviation of X-values relative to the mean but also their associated probabilities.
Binomial experiment is a random experiment involving a
number of identical and independent trials in which there are only two possible outcomes (success and failure).
Binomial random variable is a discrete random variable
describing the number of successes in a binomial experiment.
Parameters of the binomial probability distribution are the
number of trials “n” and the rate of success “p” (probability of success for each trial).
Poisson experiment is a random experiment in which the
number of occurrences of a given event during a specified period of time is observed. The occurrences of the event are assumed to be random and independent one to another.
Poisson random variable is a discrete random variable
describing the number of occurrences of a given event during a specified period of time.
Parameter of the Poisson probability distribution is the
population or historical mean number of occurrences of the given event during a specified period of time. Chapter 4, Part II: Continuous Probability Distributions
Continuous random variable is a random variable that can
assume any value inside an interval of real numbers.
Density curve is a smooth curve associated with the relative
frequency graph of a continuous random variable
Continuous Probability Distribution is the probability model
for a continuous random variable where probabilities are calculated as areas under the associated density curve
Normal random variable is a continuous random variable with
a density curve that is smooth, symmetric, and bell-shaped.
Normal or Bell Curve is the density curve for a normal
random variable
Normal probability distribution is the probability model for a
normal random variable.
Parameters of a normal probability distribution are the mean
and standard deviation of the associated normal random variable.
Normal population is a population where a normal random
variable has been defined.
Standard normal variable is a normal random variable with a
mean of zero and standard deviation of one.
Z-scores are values of the standard normal variable. They
indicate the number of standard deviations that any raw score (or value of any normal random variable) deviates from the mean. Chapter 5. Sampling Distributions
Review of basic concepts from chapter 1
Inferential statistics scheme
Population Sample Representative sample Simple random sampling Statistical inference
Parameter is a descriptive numerical measure of the
population. Parameters are fixed numbers usually unknown because the associated population is very large
Statistic is a descriptive numerical measure of a sample.
Statistics are used to estimate parameters and vary from sample to sample.
Sampling Distribution is the probability distribution (model)
associated with any statistic when repeated samples (of the same size) are drawn from the defined population.
Central Limit Theorem is a statistical property stating that the
sampling distribution of the sample mean is approximately normal when the sample size is large enough. Chapter 6. Estimation with confidence intervals
Estimation is the process of estimating or predicting the value
of a population parameter using a random sample and an estimator.
Estimator is a formula or statistic defined on sample data with
the purpose of estimating a parameter.
Estimate is a numerical result obtained by substituting the
sample data on any given estimator.
Types of estimates: point estimate and interval estimate.
Point estimate consists of a single figure used to predict the
value of a population parameter.
Interval estimate consists of a numerical range where the
parameter is expected to fall with certain confidence.
Confidence coefficient is a probability that measures the
reliability of any interval estimate.
Confidence level is the confidence coefficient expressed as a
percentage.
Confidence interval is an interval estimate calculated with a
specified confidence level.
Margin of error is a measure of the error of estimation
involving the given confidence level and sample size. Precision of any confidence interval is associated with the margin of error of the estimate. The precision is better as the margin of error is smaller. Hence, the narrower the confidence interval the more precise the interval estimate is. Chapter 7. Tests of Hypotheses based on a single sample
Research hypothesis is a statement or claim about
population parameters that can be tested using sample data.
Statistical hypotheses are the null and alternative
hypotheses.
Alternative hypothesis (Ha) describes the research
hypothesis of the problem.
Null hypothesis (Ho) describes the opposite of the
alternative hypothesis.
Test Statistic is a formula that summarizes the statistical
evidence collected in any test of hypotheses.
Rejection region is the set of values of the test statistic
indicating sufficient/convincing evidence against Ho.
Type I error consists of rejecting the null hypothesis Ho
when Ho is actually true.
Type II error consists of failing to reject the null
hypothesis Ho when Ho is actually false.
Alpha designates the probability of type I error.
Beta designates the probability of type II error.
p-value is a probability that measures the strength of the evidence against Ho (that is, in favor of Ha). The p-value of any statistical test describes the observed probability of type I error.