0% found this document useful (0 votes)
34 views57 pages

Introduction To Statistical Concepts - New1

Uploaded by

Jivyansh Mittal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views57 pages

Introduction To Statistical Concepts - New1

Uploaded by

Jivyansh Mittal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Introduction to Statistical

Concepts
Dr. Pooja Goel
Department of Commerce
Types of Statistics
Descriptive Statistics

Techniques that allow us to tabulate, summarize, and


depict a collection of data in an abbreviated fashion.
Inferential Statistics

Techniques that allow us to employ inductive reasoning


to infer the properties of an entire group or collection of
individuals, a population, from a small number of those
individuals, a sample.
Population
A population is defined as all members of a well-defined group.

A population may be large in scope as well as narrow.....

The key is that the population is well-defined.


Sample

A sample is defined as a
subset of a population.
The key is that the
sample consists of some,
but not all, of the
members of the
population.
Parameter

A parameter is defined as a characteristic of a


population.

• Statistic

• A statistic is defined as a
characteristic of a sample.

• The field has become known as


“statistics” simply because we are
almost always dealing with sample
statistics because population data are
rarely obtained.
In what situations one may choose sampling over
census

When measuring every item in the population is:

Impossible
Inconvenient
Expensive
Homogenous
Stages in the Selection of a Sample
Define the
Target
Population

Select a
sampling
frame

Choosing
sample
method

Procedure for
selecting
sampling

Determine
the sampling
size

Select actual
sampling
units
Types of sampling method

A.Probability-based sampling (Random) method

1. Simple random sampling


2. Stratified random sampling
a. Proportionate stratified random sampling
b. Disproportionate stratified random sampling
3. Systematic Random sampling
4. Cluster sampling
5. Area Sampling
Systematic Sampling
Systematic Sampling
A management information systems researcher wanted to sample the manufacturers in
Texas. He had enough financial support to sample 1,000 companies (n). The Directory of
Texas Manufacturers listed approximately 17,000 total manufacturers in Texas (N) in
alphabetical order. The value of k was 17 (17,000/1,000) and the researcher selected every
17th company in the directory for his sample. In selecting every kth value, a simple
random number table should be used to select a value between 1 and k inclusive as a
starting point. The second element for the sample is the starting point plus k. In the
example, k = 17, so the researcher would have gone to a table of random numbers to
determine a starting point between 1 and 17. Suppose he selected the number 5. He
would have started with the 5th company, then selected the 22nd (5 + 17), and then
the 39th, and so on.
Systematic Sampling
A city’s telephone book lists 100,000 people. If the telephone book is the frame for a study,
how large would the sample size be if systematic sampling were done on every 200th
person?

If every 11th item is systematically sampled to produce a sample size of 75 items,


approximately how large is the population?

If a company employs 3,500 people and if a random sample of 175 of these employees has
been taken by systematic sampling, what is the value of k? The researcher would start the
sample selection between what two values? Where could the researcher obtain a frame for
this study?
Let’s try this….
Q:A pharmaceutical company wants to trace the effects of a new
drug on patients with specific health problems (muscular
dystrophy, sickle cell anemia, rheumatoid arthritis, etc.). It then
contacts such individuals and, with a group of voluntarily
consenting patients, tests the drug. Which sampling technique this
company has used?

Judgement Sampling
Let’s try this….
Q:A human resources director is interested in knowing why staff resign.
Which sampling will be useful in this case for conducting exit
interviews of all members completing their final papers in the human
resources department on the same day, before resigning. The sample
chosen for interview will be based on a simple random sampling of the
various clusters of personnel resigning on different days. The interviews
will help to understand the reasons for turnover of a heterogeneous
group of individuals (i.e., from various departments), and the study can
be conducted at a low cost.

Cluster Sampling
Types of sampling method

A.Non-Probability based sampling (Non- random)


method
1. Convenience sampling
2. Judgement sampling
3. Quota sampling
4. Snowball sampling
Which Sampling Technique to use?
What is the relevant target population of focus to the study?

What exactly are the parameters we are interested in investigating?

What kind of a sampling frame is available?

What costs are attached to the sampling design?

How much time is available to collect the data from the sample?
Comparative Analysis of Sampling Techniques
Comparative Analysis of Sampling Techniques
Comparative Analysis of Sampling Techniques
Errors associated with Sampling
Errors associated with sampling

A. Random Sampling Errors


chance variation in the sample result and result of the census

B. Systematic Sampling error


Errors resulting from some imperfect aspect of the research design that
causes respondent error or from a mistake in the execution of the
research. The many possible non-sampling errors include missing data,
recording errors, input processing errors, and analysis errors. Other
non-sampling errors result from the measurement instrument, such as
errors of unclear definitions, defective questionnaires, and poorly
conceived concepts.
Classification of Errors

29-08-2024 Dr. Pooja Goel 22


Respondent Error
A sample bias category results from some respondent action or inaction such as
nonresponse or response bias.

Non-response error: can occur when subjects who refuse to take part in a
study, or who drop out before the study can be completed, are systematically
different from those who participate.

Refusals, self-selection bias,

29-08-2024 Dr. Pooja Goel 23


Response Bias: A bias that occurs when respondents either consciously or
unconsciously tend to answer questions with a certain slant that
misrepresents the truth.

Deliberate Falsification: A response bias may occur when people


misrepresent answers to appear intelligent, conceal personal information,
avoid embarrassment, and so on.

Unconscious Misrepresentation: Even when a respondent is consciously


trying to be truthful and cooperative, response bias can arise from the
question format, the question content, or some other stimulus.

29-08-2024 Dr. Pooja Goel 24


Overestimating Patient Satisfaction When companies conduct surveys to learn about customer
satisfaction, they face an important challenge: do the responses represent a cross-section of
customers? Maybe just the happiest or most angry customers participate. This problem also
occurs when the “customers” are the patients of a health-care provider. To investigate this issue,
a group of researchers in Massachusetts studied data from patient satisfaction surveys that rated
6,681 patients’ experiences with 82 primary-care physicians (internists and family practitioners)
at a health maintenance organization. These ratings represented response rates ranging from 11
to 55 percent, depending on the physician being rated. The researchers compared their
information about response rates with a set of simulated data for which they knew the
underlying distribution of responses. They found that the actual data closely matched simulated
data in which responses were biased so that responses were more likely when satisfaction was
higher. The researchers concluded that there was a significant correlation between the response
rate and average (mean) satisfaction rating. In other words, more-satisfied patients were more
likely to complete and return the survey. Thus, if the HMO were to use the data to evaluate how
satisfied patients are with their doctors, it would overestimate satisfaction. Also, it would have
less information about its lower-performing doctors. The researchers therefore concluded that it
is important to follow up with subjects to encourage greater response from less-satisfied patients
Acquiescence Bias. Some respondents are very agreeable. They seem to agree to
practically every statement they are asked about. A tendency to agree (or disagree)
with all or most questions is known as acquiescence bias.

Extremity Bias. Some individuals tend to use extremes when responding to questions.
For example, they may choose only “1” or “10” on a ten-point scale.

Interviewer Bias. Response bias may arise from the interplay between interviewer
and respondent. If the interviewer’s presence influences respondents to give untrue or
modified answers.

29-08-2024 Dr. Pooja Goel 26


Social Desirability Bias. Social desirability bias may occur either
consciously or unconsciously because the respondent wishes to create a
favorable impression or save face in the presence of an interviewer.

Administrative Error: An error caused by the improper administration


or execution of the research task.

Data Processing errors: occurs because of incorrect data entry,


incorrect computer programming, or other proce dural errors during data
analysis.

29-08-2024 Dr. Pooja Goel 27


Sample selection error: results in an unrepresentative sample because of an
error in either the sample design or the execution of the sampling procedure.

Interviewer error: Mistakes made by interviewers failing to record survey


responses correctly

Interviewer cheating: occurs when an interviewer falsifies entire


questionnaires or fills in answers to questions that have been intentionally
skipped.

29-08-2024 Dr. Pooja Goel 28


Determining the Sample size
Is a large sample better than a small sample?

Is it more representative?

How large the sample size should be?


Factors affecting decisions on sample size:

• The research objective


• The extent of precision desired (the confidence interval);
• The acceptable risk in predicting that level of precision
(confidence level)
• The amount of variability in the population itself;
• The cost and time constraints;
• In some cases, the size of the population itself.
Precision

Precision refers to how close our estimate is to


the true population characteristic.

We estimate the population parameter to fall


within a range, based on the sample estimate.
Precision
Precision is a function of the range of variability in the sampling
distribution of the sample mean.

S is the standard deviation of the sample, n is the sample size, and


indicates the standard error or the extent of precision offered by the
sample.

Standard error varies inversely with the square root of the sample size.
Hence, if we want to reduce the standard error given a particular
standard deviation in the sample, we need to increase the sample size.
Confidence
Confidence denotes how certain we are that our estimates will really hold true
for the population.

Other things being equal, the narrower the range, the lower the confidence. In
other words, there is a trade-off between precision and confidence for any
given sample size.

The level of confidence can range from 0 to 100%. 95% confidence is the
conventionally accepted level for most business research, most commonly
expressed by denoting the significance level as p ≤ 0.05.

In other words, we say that at least 95 times out of 100 our estimate will
reflect the true population characteristic.
Precision VS Confidence Level
Sample data, Precision, and Confidence in Estimation
For example, we may want to estimate the mean dollar value of
purchases made by customers when they shop at department stores.
From a sample of 64 customers sampled through a systematic
sampling design procedure, we may find that the sample mean =105 ,
and the sample standard deviation S = 10, the sample mean, is a point
estimate of μ, the population mean. We could construct a confidence
interval around X to estimate the range within which μ will fall. The
standard error and the percentage or level of confidence we require
will determine the width of the interval, which can be represented by
the following formula, where K is the t statistic for the level of
confidence desired.
Sample data, Precision, and Confidence in Estimation
Sample data, Precision, and Confidence in Estimation

If we desire a 90% confidence level in the above case, then μ = 105


± 1.645 (1.25) (i.e., μ = 105 ± 2.056). μ thus falls between 102.944
and 107.056. These results indicate that using a sample size of 64,
we could state with 90% confidence that the true population mean
value of purchases for all customers would fall between $102.94 and
$107.06. If we now want to be 99% confident of our results without
increasing the sample size, we necessarily have to sacrifice
precision, as may be seen from the following calculation: μ = 105 ±
2.576 (1.25).
Sample data, Precision, and Confidence in Estimation

For a population with a known variance of 185, a sample of 64


individuals leads to 217 as an estimate of the mean.

a) Find the standard error of the mean

b) Establish an interval estimate that should include the population


mean 68.3 percent of the time.

c) Ans: a. 1.7 b. 215.3 to 218.7


Sample data, Precision, and Confidence in Estimation

For a population with a known variance of 185, a sample of 64


individuals leads to 217 as an estimate of the mean.

a) Find the standard error of the mean

b) Establish an interval estimate that should include the population


mean 68.3 percent of the time.

c) Ans: a. 1.7 b. 215.3 to 218.7


Sample data, Precision, and Confidence in Estimation

Harman is interested in purchasing a used car. He randomly selected


125 want ads and found that the average price of a car in this sample
was $ 3,250. Harman knows that the standard deviation of used-car
prices in this city is $615. (se = 55.01)

a. Establish an interval estimate for the average price of a car so


that Harman can be 68.3 certain that population mean lies within
this interval.
b. Establish an interval estimate for the average price of a car so
that Harman can be 95.5 percent certain that the population mean
lies within this interval.
Critical Values
Sample data, Precision, and Confidence in Estimation

In other words, the width of the interval has increased and


we are now less precise in estimating the population mean,
though we are a lot more confident about our estimation. It
is not difficult to see that if we want to maintain our original
precision while increasing the confidence, or maintain the
confidence level while increasing precision, or we want to
increase both the confidence and the precision, we need a
larger sample size.
Sample data, Precision, and Confidence in Estimation

In sum, the sample size, n, is a function of:

1. the variability in the population


2. precision or accuracy needed
3. confidence level desired
4. type of sampling plan used – for example, simple
random sampling versus stratified random
sampling.
Normal Distribution

Characteristics
1. Continuous distribution
2. Symmetrical distribution about its
mean
3. Asymptotic to the horizontal axis
4. It is unimodal
5. Family of curves
6. Area under curve is 1
Standard Normal Distribution
Normal Distribution
Normal Distribution
Calculation of Sample Size in case of mean
Calculation of Sample Size in case of Proportion

The above formula will be used if the value of population


proportion (proportion of occurrence of the event) p is
known. If, however, p is unknown, we substitute the
maximum value of pq in the above formula. It can be
shown that the maximum value of pq is 1/4 when p = 1/2
and q = 1/2.
Sample Size
An economist is interested in estimating the average monthly household expenditure
on food items by the households of a town. Based on past data, it is estimated that
the standard deviation of the population on the monthly expenditure on food item is
rupees 30. With allowable error set at rupees 7, estimate the sample size required at a
90 per cent confidence.
Sample Size
A consumer electronics company wants to determine the job satisfaction levels
of its employees. For this, they ask a simple question, ‘Are you satisfied with
your job?’ It was estimated that no more than 30 per cent of the employees
would answer yes. What should be the sample size for this company to
estimate the population proportion to ensure a 95 per cent confidence in result,
and to be within 0.04 of the true population proportion?
Sample Size
A manager of a department store would like to study women’s spending per
year on cosmetics. He is interested in knowing the population proportion of
women who purchase their cosmetics primarily from his store. If he wants to
have a 90 per cent confidence of estimating the true proportion to be within ±
0.045, what sample size is needed?
Sample Size (Proportion)
Last year, Lipton tea company conducted a mall-intercept survey at six regional
malls around the country and found that 20 % of the public preferred tea over coffee
as a midafternoon hot drink. This year, Lipton wants to have a nationwide telephone
survey performed with random digit dialing. What sample size should be used in this
year’s study in order to achieve an accuracy level of ±2.5% at the 99% level of
confidence? What about at the 95% level of confidence.
Finite Population Corrector

A policymaker is interested to know the percentage of companies that


are interested in a substance abuse counselling programme for their
employees offered by a local hospital. At 95% level of confidence , and
the director of Councelling services at Bharat Hospital would like the
results to be accurate ±5%. The total number of companies is 1000.
How much be the sample size.
Normal Disribution
Khaitan & Khaitan, an auditor for a large credit card company, knows
that, on average, the monthly balance of any given customer is $112, and
the standard deviation is $56. If Mary audits 50 randomly selected
accounts, what is the probability that the sample average monthly balance
is

• Below $100

• Between $ 100 and $ 130


In a sample of 16 observations from a normal distribution with a mean of
150 and a variance of 256, what is

a) P(͞X< 160)?

b) P(͞X >142)?

If instead of 16 observations, 9 observations are taken, find the answer


In a normal distribution with mean 56 and standard deviation 21, how
large a sample must be taken so that there will be atleast a 90 percent
chance that its mean is greater than 52?

In a normal distribution with mean 375 and standard deviation 48, how
large a sample must be taken so that the probability will be at least 0.95
that the sample mean falls between 370 and 380.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy