0% found this document useful (0 votes)
53 views19 pages

8614 - Assignment 2 Solved (AG)

This document discusses inferential statistics and how it is used in educational research. Inferential statistics allow researchers to make inferences about populations based on data from samples. It can be used to estimate population parameters and test hypotheses about populations. When collecting data from a sample, inferential statistics account for sampling error, which is the difference between population parameters and sample statistics. Common inferential techniques discussed include point estimates, interval estimates, and confidence intervals, which provide a range that a population parameter is likely to fall within based on a sample.

Uploaded by

Malik Alyan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views19 pages

8614 - Assignment 2 Solved (AG)

This document discusses inferential statistics and how it is used in educational research. Inferential statistics allow researchers to make inferences about populations based on data from samples. It can be used to estimate population parameters and test hypotheses about populations. When collecting data from a sample, inferential statistics account for sampling error, which is the difference between population parameters and sample statistics. Common inferential techniques discussed include point estimates, interval estimates, and confidence intervals, which provide a range that a population parameter is likely to fall within based on a sample.

Uploaded by

Malik Alyan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Course: Educational Statistics (8614) Semester: Autumn, 2022

ASSIGNMENT-02

Q.1 How is mean calculated? Also discuss its merits and demerits.
Ans-

Mode is the value that occurs with the greatest number of frequency. For
example- if the given set of values are 2, 3, 2, 4, 5, 2, 3, 1, 6 the mode here
would be 2 which appears thrice. However, when there are 2 or more values
appearing with same frequencies then the mode is said to be ill-defined. Such
series is called as bi-modal or multi-modal.
Mode is an appropriate measure than average and median under certain
circumstances. For instance, while studying the income earned by the workers
in a company, mode reflects the wages earned by a large number of workers.
Average income of the workers, on the other hand, may be much higher just
because few employees in higher positions are earning a very high level of
income.
Majority votes are considered in decision making where mode is applied to see
the choice preferred by a large number of people.
1. Individual Observations:
Example 1:
Calculate the mode from the data given below showing the marks obtained by
10 students.
75, 80, 82, 76, 82, 74, 75, 79, 82, 70
Solution:
The mode here is 82 as it appears with the highest frequency.
2. Discrete Series:
Example 2:
Calculate the mode for the data pertaining to the size of shoes.
Course: Educational Statistics (8614) Semester: Autumn, 2022

Solution:
The mode here is 6 as it has the highest frequency.
3. Continuous Series:
Mode for a data in the form of a continuous series is calculated using the
formula

Example 3:
Calculate the mode from the data given below pertaining to the marks
obtained by the students in a test.

Solution:
By observation, it is known that the modal class is 40 – 50 as this class has the
highest frequency.

Calculation of Mode – Grouping Method:


Ascertaining the mode by mere observation can be erroneous when there is a
very low frequency preceding or succeeding the highest frequency. In such
cases, a grouping table and an analysis table is prepared to ascertain the modal
class. A grouping table consists of six columns. The maximum frequency is
marked in the first column.
Course: Educational Statistics (8614) Semester: Autumn, 2022

The frequencies are grouped in two’s in the second column. In the third
column, the first frequency is left out and the remaining frequencies are
grouped in two’s. In the fourth column, the frequencies are grouped in three’s.
In the fifth column, the first frequency is left out and the remaining
frequencies are grouped in three’s. In the sixth column, the first two
frequencies are left out and the remaining frequencies are grouped in three’s.
In each of these columns the maximum value is observed.
The analysis table is prepared taking the column numbers on the left and the
probable values of mode on the right. The probable values of mode are those
values against which the frequencies are the highest in the grouping table. The
values are entered by means of a bar in the analysis table. The column total is
then taken and the one which has the maximum value is the modal value.
Example 4:
Calculate the value of mode for the following data:

The modal value is 25 as it has the maximum total of 5 bars.


Merits of Mode:
1. It can be easily observed from the data.
2. It is easy to compute.
3. It is unaffected by extreme values.
4. Mode can be determined even if the distribution has open end class.
5. It can also be determined easily by graphic method.
6. It is easy to understand.
Demerits of Mode:
1. Mode is ill-defined when there are distributions with two modes.
2. It is not based on all the values.
3. It cannot be accurate when there are sampling fluctuations.
Course: Educational Statistics (8614) Semester: Autumn, 2022

4. When mode is computed through different methods, the value may differ in
each of the methods.

Q.2 What is meant by inferential statistics? How and why is it used in


educational research?
Ans-

While descriptive statistics summarize the characteristics of a data


set, inferential statistics help you come to conclusions and make predictions
based on your data.

When you have collected data from a sample, you can use inferential statistics
to understand the larger population from which the sample is taken.

Inferential statistics have two main uses:

• making estimates about populations (for example, the mean SAT score
of all 11th graders in the US).
• testing hypotheses to draw conclusions about populations (for example,
the relationship between SAT scores and family income).

Table of contents

1. Descriptive versus inferential statistics


2. Estimating population parameters from sample statistics
3. Hypothesis testing
4. Frequently asked questions about inferential statistics

Descriptive versus inferential statistics


Descriptive statistics allow you to describe a data set, while inferential
statistics allow you to make inferences based on a data set.

Descriptive statistics
Using descriptive statistics, you can report characteristics of your data:

• The distribution concerns the frequency of each value.


Course: Educational Statistics (8614) Semester: Autumn, 2022

• The central tendency concerns the averages of the values.


• The variability concerns how spread out the values are.

In descriptive statistics, there is no uncertainty – the statistics precisely


describe the data that you collected. If you collect data from an entire
population, you can directly compare these descriptive statistics to those from
other populations.

Inferential statistics
Most of the time, you can only acquire data from samples, because it is too
difficult or expensive to collect data from the whole population that you’re
interested in.

While descriptive statistics can only summarize a sample’s characteristics,


inferential statistics use your sample to make reasonable guesses about the
larger population.

With inferential statistics, it’s important to use random and unbiased sampling
methods. If your sample isn’t representative of your population, then you can’t
make valid statistical inferences or generalize.

Example: Inferential statisticsYou randomly select a sample of 11th graders in


your state and collect data on their SAT scores and other characteristics.
You can use inferential statistics to make estimates and test hypotheses about
the whole population of 11th graders in the state based on your sample data.

Sampling error in inferential statistics


Since the size of a sample is always smaller than the size of the population,
some of the population isn’t captured by sample data. This creates sampling
error, which is the difference between the true population values (called
parameters) and the measured sample values (called statistics).

Sampling error arises any time you use a sample, even if your sample is
random and unbiased. For this reason, there is always some uncertainty in
inferential statistics. However, using probability sampling methods reduces this
uncertainty.

Estimating population parameters from sample statistics


The characteristics of samples and populations are described by numbers
called statistics and parameters:
Course: Educational Statistics (8614) Semester: Autumn, 2022

• A statistic is a measure that describes the sample (e.g., sample mean).


• A parameter is a measure that describes the whole population (e.g.,
population mean).

Sampling error is the difference between a parameter and a corresponding


statistic. Since in most cases you don’t know the real population parameter,
you can use inferential statistics to estimate these parameters in a way that
takes sampling error into account.

There are two important types of estimates you can make about the
population: point estimates and interval estimates.

• A point estimate is a single value estimate of a parameter. For instance,


a sample mean is a point estimate of a population mean.
• An interval estimate gives you a range of values where the parameter is
expected to lie. A confidence interval is the most common type of
interval estimate.

Both types of estimates are important for gathering a clear idea of where a
parameter is likely to lie.

Confidence intervals
A confidence interval uses the variability around a statistic to come up with an
interval estimate for a parameter. Confidence intervals are useful for
estimating parameters because they take sampling error into account.

While a point estimate gives you a precise value for the parameter you are
interested in, a confidence interval tells you the uncertainty of the point
estimate. They are best used in combination with each other.

Each confidence interval is associated with a confidence level. A confidence


level tells you the probability (in percentage) of the interval containing the
parameter estimate if you repeat the study again.

A 95% confidence interval means that if you repeat your study with a new
sample in exactly the same way 100 times, you can expect your estimate to lie
within the specified range of values 95 times.

Although you can say that your estimate will lie within the interval a certain
percentage of the time, you cannot say for sure that the actual population
parameter will. That’s because you can’t know the true value of the population
parameter without collecting data from the full population.
Course: Educational Statistics (8614) Semester: Autumn, 2022

However, with random sampling and a suitable sample size, you can
reasonably expect your confidence interval to contain the parameter a certain
percentage of the time.

Example: Point estimate and confidence intervalYou want to know the average
number of paid vacation days that employees at an international company
receive. After collecting survey responses from a random sample, you calculate
a point estimate and a confidence interval.
Your point estimate of the population mean paid vacation days is the sample
mean of 19 paid vacation days.

With random sampling, a 95% confidence interval of [16 22] means you can be
reasonably confident that the average number of vacation days is between 16
and 22.

Hypothesis testing
Hypothesis testing is a formal process of statistical analysis using inferential
statistics. The goal of hypothesis testing is to compare populations or assess
relationships between variables using samples.

Hypotheses, or predictions, are tested using statistical tests. Statistical tests


also estimate sampling errors so that valid inferences can be made.

Statistical tests can be parametric or non-parametric. Parametric tests are


considered more statistically powerful because they are more likely to detect
an effect if one exists.

Parametric tests make assumptions that include the following:

• the population that the sample comes from follows a normal


distribution of scores
• the sample size is large enough to represent the population
• the variances, a measure of variability, of each group being compared
are similar

When your data violates any of these assumptions, non-parametric tests are
more suitable. Non-parametric tests are called “distribution-free tests”
because they don’t assume anything about the distribution of the population
data.
Course: Educational Statistics (8614) Semester: Autumn, 2022

Statistical tests come in three forms: tests of comparison, correlation or


regression.

Comparison tests
Comparison tests assess whether there are differences in means, medians or
rankings of scores of two or more groups.

To decide which test suits your aim, consider whether your data meets the
conditions necessary for parametric tests, the number of samples, and
the levels of measurement of your variables.

Means can only be found for interval or ratio data, while medians and rankings
are more appropriate measures for ordinal data.

Comparison test Parametric? What’s being Samples


compared?

t test Yes Means 2 samples

ANOVA Yes Means 3+


samples

Mood’s median No Medians 2+


samples

Wilcoxon signed-rank No Distributions 2 samples

Wilcoxon rank-sum (Mann- No Sums of rankings 2 samples


Whitney U)

Kruskal-Wallis H No Mean rankings 3+


samples

Correlation tests
Correlation tests determine the extent to which two variables are associated.
Course: Educational Statistics (8614) Semester: Autumn, 2022

Although Pearson’s r is the most statistically powerful test, Spearman’s r is


appropriate for interval and ratio variables when the data doesn’t follow a
normal distribution.

The chi square test of independence is the only test that can be used
with nominal variables.

Correlation test Parametric? Variables

Pearson’s r Yes Interval/ratio variables

Spearman’s r No Ordinal/interval/ratio
variables

Chi square test of No Nominal/ordinal variables


independence

Regression tests
Regression tests demonstrate whether changes in predictor variables cause
changes in an outcome variable. You can decide which regression test to use
based on the number and types of variables you have as predictors and
outcomes.

Most of the commonly used regression tests are parametric. If your data is not
normally distributed, you can perform data transformations.

Data transformations help you make your data normally distributed using
mathematical operations, like taking the square root of each value.

Q.3 Discuss the characteristics of correlation. Also explain the importance of


p-value in interpreting correlation.
Ans-

Correlation and P value


The two most commonly used statistical tests for establishing relationship
between variables are correlation and p-value. Correlation is a way to test if
Course: Educational Statistics (8614) Semester: Autumn, 2022

two variables have any kind of relationship, whereas p-value tells us if the
result of an experiment is statistically significant. In this tutorial, we will be
taking a look at how they are calculated and how to interpret the numbers
obtained.

What is correlation?
Correlation coefficient is used in statistics to measure how strong a
relationship is between two variables. There are several types of correlation
coefficients (e.g. Pearson, Kendall, Spearman), but the most commonly used is
the Pearson’s correlation coefficient. This coefficient is calculated as a number
between -1 and 1 with 1 being the strongest possible positive correlation and -
1 being the strongest possible negative correlation.

A positive correlation means that as one number increases the second number
will also increase. A negative correlation means that as one number increases
the second number decreases. However, correlation does not always imply
causation — correlation does not tell us whether change in one number is
directly caused by the other number, only that they typically move together.
Learn more about the Pearson correlation formula, and how to implement it in
SQL here. To understand how correlation works, let’s look at a chart of height
vs weight.

We can observe that with increase in weight, the height also increases – which
indicates they are positively correlated. Also, the correlation coefficient in this
case is 0.88, which supports our finding. Learn more about correlation and how
to implement it in Excel here.
Course: Educational Statistics (8614) Semester: Autumn, 2022

What is a p-value?
P-value evaluates how well your data rejects the null hypothesis, which states
that there is no relationship between two compared groups. Successfully
rejecting this hypothesis tells you that your results may be statistically
significant. In academic research, p-value is defined as the probability of
obtaining results ‘as extreme’ or ‘more extreme’, given that the null
hypothesis is true — essentially, how likely it is that you would receive the
results (or more dramatic results) you did assuming that there is no correlation
or relationship (e.g. the thing that you’re testing) among the subjects. To
understand what this means, let us look at an example.

We are going to conduct an experiment to check if a coin is biased or not. To


do this, let’s flip a coin 10 times. Intuitively, we can say that the probability of
getting 5 heads and 5 tails is highest, followed by 6 heads and 4 tails or 6 tails
and 4 heads, and so on. So first, let’s state the null and alternate hypothesis.
Since the assumption is that the coin is fair, our null hypothesis is “The coin is
unbiased with equal probability of heads and tails”. We are conducting the
experiment to prove/disprove the claim, so our alternative hypothesis is “The
coin is biased with unequal probability of heads and tails”

Assuming the null hypothesis is true (the coin is fair), let’s calculate the
probabilities of the various possible outputs i.e 0 heads & 10 tails, 1 head & 9
tails, 2 heads & 8 tails, and so on.

The probabilities are calculated using the probability of a binomial distribution,


which gives the probability of r successes in n trials using the formula :

nCr * (p)r * (1-p)n-r

Where,
n = no. of trials = 10
r = no. of successes (heads)
p = probability of a success = 1/2
1-p = probability of a failure = 1/2

Let’s consider a ‘success’ to be when heads appears in the coin toss. Also, it
won’t make a difference if ‘success’ is considered to be heads or tails. Let’s first
calculate the probability of obtaining 5 heads and 5 tails in 10 coin flips.

P(5 heads and 5 tails) = 10C5 * (½)5 * (½)5 = 0.24609375


Course: Educational Statistics (8614) Semester: Autumn, 2022

Similarly, let’s generate the probabilities of all other possible combinations of


heads and tails:

Let’s plot the probabilities to understand the intuition behind the above
calculation:

We can observe from the chart that the probability of getting 5 heads is the
highest, and the probability of getting 0 heads or 0 tails is the lowest. Now,
let’s assume we get the output of this experiment as “9 heads and 1 tail”.

Let us calculate the p-value of the experiment. To reiterate the definition – “p


value is the probability of obtaining results as extreme or more extreme,
given the null hypothesis is true”.

Now, we add the probabilities of all the possible outputs of the experiment
which are as probable as ‘9 heads and 1 tail’ and less probable than ‘9 heads
and 1 tail’.

P-value = P(9 heads and 1 tail) + P(10 heads and 0 tail) + P(9 tails and 1 head) +
P(10 tails and 0 heads)
Course: Educational Statistics (8614) Semester: Autumn, 2022

= 0.009765625 + 0.000976563 + 0.009765625 + 0.000976563 = 0.02148437 =


0.02 (approx.)

Now, we need to check whether the p-value is significant or not. This is done
by specifying a significance cutoff, known as the alpha value. Alpha is usually
set to 0.05, meaning the probability of achieving the same or more extreme
results assuming the null hypothesis is 5%. If the p-value is less than the
specified alpha value, then we reject the null hypothesis. Hence, we reject the
hypothesis that ““The coin is fair with equal probability of heads and
tails” and conclude that the coin is biased.

Conclusion
Though correlation and p-value provides us with the relationship between
variables, care should be taken to interpret them correctly. Correlation tells us
whether two variables have any sort of relationship and it does not imply
causation. If two variables A and B are highly correlated, there are several
possible explanations: (a) A influences B; (b) B influences A; (c) A and B are
influenced by one or more additional variables; (d) the relationship observed
between A and B was a chance error. Similarly, p-value should not be misused
to produce a statistically significant result. If analysis is done by exhaustively
searching various combinations of variables for correlation, then it is known
as p-hacking.

Q.4 Explain the rationale of applying ANOVA in educational statistics.


Ans-
Analysis of variance (ANOVA) is an analysis tool used in statistics that splits an
observed aggregate variability found inside a data set into two parts:
systematic factors and random factors. The systematic factors have a
statistical influence on the given data set, while the random factors do not.
Analysts use the ANOVA test to determine the influence that independent
variables have on the dependent variable in a regression study.

The t- and z-test methods developed in the 20th century were used for
statistical analysis until 1918, when Ronald Fisher created the analysis of
variance method.12 ANOVA is also called the Fisher analysis of variance, and
it is the extension of the t- and z-tests. The term became well-known in 1925,
after appearing in Fisher's book, "Statistical Methods for Research
Workers."3 It was employed in experimental psychology and later expanded
to subjects that were more complex.
Course: Educational Statistics (8614) Semester: Autumn, 2022

What Does the Analysis of Variance Reveal?


The ANOVA test is the initial step in analyzing factors that affect a given data
set. Once the test is finished, an analyst performs additional testing on the
methodical factors that measurably contribute to the data set's inconsistency.
The analyst utilizes the ANOVA test results in an f-test to generate additional
data that aligns with the proposed regression models.

The ANOVA test allows a comparison of more than two groups at the same
time to determine whether a relationship exists between them. The result of
the ANOVA formula, the F statistic (also called the F-ratio), allows for the
analysis of multiple groups of data to determine the variability between
samples and within samples.

If no real difference exists between the tested groups, which is called the null
hypothesis, the result of the ANOVA's F-ratio statistic will be close to 1. The
distribution of all possible values of the F statistic is the F-distribution. This is
actually a group of distribution functions, with two characteristic numbers,
called the numerator degrees of freedom and the denominator degrees of
freedom.

Example of How to Use ANOVA


A researcher might, for example, test students from multiple colleges to see if
students from one of the colleges consistently outperform students from the
other colleges. In a business application, an R&D researcher might test two
different processes of creating a product to see if one process is better than
the other in terms of cost efficiency.

The type of ANOVA test used depends on a number of factors. It is applied


when data needs to be experimental. Analysis of variance is employed if there
is no access to statistical software resulting in computing ANOVA by hand. It is
simple to use and best suited for small samples. With many experimental
designs, the sample sizes have to be the same for the various factor level
combinations.

ANOVA is helpful for testing three or more variables. It is similar to multiple


two-sample t-tests. However, it results in fewer type I errors and is
appropriate for a range of issues. ANOVA groups differences by comparing the
means of each group and includes spreading out the variance into diverse
sources. It is employed with subjects, test groups, between groups and within
groups.
Course: Educational Statistics (8614) Semester: Autumn, 2022

One-Way ANOVA Versus Two-Way ANOVA


There are two main types of ANOVA: one-way (or unidirectional) and two-
way. There also variations of ANOVA. For example, MANOVA (multivariate
ANOVA) differs from ANOVA as the former tests for multiple dependent
variables simultaneously while the latter assesses only one dependent
variable at a time. One-way or two-way refers to the number of independent
variables in your analysis of variance test. A one-way ANOVA evaluates the
impact of a sole factor on a sole response variable. It determines whether all
the samples are the same. The one-way ANOVA is used to determine whether
there are any statistically significant differences between the means of three
or more independent (unrelated) groups.

A two-way ANOVA is an extension of the one-way ANOVA. With a one-way,


you have one independent variable affecting a dependent variable. With a
two-way ANOVA, there are two independents. For example, a two-way
ANOVA allows a company to compare worker productivity based on two
independent variables, such as salary and skill set. It is utilized to observe the
interaction between the two factors and tests the effect of two factors at the
same time.

Q.5 Discuss chi-square distribution. Why and where is it used?


Ans-
A chi-square (χ2) statistic is a test that measures how a model compares to
actual observed data. The data used in calculating a chi-square statistic must
be random, raw, mutually exclusive, drawn from independent variables, and
drawn from a large enough sample. For example, the results of tossing a fair
coin meet these criteria.

Chi-square tests are often used to test hypotheses. The chi-square statistic
compares the size of any discrepancies between the expected results and the
actual results, given the size of the sample and the number of variables in the
relationship.

For these tests, degrees of freedom are used to determine if a certain null
hypothesis can be rejected based on the total number of variables and
samples within the experiment. As with any statistic, the larger the sample
size, the more reliable the results.
Course: Educational Statistics (8614) Semester: Autumn, 2022

What Does a Chi-Square Statistic Tell You?


There are two main kinds of chi-square tests: the test of independence, which
asks a question of relationship, such as, "Is there a relationship between
student gender and course choice?"; and the goodness-of-fit test, which asks
something like "How well does the coin in my hand match a theoretically fair
coin?"1

Chi-square analysis is applied to categorical variables and is especially useful


when those variables are nominal (where order doesn't matter, like marital
status or gender).2

Independence
When considering student gender and course choice, a χ2 test for
independence could be used. To do this test, the researcher would collect
data on the two chosen variables (gender and courses picked) and then
compare the frequencies at which male and female students select among the
offered classes using the formula given above and a χ2 statistical table.2

If there is no relationship between gender and course selection (that is, if they
are independent), then the actual frequencies at which male and female
students select each offered course should be expected to be approximately
equal, or conversely, the proportion of male and female students in any
selected course should be approximately equal to the proportion of male and
female students in the sample.2

A χ2 test for independence can tell us how likely it is that random chance can
explain any observed difference between the actual frequencies in the data
and these theoretical expectations.

Goodness-of-Fit
χ2 provides a way to test how well a sample of data matches the (known or
assumed) characteristics of the larger population that the sample is intended
to represent. This is known as goodness of fit.

If the sample data do not fit the expected properties of the population that
we are interested in, then we would not want to use this sample to draw
conclusions about the larger population.3
Course: Educational Statistics (8614) Semester: Autumn, 2022

Example
For example, consider an imaginary coin with exactly a 50/50 chance of
landing heads or tails and a real coin that you toss 100 times. If this coin is fair,
then it will also have an equal probability of landing on either side, and the
expected result of tossing the coin 100 times is that heads will come up 50
times and tails will come up 50 times.4

In this case, χ2 can tell us how well the actual results of 100 coin flips compare
to the theoretical model that a fair coin will give 50/50 results. The actual toss
could come up 50/50, or 60/40, or even 90/10. The farther away the actual
results of the 100 tosses is from 50/50, the less good the fit of this set of
tosses is to the theoretical expectation of 50/50, and the more likely we might
conclude that this coin is not actually a fair coin.4

When to Use a Chi-Square Test


A chi-square test is used to help determine if observed results are in line with
expected results, and to rule out that observations are due to chance.

A chi-square test is appropriate for this when the data being analyzed are
from a random sample, and when the variable in question is a categorical
variable.2 A categorical variable is one that consists of selections such as type
of car, race, educational attainment, male or female, or how much somebody
likes a political candidate (from very much to very little).

These types of data are often collected via survey responses or


questionnaires. Therefore, chi-square analysis is often most useful in analyzing
this type of data.

How to Perform a Chi-Square Test


These are the basic steps whether you are performing a goodness of fit test or
a test of independence:

• Create a table of the observed and expected frequencies;


• Use the formula to calculate the chi-square value;
• Find the critical chi-square value using a chi-square value table or
statistical software;
• Determine whether the chi-square value or the critical value is the
larger of the two;
• Reject or accept the null hypothesis.5
Course: Educational Statistics (8614) Semester: Autumn, 2022

Limitations of the Chi-Square Test


The chi-square test is sensitive to sample size. Relationships may appear to be
significant when they aren't simply because a very large sample is used.

In addition, the chi-square test cannot establish whether one variable has a
causal relationship with another. It can only establish whether two variables
are related.

What Is a Chi-square Test Used for?


Chi-square is a statistical test used to examine the differences between
categorical variables from a random sample in order to judge goodness of fit
between expected and observed results.

Who Uses Chi-Square Analysis?


Since chi-square applies to categorical variables, it is most used by researchers
who are studying survey response data. This type of research can range from
demography to consumer and marketing research to political science and
economics.

Is Chi-Aquare Analysis Used When the Independent Variable Is Nominal or


Ordinal?
A nominal variable is a categorical variable that differs by quality, but whose
numerical order could be irrelevant. For instance, asking somebody their
favorite color would produce a nominal variable. Asking somebody's age, on
the other hand, would produce an ordinal set of data. Chi-square can be best
applied to nominal data.

The Bottom Line


There are two types of chi-square tests: the test of independence and the test
of goodness of fit. Both are used to determine the validity of a hypothesis or
an assumption. The result is a piece of evidence that can be used to make a
decision. For example:

In a test of independence, a company may want to evaluate whether its new


product, an herbal supplement that promises to give people an energy boost,
is reaching the people who are most likely to be interested. It is being
advertised on websites related to sports and fitness, on the assumption that
active and health-conscious people are most likely to buy it. It does an
extensive poll that is intended to evaluate interest in the product by
demographic group. The poll suggests no correlation between interest in this
product and the most health-conscious people.
Course: Educational Statistics (8614) Semester: Autumn, 2022

In a test of goodness of fit, a marketing professional is considering launching a


new product that the company believes will be irresistible to women over 45.
The company has conducted product testing panels of 500 potential buyers of
the product. The marketing professional has information about the age and
gender of the test panels, This allows the construction of a chi-square test
showing the distribution by age and gender of the people who said they would
buy the product. The result will show whether or not the likeliest buyer is a
woman over 45. If the test shows that men over 45 or women between 18 and
44 are just as likely to buy the product, the marketing professional will revise
the advertising, promotion, and placement of the product to appeal to this
wider group of customers.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy