0% found this document useful (0 votes)
17 views92 pages

Unit 2 Rizvi Sir

The document provides an overview of data types, collection methods, and analyses, emphasizing the importance of accurate data gathering for research integrity. It distinguishes between qualitative and quantitative data, detailing their strengths and weaknesses, and outlines various data collection methods such as surveys, interviews, and observations. Additionally, it categorizes data collection into primary and secondary methods, highlighting the unique characteristics and applications of each approach.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views92 pages

Unit 2 Rizvi Sir

The document provides an overview of data types, collection methods, and analyses, emphasizing the importance of accurate data gathering for research integrity. It distinguishes between qualitative and quantitative data, detailing their strengths and weaknesses, and outlines various data collection methods such as surveys, interviews, and observations. Additionally, it categorizes data collection into primary and secondary methods, highlighting the unique characteristics and applications of each approach.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 92

Data Collection and Analyses

Unit II
Content….
• Data and its Type
• Data Collection
• Qualitative Analysis
• Quantitative Analyses
• Hypothesis Testing
• Null and Alternative Hypothesis
• Chi-Square Test
• F Test
Data
• In computing, data is information that has been translated into a form that is efficient for
movement or processing.

• Data is a collection of facts, such as numbers, words, measurements, observations or just


descriptions of things.

• Data is a collection of information gathered by observations, measurements, research or


analysis.

• They may consist of facts, numbers, names, figures or even description of things. Data is
organized in the form of graphs, charts or tables.

• In the pursuit of knowledge, data is a collection of discrete values that convey information,
describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences
of symbols that may be further interpreted.
Types of Data
Data can be qualitative or quantitative.

• Qualitative data is descriptive information (it describes something)

• Quantitative data is numerical information (numbers)


Qualitative Data
• Qualitative data is a set of information which can not be measured using numbers. It
generally consist of words, subjective narratives.

• Result of an qualitative data analysis can come in form of highlighting key words, extracting
information and concepts elaboration. For example, a study on parents perception about the
current education system for their kids.

• The resulted information collected from them might be in narrative form and you need to
deduce the analysis that they are satisfied, un-satisfied or need improvement in certain areas
and so on.
Qualitative Data
Strength

Better understanding - Qualitative data gives a better understanding of the perspectives and
needs of participants.

Provides Explaination - Qualitative data along with quantitative data can explain the result of
the survey and can measure the correction of the quantitative data.

Better Identification of behavior patterns - Qualitative data can provide detailed information
which can prove itself useful in identification of behavioral patterns.
Weakness

Lesser reachability - Being subjective in nature, small population is generally covered to


represent the large population.

Time Consuming - Qualitative data is time consuming as large data is to be understood.

Possibility of Bias - Being subjective analysis; evaluator bias is quite feasible.


Quantitative Data
• Quantitative data is a set of numbers collected from a group of people and involves statistical
analysis. For example if you conduct a satisfaction survey from participants and ask them to
rate their experience on a scale of 1 to 5.

• You can collect the ratings and being numerical in nature, you will use statistical techniques
to draw conclusions about participants satisfaction.

• Quantitative data can be Discrete or Continuous:


o Discrete data can only take certain values (like whole numbers)
o Continuous data can take any value (within a range)

• Put simply: Discrete data is counted, Continuous data is measured


Quantitative Data
Strength

Specific Quantitative data is clear and specific to the survey conducted.


High Reliability: If collected properly, quantitative data is normally accurate and hence highly
reliable.
Easy communication: Quantitative data is easy to communicate and elaborate using charts,
graphs etc.
Existing support: Many large datasets may be already present that can be analyzed to check the
relevance of the survey.

Weakness

Limited Options - Respondents are required to choose from limited options.


High Complexity - Qualitative data may need complex procedures to get correct sample.
Require Expertise - Analysis of qualitative data requires certain expertise in statistical analysis.
Data Collection

• Data collection is the process of gathering, measuring, and analyzing accurate data from a
variety of relevant sources to find answers to research problems, answer questions, evaluate
outcomes, and forecast trends and probabilities.

• Accurate data collection is necessary to make informed business decisions, ensure quality
assurance, and keep research integrity.

• During data collection, the researchers must identify the data types, the sources of data, and
what methods are being used.

• Regardless of the field of study or preference for defining data (quantitative, qualitative),
accurate data collection is essential to maintaining the integrity of research.

• Both the selection of appropriate data collection instruments (existing, modified, or newly
developed) and clearly delineated instructions for their correct use reduce the likelihood of
errors occurring.
Common Data Collection Methods
Method When to use How to collect data
Experiment To test a causal relationship. Manipulate variables and measure their
effects on others.
Survey To understand the general Distribute a list of questions to a sample
characteristics or opinions of a group of online, in person or over-the-phone.
people.
Interview/focus group To gain an in-depth understanding of Verbally ask participants open-ended
perceptions or opinions on a topic. questions in individual interviews or
focus group discussions.
Observation To understand something in its natural Measure or survey a sample without
setting. trying to affect them.
Ethnography To study the culture of a community or Join and participate in a community and
organization first-hand. record your observations and
reflections.
Archival research To understand current or historical Access manuscripts, documents or
events, conditions or practices. records from libraries, depositories or
the internet.
Secondary data collection To analyze data from populations that Find existing datasets that have already
you can’t access first-hand. been collected, from sources such as
government agencies or research
organizations.
Data Collection Methods

Data collection methods can be divided into two categories: secondary methods of data
collection and primary methods of data collection.

Secondary Data Collection Methods

Primary Data Collection Methods


Secondary Data Collection Methods
• Secondary data is a type of data that has already been published in books, newspapers,
magazines, journals, online portals etc. There is an abundance of data available in these
sources about your research area in business studies, almost regardless of the nature of the
research area. Therefore, application of appropriate set of criteria to select secondary data to
be used in the study plays an important role in terms of increasing the levels of research
validity and reliability.

• These criteria include, but not limited to date of publication, credential of the author, reliability
of the source, quality of discussions, depth of analyses, the extent of contribution of the text to
the development of the research area etc. Secondary data collection is discussed in greater
depth in Literature Review chapter.

• Secondary data collection methods offer a range of advantages such as saving time, effort and
expenses. However they have a major disadvantage. Specifically, secondary research does not
make contribution to the expansion of the literature by producing fresh (new) data.
Primary Data Collection Methods
Primary data is the type of data that has not been around before. Primary data is unique findings
of your research. Primary data collection and analysis typically requires more time and effort to
conduct compared to the secondary data research. Primary data collection methods can be
divided into two groups:

1. Quantitative data Collection

2. Qualitative data Collection


Primary Data Collection Methods
1.Quantitative data collection methods are based on mathematical calculations in various
formats. Methods of quantitative data collection and analysis include questionnaires with closed-
ended questions, methods of correlation and regression, mean, mode and median and others.

Quantitative methods are cheaper to apply and they can be applied within shorter duration of
time compared to qualitative methods. Moreover, due to a high level of standardisation of
quantitative methods, it is easy to make comparisons of findings.

2.Qualitative research methods, on the contrary, do not involve numbers or mathematical


calculations. Qualitative research is closely associated with words, sounds, feeling, emotions,
colours and other elements that are non-quantifiable.

Qualitative studies aim to ensure greater level of depth of understanding and qualitative data
collection methods include interviews, questionnaires with open-ended questions, focus groups,
observation, game or role-playing, case studies etc.

Your choice between quantitative or qualitative methods of data collection depends on the area of
your research and the nature of research aims and objectives.
Qualitative Data Collection Methods
Qualitative Data Collection Methods
Interviews

Interviews are one of the most common qualitative data-collection methods, and they’re a
great approach when you need to gather highly personalized information. Informal,
conversational interviews are ideal for open-ended questions that allow you to gain rich,
detailed context.
The interview questionnaire is designed in the
manner to elicit the interviewee’s knowledge
or perspective related to a topic, program, or
issue.
At times, depending on the interviewer’s
approach, the conversation can be
unstructured or informal but focused on
understanding the individual’s beliefs, values,
understandings, feelings, experiences, and
perspectives of an issue.
Qualitative Data Collection Methods
Qualitative surveys

To develop an informed hypothesis, many


researchers use qualitative surveys
for data collection or to collect a piece of
detailed information about a product or an
issue. If you want to create questionnaires
for collecting textual or qualitative data,
then ask more open-ended questions. To
answer such questions, the respondent
has to write his/her opinion or
perspective concerning a specific topic or
issue. Unlike other qualitative data
collection methods, online surveys have a
wider reach wherein many people can
provide you quality data that is highly
credible and valuable.
Qualitative Data Collection Methods
Qualitative surveys

Paper surveys
The paper questionnaires are frequently used for qualitative data collection from the participants.
The survey consists of short text questions, which are often open-ended. These questions’ motive
is to collect as much detailed information as possible in the respondents’ own words. More often,
the survey questionnaires are designed to collect standardized data hence used to collect
responses from a larger population or large sample size.

Online surveys
An online survey or a web survey is prepared using a prominent online survey software and either
uploaded in a website or emailed to the selected sample size with a motive to collect reliable
online data. Instead of writing down responses, the respondents use computers and keyboards to
type their answers. With an online survey questionnaire, it becomes easier and smoother to collect
qualitative data.
In addition to that, online surveys have a wider reach, and the respondent is not pressurized to
answer each question under the interviewer’s supervision. One of the significant benefits that
online surveys offer is that they allow the respondents to take the survey on any device, be it a
desktop, tablet, or mobile.
Qualitative Data Collection Methods
Focus group discussions

• Focus group discussions can also be considered a type of interview, but it is conducted in a
group discussion setting. Usually, the focus group consists of 8 – 10 people (the size may vary
depending on the researcher’s requirement).

• The researchers ensure appropriate space is given to the participants to discuss a topic or issue
in a context. The participants are allowed to either agree or disagree with each other’s
comments.

• With a focused group discussion, researchers know how a particular group of participants
perceives the topic. Researchers analyze what participants think of an issue, the range of
opinions expressed, and the ideas discussed.

• The data is collected by noting down the variations or inconsistencies (if any exists) in the
participants, especially in terms of belief, experiences, and practice.
Qualitative Data Collection Methods
Focus group discussions

• The participants of focused group discussions are selected based on the topic or issues for
which the researcher wants actionable insights.

• For example, is the research is about the recovery of college students from drug addiction, the
participants have to be a college student, studying and recovering from the drug addiction.

• Other parameters such as age, qualification, financial background, social presence, and
demographics are also considered but not primarily, as the group needs diverse participants.

• Frequently, the qualitative data collected through focused group discussion is more descriptive
and highly detailed.
Qualitative Data Collection Methods
Observations

• Observation is one of the traditional qualitative data collection methods used by researchers to gather
descriptive text data by observing people and their behavior at events or in their natural settings.
• Observation is a useful qualitative data collection method, especially when you want to study the ongoing
process, situation, or reactions on a specific issue related to the people being observed.

• In this method, the researcher is completely immersed in watching or seeing people by taking a
participatory stance to take down notes. Aside from taking notes, different techniques such as videos,
photographs, audio recordings, tangible items like artifacts, and souvenirs are also be used.

• There are two main types of observation,


o Covert: In this method, the observer is concealed without letting anyone know that they are being
observed. For example, a researcher studying the rituals of a wedding in nomadic tribes must join
them as a guest and quietly see everything that goes around him.
o Overt: In this method, everyone is aware that they are being watched. For example, A researcher or
an observer wants to study the wedding rituals of a nomadic tribe. To proceed with the research, the
observer or researcher can reveal why he is attending the marriage and even use a video camera to
shoot everything around him.
Quantitative Data Collection Methods
Quantitative Data Collection Methods
Probability sampling
A definitive method of sampling carried out by utilizing some form of random selection and
enabling researchers to make a probability statement based on data collected at random from the
targeted demographic. One of the best things about probability sampling is it allows researchers
to collect the data from representatives of the population they are interested in studying. Besides,
the data is collected randomly from the selected sample rules out the possibility of sampling bias.

There are three significant types of probability sampling


• Simple random sampling: More often, the targeted demographic is chosen for inclusion in the
sample.

• Systematic random sampling: Any of the targeted demographic would be included in the
sample, but only the first unit for inclusion in the sample is selected randomly, rest are selected
in the ordered fashion as if one out of every ten people on the list.

• Stratified random sampling: It allows selecting each unit from a particular group of the
targeted audience while creating a sample. It is useful when the researchers are selective about
including a specific set of people in the sample, i.e., only males or females, managers or
executives, people working within a particular industry.
Quantitative Data Collection Methods
Interviews

Interviewing people is a standard method used for data collection. However, the interviews
conducted to collect quantitative data are more structured, wherein the researchers ask only a
standard set of questionnaires and nothing more than that.

There are three major types of interviews conducted for data collection
• Telephone interviews: For years, telephone interviews ruled the charts of data collection
methods. However, nowadays, there is a significant rise in conducting video interviews using
the internet, Skype, or similar online video calling platforms.

• Face-to-face interviews: It helps in acquiring quality data as it provides a scope to ask


detailed questions and probing further to collect rich and informative data.

• Computer-Assisted Personal Interviewing (CAPI): It is nothing but a similar setup of the


face-to-face interview where the interviewer carries a desktop or laptop along with him at the
time of interview to upload the data obtained from the interview directly into the database.
Quantitative Data Collection Methods
Surveys

• Conducting surveys is the most common quantitative data-collection method. Unlike


qualitative surveys, in which participants answer open-ended questions and can share as much
detail as they’d like, close-ended surveys ask respondents to answer yes or no and/or multiple
choice questions.

• These surveys can also gather demographic data — like age, gender, income, or occupation.

• Another type of closed-ended survey question may ask respondents to rate something along a
scale, for example, by presenting a statement and asking if the respondent strongly agrees,
agrees, disagrees, or strongly disagrees.

• The surveys are designed in a manner to legitimize the behavior and trust of the respondents.

• Participants can respond to surveys online or through the mail.


Quantitative Data Collection Methods
Surveys

There are two significant types of survey questionnaires used to collect online data for
quantitative research.

1. Web-based questionnaire: This is one of the ruling and most trusted methods for internet-
based research or online research. In a web-based questionnaire, the receive an email
containing the survey link, clicking on which takes the respondent to a secure online survey
tool from where he/she can take the survey or fill in the survey questionnaire. Being a cost-
efficient, quicker, flexible and having a wider reach, web-based surveys are more preferred by
the researchers.
2. Mail Questionnaire: In a mail questionnaire, the survey is mailed out to a host of the sample
population, enabling the researcher to connect with a wide range of audiences. The mail
questionnaire typically consists of a packet containing a cover sheet that introduces the
audience about the type of research and reason why it is being conducted along with a
prepaid return to collect data online. One of the major benefits of the mail questionnaire is all
the responses are anonymous, and respondents are allowed to take as much time as they want
to complete the survey and be completely honest about the answer without the fear of
prejudice.
Quantitative Data Collection Methods
Observation

• Observation is a simple method of gathering quantitative data in which researchers observe or


count subjects attending a specific event or using a service in a designated locale. It’s a way to
retrieve numerical data that focuses on the “what” rather than the “why.”

• Collecting data this way is often referred to as “structured observation,” in which researchers
focus on observing, then quantifying specific narrowly defined behaviors.

• In this method, researchers collect quantitative data through systematic observations by using
techniques like counting the number of people present at the specific event at a particular time
and a particular venue or number of people attending the event in a designated place.
Quantitative Data Collection Methods
Document review

Document review is a process in which researchers analyze quantitative data they’ve found in
existing primary documents, such as public records and personal documents. Researchers can use
the supplementary data found in these documents to strengthen and support data from other
quantitative data-collection methods.

Three primary document types are being analyzed for collecting supporting quantitative research
data
•Public Records: Under this document review, official, ongoing records of an organization are
analyzed for further research. For example, annual reports policy manuals, student activities,
game activities in the university, etc.
•Personal Documents: In contrast to public documents, this type of document review deals with
individual personal accounts of individuals’ actions, behavior, health, physique, etc. For example,
the height and weight of the students, distance students are traveling to attend the school, etc.
•Physical Evidence: Physical evidence or physical documents deal with previous achievements of
an individual or of an organization in terms of monetary and scalable growth.
Quantitative Data Analysis
• Qualitative Data Analysis (QDA) is the range of processes and procedures whereby we move
from the qualitative data that have been collected into some form of explanation,
understanding or interpretation of the people and situations we are investigating.

• QDA is usually based on an interpretative philosophy. The idea is to examine the meaningful
and symbolic content of qualitative data

• Qualitative data analysis (QDA) is the process of organizing, analyzing, and interpreting
qualitative data—non-numeric, conceptual information and user feedback—to capture themes
and patterns, answer research questions, and identify actions to take to improve your product
or website.
Quantitative Data Analysis Methods
Here are five methods of qualitative data analysis :

1. Content analysis

2. Thematic analysis

3. Narrative analysis

4. Grounded theory analysis

5. Discourse analysis
Qualitative Data Analysis Methods
1 Content analysis

• Content analysis is a research method that examines and quantifies the presence of certain
words, subjects, and concepts in text, image, video, or audio messages. The
method transforms qualitative input into quantitative data to help you make reliable
conclusions about what customers think of your brand, and how you can improve their
experience and opinion.

• Content analysis is the procedure for the categorization of verbal or behavioural data for the
purpose of classification, summarization and tabulation

• The content can be analyzed on two levels


– Descriptive: What is the data?
– Interpretative: what was meant by the data?
Qualitative Data Analysis Methods
Examples of content analysis include:

• Analyzing brand mentions on social media to understand your brand's


reputation

• Reviewing customer feedback to evaluate (and then improve) the


customer and user experience (UX)

• Researching competitors’ website pages to identify their competitive


advantages and value propositions

• Interpreting customer interviews and survey results to determine user


preferences, and setting the direction for new product or feature
developments
Qualitative Data Analysis Methods
Content analysis benefits and challenges

Content analysis has some significant advantages for small teams:


1. You don’t need to directly interact with participants to collect data
2. The process is easily replicable once standardized
3. You can automate the process or perform it manually
4. It doesn’t require high investments or sophisticated solutions

On the downside, content analysis has certain limitations:


5. When conducted manually, it can be incredibly time-consuming
6. The results are usually affected by subjective interpretation
7. Manual content analysis can be subject to human error
8. The process isn’t effective for complex textual analysis
Qualitative Data Analysis Methods
2 Thematic analysis

Thematic analysis helps to identify, analyze, and interpret patterns in


qualitative data, and can be done with tools like Dovetail and Thematic.
While content analysis and thematic analysis seem similar, they're different
in concept:

Content analysis can be applied to both qualitative and quantitative


data, and focuses on identifying frequencies and recurring words and
subjects.

Thematic analysis can only be applied to qualitative data, and focuses on


identifying patterns and ‘themes’.
Qualitative Data Analysis Methods
Thematic analysis benefits and challenges

Some benefits of thematic analysis:


1. It’s one of the most accessible analysis forms, meaning you don’t have
to train your teams on it
2. Teams can easily draw important information from raw data
3. It’s an effective way to process large amounts of data into digestible
summaries

And some drawbacks of thematic analysis:


4. In a complex narrative, thematic analysis can't capture the true
meaning of a text
5. Thematic analysis doesn’t consider the context of the data being
analyzed
6. Similar to content analysis, the method is subjective and might drive
results that don't necessarily align with reality
Qualitative Data Analysis Methods
3 Narrative analysis

• Narratives are transcribed experiences

• Every interview/observation has narrative aspect-the researcher has to sort-out and reflect up
on them, enhance them, and present them in a revised shape to the reader

• The core activity in narrative analysis is to reformulate stories presented by people in different
contexts and based on their different experiences

• Narrative analysis is a method used to interpret research participants’ stories—things like


testimonials, case studies, interviews, and other text or visual data—with tools like Delve and
AI-powered ATLAS.ti.

• Some formats narrative analysis doesn't work for are heavily-structured interviews and written
surveys, which don’t give participants as much opportunity to tell their stories in their own
words.
Qualitative Data Analysis Methods
Narrative analysis benefits and challenges

Benefits of narrative analysis are:


1. The method provides you with a deep understanding of your customers'
actions—and the motivations behind them
2. It allows you to personalize customer experiences
3. It keeps customer profiles as wholes, instead of fragmenting them into
components that can be interpreted differently

However, this data analysis method also has drawbacks:


4. Narrative analysis cannot be automated
5. It requires a lot of time and manual effort to make conclusions on an
individual participant’s story
6. It’s not scalable
Qualitative Data Analysis Methods
4 Grounded theory analysis

• Grounded theory analysis is a method of conducting qualitative research


to develop theories by examining real-world data.

• The technique involves the creation of hypotheses and theories through


the collection and evaluation of qualitative data, and can be performed
with tools like MAXQDA and Delve.

• Unlike other qualitative data analysis methods, this technique


develops theories from data, not the other way round.
Qualitative Data Analysis Methods
Grounded theory analysis benefits and challenges

Benefits of grounded theory analysis:


1. It explains events that can’t be explained with existing theories
2. The findings are tightly connected to data
3. The results are data-informed, and therefore represent the proven state
of things
4. It’s a useful method for researchers that know very little information on
the topic

Some drawbacks of grounded theory are:


5. The process requires a lot of objectivity, creativity, and critical thinking
from researchers
6. Because theories are developed based on data instead of the other way
around, it's considered to be overly theoretical, and may not provide
concise answers to qualitative research questions
Qualitative Data Analysis Methods
5 Discourse analysis

• Discourse analysis is the act of researching the underlying meaning of


qualitative data. It involves the observation of texts, audio, and videos
to study the relationships between the information and its
context.
• In contrast to content analysis, the method focuses on the contextual
meaning of language: discourse analysis sheds light on what audiences
think of a topic, and why they feel the way they do about it.
• A method of analyzing a naturally occurring talk (spoken interaction) and all types of
written texts.
• Focus on ordinary people method of producing and making sense of everyday social life:
How language is used in everyday situations? – Sometimes people express themselves in a
simple and straightforward way – Sometimes people express themselves vaguely and
indirectly – Analyst must refer to the context when interpreting the message as the same
phenomenon can be described in a number of different ways depending on context
Qualitative Data Analysis Methods
Discourse analysis benefits and challenges

Discourse analysis has the following benefits:


1. It uncovers the motivation behind your customers’ or employees’ words,
written or spoken
2. It helps teams discover the meaning of customer data, competitors’
strategies, and employee feedback

But it also has drawbacks:


3. Similar to most qualitative data analysis methods, discourse analysis is
subjective
4. The process is time-consuming and labor-intensive
5. It’s very broad in its approach
Quantitative Data Analysis Methods
What Is Descriptive Statistics?

• Descriptive Statistics describes the characteristics of a data set. It is a simple technique to


describe, show and summarize data in a meaningful way.
• You simply choose a group you’re interested in, record data about the group, and then use
summary statistics and graphs to describe the group properties.
• There is no uncertainty involved because you’re just describing the people or items that you
actually measure. You’re not aiming to infer properties about a large data set.
• Descriptive statistics involves taking a potentially sizeable number of data points in the sample
data and reducing them to certain meaningful summary values and graphs.
• The process allows you to obtain insights and visualize the data rather than simply pouring
through sets of raw numbers. With descriptive statistics, you can describe both an
entire population and an individual sample.
Quantitative Data Analysis Methods
Types of Descriptive Statistics

There are three major types of Descriptive Statistics.

1. Frequency Distribution

2. Central Tendency

3. Variability or Dispersion
Quantitative Data Analysis Methods
1. Frequency Distribution

• Frequency distribution is used to show how often a response is given for quantitative as well as
qualitative data. It shows the count, percent, or frequency of different outcomes occurring in a
given data set.
• Frequency distribution is usually represented in a table or graph. Bar charts, histograms, pie
charts, and line charts are commonly used to present frequency distribution.
• Each entry in the graph or table is accompanied by how many times the value occurs in a
specific interval, range, or group.
• These tables of graphs are a structured way to depict a summary of grouped data classified on
the basis of mutually exclusive classes and the frequency of occurrence in each respective class.
Quantitative Data Analysis Methods
2. Central Tendency

• Central tendency includes the descriptive summary of a dataset using a single value that
reflects the center of the data distribution.
• It locates the distribution by various points and is used to show average or most commonly
indicated responses in a data set.
• Measures of central tendency or measures of central location include the mean, median, and
mode.
• Mean refers to the average or most common value in a data set, while the median is the middle
score for the data set in increasing order, and mode is the most frequent value.
Quantitative Data Analysis Methods
3. Variability or Dispersion

• A measure of variability identifies the range, variance, and standard deviation of scores in a
sample.
• This measure denotes the range and width of distribution values in a data set and determines
how to spread apart the data points are from the center.
• The range shows the degree of dispersion or the difference between the highest and lowest
values within the data set.
• The variance refers to the degree of the spread and is measured as an average of the squared
deviations.
• The standard deviation determines the difference between the observed score in the data set
and the mean value.
• This descriptive statistic is useful when you want to show how to spread out your data is and
how it affects the mean.
• Descriptive Statistics is also used to determine measures of position, which describes how a
score ranks in relation to another. This statistic is used to compare scores to a normalized score
like determining percentile ranks and quartile ranks.
Quantitative Data Analysis Methods
What Is Inferential Statistics?

• In Inferential Statistics, the focus is on making predictions about a large group of data based on
a representative sample of the population.
• A random sample of data is considered from a population to describe and make inferences
about the population.
• This technique allows you to work with a small sample rather than the whole population.
• Since inferential statistics make predictions rather than stating facts, the results are often in
the form of probability.
• The accuracy of inferential statistics depends largely on the accuracy of sample data and how it
represents the larger population.
• This can be effectively done by obtaining a random sample. Results that are based on non-
random samples are usually discarded.
• Random sampling - though not very straightforward always – is extremely important for
carrying out inferential techniques.
Quantitative Data Analysis Methods
Types of Inferential Statistics

Inferential Statistics helps to draw conclusions and make predictions based on a data set. It is
done using several techniques, methods, and types of calculations. Some of the most important
types of inferential statistics calculations are:

1. Regression Analysis

2. Hypothesis Tests

3. Confidence Intervals
Quantitative Data Analysis Methods
1. Regression Analysis

• Regression models show the relationship between a set of independent variables and a
dependent variable.

• This statistical method lets you predict the value of the dependent variable based on different
values of the independent variables.

• Hypothesis tests are incorporated to determine whether the relationships observed in sample
data actually exist in the data set.
Quantitative Data Analysis Methods
2. Hypothesis Tests

• Hypothesis testing is used to compare entire populations or assess relationships between


variables using samples.

• Hypotheses or predictions are tested using statistical tests so as to draw valid inferences.
Quantitative Data Analysis Methods
3. Confidence Intervals

• The main goal of inferential statistics is to estimate population parameters, which are mostly
unknown or unknowable values.

• A confidence interval observes the variability in a statistic to draw an interval estimate for a
parameter.

• Confidence intervals take uncertainty and sampling error into account to create a range of
values within which the actual population value is estimated to fall.

• Each confidence interval is associated with a confidence level that indicates the probability in
the percentage of the interval to contain the parameter estimate if you repeat the study.
Hypothesis Testing
What is Hypothesis?

A hypothesis is an educated guess about something in the world around you. It should be
testable, either by experiment or observation. For example:

• A new medicine you think might work.


• A way of teaching you think might be better.
• A possible location of new species.
• A fairer way to administer standardized tests.
• It can really be anything at all as long as you can put it to the test.
What is a Hypothesis Statement?
If you are going to propose a hypothesis, it’s customary to write a statement. Your
statement will look like this:

“If I…(do this to an independent variable)….then (this will happen to the dependent
variable).”
For example:
• If I (decrease the amount of water given to herbs) then (the herbs will increase in size).
• If I (give patients counseling in addition to medication) then (their overall depression
scale will decrease).
• If I (give exams at noon instead of 7) then (student test scores will improve).
• If I (look in this certain location) then (I am more likely to find new species).
A good hypothesis statement should:

• Include an “if” and “then” statement (according to the University of California).

• Include both the independent and dependent variables.

• Be testable by experiment, survey or other scientifically sound technique.

• Be based on information in prior research (either yours or someone else’s).


Null Hypothesis
• The Null Hypothesis is the assumption that the event will not occur. A null hypothesis has no
bearing on the study's outcome unless it is rejected.
• H0 is the symbol for it, and it is pronounced H-naught.

Alternate Hypothesis
• The Alternate Hypothesis is the logical opposite of the null hypothesis. The acceptance of the
alternative hypothesis follows the rejection of the null hypothesis. H1 is the symbol for it.

Example
A sanitizer manufacturer claims that its product kills 95 percent of germs on average.
To put this company's claim to the test, create a null and alternate hypothesis.

• H0 (Null Hypothesis): Average = 95%.


• Alternative Hypothesis (H1): The average is less than 95%.
What is Hypothesis testing?

• Hypothesis testing in statistics is a way for you to test the results of a survey or experiment to
see if you have meaningful results.
• You’re basically testing whether your results are valid by figuring out the odds that your results
have happened by chance.
• If your results may have happened by chance, the experiment won’t be repeatable and so has
little use.
• Hypothesis testing can be one of the most confusing aspects, mostly because before you can even
perform a test, you have to know what your null hypothesis is.

Examples of statistical hypothesis from real-life -


• A teacher assumes that 60% of his college's students come from lower-middle-class families.
• A doctor believes that 3D (Diet, Dose, and Discipline) is 90% effective for diabetic patients.
Step 1: State your null and alternate hypothesis
After developing your initial research hypothesis (the prediction that you want to investigate), it is important to
restate it as a null (Ho) and alternate (Ha) hypothesis so that you can test it mathematically.
The alternate hypothesis is usually your initial hypothesis that predicts a relationship between variables.
The null hypothesis is a prediction of no relationship between the variables you are interested in.

You want to test whether there is a relationship between gender and height. Based on your
knowledge of human physiology, you formulate a hypothesis that men are, on average, taller than women. To
test this hypothesis, you restate it as:
Ho: Men are, on average, not taller than women.
Ha: Men are, on average, taller than women.

Step 2: Collect data


For a statistical test to be valid, it is important to perform sampling and collect data in a way that is designed to
test your hypothesis. If your data are not representative, then you cannot make statistical inferences about the
population you are interested in.
To test differences in average height between men and women, your sample should have an equal proportion of
men and women, and cover a variety of socio-economic classes and any other variables that might influence
average height.
You should also consider your scope (Worldwide? For one country?) A potential data source in this case might be
census data, since it includes data from a variety of regions and social classes and is available for many countries
around the world.
Step 3: Perform a statistical test
There are a variety of statistical tests available, but they are all based on the comparison of within-group
variance (how spread out the data is within a category) versus between-group variance (how different the
categories are from one another).
If the between-group variance is large enough that there is little or no overlap between groups, then your
statistical test will reflect that by showing a low p-value. This means it is unlikely that the differences between
these groups came about by chance.
Alternatively, if there is high within-group variance and low between-group variance, then your statistical test
will reflect that with a high p-value. This means it is likely that any difference you measure between groups is
due to chance.
Step 4: Decide whether the null hypothesis is supported or refuted
Based on the outcome of your statistical test, you will have to decide whether your null hypothesis is supported
or refuted.
In most cases you will use the p-value generated by your statistical test to guide your decision. And in most
cases, your cutoff for refuting the null hypothesis will be 0.05 – that is, when there is a less than 5% chance that
you would see these results if the null hypothesis were true.
In your analysis of the difference in average height between men and women, you find that the p-value of
0.002 is below your cutoff of 0.05, so you decide to reject your null hypothesis of no difference.
Step 5: Present your findings
The results of hypothesis testing will be presented in the results and discussion sections of your research paper.
In the results section you should give a brief summary of the data and a summary of the results of your
statistical test (for example, the estimated difference between group means and associated p-value). In the
discussion, you can discuss whether your initial hypothesis was supported or refuted.
In the formal language of hypothesis testing, we talk about refuting or accepting the null hypothesis. You will
probably be asked to do this in your statistics assignments.
Stating results in a statistics assignment
In our comparison of mean height between men and women we found an average difference of 14.3cm and a p-
value of 0.002; therefore, we can refute the null hypothesis that men are not taller than women and conclude
that there is likely a difference in height between men and women.

However, when presenting research results in academic papers we rarely talk this way. Instead, we go back to
our alternate hypothesis (in this case, the hypothesis that men are on average taller than women) and state
whether the result of our test was consistent or inconsistent with the alternate hypothesis.
If your null hypothesis was refuted, this result is interpreted as being consistent with your alternate hypothesis.
One-Tailed Hypothesis Testing
The One-Tailed test, also called a directional test, considers a critical region of data that would
result in the null hypothesis being rejected if the test sample falls into it, inevitably meaning the
acceptance of the alternate hypothesis.
In a one-tailed test, the critical distribution area is one-sided, meaning the test sample is either
greater or lesser than a specific value.

There could be two possibilities, It could be Left-Tail Test and Right-Tail Test.

Left-Tail Test:
When we have to test, the observed mean is lesser than the hypothesized mean then it is Left-Tail
Test. It is also when the inequality of Alternate Hypothesis points to the left (a < symbol).
Ex- A consumer forum suspects that 300 gm pack of Narasu’s Coffee, is underweight.

Right-Tail Test:
When we have to test, the observed mean is more than the hypothesized mean then it is Right-
Tail Test. It is also when the inequality of Alternate Hypothesis points to the right (a > symbol).
Ex- The Food Administration Authority received complaints that Maggi Noodle pack contains lead
beyond permissible limits.
Left-Tail Test:

Right-Tail Test:
Two-Tailed Hypothesis Testing
In two tails, the test sample is checked to be greater or less than a range of values in a Two-Tailed
test, implying that the critical distribution area is two-sided.

If the sample falls within this range, the alternate hypothesis will be accepted, and the null
hypothesis will be rejected.

Example:
Suppose H0: mean = 50 and H1: mean not equal to 50
According to the H1, the mean can be greater than or less than 50. This is an example of a Two-
tailed test.

In a similar manner, if H0: mean >=50, then H1: mean <50


Here the mean is less than 50. It is called a One-tailed test.
Type 1 and Type 2 Error
Type 1 and Type 2 Error
A hypothesis test can result in two types of errors.

Type 1 Error: A Type-I error occurs when sample results reject the null hypothesis despite being
true.

Type 2 Error: A Type-II error occurs when the null hypothesis is not rejected when it is false,
unlike a Type-I error.

Example:
Suppose a teacher evaluates the examination paper to decide whether a student passes or fails.

H0: Student has passed


H1: Student has failed

Type I error will be the teacher failing the student [rejects H0] although the student scored the
passing marks [H0 was true].

Type II error will be the case where the teacher passes the student [do not reject H0] although the
student did not score the passing marks [H1 is true].
Level of Significance

The alpha value is a criterion for determining whether a test statistic is statistically significant. In a
statistical test, Alpha represents an acceptable probability of a Type I error. Because alpha is a
probability, it can be anywhere between 0 and 1. In practice, the most commonly used alpha values
are 0.01, 0.05, and 0.1, which represent a 1%, 5%, and 10% chance of a Type I error, respectively
(i.e. rejecting the null hypothesis when it is in fact correct).

P-Value

A p-value is a metric that expresses the likelihood that an observed difference could have occurred
by chance. As the p-value decreases the statistical significance of the observed difference
increases. If the p-value is too low, you reject the null hypothesis.
Chi-Square Test
Chi-Square Distributions

• As you know, there is a whole family of t-


distributions, each one specified by a
parameter called the degrees of freedom,
denoted df.

• Similarly, all the chi-square distributions form


a family, and each of its members is also
specified by a parameter df, the number of
degrees of freedom. Chi is a Greek letter
denoted by the symbol χ and chi-square is
often denoted by χ 2 .

• A chi-square random variable is a random


variable that assumes only positive values and
follows a chi-square distribution.
Chi-Square Distributions
Chi-Square Tests for Independence
Here we will investigate hypotheses that have to do with whether or not two random variables take
their values independently, or whether the value of one has a relation to the value of the other. Thus the
hypotheses will be expressed in words, not mathematical symbols. We build the discussion around the
following example.

Example: There is a theory that the gender of a baby in the womb is related to the baby’s heart rate:
baby girls tend to have higher heart rates. Suppose we wish to test this theory. We examine the heart
rate records of 40 babies taken during their mothers’ last prenatal checkups before delivery, and to
each of these 40 randomly selected records we compute the values of two random measures: 1) gender
and 2) heart rate. In this context these two random measures are often called factors. Since the burden
of proof is that heart rate and gender are related, not that they are unrelated, the problem of testing the
theory on baby gender and heart rate can be formulated as a test of the following hypotheses:
Chi-Square Tests for Independence
Example Continue….

The factor gender has two natural categories or levels: boy and girl. We divide the second factor, heart rate,
into two levels, low and high, by choosing some heart rate, say 145 beats per minute, as the cutoff between
them. A heart rate below 145 beats per minute will be considered low and 145 and above considered high.
The 40 records give rise to a 2 × 2 contingency table.

The four entries in boldface type are counts of observations from the sample of n = 40. There were 11 girls
with low heart rate, 17 boys with low heart rate, and so on. They form the core of the expanded table.
Chi-Square Tests for Independence
Example Continue….

In analogy with the fact that the probability of independent events is the product of the probabilities of
each event, if heart rate and gender were independent then we would expect the number in each core cell
to be close to the product of the row total R and column total C of the row and column containing it, divided
by the sample size n. Denoting such an expected number of observations E, these four expected values are:
1st row and 1st column: E = (R × C) ∕ n = 18 × 28 ∕ 40 = 12.6 1st row and 2nd column: E = (R × C) ∕ n = 18 × 12
∕ 40 = 5.4 2nd row and 1st column: E = (R × C) ∕ n = 22 × 28 ∕ 40 = 15.4 2nd row and 2nd column: E = (R × C) ∕
n = 22 × 12 ∕ 40 = 6.6
Chi-Square Tests for Independence
Example Continue….

A measure of how much the data deviate from what we would expect to see if the factors really were
independent is the sum of the squares of the difference of the numbers in each core cell, or, standardizing
by dividing each square by the expected number in the cell, the sum Σ(O − E) 2 ∕ E. We would reject the null
hypothesis that the factors are independent only if this number is large, so the test is right-tailed. In this
example the random variable Σ(O − E) 2 ∕ E has the chi-square distribution with one degree of freedom. If we
had decided at the outset to test at the 10% level of significance, the be, reading from table "Critical Values
of ChiSquare Distributions", χ 2 α = χ 2 0.10 = 2.706, so that the rejection region would be the interval
[2.706, ∞). When we compute the value of the standardized test statistic we obtain

Since 1.231 < 2.706, the decision is not to reject H0.


Chi-Square Tests for Independence
Example Continue….

Table
Critical Values of ChiSquare Distributions
Chi-Square Tests for Independence
Example Continue….

Baby Gender Prediction

The data do not provide sufficient evidence, at the 10% level of significance, to conclude that heart rate
and gender are related.
Chi-Square Tests for Independence
General Rule:
F-Test
F-tests for Equality of Two Variances

• The F-distribution arises in tests of hypotheses concerning whether or not two population
variances are equal and concerning whether or not three or more population means are
equal.
• Each member of the F-distribution family is specified by a pair of parameters called degrees
of freedom and denoted df1 and df2.
• An F random variable is a random variable that assumes only positive values and follows
an F-distribution.
• The parameter df1df1 is often referred to as
the numerator degrees of freedom and the
parameter df2df2 as the denominator degrees of
freedom.
• It is important to keep in mind that they are not
interchangeable.
• For example, the F-distribution with degrees of
freedom df1=3df1=3 and df2=8df2=8 is a different
distribution from the F-distribution with degrees
of freedom df1=8df1=8 and df2=3.
Definition

The value of the F random variable F with degrees of freedom df1 and df2 that cuts off a right tail
of area c is denoted Fc and is called a critical value.
Example:

Suppose F is an F random variable with degrees of freedom df1=5df1=5 and df2=4.df2=4. Use
the tables to find
1.F0.10
2.F0.95

Solution:

3.The column headings of all the tables contain df1=5.df1=5. Look for the table for which 0.10
is one of the entries on the extreme left (a table of upper critical values) and that has a row
heading df2=4df2=4 in the left margin of the table. A portion of the relevant table is provided.
The entry in the intersection of the column with heading df1=5df1=5 and the row with the
headings 0.10 and df2=4df2=4, which is shaded in the table provided, is the
answer, F0.10=4.05.
Example:

Suppose F is an F random variable with degrees of freedom df1=5df1=5 and df2=4.df2=4. Use
the tables to find
1.F0.10
2.F0.95

Solution:
1.The column headings of all the tables
contain df1=5.df1=5. Look for the table for which
0.10 is one of the entries on the extreme left (a
table of upper critical values) and that has a row
heading df2=4df2=4 in the left margin of the
table. A portion of the relevant table is provided.
The entry in the intersection of the column with
heading df1=5df1=5 and the row with the
headings 0.10 and df2=4df2=4, which is shaded
in the table provided, is the answer, F0.10=4.05.
2. Look for the table for which 0.95 is one
of the entries on the extreme left (a table
of lower critical values) and that has a
row heading df2=4df2=4 in the left
margin of the table.

A portion of the relevant table is


provided. The entry in the intersection of
the column with heading df1=5df1=5 and
the row with the headings 0.95
and df2=4df2=4, which is shaded in the
table provided, is the
answer, F0.95=0.19.F0.95=0.19.
A fact that sometimes allows us to find a critical value from a table that we could not read
otherwise is:
Example
F-Tests for Equality of Two Variances
For theoretical reasons it is easier to compare the squares of the population standard
deviations, the population variances σ21 and σ22. This is not a problem, since σ1=σ2 precisely
when σ21=σ22, σ1<σ2 precisely when σ21<σ22, and σ1>σ2 precisely when σ21>σ22.

The null hypothesis always has the form H0:σ21=σ22. The three forms of the alternative
hypothesis, with the terminology for each case, are:

Just as when we test hypotheses concerning two population


means, we take a random sample from each population, of
sizes n1 and n2, and compute the sample standard
deviations s1 and s2.

In this context the samples are always independent. The


populations themselves must be normally distributed.
A test based on the test statistic F is called an F-test.
A most important point is that while the rejection region for a right-tailed test is exactly as in
every other situation that we have encountered, because of the asymmetry in the F-distribution
the critical value for a left-tailed test and the lower critical value for a two-tailed test have the
special forms shown in the following table:
Rejection Regions: (a) Right-Tailed; (b) Left-Tailed; (c) Two-Tailed
“Critical Value Approach” to hypothesis testing
Example
Solution

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy