MGMT603 - Business Research Methods
MGMT603 - Business Research Methods
ity
Key learning Outcomes
At the end of this module, participant will be able to:
rs
●● Identify different approaches of research
●● Elaborate planning a research project
●● Analyze Application of Research
ve
Structure
ni
1.1.3 Differences between Research Methods and Research Methodology
ity
1.5.4 Application of Research in Production
1.5.5 Application of Research in Entrepreneurship
rs
ve
ni
U
ity
m
)A
(c
ity
Unit Objectives:
At the end of this unit, the participant will be able to:
rs
●● Identify objectives, significance and types of research
●● Definition and Objectives of Research
ve
Authors and management gurus have defined research in different ways. Usually,
a research is said to begin with a question or a problem.Research is defined as the
generation of new concepts, methodologies, and understandings through the creation
of new knowledge and/or the creative application of existing knowledge. This could
include synthesising and analysing previous research to the point where it produces
ni
new and innovative results. By applying research we are able to find out the solutions of
a problem with the application of systematic and scientific methods.You could talk about
experimentation or innovation. You could use the word “risk” to describe the element of
danger that comes with discovery. It is possible that investigation will lead to analysis. It
U
is possible that you will conduct tests to prove your hypothesis. You could simply state
that this work is unique and never seen. You could discuss what new knowledge will be
gained because of your work.
You could talk about a new method or a new data source that will result in a
breakthrough or a small improvement over current practise. You could state that it is a
ity
Research Objectives
General Objectives: General objectives, also known as secondary objectives,
(c
provide a detailed view of a study’s goal. In other words, by the end of your studies,
you will have a general idea of what you want to accomplish. For example, if you
want to investigate an organization’s contribution to environmental sustainability, your
broad goal could be to investigate sustainable practises and the organization’s use of
Notes renewable energy.
ity
Specific Objectives: Specific objectives define the primary aim of the study. In
most cases, general objectives serve as the foundation for identifying specific goals.
In other words, specific objectives are defined as general objectives that have been
broken down into smaller, logically connected objectives. They assist you in defining the
who, what, why, when, and how of your project. It’s much easier to develop and carry
rs
out a research plan once you’ve identified the main goal.
●● Determine how the organisation has changed its practises and adopted new
ve
solutions throughout its history.
●● To determine the impact of new practises, technology, and strategies on
overall effectiveness.
●● The findings should be repeatable. This asserts that previous research findings
should be able to be confirmed in a new environment and different settings with a
new group of subjects or at a different time.
●● The study should be fruitful. One of the most valuable characteristics of research
is that answering one question leads to the generation of a slew of new ones.
m
●● At all stages of the study, all parties involved (from policymakers to community
members) should be invited to participate.
●● The research must be straightforward, timely, and time-bound, with a
straightforward design.
●● The research should be as inexpensive as possible.
(c
●● The research findings should be presented in formats that are most useful
to administrators, decision-makers, business managers, or members of the
community.
ity
Although the names sound similar, both Research methods and Research
methodology are different, as explained below:
rs
primarily planned, scientific, and value-agnostic. Observations, theoretical procedures,
experimental studies, numerical schemes, statistical approaches, and so on are all
examples of these. We can use research methods to collect samples, data, and come
up with a solution to a problem. Business and scientific research methods, in particular,
ve
demand explanations based on collected facts, measurements, and observations,
rather than solely on reasoning. They only accept explanations that can be verified
through experiments.
ni
their work of describing, explaining, and predicting phenomena. It can also be defined
as the study of methods for gaining knowledge. Its goal is to provide a research work
plan.
U
ity
m
)A
(c
ity
Unit Objectives:
At the end of this unit the participant will be able to:
rs
etc.
●● Define hypothesis formulation and its types
●● Describe various hypothesis errors
ve
1.2.1 Research Process
In the early decades human inquiry was primarily based on the examination of
one’s own conscious thoughts and feelings that means the observation of any one
and understanding through the logical discussion to seek the truth. This procedure
was accepted for a millennium and was a well-established conceptual framework for
ni
understanding the world. The knowledge seeker was an integral part of the inquiry
process. With time, this part was changed. The Scientific method introduced several
major components in research procedure like Objectivity.
that reflects the true situation. While the researcher’s research philosophy will always
influence the research, it should be free of the researcher’s or management’s personal
or political biases.
For example, we may find out that our topic is too broad and needs to be narrowed,
sufficient information resources may not be available, what we learn may not support
our thesis or the size of the project does not fit the requirements.
m
There are main nine steps of research process that are followed at the time of
designing a research project. They are as follows.
ity
●● The research’s purpose should be clearly defined, and common concepts
should be used.
●● The research procedure should be sufficiently described. detail to allow
another researcher to continue the research for further advancement while
maintaining the integrity of what has already been accomplished.
●● The research’s procedural design should be meticulously planned to produce
rs
objective results.
●● The researcher should be completely honest about any flaws in the procedure
design and estimate their impact on the findings.
●● The data analysis should be sufficient to reveal its significance, and the
ve
analysis methods used should be appropriate. The data’s validity and
reliability should be double-checked.
●● Conclusions should be limited to those that are supported by the research
data and for which the data provide an adequate foundation.
●● If the researcher is experienced, has a good research reputation, and is a
deducted from propositions or premises. This will start with a pattern “that is tested
against observations”, whereas induction “begins with observations and seeks to find a
pattern within them”.
Advantages of deductive:
●● making decision to either confirm or reject on the basis of the result examined
since it is essential to compare research findings against literature review
findings.
●● Theory modification when hypothesis cannot be confirmed.
Notes
ity
Inductive: Inductive approach or inductive reasoning, starts with observations
and theories are proposed which are related to the end of the research process as
rs
a result of the observations. It involves the search for pattern from observation and
the development of explanations – theories – for those patterns through series of
hypotheses. In this discipline of studies, at the start of research, both hypothesis and
theories are not applicable. The researcher in this case, is free to male alterations in the
study direction even if it is after the start of the research process.
ve
This approach doesn’t disregard theories at the time of formulating questions
and objectives for the research. Inductive approach helps to generate meanings from
the data set collected in order to identify patterns and relationships to build a theory.
This approach is mainly based on learning from experience. Previous patterns,
resemblances and regularities are observed in order to reach conclusions or to
generate theory.
ni
U
Abductive Reasoning or Abductive Approach
ity
The figure below illustrates the main differences between abductive, deductive and
inductive reasoning:
)A
(c
ity
Abductive approach starts with ‘surprising facts’ or ‘puzzles’ and the research
process is devoted in their explanation. The ‘Surprising facts’ or ‘puzzles’ may emerge,
when a researcher encounters empirical phenomena, which cannot be explained by
the existing theories. In that approach, researcher searches for the ‘best’ explanation
among many alternatives to choose. Researcher can combine both, numerical and
rs
cognitive reasoning for explaining ‘surprising facts’ or ‘puzzles’.
ve
The sample size is usually restricted in between a minimum of 6 and a maximum of
10 people. Open-ended questions work to (To get maximum information from a given
sample) encourages answers which leads the researcher to another question or more
questions. The below methods are used for qualitative research:
●● One-to-one interview
ni
●● Focus groups
●● Ethnographic research
●● Content/Text analysis
●● Case study research
U
Quantitative research: It is a structured process of data collection and analysis
in order to draw conclusions. This method uses a computational and statistical process
to collect and analyze data. Quantitative data is all about numbers. It involves a larger
population as more people will bring more data to the table, which helps to obtain more
accurate results. This research uses close-ended questions because the researchers
ity
are typically looking to gather statistical data. It involves use of data collection tools
likes online surveys, questionnaires, and polls .There are various methods of deploying
surveys or questionnaires. Online surveys helps surveyor to reach large number of
people or smaller focus groups for different types of research that meet different goals.
Researchers want their research to directly or indirectly result in some kind of reform,
for which they involve the group being studied in the research at all stages, so as to
avoid further marginalizing them.
The researchers may adopt a less neutral position than that which is usually
required in scientific research. This might involve interacting informally or even living
amongst the research participants (the co-researchers). The searching of the research
(c
can be reported in more personal terms, often using the precise words of the research
participants.
ity
Unit Objectives:
At the end of this unit the participant will be able to:
rs
1.3.1 Pure & Applied
Applied Research: Applied research is a type of study that aims to solve a specific
problem or offer novel solutions to issues that affect a person, a group, or a society.
Because it involves the practical application of scientific methods to everyday problems,
ve
it is often referred to as a scientific method of inquiry or contractual research.
ni
Because of its direct approach to finding a solution to a problem, applied research
is sometimes considered a non-systematic inquiry. It’s a type of follow-up research that
digs deeper into the findings of pure or basic research in order to validate them and use
them to develop innovative solutions.
U
Applied Research Example in Business
Pure research or fundamental research are other terms for basic research.
Between the late 19th and early 20th centuries, the concept of basic research arose as
a means of bridging the gaps in science’s societal utility.
data in order to improve one’s understanding, which can then be used to propose
solutions to a problem.
ity
●● How does human retentive memory?
●● How do different teaching methods affect students’ concentration in class?
rs
Example: Effect of promotional events on sales
Correlational Research
ve
a researcher measures two variables, understands and assesses the statistical
relationship between them with no influence from any extraneous variable. Our minds
can do some brilliant things.
ni
●● To determine the effects of foreign direct investment on Taiwanese economic
growth.
●● To investigate the impact of rebranding initiatives on customer loyalty.
●● To determine the nature of the impact of work process re-engineering on
U
employee motivation levels.
Did u know? Conceptual Research is related to certain abstract ideas or theories
that are often applied by philosophers to develop new concepts or to rework on the
existing ones.
ity
The longitudinal and cross-sectional studies are both observational studies. This
means that researchers record data about their subjects without tampering with the
research environment. We would simply measure the cholesterol levels of daily walkers
m
in the same age groups. We might even create gender subgroups. We would not,
however, consider past or future cholesterol levels because they would be outside the
scope. We’d only examine cholesterol levels at a single point in time.
ity
educational level in relation to walking and cholesterol levels, for example, with little or
no extrapolation.
Cross-sectional studies, on the other hand, may not provide conclusive evidence
of cause-and-effect relationships. This is because such studies provide a snapshot of
a single moment in time and do not consider what occurs before or after the snapshot.
rs
As a result, we can’t say for sure whether our daily walkers had low cholesterol levels
before starting their exercise routines or if the daily walking behaviour helped to lower
cholesterol levels that were previously high.
Longitudinal study
ve
A longitudinal study is observational, just like a cross-sectional one. As a result,
researchers do not interfere with their subjects once more. A longitudinal study, on the
other hand, involves researchers making multiple observations of the same subjects
over a long period of time, sometimes many years.
ni
in the characteristics of the target population at both the group and individual level. The
important thing to remember is that longitudinal studies go beyond a single point in
time. As a result, they can create event sequences.
In general, the design should be driven by the research. However, the progression
of the research can sometimes aid in determining which design is best. Longitudinal
studies take longer to complete than cross-sectional studies.
Even though it can be difficult to execute, this research method is widely used in a
variety of physical and social science fields. They are far more common in information
systems research than in library and information management research within the
information field.
Semi Experimental: The prefix quasi means “similar to.” As a result, quasi-
experimental research is research that resembles experimental research but isn’t Notes
ity
actually experimental. Participants are not randomly assigned to conditions or orders
of conditions, despite the fact that the independent variable is manipulated (Cook &
Campbell, 1979). 1st The directionality problem is eliminated in quasi-experimental
research because the independent variable is manipulated before the dependent
variable is measured. However, because participants are not assigned at random, there
is a chance that other differences exist between conditions. Thus, quasi-experimental
rs
research does not eliminate the problem of confounding variables.
ve
program while the other does not.
ni
Non-Experimental: Non-experimental research is defined as research in which
no control or independent variable is manipulated. Researchers in non-experimental
research measure variables as they occur naturally, without any further manipulation.
U
When the researcher doesn’t have a specific research question about a causal
relationship between two variables and manipulating the independent variable is
impossible, this type of research is used. They’re also useful for:
●● The subject of the study is a causal relationship, but the independent variable
cannot be changed.
●● The study is broad and exploratory in nature.
●● The study focuses on a variable-to-variable non-causal relationship.
●● Only a limited amount of information about the research topic is available.
m
pan India. The researchers would focus on collecting data related to “what is the
Notes impulse buying pattern of Indian consumers” and the scope of their research would
ity
be limited to that. The research does not explain the underlying reasons behind such
impulse buying practices or “why” such buying pattern exists. Here, the scope of the
research is just to report the existence of such buying trends and not why do people
resort to impulse buying. This is, hence, an ideal example of descriptive research.
Exploratory research
rs
Exploratory research is the investigation of a problem that has not previously been
studied or thoroughly investigated. Exploratory research is usually done to gain a better
understanding of the problem at hand, but it rarely yields a conclusive result.
Exploratory research is used by researchers when they want to learn more about
ve
an existing phenomenon and gain new insights into it in order to formulate a more
precise problem. It starts with a broad concept, and the research findings are used to
uncover related issues to the research topic.
ni
grounded theory approach, provide answers to questions like what, how, and why.
The exploratory research conducted after product development will be the focus
of our attention. It’s known as the beta testing stage of product development for tech
products.
ity
For example, if a new feature is added to an existing app, product researchers will
want to see how well the feature is received by users. The research is not exploratory if
the feature added to the app is something that already exists.
If Telegram adds a status feature to its app, for example, the app’s beta research
stage is not exploratory. This is because this feature is already available, and they can
m
When it comes to a new feature, such as Snapchat filters when they first launched,
the research is instructive. A focus group of beta testers is used to conduct exploratory
research in this case.
)A
(c
ity
Unit Outcomes:
At the end of this unit, participants will be able to:
rs
●● Identify the salient features of a research project
ve
problem in a research process. The researcher needs to identify the problem in order to
have it formulated, and then make it suitable to research. Usually, research problem is
an unanswered question that is encountered by the researcher in regards to a practical
or theoretical situation, for which he needs a solution. Kothari states research problem
to exist if any of the given is noticed:
●●
●●
ni
There is an organization or an individual (X) who is facing the problem. The
organization or the individual has an environment (Y) and is affected with
variables that are beyond control (Z).
There needs to be two courses of action that need pursuing (the least) which
U
are (A1 and A2). These are defined by one or sometimes more values related
to the controlled variables.
●● The above mentioned courses of action need to have two alternative and
possible outcomes at the least (B1 and B2). One of these will be preferred
more than the other, which is what the researcher wants and this becomes the
ity
objective.
●● The possible courses of action that are available must yield a way to the
researcher to have the objective achieved but not the exact chance. So, if
P (Bj / X, A, Y) represents the probability of the occurrence of an outcome Bj
when X selects Aj in Y, then P (B1 / X, A1, Y) ≠ P (B1 / X, A2, Y). From this we
get that the choices must not have equal efficiencies for the desired outcome.
m
When taking the above into consideration, the individual or organization may reach
the research problem only if, X has no idea of the best course of action. In other words,
X should have a doubt about the solution.
)A
●● Alternate options of object pursuing should be met which will allow the
Notes researcher to have more than one alternative. Otherwise without the choice of
ity
alternative options, there won’t be a problem for the researcher. T
●● The researcher needs to have doubt on the alternative means and making a
selection. This means that the researcher needs to have answer for relative
frequency or suitability question, pertaining to its alternatives that are possible.
●● A context is needed attributed to the difficulty faced. T
rs
Thus, identification of a research problem is something that happens even prior to
conducting research. Research problem requires a researcher to look up for the best
available solution to the given problem. This means researcher needs to find out the
best course of action through which the research objective may be achieved optimally
in the context of a given situation.
ve
1.4.2 Factors Influencing the Complication of a Research Problem:
There are factors that can complicate a research problem:
●● Changes in environment which affect the efficient alternative courses of action
●●
ni
taken or the quality of the outcomes.
Available alternative courses may be a lot and the person who isn’t involved in
decision making might get affected with the environment change. His reaction can
be favourable or unfavourable.
U
●● There are different similar factors that may cause these changes related to the
research context. All of these can be thought of and considered from the point of
view of a research problem.
of data to facilitate the combining of relevance to the research purpose with economy in
procedure.
Seltiz and others stated that, this is the conceptual structure within which research
is conducted; it constitutes the blueprint for the collection, measurement and analysis Notes
ity
of data. We can conclude that research design offers an outline of what the researcher
plans to execute in terms of framing the hypothesis, its operational implications and the
final data analysis. Particularly, the research design highlights decisions which include:
rs
●● The location of the study that is to be conducted
●● The nature of data required
●● Source of data that is to be collected
●● Time period of the study
ve
●● Sample design type that can be used
●● Data collection techniques that are usable
●● Data analysis methods that can be applied
●● Structure of the report
●● ni
Taking into consideration the research design decisions, the overall research
design may be divided into the following (Kothari 1988):
The operational design which is about the techniques, using which the procedures
mentioned in the sampling, statistical and observational designs can be carried out.
m
)A
(c
ity
Unit Objectives
At the end of this unit, the participants will be able to:
rs
●● Explain how to promote a product
●● Analyze how to help entrepreneurs.
ve
Research may be used in the area of marketing
ni
and well-established areas is marketing research. Market potentials, sales forecasting,
product testing, sales analysis, market surveys, test marketing, consumer behaviour
studies, and marketing information systems are all examples of marketing research.
helpful if we work, or intend to work, outside of the finance domain such as conducting
academic research in finance. Research basically takes responsibilities on to
the organization-
ity
●● wage rates
●● employment trends and best practices
To study:
●● Incentive schemes
rs
●● Cost of living
●● Employee turnover rates
●● Performance appraisal techniques
Planning manpower and utilising human resources effectively
ve
●● Framing human resources policies for the organisation
●● Compares it’s organization / division with another organization / division to
uncover areas of poor performance that need to improve
Relies on the expertise of a consultant to diagnose the causes of problems
●● ni
With the help of existing records generates statistical standards against which
activities and programs are evaluated
With the human resource information system taken care of laws and company
policies or procedures.
U
●● MBO (management by objectives) is applied to compare between the actual
results and stated objectives.
component that allows them to take a step back and consider how their product might
fit into the marketplace. Entrepreneurs gain valuable information about industry trends,
who their true competitors are, and which consumers they should target and how
through market research. Market research aids start-up entrepreneurs in developing,
fine-tuning, and improving their specific product or service, which leads to increased
revenue from new customers.
(c
Summary:
Notes
ity
At the end of this module, the participants have covered:
●● Defining research methodology
●● Explaining research process
●● Identifying different approaches of research
rs
●● Elaborating planning a research project
●● Analyzing Application of Research
Exercise:
ve
1. The purpose of research is to find solutions through the application of ......................
and ...................... different methods.
a) Synthesizing and Analyzing
b) Applying and interpreting
c) Both and b
2.
d) none of the above
ni
Which of the following scopes of research is related to human resource development?
a) Projecting demand
U
b) Studying performance appraisal techniques
c) Cost budgeting
d) Measuring effectiveness of promotional activities
3. Which of the following scopes of research is NOT exclusively related to the framing
ity
of government policies?
a) Evolving the union finance budget
b) Modifying the five-year plan
c) Revising fiscal policies
m
b) Product research
c) Demand research
d) none of the above
5. Planning, organising, staffing, communicating, __________, ________ and
_________are all management functions
(c
a) coordinating
b) motivating
c) controlling
Notes
ity
d) all of the above
Answers:
1. a) Synthesizing and Analyzing
2. b) Studying performance appraisal techniques
3. d) Revising monetary policies
rs
4. a) Market research
5. d) all of the above
ve
ni
U
ity
m
)A
(c
ity
Key learning outcomes
At the end of this module the participant will be able to:
rs
●● Define Data Collection and its Methods
●● Explain Questionnaire Designing
●● Describe Measurement and Scaling
ve
●● Analyze Sampling
Structure
ni
2.1.1 Types of Research Models
2.1.2 Importance of Research Model
2.1.3 Types of Research Models
U
2.1.4 Stages of a Research Model
2.1.5 Heuristic Research Model
2.1.6 Simulation Research Modelling
ity
2.5.3 Steps involved in Sampling Process
2.5.4 Simple Random Sampling
2.5.6 Sampling and Non-Sampling Errors
rs
ve
ni
U
ity
m
)A
(c
ity
Unit Objectives:
At the end of this unit, you will learn:
rs
●● Types of Research Models
●● Stages of Research Model
●● Heuristic Research Model
ve
●● Simulation Research Model
●● Data Considerations while analyzing Data for a Research
●●
●● ni
Qualitative Research Model
Quantitative Research Model
Qualitative Research Model
U
It involves non-numerical data collection and analysis in order to understand
concepts, opinions and experiences. This helps to gather in-depth insights into a
problem or develop new ideas for research. Qualitative research finds its use mostly
in the humanities and social sciences, in subjects such as anthropology, sociology,
education, health sciences, history, etc. Qualitative research helps to visualize how
ity
people can experience the world. While there are many approaches to qualitative
research, they are less desirable as they are flexible and focus on retaining rich
meaning when interpreting data.
It is about collecting and analysing numerical data. Used for locating and defining
m
patterns and averages, this research model can make predictions, test causal
relationships, and help to generate results to wider populations. Quantitative research
finds a wide use in the natural and social sciences: biology, chemistry, psychology,
economics, sociology, marketing, etc.
)A
●● Models are also very important to social scientists because they provide a
framework through which important questions are investigated. Notes
ity
2.1.3 Types of Research Models
Research Models are classified broadly into two types as mentioned below:
rs
Qualitative Research Model
ve
in the humanities and social sciences, in subjects such as anthropology, sociology,
education, health sciences, history, etc. Qualitative research helps to visualize how
people can experience the world. While there are many approaches to qualitative
research, they are less desirable as they are flexible and focus on retaining rich
meaning when interpreting data.
ni
It is about collecting and analysing numerical data. Used for locating and defining
patterns and averages, this research model can make predictions, test causal
relationships, and help to generate results to wider populations. Quantitative research
U
finds a wide use in the natural and social sciences: biology, chemistry, psychology,
economics, sociology, marketing, etc.
These steps are: (1) choosing a topic, (2) defining the problem, (3) reviewing the
literature, (4) formulating a hypothesis, (5) selecting a research method, (6) collecting
data, (7) analysing the results, and (8) sharing the findings.
Other authors may identify more or fewer steps, but the fundamental model
remains the same. Validity and reliability are two important aspects of research. Validity
refers to whether or not the research actually measures what it claims to. The degree
m
Sociologists use six different research methods to conduct their studies: (1)
surveys, (2) participant observation, (3) secondary analysis, (4) documents, (5)
)A
lesser-known models is the Heuristic research model. This research model was
developed by Clark Moustakas (an American psychologist and researcher). The name,
Heuristic was derived from the Greek work ‘Heuriskein’ (which means discover, find).
The research model has six phases
●● Initial engagement
Notes ●● Immersion
ity
●● Incubation
●● Illumination
●● Explication
●● Creative synthesis
rs
Shelly Chaiken developed the heuristic-systematic model of information processing
(HSM), which attempts to explain how people receive and process persuasive
messages. 1st Individuals can process messages in one of two ways, according to the
model: heuristically or systematically. Heuristic processing, on the other hand, entails
the use of simplifying decision rules or “heuristics” to quickly assess the message
ve
content, whereas systematic processing entails the careful and deliberate processing
of a message.This model’s guiding belief is that people are more likely to use heuristics
instead of cognitive resources, which affects message intake and processing. The
elaboration likelihood model, or ELM, is very similar to the HSM. Both models were
developed primarily in the early to mid-1980s, and they share many of the same
concepts and ideas.
ni
2.1.6 Simulation Research Modelling
By using statistical descriptions of the activities involved, stimulation models
U
attempt to replicate the workings and logic of a real system. For example, a line
might produce 1000 units per hour on average. If we assume this is always the case,
we lose sight of what happens when there is a breakdown or a stoppage for routine
maintenance, for example. When we consider the effect on downstream units, the effect
of such a delay may be amplified (or absorbed).
ity
‘Entities’ (e.g. machines, materials, people, etc.) and ‘activities’ are two types of
entities in a simulation model (e.g. processing, transporting, etc.). It also includes an
explanation of the logic that governs each activity. A processing activity, for example,
can only begin when a certain quantity of working material, a person to operate the
machine, and an empty conveyor to transport the product are all available. Once an
activity has begun, the time it will take to complete it is calculated, which is frequently
m
ity
Unit Outcomes
rs
●● Types of Data Collection Methods
●● Tabulating and Validating the Collected Data
ve
The researcher should know data sources that he/she requires for all purposes.
Data or information is of two types:
●● Primary Data
●● Secondary Data
Information gathered through original or first-hand research is referred to as
ni
primary data. Surveys and focus group discussions, for example. Secondary data, on
the other hand, is information that has already been gathered by someone else. For
instance, internet research, newspaper articles, and company reports.
Primary data is gathered from first-hand experience and has never been used
before. The data gathered through primary data collection methods is highly accurate
and specific to the research’s purpose.
)A
Quantitative and qualitative data collection methods are the two types of primary
data collection methods.
Quantitative Methods:
Smoothing Techniques: Smoothing techniques can be used when the time series
lacks significant trends. They get rid of the random variation in historical demand. It aids
in the identification of patterns and demand levels in order to forecast future demand.
Amity Directorate of Distance & Online Education
28 Business Research Methods
The simple moving average method and the weighted moving average method are the
Notes two most common methods for smoothing demand forecasting techniques.
ity
Barometric Method: Researchers use this method, also known as the leading
indicators approach, to predict future trends based on current events. When past events
are used to forecast future events, they are referred to as leading indicators.
Qualitative Methods:
rs
Surveys: Surveys are used to gather information about the target audience’s
preferences, opinions, choices, and feedback on their products and services. Most
survey software allows you to choose from a variety of question types.
You can also save time and effort by using a pre-made survey template. By
ve
changing the theme, logo, and other elements, online surveys can be tailored to fit
the brand of the company. They can be distributed via a variety of channels, including
email, website, offline app, QR code, social media, and so on. You can choose the
channel based on the type and source of your audience.Survey software can generate
various reports and run analytics algorithms to uncover hidden insights once the data
has been collected. A survey dashboard can show you statistics such as response rate,
ni
completion rate, demographic filters, export and sharing options, and so on. Integrating
survey builder with third-party apps can help you get the most out of your online data
collection efforts.
Polls: One single or multiple choice questions is asked in a poll. You can use
U
polls when you need to get a quick pulse on the audience’s feelings. It is easier to get
responses from people because they are short in length.
Online polls, like surveys, can be integrated into a variety of platforms. After
the respondents have responded to the question, they can see how their responses
compare to those of others.
ity
are many participants, repeating the same process is too time-consuming and tedious.
Delphi Technique: Market experts are given the estimates and assumptions
of forecasts made by other industry experts in this method. Based on the information
provided by other experts, experts may reconsider and revise their estimates and
)A
assumptions. The final demand forecast is based on the consensus of all experts on
demand forecasts.
Focus Groups: A focus group is a small group of people (around 8-10 members)
who meet to discuss the problem’s common areas. Each person expresses his or
her viewpoint on the subject at hand. The discussion among the group members
is moderated by a moderator. The group comes to an agreement at the end of the
(c
discussion.
Questionnaire: A questionnaire is a printed set of open-ended or closed-ended
questions. The respondents must respond based on their knowledge and experience
with the topic at hand. The survey includes the questionnaire, but the questionnaire’s
end-goal may or may not be a survey. Notes
ity
Sources of Secondary data:
The various sources for secondary data collection may be classified into two broad
categories:
●● Published Sources
rs
●● Unpublished Sources
Published Sources:
International, governmental and local agencies are the ones to publish statistical
data, among which the following are important: T
ve
●● International Publications: We have international institutions and bodies like
I.M.F, I.B.R.D, I.C.A.F.E and U.N.O who occasionally publish on occasional
reports on statistical and economic matters.
●● Official Publications of Central and State Governments: Reports on
different subjects are published by several departments of the Central and
ni
State Governments regularly. They collect all the additional information.
Important publications among these are: The Reserve Bank of India Bulletin,
Census of India, Statistical Abstracts of States, Agricultural Statistics of India,
Indian Trade Journal, etc.
U
●● Semi-Official Publications: Example: Municipal Corporations, District
Boards, Panchayats, etc. that will publish reports relating to different matters
of public concern.
●● Publications of Research Institutions: Indian Statistical Institute (I.S.I),
Indian Council of Agricultural Research (I.C.A.R), Indian Agricultural Statistics
ity
Wanchoo Committee’s Report on Taxation and Black Money, etc. are also
important sources of secondary data.
●● Journals and Newspapers: Journals and News Papers are the powerful
sources from where data is obtained. Current and important materials
on statistics and socio-economic problems are provided by journals and
)A
value.” This data can be erroneous in different respects due to biases and
Notes prejudiced mindset of the information collectors along with the sample size
ity
being inadequate, mistakes in definition, mathematical errors and substitution
issues. Even without error, such data still can be unsuitable for enquiry
purpose. E According to Prof. Simon Kuznet’s (which is of importance), “the
degree of reliability of secondary source is to be assessed from the source,
the compiler and his capacity to produce correct statistics and the users
also, for the most part, tend to accept a series particularly one issued by a
rs
government agency at its face value without enquiring its reliability”.
Thus we need to follow some of the given factors:
●● The Suitability of Data: This is possible by judging the scope and nature of
the present enquiry with the original one. For example, if we are looking for
ve
trend in retail prices while the data provided is meant for wholesale prices,
then it is of no use.
●● Adequacy of Data: Once it is ensured that the data is suitable for
investigation, it should be checked for the purpose of present analysis.
Geographical area for the original enquiry can be studied in this respect along
●●
ni
with the time for which we are getting the data. In the above example, if we
want to study the retail price trend of India, and acquired data will cover only
the retail price trend in the state of UP, then it would not serve the purpose.
Reliability of Data: This issue concerns whether research findings can be
U
applied to a larger group than those who participated in the study. To put it
another way, would similar results have been obtained if a different group of
respondents or a different set of data points had been used? Is the information
obtained from these 40 people sufficient to conclude how the entire sales
force feels about company policies, for example, if 40 salespeople out of a
ity
are similar, the data collection method is most likely reliable. The scientific research
method includes ensuring that research can be replicated and produces similar results.
While editing primary data, the following considerations should be borne in mind:
numeric data in rows and columns to facilitate comparison and statistical analysis. It
facilitates comparison by bringing related information close to each other and helps in Notes
ity
further statistical analysis and interpretation.
In other words, the method of placing organized data into a tabular form is called
as tabulation. It may be complex, double or simple depending upon the nature of
categorization.
rs
The objectives of tabulation of collected data are as follows:
ve
●● To simplify the data collected
●● Conserves space
●● Ease of comparison
●● Summation of items
●● Enables easy detection of errors and omissions
●● Facilitates Statistical computations
Validating Data
ni
Definition of Data Validation: As defined by United Nations Economic
U
Commission of Europe (UNECE 2013), data validation is an activity aimed at verifying
whether the value of a data item comes from the given (finite or infinite) set of
acceptable values.
Data validation means checking the accuracy and quality of source data before
using, importing or otherwise processing data. Different types of validation can be
ity
For example, an email question will automatically check if the data entered is a
valid email. A phone number question can check whether the phone number has the
right number of digits, based on its country code.
m
Types of Validity
Notes
ity
1. Content validity: The extent to which the items’ content adequately represents
the universe of all relevant items under investigation. Is it true that samples are
representative of the population/universe?
2. Criterion Validity: The extent to which each criterion can be measured
correctly. For instance, consider a family’s income.
3. Construct Validity: The construct validity of a scale or test refers to how well it
rs
measures the construct.
For example, a doctor might assess the effectiveness of a painkiller. Each day, he
tries to assess the level of pain by asking his patients to rate pain on a 1-10 scale.
Whether its pain or numbness, he’s measuring it.
ve
ni
U
ity
m
)A
(c
ity
Unit Objectives
rs
●● List the Steps to be followed for constructing a Questionnaire
●● Identify the types of questions to be asked in a Questionnaire
●● The Format of a Questionnaire
ve
2.3.1 Introduction to Questionnaire
These days, questionnaire is widely used for data collection in research. It is a
reasonably fair tool for gathering data from large, diverse, varied and scattered social
groups. The questionnaire is the media of communication between the investigator and
the respondents. According to Bogardus, “a questionnaire is a list of questions sent to a
ni
number of persons for their answers and which obtains standardized results that can be
tabulated and treated statistically”.
The researcher should begin by reviewing the proposal and brief and making a list
of all of the objectives as well as the information needed to achieve them.
A list of all the questions that could be included in the questionnaire is now being
compiled. The goal at this point is to be as thorough as possible with the listing and not
m
to be concerned with the wording of the questions. That’ll be the next step.
The questions must now be refined to the point where they make sense and
generate the correct responses.
)A
The order in which the questions are asked is crucial because it gives the interview
logic and flow. Typically, the respondent is eased into the task with relatively simple
Amity Directorate of Distance & Online Education
34 Business Research Methods
questions, with the more difficult or sensitive ones being saved until the respondent has
Notes warmed up. Unprompted questions about brand awareness are asked first, followed by
ity
prompted questions.
rs
from one another so that the wrong one is not circled.
The questionnaire must then be tested. Because the goal of a pilot is to ensure
ve
that it works, rather than to obtain pilot results, it is usually not necessary to conduct
more than 10 to 20 interviews. The questionnaire should theoretically be piloted with
the interviewing method that will be used in the field (over the phone if telephone
interviews are to be used; self completed if it will be a self completion questionnaire).
Because time and money may prevent a proper pilot, it should be tested on one or two
colleagues for logic, flow, and clarity of instructions at the very least.The entire point
ni
of the test is to see if any changes are required before final revisions can be made.
When conducting the pilot, it is best to go over the questionnaire with the guinea pig
respondent and then ask for each question, “What went through your mind when you
were asked this question?”
U
2.3.3 Types of Questions to be asked in a Questionnaire
Open-ended questions
With this question, you can start a conversation. These are good survey questions
ity
to get more meaningful responses from because people can provide additional
feedback via a text box. You’ll need to use a closed-end question if you’re looking for a
yes/no response.
Some questions only require a single word answer. Yes, I agree. Yes or no. You
)A
can use them to get some quick titbits of information, then segment your survey-filler-
inners based on that information.
(c
ity
●● Did you order the chicken?
●● Do you like learning German?
●● Are you living in Australia?
Rating questions
Strive for the moon and stars. Alternatively, the hearts. Alternatively, smiles. Send
rs
a rating question to your survey participants to see how they would rate something. It’s
a great question to ask because it allows you to gauge people’s opinions across the
board.
ve
●● How would you rate our service out of 5?
●● How many stars would you give our film?
●● Please, rate how valuable our training was today.
Likert scale questions
ni
Likert scale questions are useful in surveys to determine what people think about
certain topics. They usually come in five, seven, or nine-point scales, and you’ve
probably used one before.
Do you want to send out a test or a quiz? Multiple-choice questions are your best
pal. You can give a few answers while keeping the true answer hidden. Multiple-choice
questions are also useful for determining time periods or dates for an event. Plus, you
can group them all together in a dropdown menu.
ity
rs
ve
Demographic questions
Questions in demographic surveys are a mix of different types of questions.
ni
Whether you use a dropdown or an open-ended question with them is entirely up to
you. Take note that they all discuss topics that could be considered sensitive.
40 - 49
50 - 59
60 +
2. Gender:
(c
Male
Female
ity
Writer
Administrative Assistant
Journalist
Secretary
Academic
rs
Professional
Technical expert
Student
ve
Designer
Administrator/Manager
Other, please specify:
ni
Part-2: To be completed during and/or after software use
1. With respect to the version of Microsoft Word currently installed on your ma-
chine, please indicate the extent to which you agree or disagree with the following
statements:
U
SD = Strongly Disagree
D = Disagree
N = Neutral
A = Agree
ity
SA = Strongly Agree
Part-3: To be Completed once both versions of Microsoft Word have been used
Notes by the subject.
ity
1. If you could choose only one of the versions to continue using, which would it be?
Microsoft Word 2000
Microsoft Word Personal
2. What particular aspect(s) of Microsoft Word 2000 did you like?
rs
3. What particular aspect(s) of Microsoft Word 2000 did you dislike?
4. What particular aspect(s) of Microsoft Word Personal did you like?
5. What particular aspect(s) of Microsoft Word Personal did you dislike?
ve
6. There are a number of criteria listed below. Please select the version that would be
your 1st choice according to each of the criteria. If you really cannot make a choice
for a given criteria please select “Equal”.
ni
Personal = Microsoft Word Personal
Equal = 2000 and Personal satisfy this criteria equally
ity
Unit Objectives
rs
●● Types of Scaling Techniques
●● Attitude Measurement Scales and its Types
ve
Definition of Measurement
Measurement is the process of observing and recording the observations that are
collected as part of research. “Process of mapping aspects of a domain onto other
aspects of a range according to some rule of correspondence”
By C.R.Kothari—
ni
The process of describing some property of a phenomenon of interest, usually
by assigning numbers in a reliable and valid manner, is known as measurement.
The numbers provide details about the object being measured. When numbers are
U
used, the researcher must follow a set of rules for assigning a numerical value to an
observation in a way that is accurate.
Definition of Scaling
ity
Scaling is the procedure of measuring and assigning the objects to the numbers
according to the specified rules. In other words, the process of locating the measured
objects on the continuum, a continuous sequence of numbers to which the objects are
assigned is called as scaling.
The level of measurement refers to the relationship among the values that are
assigned to the attributes, feelings or opinions for a variable.
numbers:
●● Nominal scale
●● Ordinal scale
●● Interval scale
●● Ratio scale
(c
Nominal Scale
This is the crudest among all measurement scales but it is also the simplest scale.
In this scale the different scores on a measurement simply indicate different categories.
The nominal scale does not express any values or relationships between variables.
Notes The nominal scale is often referred to as a categorical scale.
ity
The assigned numbers have no arithmetic properties and act only as labels. The
only statistical operation that can be performed on nominal scales is a frequency count.
We cannot determine an average except mode.
Example: Labelling Apples as 1 and Oranges as 2 for data recording does not
mean Apples are tastier than Oranges.
rs
Ordinal Scale
A system for assigning numbers and symbols to events in chronological order, but
not according to any interval rule.
ve
Places events in order of importance, from highest to lowest, for example. Exam
results are ranked by students. The first-place finisher is not three times better than the
third-place finisher. He only outperforms the second and third-placed students.
Interval Scale
This is a scale in which the numbers are used to rank attributes such that
ni
numerically equal distances on the scale represent equal distance in the characteristic
being measured. An interval scale contains all the information of an ordinal scale, but it
also allows to compare the difference/distance between attributes. Interval scales may
be either in numeric or semantic formats.
U
The interval scales allow the calculation of averages like:
●● Mean
●● Median
●● Mode
ity
Ratio Scale
This scale is the highest level of measurement scales. This has the properties of an
interval scale together with a fixed (absolute) zero point. The absolute zero point allows
)A
Ratio scales permit the researcher to compare both differences in scores and
relative magnitude of scores. Examples of ratio scales include weights, lengths and
times.
Example: The number of customers of a bank’s ATM in the last three months is a
ratio scale. This is because you can compare this with previous three months.
(c
ity
rs
ve
Comparative Scales
In comparative scales, the respondent is asked to compare one object with
another.
●●
●●
Paired Comparison Scale
Rank Order Scale
ni
Comparative scales can be further classified into the following types of scaling
techniques:
U
●● Constant Sum Scale
●● Q - sort Scale
The respondent must choose a preferred object from several pairs of objects based
on some property, which results in object rank ordering. Respondents found it time-
consuming and exhausting.
E.g. Rank the bikes in your order of preference. Place the number 1 to the most
preferred, 2 by the second choice, and so forth.
(c
____ Pulsar
____ CBZ
ity
____ Shine
____ Activa
____ Vigo
____ Pleasure
rs
Constant Sum Scale
Respondent allocates points to more than one attribute or property, such that they
total a constant sum, usually 10 or 100.
ve
Story: ____
Music: ____
Songs: ____
Casting: ____
Total: 100
Q - Sort Scale
ni
The systematic study of participant viewpoints is known as Q-methodology (also
U
known as Q-sort). By having participants rank and sort a series of statements, the
Q-methodology is used to investigate the perspectives of participants who represent
different stances on an issue.
1. Non-Comparative Scales
A non-comparative scale is used to evaluate a product’s or object’s performance
across a variety of parameters. Some of the most common types are as follows:
Continuous Rating Scales (CRS) are a type of rating scale that is used
m
It’s a graphical rating scale in which respondents can place the object in any
position they want. It’s done by picking a point on a vertical or horizontal line that falls
between two extreme criteria and marking it.
)A
Likert Scale: In a Likert scale, the researcher presents some statements to the
respondents and asks them to indicate their level of agreement or disagreement with
(c
these statements by selecting one of the five options from a list of five.
ity
object’s attributes based on personal preference.
●● Stapel Scale: A Stapel scale is an itemised rating scale that uses a unipolar rating
to measure the respondents’ response, perception, or attitude toward a specific
object. A Stapel scale has a range of -5 to +5, so it excludes 0 from the equation.
rs
The respondents use a comparative scale to compare two or more variables. The
various types of comparative scaling techniques are as follows:
1. Paired Comparison
ve
A paired comparison denotes a situation in which the respondent must choose one
of two variables.
When comparing more than two objects, such as P, Q, and R, compare P with Q
first, then the superior one (i.e., the one with a higher percentage) with R.
2. Rank Order
ni
The respondent must rank or arrange the given objects according to his or her
preference in rank order scaling.
U
3. Sum Constant
It’s a method of scalability in which the features, attributes, and values are
assigned a constant sum of units such as dollars, points, chits, chips, and so on. The
respondents place a high value on a specific product or service.
ity
4. Scaling by Q-Sort
Q-sort scaling is a method for selecting the most appropriate objects from a large
set of variables.
m
)A
(c
Unit-2.5: Sampling
Notes
ity
Unit Objectives:
At the end of this unit, participants will be able to learn:
●● Describe sampling
●● Analyze sampling plan and sampling frame
rs
●● List steps involved in sampling process
●● Identify different sample selection methods
●● Describe probability and non-probability sampling techniques
ve
●● Identify sampling and non-sampling errors
ni
It is a subset of population
sample can be applied to the entire population, the sample should be representative of
the population.
We obtain a sample of the population for a variety of reasons, including the fact
that it is rarely practical and almost never cost-effective.
ity
●● Inaccessibility of some populations: Access to some populations is so difficult
that only a sample can be used. Prisoners, people with severe mental illness, and
disaster survivors are just a few examples. And so on. The inaccessibility could be
due to a lack of funds, time, or simply access.
●● Destructiveness of observation: Sometimes just observing a product’s desired
characteristic destroys it for its intended use. Quality control is a good example of
rs
this. For example, a fuse must be destroyed to determine its quality and whether it
is defective.
●● As a result, if you tested all of the fuses, they’d all blow up.
●● Accuracy and sampling: A sample of the study population may be more accurate
ve
than the entire population. A population that has been incorrectly identified can
provide less reliable data than a sample that has been carefully selected.
ni
A sampling plan is a term widely used in research studies that provide an outline
on the basis of which research is conducted. It tells which category is to be surveyed,
what should be the sample size and how the respondents should be chosen out of the
population. Sampling plan is the base from which the research starts and includes the
U
following major decisions:
as possible.
procedure i.e., which method can be used such that every object in the population
has an equal chance of being selected. Generally, the researchers use the probability
sampling to determine the objects to be chosen as these represents the sample more
accurately.
A sampling frame is a list or database from which a sample can be used. In market
research terms, a sampling frame is a database of potential respondents that can be
drawn from, to invite to take part in a given research project.
ity
The following are the series of steps that are involved in the sampling process:
rs
●● Execute the sampling process
ve
●● Probability Sampling
●● Non-Probability Sampling
●●
●●
●●
Simple random sampling
Systematic Sampling
Stratified Sampling ni
U
●● Cluster Sampling
●● Multistage Sampling
chance of selection
Sample size
m
= 25/591
= 0.042 or 4.2%
)A
(c
Systematic Sampling
Notes
ity
A probability sample drawn by applying a calculated skip interval to a sample
frame.
rs
E.g. Population = Total students in AMITY (591)
ve
Population sample frame (N)
Skip interval (k) = ------------------------------------------------
Sample Size (n)
= 591/25
= 23.64 = 23
a gap of 23 numbers.
k = 23
ni
Select any number randomly between 1 – 23, and then select rest 24 numbers with
U
Select any number randomly between 1 – 23, and then select rest 24 numbers with
a gap of 23 numbers.
ity
m
Stratified Sampling
A probability sampling technique in which the population is divided into different
)A
sub-homogeneous groups or strata and samples are randomly selected from such sub-
groups or strata.
Sub groups-
(c
MPM = 59 DBM = 92
DIEM = 37 MCA = 59
Notes
ity
MCM = 30
rs
ve
Cluster Sampling
MMM = 42
U
MPM = 59 DBM = 92
DIEM = 37 MCA = 59
MCM = 30
ity
Notes
ity
rs
ve
Non - Probability Sampling
Convenience Sampling
A non-probability sampling technique where researcher use any readily available
individuals as participants
ity
Least reliable
A non-probability sample that conforms to certain criteria. The units or elements are
purposively selected.
purpose of the study. Purposive sampling may involve studying the entire population of
some limited group or a subset of a population). As with other non-probability sampling
methods, purposive sampling does not produce a sample that is representative
of a larger population, but it can be exactly what is needed in some cases - study of
organization, community, or some other clearly defined and relatively limited group
Quota Sampling
(c
ity
MBA = 272 46% 12
MMM = 42 7% 2
MPM = 59 10% 2
DBM = 92 16% 4
DIEM = 37 6% 2
rs
MCA = 59 10% 2
MCM = 30 5% 1
Total = 591 100% 25
ve
Snowball Sampling
A non- probability sampling in which subsequent participants are referred by
current sample elements.
ni
These initial individuals refer others who are similar to them and so on. Like a
snowball gathers subjects as its rolls along.
Non-Sample Errors
Non-sample errors can be classified into:
●● Non-response Error
●● Response Error
)A
Non-Response Error
A non-response error occurs when units selected as part of the sampling procedure
do not respond in whole or in part.
Response Error
(c
A response or data error is any systematic bias that occurs during data collection,
analysis or interpretation.
ity
●● Recording Errors
●● Poorly designed questionnaires.
●● Measurement errors.
Summary
rs
At the end of this module the participant have covered:
●● Analyzing Research Modelling
●● Defining Data Collection and its Methods
ve
●● Explaining Questionnaire Designing
●● Describing Measurement and Scaling
●● Analyzing Sampling
Exercise:
1.
a)
b)
Data collection and Analyzing
Analyzing and Interpreting
ni
Sampling is divided into two types, viz. ...................... and......................
U
c) Both a and b
d) None of the above
2. A __________, in research terms, is a group of people, objects, or items selected for
measurement from a larger population
ity
a) Sample
b) Item
c) Object
d) None of the above
m
3. A __________is a term widely used in research studies that provide an outline on the
basis of which research is conducted
a) sampling plan
b) sorting plan
)A
c) separating plan
d) none of the above
4. ________ is also known as ‘deliberate,’ ‘purposeful,’ or ‘judgement’ sampling
a) Non - Probability Sampling
b) Probability Sampling
(c
c) Snowball Sampling
d) Quota sampling
ity
a) purpose of the study
b) subject of the study
c) nature of the study
d) none of the above
rs
Answers:
1. a) Data collection and Analyzing
2. a) Sample
ve
3. a) sampling plan
4. a) Non - Probability Sampling
5. a) purpose of the study
ni
U
ity
m
)A
(c
ity
Key learning Outcomes
At the end of this module, participants will be able to:
rs
3. Identify the importance of parametric and non-parametric tests
4. Analyse and perform principle component factor analysis
5. Identify the importance of data analysis
ve
Structure
Unit 3.1: Descriptive Statistics
3.1.1 Introduction to Descriptive Research Design
3.1.2 Applications of Descriptive Research
3.1.3 Descriptive Research Methods
3.3.2 Z-test
3.3.3 t-test
3.3.4 Correlation & Regression
)A
ity
Unit Objectives
rs
●● Application of Descriptive Research with examples
●● Descriptive Research Methods
ve
Introduction to Descriptive Research
ni
investigate one or more variables. Not like in experimental research, the scholar doesn’t
control or manipulate any of the variables, but only observes and calculate them.
Definition
U
A Descriptive Research Design is concerned with describing the characteristics of
a particular individual or a group.
phenomenon. For instance, understanding from millennial the hours per week
they spend on browsing the web. All this information helps the organization
researching to make informed business decisions.
●● Measure Data Trends: Researchers measure data trends over time with a
)A
questions like age, income, gender, geographical location, etc. This marketing
research helps the organization understand what aspects of the brand Notes
ity
appeal to the population and what aspects don’t. It also helps make product
or marketing fixes or even create a new product line to cater to high growth
potential groups.
●● Validate Existing Conditions: Researchers widely use descriptive research
to assist ascertain the research object’s prevailing conditions and underlying
patterns. Because of the non-invasive research method and therefore the
rs
use of quantitative observation and some aspects of qualitative observation,
researchers observe each variable and conduct an in-depth analysis.
Researchers also use it to validate any existing conditions that may be
prevalent in an exceedingly population.
ve
●● Conduct Research at different times: The analysis will be conducted at
different periods to establish any similarities or differences. This also allows
any number of variables to be evaluated. For verification, studies on prevailing
conditions may be repeated to draw trends.
ni
Animal and human behaviour are closely observed using the observational method
(also known as field observation). Naturalistic observation and laboratory observation
U
are the two main types of observational methods.
Naturalistic observations are usually more time consuming and expensive than
laboratory observations. Both naturalistic and laboratory observation are important in
the advancement of scientific knowledge, of course.
Expectancy effects and atypical individuals are two serious issues with case
studies. Expectancy effects are underlying biases held by the experimenter that may
(c
influence the actions taken during research. These biases can cause participants’
descriptions to be misrepresented. Defining atypical people can lead to faulty
generalisations and a loss of external validity.
Survey Method
Notes
ity
Participants in survey method research respond to questions via interviews or
questionnaires. Researchers describe the responses given by participants after they
have answered the questions. The questions must be properly constructed in order
for the survey to be both reliable and valid. Questions should be written in a clear and
understandable manner.
rs
ve
ni
U
ity
m
)A
(c
ity
Unit Outcomes
At the end of this unit, participants will be able to:
●● Define Hypothesis
●● Describe different types of Hypothesis
rs
●● Explain Testing Hypothesis and its Significance Levels
●● Identify Type - I and Type - II Errors with Examples
●● Demonstrate One-tailed and Two-tailed Test
ve
●● Analyze Confidence Intervals
●● Analyze Bayesian Statistics
ni
“Hypothesis may be defined as a proposition or a set of propositions set forth as an
explanation for the occurrence of some specified group of phenomena either asserted
merely as a provisional conjecture to guide some investigation in the light of established
U
facts” (Kothari, 1988).
Characteristics of Hypothesis
●● A hypothesis must be precise and clear. If it’s not precise and clear, then the
ity
disproved by observation”.
●● A hypothesis must state relationship between two variables, in the case of
relational hypotheses.
●● To be considered reliable, the hypothesis must be clear and precise.
)A
●● Simple Hypothesis
●● Complex Hypothesis
Notes ●● Empirical Hypothesis
ity
●● Null Hypothesis
●● Alternative Hypothesis
●● Logical Hypothesis
●● Statistical Hypothesis
rs
Simple Hypothesis: In simple hypothesis there exists relationship between two
variables one is called independent variable or cause and other is dependent variable
or effect. For example
ve
●● High rate of unemployment leads to crimes.
Complex Hypothesis: In complex hypothesis there exist a relationship among
more variables (more than two dependent and independent). For example
ni
rape, prostitution & killing etc.
Empirical / Working Hypothesis: When a theory is put to the test through
observation and experiment, it becomes an empirical hypothesis, or working
hypothesis. It’s no longer just a thought or a hypothesis. It’s a matter of trial and error,
U
and possibly rearranging those independent variables.
Null Hypothesis: The null hypothesis, H0, denotes a theory that has been
proposed but not proven, either because it is believed to be true or because it is
intended to be used as a basis for argument. The null hypothesis in a clinical trial of a
new drug, for example, might be that the new drug is no better than the current drug on
ity
average. We’d write H0: on average, there’s no difference between the two drugs.
The null hypothesis is given special attention. This is because the null hypothesis
is concerned with the statement being tested, whereas the alternative hypothesis
is concerned with the statement that will be accepted if/when the null hypothesis is
rejected.
m
After the test has been completed, the final conclusion is always expressed in
terms of the null hypothesis. “Reject H0 in favour of H1” or “Do not reject H0” are the
only options; “Reject H1” or “Accept H1” are never options.
“Do not reject H0” does not imply that the null hypothesis is correct; rather, it
)A
implies that there is insufficient evidence to reject H0 in favour of H1. When the null
hypothesis is rejected, it implies that the alternative hypothesis is true.
An alternative hypothesis is that the new drug is, on average, superior to the
current drug. In this case, we’d write.
H1: On average, the new drug is better than the current drug.
Notes
ity
After the test has been completed, the final conclusion is always expressed in
terms of the null hypothesis. “Reject H0 in favour of H1” or “Do not reject H0” are the
two options. We never say “Reject H1” or even “Accept H1” as a conclusion.
“Do not reject H0” does not imply that the null hypothesis is correct; rather, it
implies that there is insufficient evidence to reject H0 in favour of H1. When the null
hypothesis is rejected, it implies that the alternative hypothesis is true.
rs
Logical Hypothesis: A logical hypothesis is a proposed explanation for which
there is only a small amount of evidence. In general, you want to turn a logical
hypothesis into an empirical hypothesis by testing your theories or postulations.
ve
Statistical Hypothesis: A statistical hypothesis is a claim about the parameters
or form of a probability distribution for a specific population or populations, or, more
broadly, about a probabilistic mechanism that is supposed to generate the observations.
ni
Testing a Hypothesis
The null hypothesis (H0) states that two or more groups or factors have no effect,
relationship, or difference. A researcher’s primary goal in a research study is to disprove
the null hypothesis.
For example, there is no difference in intubation rates between children aged 0 and
5.
m
The survival rates of the intervention and control groups are identical (or, the
intervention does not improve survival rate).
There is no link between the type of injury and whether the patient was given an IV
in the prehospital setting.
)A
The alternative hypothesis (H1) asserts that a difference or effect exists. This
is typically the hypothesis that the researcher is attempting to prove. The alternative
hypothesis can be one-sided (only provides one direction, for example, lower) or two-
sided (provides both directions). Even when our true hypothesis is one-sided, we
frequently use two-sided tests because accepting the alternative hypothesis requires
(c
The success rate of intubation varies depending on the age of the patient being
treated (two-sided).
The intervention group’s time to resuscitation from cardiac arrest is shorter than the
Notes control group’s (one-sided).
ity
Step 3: Set the Significance Level (a)
The significance level is usually set at 0.05 (represented by the Greek letter
alpha— a). This means that if your null hypothesis is true, there is a 5% chance that
you will accept your alternative hypothesis. The greater the burden of proof required
to reject the null hypothesis, or to support the alternative hypothesis, the smaller the
rs
significance level.
Step 4: Determine the Test Statistic and the P-Value that corresponds.
ve
In most cases, hypothesis testing employs a test statistic that compares groups or
investigates relationships between variables. A confidence interval is commonly used to
describe a single sample without establishing relationships between variables.
P-value <= significance level (a) => Reject your null hypothesis in favour of your
ni
alternative hypothesis. Your result is statistically significant.
P-value > significance level (a) => Fail to reject your null hypothesis. Your result is
not statistically significant.
U
If your null hypothesis is true, the p-value describes the likelihood of obtaining a
sample statistic as or more extreme by chance alone. The result of your test statistic
is used to calculate this p-value. Your p-value and significance level are used to draw
conclusions about the hypothesis.
Significance Level
ity
The probability of rejecting the null hypothesis when it is true is known as the
significance level, also known as alpha or. A significance level of 0.05, for example,
indicates a 5% chance of concluding that a difference exists when there is none.
The significance level determines how far the line on the graph will be drawn from
the null hypothesis value. We need to shade the 5% of the distribution that is furthest
away from the null hypothesis to graph a significance level of 0.05.
)A
Although type I and type II errors are impossible to completely avoid, the
investigator can reduce the likelihood of them by increasing the sample size (the
larger the sample, the lesser is the likelihood that it will differ substantially from the
population). Notes
ity
Bias can also lead to false-positive and false-negative results (observer,
instrument, recall, etc.). (Bias errors, on the other hand, are not classified as type I or
type II errors.) Such errors are inconvenient because they are often difficult to detect
and cannot be quantified.
rs
3.2.5 One - tailed and Two - tailed Tests:
One - tailed Test
A test of H0 which assumes, that the difference between sample parameter and
population statistics is in only one direction.
ve
e.g. Maximum 15kgs of chemical waste is produced per batch of 60kgs. However a
random sample of 100 batches gives an average of 16kgs of chemical waste per patch.
Test at 10% level of significance, whether average quantity of waste per batch has
increased?
ni
When the hypothesis about the population parameter is rejected for the value of
sample statistic falling into onside tail of the distribution, then it is known as one-tailed
test.
e.g. Average height of 20 students = 168 cms. Can this be considered as a sample
ity
as a percentage.
Confidence intervals are most commonly used to bound the mean or standard
deviation, but they can also be used to bound regression coefficients, proportions, rates
of occurrence (Poisson), and population differences.
)A
If you can assess many intervals and know the value of the population parameter,
the confidence level represents the theoretical ability of the analysis to produce
(c
ity
—Jerzy Neyman, the inventor of the confidence interval.
Because the procedure tends to produce intervals that contain the parameter,
confidence intervals serve as good estimates of the population parameter. The point
rs
estimate (the most likely value) and a margin of error around that point estimate
make up confidence intervals. The margin of error describes the level of uncertainty
surrounding a sample estimate of a population parameter.
In this vein, confidence intervals can be used to evaluate the precision of a sample
ve
estimate. A narrower confidence interval [90 110] for a specific variable indicates a more
precise estimate of the population parameter than a wider confidence interval [50 150].
Margin of Error
Let’s look at how confidence intervals are used to account for that margin of error.
We’ll use the same tools we used to understand hypothesis tests to accomplish this.
ni
Using probability distribution plots, the t-distribution, and the variability in our data, I’ll
create a sampling distribution. Our confidence interval will be based on the energy cost
data set we’ve been using.
The shaded area depicts the range of sample means that you’d get 95% of the
time if you used our sample mean as the population mean point estimate. Our 95
percent confidence interval is this range [267 394].
It’s easier to understand how a confidence interval represents the margin of error,
or the amount of uncertainty, around a point estimate when you look at the graph. Given
the information available, the sample mean is the most likely value for the population
(c
mean. However, the graph shows that other random samples drawn from the same
population could have different sample means within the shaded area, which is not
unusual. These other possible sample means all point to a different conclusion.
These graphs can be used to calculate probabilities for specific values. However,
because the population mean is unknown, you won’t be able to plot it on the graph. Notes
ity
As a result, as Neyman pointed out, you can’t calculate probabilities for the population
mean!
rs
to statistical problems. In Bayesian statistics, the interpretation of probability is a
description of how certain some statement, or proposition, is true.
ve
●● If the probability is 0.5, then we are as uncertain state, as we would be about
a fair coin toss.
●● If the probability is 0.95, then we’re quite sure the statement is true, but it
wouldn’t be too surprising to us if we found out the statement was false.
0......................probability..........................1
ni
The above figure can say that the probability can be used to describe degrees of
certainty, or how plausible some statement is. 0 and 1 are the two extremes of the scale
and correspond to complete certainty. However, probabilities are not static quantities.
When we get more information, our probabilities can change.
U
It might sound like there is nothing more to Bayesian statistics than just thinking
about a question and then blurting out a probability that feels appropriate. For example,
we may be on “Who Wants to be a Millionaire?” and not know the answer to a question,
so we might think the probability that it is A is 25%. But if we call our friend using “phone
ity
a friend”, and our friend says, “It’s definitely A”, then we would be much more confident
that it is A! our probability probably wouldn’t go all the way to 100%.
We will now look at a simple example to demonstrate the basics of how Bayesian
statistics works.
●● We start with some probabilities at the beginning of the problem are called
m
prior probabilities.
●● And how exactly these get updated when we get more information, these
updated probabilities are called posterior probabilities.
●● To make all these clearer, we will use a table that we will call a Bayes’ Boxto
)A
ity
is white and another is black so that we can label our two competing hypotheses BB
and BW. So, at the beginning of the problem, we know that one and only one of the
following statements/hypotheses is true:
rs
and observes its colour. The result of this experiment is
A Bayesian analysis starts by choosing some values for the prior probabilities with
our two competing hypotheses BB and BW, and we need to choose some probability
ve
values to describe how sure we’re that each of these is true. Since we are taking two
hypotheses then there will be two prior probabilities, one for BB and one for BW. For
simplicity, we will assume that we don’t have much of an idea which is true, and so we
will use the following prior probabilities:
P (BB) = 0.5
P (BW) = 0.5.
ni
The above two hypotheses are mutually exclusive (they can’t both be true) and
exhaustive (one of these is true; it can’t be some undefined third option). The choice of
U
0.5 for the two prior probabilities describes the fact that, before we did the experiment,
we were very uncertain about which of the two hypotheses was true. Now present a
Byes’ Box, which lists all the hypotheses that might be true, and the prior probabilities.
There are some extra columns which we haven’t discussed yet, and will be needed in
order to figure out the posterior probabilities in the final column. The first column of a
ity
Bayes’ Box is that the list of hypotheses we’re considering. In this case there are just
two. If need to construct a Bayes’ box for a new problem then just think about what the
possible answers to the problem are, and list them in the first column. The 2nd column
lists the prior probabilities for every hypothesis. Above, before we did the experiment,
we decided to say that there was a 50% probability that BB is true and a 50%
probability that BW is true, hence the 0.5 values in this column. The prior column should
m
always sum to 1. Remember, the prior probabilities only describe our initial uncertainty,
before taking the data into account.
BB 0.5
)A
BW 0.5
Totals: 1
Likelihood
The third column is called likelihood by which we can calculating the posterior
(c
likelihoods, so you can tell from this that the likelihood is something different for each
hypothesis. But what is it exactly? Notes
ity
Here is the Bayes’ Box with the likelihood column filled in
BB 0.5
BW 0.5
rs
Totals: 1
First calculate the value of the likelihood for the BB hypothesis. Remember, the
data we are analysing here is that we chose one of the balls in the bag “at random”, and
it was black. The likelihood for the BB hypothesis is therefore the probability that we
ve
would get a black ball if BB is true.
Imagine that BB is true. That means both balls are black. What is the probability
that the experiment would result in a black ball? That’s easy – it’s 100%! So, we put the
number 1 in the Bayes Box as the likelihood for the BB hypothesis.
ni
Now imagine instead that BW is true. That would mean one ball is black and
the other is white. If this were the case and we did the experiment, what would be
the probability of getting the black ball in the experiment? Since one of the two balls
is black, the chance of choosing this one is 50%. Therefore, the likelihood for the BW
hypothesis is 0.5, and that’s why we put 0.5 in the Bayes’ Box for the likelihood for BW.
U
In general, the likelihood is the probability of the data that you actually got,
assuming a particular hypothesis is true. In this example it was fairly easy to get the
likelihoods directly by asking “if this hypothesis is true, what is the probability of getting
the black ball when we do the experiment?” Sometimes this is not so easy, and it can
ity
be helpful to think about ALL possible experimental outcomes/data you might have seen
– even though ultimately, we just need to select the one that actually occurred.
BB Black Ball 1
White Ball 0
m
considering not just the data that actually occurred, but all data that might have
occurred. Ultimately, it is only the probability of the data which actually occurred that
matters, so this is highlighted in blue.
The third column of the Bayes’ Box is the product of the prior probabilities and the
(c
likelihoods, calculated by simple multiplication. The result will be called “prior times
likelihood”, but occasionally we will use the letter h for these quantities. This is the un-
normalized posterior. It does not sum to 1 as the posterior probabilities should, but it is
Notes at least proportional to the actual posterior probabilities.
ity
To find the posterior probabilities, we take the prior likelihood column and divide
it by its sum, producing numbers that do sum to 1. This gives us the final posterior
probabilities, which were the goal all along. The completed Bayes’ Box is shown below:
rs
BB 0.5 1 0.5 0.667
BW 0.5 0.5 0.25 0.333
Totals: 1 0.75 1
We can see that the posterior probabilities are not the same as the prior
ve
probabilities, because we have more information now! The experimental result made
BB a little bit more plausible than it was before. Its probability has increased from 1/2 to
2/3.
Interpretation
ni
The posterior probabilities of the hypotheses are proportional to the prior
probabilistic and the likelihoods. A high prior probability will help a hypothesis have a
high posterior probability. To understand what this means about reasoning, consider the
meanings of the prior and the likelihood. There are two things that can contribute to a
hypothesis being plausible:
U
If the prior probability is high. That is, the hypothesis was already plausible, before
we got the data.
If the hypothesis predicted the data well. That is, the data was what we would have
expected to occur if the hypothesis had been true.
ity
Bayes Rule:
For example, the left-hand side of the equation is P (A B) and that means the
probability of A given B. That is, it’s the probability of ‘A’ after taking into account the
m
information ‘B’. In other words, P (A B) is a posterior probability, and Bayes’ rule tells us
how to calculate it from other probabilities. Bayes’ rule is true for any statements A and
B.
In Bayesian statistics, most of the terms in Bayes’ rule have special names. Some
of them even have more than one name, with different scientific communities preferring
different terminology. Here is a list of the various terms and the names we will use for Notes
ity
them:
P (H) is the prior probability, which describes how sure we were that H was true,
rs
before we observed the data D.
P (D\ H) is the likelihood. If you were to assume that H is true, this is the probability
that we would have observed data D.
ve
P (D) is the marginal likelihood. This is the probability that we would have observed
data D, whether H is true or not
ni
U
ity
m
)A
(c
ity
Unit Objectives:
At the end of this unit the participant will be able to learn:
rs
●● t-test
●● F-test
●● Correlation and Regression
ve
●● Chi-square Test
●● Factor Analysis
Parametric Test
ni
Parametric test is a testing procedure that requires assumption about the type of
population or parameters.
U
Parametric tests have following advantages:
1. Parametric tests are more powerful here data is derived from interval and ratio
measurement.
2. In parametric tests, it’s assumed that the data follows normal distributions.
ity
Examples of parametric tests are (a) Z-Test, (b) T-Test and (c) F-Test.
3. Observations must be independent i.e., selection of any one item should not
affect the chances of selecting any others be included in the sample.
The following tests are based on the assumption that the samples were drawn from
normally distributed populations:
m
●● F - test
●● t - test
●● Z - test
)A
Non-Parametric Test
A group of alternative techniques known as non-parametric tests were developed
since, it was not always possible to make a rigid assumption about the population
distribution from which, and the samples were being drawn. The prominent examples of
non-parametric test are:
●● Goodness of fit
3.3.2 Z-test
1. One Sample Test One sample tests can be categorized into 2 categories.
Notes
ity
z Test 1. When sample size is > 30
P1 = Proportion in sample 1
P2 = Proportion in sample 2
Example: You are working as a purchase manager for a company. The following
information has been supplied by two scooter tire manufacturers.
rs
Company A Company B
Mean life (in km) 13000 12000
S.D (in km) 340 388
ve
Sample size 100 100
In the above, the sample size is 100; hence a Z-test may be used.
2. Testing the hypothesis about difference between two means: This can be used when
two population means are given and null hypothesis is Ho: P1 = P2
ni
Example: In a city during the year 2000, 20% of households indicated that they
read Femina magazine. Three years later, the publisher had reasons to believe that
circulation has gone up. A survey was conducted to confirm this. A sample of 1,000
respondents were contacted and it was found 210 respondents confirmed that they
U
subscribe to the periodical ‘Femina’.
From the above, can we conclude that there is a significant increase in the
circulation of ‘Femina’? Solution: We will set up null hypothesis and alternate
hypothesis as follows:
ity
=8.33
As the value of Z at 0.05 =1.64 and calculated value of Z falls in the rejection
m
region, we reject null hypothesis, and therefore we conclude that the sale of ‘Femina’
has increased significantly
3.3.3 T-test
)A
T-test is used in the following circumstances: When the sample size n < 30. Discus
with following example:
There are two nourishment programmers: ‘A’ and ‘B’. Two groups of children are
subjected to this. Their weight is measured after six months. The first group of children
subjected to the program ‘A’ weighed 44, 37, 48, 60, 41 kgs. at the end of programme.
(c
The second group of children were subjected to nourishment program ‘B’ and their
weight was 42, 42, 58, 64, 64, 67, 62 kgs. at the end of the programme. From the
above, can we conclude that nourishment programme ‘B’ increased the weight of the
ity
Null Hypothesis: There is no significant difference between Nourishment
programme ‘A’ and ‘B’. Alternative Hypothesis: Nourishment programme B is better than
‘A’ or Nourishment programme ‘B’ increase the children’s weight significantly. Solution:
rs
X (X-X/ ) =(X-46) (X-X/ )2 Y (Y-Y/ )= (y-57)
44 -2 4 42 -15
37 -9 81 42 -15
48 2 4 58 1
ve
60 14 196 64 7
41 -5 25 64 7
67 10
62 5
Total
Here
0
ni
310 399 0
U
ity
m
)A
●● F-Test:
Let there be two independent random samples of sizes n1 and n2 from two normal
populations:
(c
Notes
ity
rs
Features of F- distribution
1. This distribution has two parameters v1 (= n1 – 1) and v2 (= n2 – 1).
ve
2. The mean of F - variant with v1 and v2 degrees of freedom is v2 /( v2 -2)
(v2 / v2 -2)
We note that the mean will exist if v Notes 2 > 2 and standard error will exist if v2 > 4.
Further, the mean > 1.
3. The random variate F can take only positive values from 0 to ∞. The curve is positively
4.
5.
skewed.
ni
For large values of v1 and v2, the distribution approaches normal distribution.
If a random variate follows t-distribution with v degrees of freedom, then its square
U
follows F-distribution with 1 and v d.f. i.e. t2v = F1,v
F and χ2 are also related as Fv1v2= χ2v1/v1
ity
m
)A
“If two or more quantities vary in sympathy so that movements in one tend to
Notes
ity
be accompanied by corresponding movements in other(s) then they are said to
be correlated.”
L.R. Connor–
rs
is known as correlation.”
Croxton and Cowden–
ve
variables”.
YaLun Chou–
ni
Existing of correlation between more than two variables can implies that these
variables
U
(i) either tend to increase or decrease together or (ii) an increase (or decrease)
in one is accompanied by the corresponding decrease (or increase) in the other.
The questions of the type, whether changes in a variable are due to changes in the
other, i.e., whether a cause and effect type relationship exists between them, are not
answered by the study of correlation analysis. If there is a correlation between two
ity
1. One of the variable may be affecting the other: A correlation coefficient calculated
from the data on quantity and corresponding price of cashew would only reveal
that the degree of association between them is very high. It will not give us any idea
about whether price is affecting demand of cashew or vice-versa. In order to know
this, we need to have some additional information apart from the study of correlation.
m
For example if, on the basis of some additional information, we say that the price of
tea affects its demand, then price will be the cause and quantity will be the effect.
The causal variable is also termed as independent variable while the other variable
is termed as dependent variable.
)A
2. The two variables may act upon each other: Cause and effect relation exists in
this case also but it may be very difficult to find out which of the two variables is
independent.
Example: If we have data on price of wheat and its cost of production, the correlation
among them may be high because higher price of wheat may attract farmers to
produce more wheat and more production of wheat may mean higher cost of
(c
production, assuming that it is an increasing cost industry. Further, the higher cost of
production may in turn raise the price of wheat.
For the purpose of determining a relationship between the two variables in such
situations, we can take any one of them as independent variable. Notes
ity
3. The two variables may be acted upon by the outside influences: In this case we
might get a high rate of correlation between the two variables, however, apparently
no cause and effect type relation seems to exist between them.
Example: The demands of the two commodities, say X and Y, may be positively
correlated because the incomes of the consumers are rising. Coefficient of correlation
rs
obtained in such a situation is called a spurious or nonsense correlation.
4. A high value of the correlation coefficient may be obtained due to sheer coincidence
(or pure chance): This is another situation of spurious correlation. Given the data on
any two variables, one may obtain a high value of correlation coefficient when in fact
ve
they do not have any relationship
Example: A high value of correlation coefficient may be obtained between the size of
shoe and the income of persons of a locality Merits and Limitations of Coefficient of
Correlation.
The only merit of Karl Pearson’s coefficient of correlation is that it is the most popular
ni
method for expressing the degree and direction of linear association between the
two variables in terms of a pure number, independent of units of the variables. This
measure, however, suffers from certain limitations, given below:
1. Coefficient of correlation r does not give any idea about the existence of cause
U
and effect relationship between the variables. It is possible that a high value
of r is obtained although none of them seem to be directly affecting the other.
Hence, any interpretation of r should be done very carefully.
2. It is only a measure of the degree of linear relationship between two variables.
If the relationship is not linear, the calculation of r does not have any meaning.
ity
3. Its value is unduly affected by extreme items. 4. If the data are not uniformly
spread in the relevant quadrants the value of r may give a misleading
interpretation of the degree of relationship between the two variables. Just Like
if there have some values concentrating around a point at first quadrant and
there are similar type of concentration in third quadrant, the value of r will be
very high although there may be no linear relation between the variables.
m
Regression Analysis
)A
If the coefficient of correlation calculated for bivariate data (Xi, Yi), i= 1, 2, ......n,
in all fairness high and a cause-and-effect kind of relation is additionally believed to
be existing between them, the subsequent logical step is to get a functional relation
between these variables. This functional relation is understood as regression equation
of Y on X. Since the coefficient of correlation is measure of the degree of linear
association of the variables, we shall discuss only simple regression equation.
(c
The regression equations are useful for predicting the worth of dependent variable
for given value of the independent variable. As pointed out earlier, the nature of a
Amity Directorate of Distance & Online Education
74 Business Research Methods
ity
= 5. However, if Y = 10 + 2X is a regression equation, then Y = 20 is an average value
of Y when X = 5.
The term regression was first introduced by Sir Francis Galton in 1877. In his study
of the relationship between heights of fathers and sons, he found that tall fathers were
likely to have tall sons and vice-versa. However, the mean height of sons of tall fathers
rs
was lower than the mean height of their fathers and the mean height of sons of short
fathers was higher than the mean height of their fathers. During this way, a bent of the
civilisation race to regress or to return to a traditional height was observed. Sir Francis
Galton referred this tendency of returning to the mean height of all men as regression
in his research paper, “Regression towards mediocrity in hereditary stature”. The term
ve
‘Regression’, originated in this particular context, is now utilized in various fields of
study, even though there may be no existence of any regressive tendency.
Simple Regression
For a bivariate data (Xi, Yi), i= 1, 2, ......n, we are able to have either X or Y as
ni
independent variable. If X is independent variable then we can estimate the average
values of Y for a given value of The relation used for such estimation is called
regression of Y on X. If on the other hand Y is employed for estimating the average
values of X, the relation termed as regression of X on Y. For a bivariate data, there
will always be two lines of regression. It will be shown later that these two lines are
U
different, i.e., one can’t be derived from the other by mere transfer of terms, because
the derivation of every line is dependent on a different set of assumptions.
Line of Regression of Y on X
ity
The above line is known if the values of a and b are known. These values are
estimated from the observed data (Xi, Yi), i= 1, 2, ...... n.
A chi-square test (χ2 test) is a statistical hypothesis test in which the sampling
distribution of the test statistic is a chi-square distribution when the null hypothesis is
true, or asymptotically true, meaning that the sampling distribution can be made to
approximate a chi-square distribution as closely as desired by making the sample size
large enough.
ity
Or total frequency should be greater than 50.
3. There should be a minimum of five observations in any cell. This is called cell
frequency constraint.
For instance: Chi-square
rs
Persons Age Group Total
Under 20-40 20-40 41-51 51 & Over
ve
Total 200 130 80 90 500
Is there any significant difference between the age group and preference for the
car?
Example: A company marketing tea claims that 70% of population in a metro drinks
ni
a particular brand (Wood Smoke) of tea. A competing brand challenged this claim. They
took a random sample of 200 families to gather data. During the study period, it was
found that 130 families were using this brand of tea. Will it be correct on the part of
competitor to conclude that the claim made by the company does not holds good at 5%
U
level of significance?
Solution:
Hypothesis H0 – People who drink Wood Smoke brand is 70%.
If the hypothesis is true then number of consumers who drink this particular brand
is 200 × 0.7
= 140.
(O) (E)
Those who drink branded 130 140 -10 100 0.714
tea
200 200 0
(c
A 0.5 level of significance of for 1 d.f. is equal to 3.841 (From tables). The
Notes calculated value is 2.381 is lower. Therefore, we accept the hypothesis that 70% of the
ity
people in that metro drink Wood Smoke branded tea.
rs
Each factor will account for one or more component. Each factor a combination of
many variables.
There are two most commonly employed factor analysis procedures or methods.
They are:
ve
1. Principle component analysis
2. Common factor analysis.
When the objective is to summarise information from a large set of variables
into fewer factors, principle component factor analysis is used. On the other hand, if
analysis is used.
ni
the researcher wants to analyse the components of the main factor, common factor
Example: Common factor – Inconvenience inside a car. The components may be:
U
1. Leg room
2. Seat arrangement
3. Entering the rare seat
4. Inadequate dickey space
ity
Summary:
●● Define Correlation & Regression
●● Factor Analysis
m
●● Chi-square test
●● Different kinds of parametric/ nonparametric test
Questions:
)A
ity
Unit Objectives:
At the end of this unit the participant will be able to learn:
rs
3. Analyse and run example on parametric test
4. Analyse and run example on non-parametric test
Purposes: Customer feedback about a two-wheeler manufactured by a company.
ve
Method: The MR manager prepares a questionnaire to study the customer
feedback. The researcher has identified six variables or factors for this purpose. They
are as follows:
ni
2. Durability (Life) (B)
3. Comfort (C)
4. Spare parts availability (D)
5. Breakdown frequency (E)
U
6. Price (F)
The questionnaire may be administered to 5,000 respondents. The opinion of the
customer is gathered. Let us allot points 1 to 10 for the variables factors A to F. 1 is the
minimum and 10 is the maximum. Let us assume that application of factor analysis has
ity
F into Factor -2
C into Factor - 3
For future analysis, while conducting a study to obtain customers’ opinion, three
)A
factors mentioned above would be sufficient. One main purpose of using factor analysis
is to minimize the number of independent variables in the study. By having too many
independent variables, the M.R study will suffer from following disadvantages:
1. Time for data collection is very high due to several independent variables.
2. Expenditure increases due to the time factor.
3. Computation time is more, resulting in delay.
(c
ity
SPSS is a powerful statistical software program with a graphical interface designed
for ease of use. Almost all commands and options can be accessed using pull down
menus at the top of the main SPSS window. This design means that once you learn a
few basic steps to access programs, it is very easy to expand your knowledge in using
SPSS through the help files. To access the online SPSS help, you click on Help in the
menu and then click on Topics if you want help by topic or on Tutorials for step-by-
rs
step hands-on guide. How to get SPSS is installed on all I Tap (Information Technology
at Purdue) machines in all ITaP labs around Statistical Packages Standard Software
Programs campus. To get into the program: click
ve
ni
U
ity
m
Notes
ity
rs
ve
ni
Opening data from external files
Sometimes you have already entered the SPSS session as described above,
worked on a data set for a while, and then want to open and work on another data set.
U
You do not have to quit the current SPSS session to perform this. Simply click on the
File menu, follow Open then Data… and find your file.
ity
m
)A
(c
Once in SPSS, in the SPSS Data Editor click on File, then Open and then choose
data as shown in Figure 3, andEnter and the screen as shown in Figure 4 is given. In
Notes Look in: specify the location of the data file, under File name: specify its name; and
ity
under Files of type: specify the file type. The dataset we are working with is a called
Cars.csv as shown in Figure 4. (Download a copy of this data set click here)
rs
ve
ni
Figure below shows how one imports the data from Car.csv. Since this file has
U
variable names at the top of the file, then click the Yes button under “Are variable
names included at the top of your file?” Click on Next until the data pops in the Data
View. Similar steps can be taken to import data from Excel and many spreadsheets and
text files.
ity
m
)A
(c
ity
When open the SPSS program then open a blank spreadsheet in Data View. If
already a data set open but want to create a new one, then click File
rs
(“ ,” “ ,” and so on). The rows will represent cases that will be a part of our data set.
When we enter values for our data in the spreadsheet cells, each value will correspond
to a specific variable (column) and a specific case (row
ve
ni
U
ity
●● Click the Variable View tab. Type the name for our 1st variable under the Name
column. We can also enter other information about the variable, such Click the
Variable View tab. The type (the default is “numeric”), width, decimals, label, etc.
m
Type the name for each variable that we plan to include in your dataset. In this
example, I will type “School_Class” since I plan to include a variable for the class
level of each student (i.e., 1 = first year, 2 = second year, 3 = third year, and 4 =
fourth year). I will also specify 0 decimals since my variable values will only include
whole numbers. (The default is two decimals.)
)A
(c
Click the Data View tab. Any variable names that we entered in Variable View
will now be included in the columns (one variable name per column). We can see that
School_Class appears in the first column in this example
Notes
ity
rs
ve
ni
Now we can enter values for each case. In this example, cases represent students.
For each student, enter a value for their class level in the cell that corresponds to the
appropriate row and column. For example, the 1st person’s information should appear
in the 1st row, under the variable column School_Class. In this example, the 1st
person’s class level is “ ,” the second person’s is “ ,” the third person’s is “ ,” the fourth
U
person’s is “ ,” and so on
ity
m
)A
(c
distributed a survey as part of your data collection, and each survey was labelled with
a number (“I,” “II,” etc.). In this example, the survey numbers essentially represent ID Notes
ity
numbers: numbers that help us to identify which pieces of information go with which
respondents in our sample. Without these ID numbers, we have no way of tracking
which information goes with which respondent, and it would be impossible to enter the
data accurately into SPSS. At the time of entering data into SPSS, we need to enter
values for each variable that correspond to the correct person or object in our sample.
It might seem like a simple solution to use the conveniently labelled rows in SPSS as
rs
ID numbers; we can enter our first respondent’s information in the row that is already
labelled “I,” the second respondent’s information in the row labelled “II,” and so on.
However, we should never rely on these pre-numbered rows for keeping track of
the specific respondents in our sample. This is because the numbers for every row
ve
are visual guides only—they’re not attached to specific lines of data, and thus cannot
be used to identify specific cases in our data. If our data become rearranged (e.g.,
after sorting data), the row numbers will no longer be associated with the same case
as when we first entered the data. Again, the row numbers in SPSS aren’t attached
to specific lines of info and might not be used to identify certain cases. Instead, you
should create a variable in your dataset that is used to identify each case—for example,
a variable called StudentID.
ni
For an example that illustrates why using the row numbers in SPSS as case
identifiers is flawed:
U
Let’s say that we have entered values for each person for the School_Class
variable. We relied on the row numbers in SPSS to correspond to our survey ID
numbers. Thus, for survey #1, we entered the first respondent’s information in row 1, for
survey #2 we entered the second person’s information in row 2, and so on.
But suppose the data get reorganized in the spreadsheet view. A common way of
ity
reorganised data is by sorting. Sorting will rearrange the rows of data so that the values
appear in ascending or descending order. If we right-click on any variable name, we can
select “Sort Ascending” or “Sort Descending.” In the example below, the data are sorted
in ascending order on the values for the variable School_Class.
perhaps we need to double-check our entry of the data by comparing the original
survey to the values you entered in SPSS. Now that the data have been rearranged,
there is no way to identify which row corresponds to which participant/survey number.
)A
(c
Inserting Acase
Notes
ity
To insert a new case into a dataset:
●● In Data View, click a row number or individual cell below where we want our new
row to be inserted.
●● We can insert a case in several ways: Click Edit > Insert Cases;
Right-click on a row and select Insert Cases from the menu; or
rs
Click the Insert Cases icon ( ).
A new, blank row will appear above the row or cell we selected. Values for each
existing variable in our dataset will be missing (indicated by either a “.” or a blank cell)
for our newly created case since we have not yet entered this information
ve
ni
U
ity
Deleting A Case
To delete an existing case from a dataset:
m
●● In the Data View tab, click the case number (row) that we wish to delete. This will
highlight the row for the case we selected.
●● Press Delete on our keyboard, or right-click on the case number and select
)A
“Clear”. This will remove the entire row from the dataset.
Deleting A Variable
To delete an existing variable from a dataset:
●● In the Data View tab, click the column name (variable) that we wish to delete. This
will highlight the variable column.
(c
●● Press Delete on our keyboard, or right-click on the selected variable and click
“Clear.” The variable and associated values will be removed.
ity
●● Click on the row number corresponding to the variable we wish to delete. This will
highlight the row.
●● Press Delete on our keyboard, or right-click on the row number corresponding to
the variable we wish to delete and click “Clear”.
rs
ve
ni
U
ity
m
)A
(c
Notes
ity
rs
ve
ni
One-Sample Statistics
U
N Mean Std. Deviation Std. Error Mean
Classroom Community 169 28.84 6.242 0.480
The chi-square test could be used to determine if a basket of fruit contains equal
proportions of apples, bananas, oranges, and peaches
Fruits Count
orange 1
orange 1
m
mango 2
banana 3
lemon 4
banana 3
)A
orange 1
lemon 4
lemon 4
orange 1
mango 2
(c
banana 3
lemon 4
banana 3
orange 1
Notes
ity
lemon 4
lemon 4
SPSS Steps:
rs
ve
ni
U
ity
m
)A
(c
Notes
ity
rs
Get the count in the test variable list
ve
ni
U
ity
m
Total 17
Notes
ity
Count
Chi-Square .176
df 3
Asymp. Sig. .981
rs
Interpretation
Here p value is 0.981 which is more than 0.05. Hence it is not significant and we
fail to reject the null hypothesis and conclude that there is no significant difference in
the proportions of apples, bananas, oranges, and peaches.
ve
We could also test to see if a basket of fruit contains 10% apples, 20% bananas,
50% oranges, and 20% peaches. For this we have to define the proportions by
checking the button “Values” and keep on adding
Summary
ni
●● Introduction to SPSS
●● The way of Creating Data file
●● Run example on parametric test
U
Questions
1. What is SPSS what is the usefulness of it.
2. Give the pictorial representation of the run an example on parametric / nonparametric
test
ity
Exercises:
1. ______ research is aimed at expanding knowledge and does not involve inventing
or creating anything.
m
a. Basic
b. Exploratory
c. Action
d. Descriptive
)A
d. Descriptive
3. Which of the following is an example of Applied research?
ity
b. Devising solutions for arresting employee attrition
c. Conducting an archaeological study on a few historical artifacts
d. Conducting a survey related to preference of face wash products
4. Which of the following options is FALSE about empirical research?
a. The findings are subject to verification by experiment or observation
rs
b. You need to collect data to prove or disprove the hypotheses
c. Involves guidelines and techniques by which you can utilise historical sources,
artifacts, and other evidence for researching and establishing facts
ve
d. Is a data-based research technique
5. Comparing the carpentry tools used during the Gupta and Maurya dynasties is an
example of ______ research.
a. Explanatory
b. Exploratory
6.
c.
d.
Descriptive
Historical
ni
Which of the following research techniques are conducted by companies during later
U
phases of decision-making?
a. Descriptive
b. Exploratory
c. Explanatory
ity
d. Both a and c
7. Which of the following is/are example/examples of Causal research?
a. How incentives affect employee performance
b. How employee attrition affects profitability
m
a. Unstructured
ity
c. Formulating and testing research hypotheses
d. Conducted in the later phases of decision-making
10. Read the below statements and identify the correct option.
a. The findings of 1- _______ research offer a final conclusion or conclusive
evidence to the research problem.
rs
b. ______ research involves studying only one variable
c. ______ research is aimed at devising solutions for an immediate problem in a
company, the society, or a person
d. _______ research is conducted to analyse how certain changes affect existing
ve
standard procedures
11. Which of the following is FALSE about hypothesis?
a. Hypothesis is a tentative statement, which is subject to verification
b. Hypothesis is a tested, well-substantiated, complete explanation for a set of
c.
d.
proven factors
Hypothesis is conceptually different from theory
ni
Hypothesis is a testable relationship between at least two variables
U
12. A research firm conducts a study to establish the minimum purchasing power
required for the medium and large retail stores as Rs. 150 million and Rs. 300 million,
respectively. Identify the INCORRECT statement.
a. The null hypothesis is - total purchasing power is less than Rs. 150 million
b. One of the alternative hypotheses is - total purchasing power is between Rs.
ity
13. A renowned automobile company is under the process of launching a new luxury
car. The car is aimed at catering to the HNI (High Net Worth Individual) population.
The company wants to conduct a detailed market research on the popular choices
of luxury cars in the country. It recruits a research team, which comes up with
the research problem - “Is luxury car a popular among HNI clients?” Which of the
)A
14. Read the below statements and identify the wrong one(s).
Notes
ity
1. Complex hypothesis establishes a causal relationship between two variables
2. Null hypothesis establishes a relationship among more than two variables
3. An alternative hypothesis is used for a reverse strategy
4. “Performance at work is not related to salary alone” is an example of alternative
hypothesis
rs
a. Only 1 and 3 are wrong
b. Only 4 is wrong
c. Only 2 and 4 are wrong
ve
d. All 4 statements are wrong
15. A renowned Fashion magazine conducts a market research to find out the need for
advertisement. The research team constructs the below hypotheses: H0: At least 30
% of the readers consists of women H1: Less than 30 % of the readers consists of
women What decision would the management take if the research team commits a
ni
type II error?
a. The management invests unnecessarily in advertisements
b. The management does not invest in advertisements
c. The management prepares a budget for promotional cost
U
d. The management hires more salespersons to carry out extensive sales across
the country
a. 1 - Descriptive
ity
2 - Exploratory
3 - Applied
4 - Historical
b. 1 - Exploratory
2 - Descriptive
m
3 - Applied
4 - Explanatory
)A
c. 1 - Action
2– Causal
3 - Applied
4 - Explanatory
d. None of these
(c
Answers
1- a
2- c
Notes
ity
3- b
4- c
5- d
6- d
7- d
rs
8- b
9 -c
10 -b
ve
11- b
12- c
13- c
14- d
15- b
ni
U
ity
m
)A
(c
ity
Key Learning Outcomes:
At the end of this module the participant will be able to:
rs
2. Analyse the concept of regression
3. Identify the importance of data analysis
Structure:
ve
Unit -4.1: Machine Learning
4.1.1 Challenges for Big Data Analytics
4.1.2 Introduction to Machine Learning
ni
4.1.3 Concepts of Machine Learning
4.1.4 Use cases of Machine Learning in Research
Unit-4.2: Regression
U
4.2.1 Introduction to Regression
4.2.2 Ordinary Least Squares
4.2.3 Ridge Regression
4.2.4 Polynomial Regression
ity
4.2.9 Clustering
ity
Unit Objectives:
At the end of this unit, participants will be able to learn:
rs
●● What is machine Learning
●● Several Use case of Machine Learning.
ve
In this digitalized world, we are producing a large number of knowledge (data) in
every minute. The number of knowledge (data) produced in every minute makes it
challenging to store, manage, utilize, and analyze it. Even large business enterprises
are struggling to seek out the ways to form this huge amount of useful data. Today,
the number of knowledge (data) produced by large business enterprises is growing,
ni
as mentioned before, at a rate of 40 to 60% per year. Simply storing this huge amount
of knowledge (data) is not going to be all that useful and this can be the reason why
organizations are looking at options like data lakes and big data analysis tools that can
help them in handling big data to a great extent. Now, let’s take a quick look at some
U
challenges faced in Big Data analysis:
sources. Data professionals may know what’s occurring, but others mightn’t have a
transparent picture.
For example, if employees do not understand the importance of data storage, they
may not keep the backup of sensitive data( knowledge). They may not use databases
properly for storage. As a result, when this important data is required, it cannot be
retrieved easily.
m
Solution
Big Data workshops and seminars must be held at companies for everybody. Basic
training programs must be arranged for all the workers who are handling data regularly
)A
and are a part of the Big Data projects. A basic understanding of data concepts must be
inculcated by all levels of the organization.
centers and databases of companies is increasing rapidly. As these data sets grow
exponentially with time, it gets extremely difficult to handle.
Most of the info is unstructured and comes from documents, videos, audios, text
files and other sources. This implies that we cannot find them in databases.
Notes
ity
Solution
In order to handle these large data sets, companies are choosing modern
techniques, like compression, tiring, and reduplication.
●● Compression is employed for reducing the amount of bits in the data that
means reducing its overall size.
rs
●● Reduplication is that the process of reducing duplicate and unwanted data
from a data set.
●● Data tiring want to store data in different storage tiers. It ensures that the
info is residing within the most appropriate storage space. Data tiers can be
ve
public cloud, private cloud, and flash storage, depending on the data size and
importance.
Companies are also opting for Big Data tools, such as Hadoop, NoSQL and other
technologies.
ni
Companies often get confused while choosing the simplest tool for Big Data
analysis and storage. Is HBase or Cassandra the best technology for data storage? Is
Hadoop Map Reduce good enough or will Spark be a better option for data analytics
and storage?
U
These questions bother companies and sometimes they’re unable to find
the answers. They end up making poor decisions and selecting an inappropriate
technology. As a result, money, time, efforts and work hours are wasted.
ity
Solution
The best procedure to go to seek professional help or hire experienced
professionals who have rather more knowledge about these tools. Otherwise go for Big
Data consulting. Here, consultants will give a recommendation of the best tools, based
on our company’s scenario. Based on their advice, we can work out a strategy and then
select the best tool for us.
m
much more experienced in working with the tools and making sense out of big data
sets. Data handling tools are changes rapidly, for that reason companies face a
problem of lack of Big Data professionals.
Solution
Companies are investing more cash within the recruitment of skilled professionals.
(c
They also can arranged training programs to the existing staff to get the most out of
them.
5. Securing data
Amity Directorate of Distance & Online Education
Business Research Methods 97
ity
analyzing their data sets that they push data security for later stages. But this is not
a smart move as unprotected data repositories can become breeding grounds for
malicious hackers.
Solution
Companies are recruiting more cyber security professionals to protect their data.
rs
Other steps taken for securing data like:
●● Data encryption
●● Data segregation
ve
●● Identity and access control
●● Implementation of endpoint security
●● Real-time security monitoring
●● Use Big Data security tools, such as IBM Guardian
ni
6. Integrating data from a variety of sources
Data in a company comes from a range of sources, such as social media pages,
ERP applications, customer logs, monetary reports, e-mails, presentations and reports
created by employees. Combining all this data to prepare reports is a challenging task.
U
Data integration is crucial for analysis, reporting and business intelligence, so it has
to be perfect.
Solution
Companies need to solve their information integration problems by purchasing the
ity
right tools. Some of the best data integration tools are mentioned below:
●● IBM InfoSphere
●● Xplenty
●● Informatica Power Center
●● CloverDX
)A
●● Microsoft SQL
●● QlikView
●● Oracle Data Service Integrator
In order to place massive Data to the best use, corporate need to begin doing
things differently. This means hiring better staff, changing the management, reviewing
(c
existing business policies and the technologies being used. To boost deciding they can
hire a Chief Data Officer – a step that is taken by many of the fortune 500 companies.
ity
As information set are getting larger and more diverse, there is a big challenge to
incorporate them into an analytical platform. If this is unmarked, it’ll make a gaps and
result to wrong messages and insights.
rs
Introduction to Machine Learning for Beginners
We have seen Machine Learning as a buzzword for the past few years, the rational
for this could be the high amount of information production by applications, the rise of
computation power within the past few years and therefore the development of higher
quality of algorithms. Machine Learning is used anywhere from automating mundane
ve
tasks to offering intelligent insights, industries in every sector try to benefit from it. We
may already be using a device that utilizes it. For example, a wearable fitness tracker
likeFitbit, or an intelligent home assistant like Google Home. But there are much more
examples of ML in use.
●● ni
Considering the loan example, to compute the probability of a fault, the system will
need to classify the available data in groups.
Image recognition — Machine learning can be used for face detection in an image
as well. There is a separate category for each person in a database of several
U
people.
●● Speech Recognition — It’s the translation of spoken words into the text. It’s
employed in voice searches and more. Voice user interfaces include voice dialing,
call routing, and appliance control. It can also be employed a simple data entry
ity
Writing software is that the bottleneck, we don’t have enough good developers. Let
the information do the work rather than people. Machine learning is that the way to
(c
But in Machine Learning: Data and output is run on the PC to make a program.
This program can be used in traditional programming. Notes
ity
rs
ve
Machine Learning used
●● Web search: ranking page by clicking for promote the site as a 1 site.
●● Computational biology: rational design drugs in the computer based on past
ni
experiments.
●● Finance: decide who to send what credit card offers to. Evaluation of risk on
credit offers. How to decide where to invest money.
●● E-commerce: Predicting client churn. Whether or not or not a group action
U
fallacious.
●● Space exploration: space probes and radio astronomy.
●● Robotics: how to handle uncertainty in new environments. Self-driving car.
●● Information extraction: Ask questions over databases across the web.
ity
There are tens of thousands of machine learning algorithms and hundreds of new
algorithms are developed every year. But each and every machine learning algorithm
has three basic components:
Examples likes decision trees, sets of rules, instances, graphical models, neural
networks, support vector machines, model ensembles and others.
●● Evaluation: the way to evaluate candidate programs (hypotheses).
Examples likes as accuracy, prediction and recall, squared error, likelihood, posterior
probability, cost, margin, entropy k-L divergence and others.
(c
ity
Types of Learning
There are four types of machine learning:
rs
●● Unsupervised learning: Training data does not include desired outputs. Example
is clustering. It’s difficult to explain what’s good learning and what’s not.
●● Semi-supervised learning: Training data includes a few desired outputs.
●● Reinforcement learning: Rewards from a sequence of actions. AI varieties like
ve
it, it’s the foremost ambitious type of learning. Supervised learning is that the
most mature, foremost studied and the type of learning utilized by most machine
learning algorithms. Learning with supervision is much easier than learning without
supervision. Inductive Learning is where we are given examples of a function
in the form of data (x) and the output of the function (f(x)). The goal of inductive
●●
●● ni
learning is to learn the function for new data (x).
Classification: when the function being learned is discrete.
Regression: when the function being learned is continuous.
U
●● Probability Estimation: when the output of the function is a probability.
●● Start Loop
●● Understand the domain, prior knowledge and goals. Talk to domain experts. Often
the goals are very unclear. We frequently have more things to try then we can
possibly implement.
●● Data integration, selection, cleaning and pre-processing. This is often the most
m
time-consuming part. It’s important to have high quality info. The more data we
have, the more it sucks because the data is dirty. Garbage in, garbage out.
●● Learning models. The fun part. This part is very mature. The tools are general.
●● Interpreting results. Sometimes it does not matter how the model works as long it
)A
result that we can use in practice. Also, the info can change, requiring a new loop.
ity
The different kind of use case about machine Learning are discuss bellow such as,
rs
the old solution of outsourcing issues to a call centre is solely unacceptable for many
of today’s customers. Advances in machine learning algorithms have made it possible
for chat bots and other automated systems to fill these needs with automating routine
and low priority tasks, companies can free up employees to handle more high-level
customer service. When implemented properly, machine learning in business can
ve
streamline issue resolution and make sure that customers can get the kind of helpful
assistance that turns them into loyal brand advocates.
2. Cyber Security
As networks become increasingly complex, cyber security experts have worked
ni
hard to retort to the ever-expanding scope of security threats. Science there introduced
rapid change in malware and hacking techniques which make harder to counter, but
the proliferation of Internet of Things (IoT) devices has fundamentally altered the cyber
security landscape. Attacks can come from anywhere, at any time, and in any form.
U
Fortunately, machine learning algorithms have allowed cyber security efforts to stay
walking with these rapid changes. Predictive analysis makes it possible to spot and
attenuate threats faster than ever, and machine learning can track user behaviour within
a network to identify irregularities and gaps in existing security measures.
3. Visual Perception
ity
By using machine learning applications, more and more devices now have feature
object visualising capabilities. An autonomous vehicle, for example knows another car
when it sees one, even if programmers didn’t provide it with an actual example of that
car to use as a reference. Retail stores are even using this technology to assist speed
up the checkout process. Cameras detect the things customers place in their cart and
m
may automatically charge their accounts at the time of leaving the shop .
4. Fraud Detection
Now the quantity of monitory transactions happening online has raised consumer
)A
awareness about various forms of fraud. While the purchaser enjoy the convenience
of having the ability to form purchases and payments online, they need to understand
that their financial data is being protected in the process. MasterCard companies and
banks have responded by turning to machine learning algorithms that may review
vast amounts of transactional data to spot suspicious activity. While these sorts of
checks are nothing new, machine learning in business has drastically expanded and
accelerated the scope of these reviews. Consistent with industry research, machine
(c
learning solutions can detect up to 95 percent of fraud and minimize investigation time
by 70 percent.
5. Communication
Notes
ity
Avoiding mistakes and misunderstandings is very important in any quite
communication, but especially so for today’s businesses. Whether it’s electronic mail
correspondence, customer reviews, video conferencing, or text-based documents
altogether their varied forms, simple grammatical errors, inappropriate tone, or
inaccurate translations can cause a spread of problems. Machine learning programs
have taken communication far beyond the heady days of Microsoft’s Clippy. Thanks to
rs
natural language processing, real-time language translation, and speech recognition,
these machine learning examples are able to help people communicate clearly and
accurately. While many of us wish to complain about autocorrect features, they also
appreciate being saved from embarrassing mistakes and inappropriate tone.
ve
6. Digital Marketing
Much of today’s marketing initiatives are dispensed online through a spread of
digital platforms and software applications. As companies gather data about customers
and their purchasing habits, marketing teams can use that information to form a posh
picture of their audience and identify which individuals are more likely to seek out their
ni
products and services. Machine learning algorithms help marketers to form sense of all
that data, identifying key trends and features that allow them to segment opportunities
more narrowly. The identical technology enables digital marketing automation on
an enormous scale. Ad platforms can be setup to dynamically identify new potential
customers and direct the acceptable marketing material to them in the right place at the
U
correct time.
deployed to boost efficiency, reduce costs, and deliver better user experiences.
7. Process Automation
Intelligent Process Automation (IPA) is the combination of artificial intelligence and
automation by the utilisation of machine learning. From automating manual data entry,
to more complex use cases like automating insurance risk assessments. The cognitive
m
technology like natural language processing, machine vision and deep learning,
machines can augment traditional rule-based automation and overtime learn to try and
do them better. Most IPA solutions already done by utilizing ML-powered capabilities
beyond simple rule-based automation. The business benefits are much more extensive
than cost saving and include better use of costly equipment or highly skilled employees,
)A
faster decisions and actions, service and merchandise innovations, and overall better
outcomes. By using ML in over rate, within the enterprise the human worker to focus
on product innovation and service improvement; allowing the corporate to transcend
conventional performance trade-offs and achieve unparalleled levels of quality and
efficiency.
(c
8. Sales Optimization
The enterprises are saving consumer data for years, because it’s also the place
with the foremost potential for immediate financial impact from implementing machine
learning. That’s why every enterprise needing to gain a competitive edge are applying
Amity Directorate of Distance & Online Education
Business Research Methods 103
ity
intelligent content and ad placement or predictive lead scoring. By adopting machine
learning within the enterprise, companies can rapidly evolve and personalize content
to meet the ever-changing needs of prospective customers. ML models are also being
used for customer sentiment analysis, sales forecasting analysis, and customer churn
predictions. With these solutions, sales managers are alerted before to specific deals or
customers that are risk.
rs
9. Collaboration
The key to getting the foremost out of machine learning in the enterprise lies
within the enterprise tapping into the capabilities of both machine learning and human
ve
intelligence. ML-enhanced collaboration tools have the potential to spice up efficiency,
quicken the innovation of latest ideas and lead to improved outcomes for teams that
collaborate from disparate locations. Nemertes’ 2018 UC and collaboration concluded
that about 41 percent of enterprises plan to use AI in their unified communications and
collaboration applications. Some uses cases in the collaboration space include:
ni
●● Video intelligence, audio intelligence and image intelligence can add context to
content being shared, making it simpler for customers to find the files they require.
Image intelligence coupled with object detection, text and handwriting recognition
helps improve meta data indexing for enhance search.
U
●● Real time language translation, facilitates communication and collaboration
between global workgroups in their native languages.
●● Integrating chatbots into team applications enables linguistic communication like
alerting team members or polling them for status updates.
That is just the tip of the iceberg, machine learning offers significant potential
ity
Summary
●● Discus about Big data challenges.
m
Questions
)A
1. Discus about the Different Challenges of big data and how to overcome from that.
2. Define several use care of machine learning.
3. Discus about the several use of Machine Learning.
(c
Unit-4.2: Regression
Notes
ity
Unit Objectives:
At the end of this unit, participants will be able to learn:
rs
●● Learn about Clustering.
ve
is logically high and a cause-and-effect type of relation is also believed to be existing
between them, the next logical step is to obtain a functional relation between these
variables. This functional relation is known as regression equation. The coefficient of
correlation is measure of the degree of linear association of the variables.
The regression equations are useful for predicting the value of dependent variable
ni
with respect to the given value of the independent variable. The characteristic of a
regression equation is different from the characteristic of a mathematical equation, e.g.,
if Y = 10 + 2X is a mathematical equation then it implies that Y is exactly equal to 20
when X = 5.
U
However, if Y = 10 + 2X is a regression equation, then Y = 20 is an average value
of Y when X = 5.
The term regression was first introduced by Sir Francis Galton in 1877.
In his study of the relationship between heights of fathers and sons, he found that
ity
tall fathers were likely to have tall sons and vice-versa. The average height of sons of
tall fathers was lower than the average height of their fathers and the average height of
sons of short fathers was higher than the average height of their fathers. In this way, a
tendency of the human race to regress or to return to a normal height was observed. Sir
Francis Galton referred this tendency of returning to the average height of all men as
regression in his research paper, “Regression towards mediocrity in hereditary stature”.
m
The term ‘Regression’, originated in this particular context, is now used in various fields
of study, even though there may be no existence of any regressive tendency.
Here a model with p explanatory variables, the OLS regression model writes:
Y = β0 + Σj=1..p βjXj + ε
β0, is that the intercept of the model, X j corresponds to the jth explanatory
variable of the model (j= 1 to p), and e is the random error with expectation 0 and
variance σ².
Amity Directorate of Distance & Online Education
Business Research Methods 105
ity
yi = β0 + Σj=1..p βjXij
rs
[β = (X’DX)-1 X’ Dy σ² = 1/(W –p*) Σi=1..n wi(yi - yi)] where
ve
y is that the vector of the n observed values of the dependent variable
ni
W is that the sum of the wi weights,
problems may arise if the matrix isn’t well behaved. If the matrix rank equals q where
q is strictly lower than p+1, few variables are far away from the model, either because
they’re constant or because they belong to a block of collinear variables.
The deleting of a number of the variables may however not be optimal: in some
cases we’d not add a variable to the model because it’s almost collinear to some other
)A
variables or to a block of variables, but it’d be that it would be more relevant to truncate
a variable that is already within the model and to the new variable.
For the above reason and also, in order to handle the cases where there a lot of
explanatory variables, other methods have been developed.
Prediction
(c
Linear regression is usually use to predict outputs’ values for new samples
ity
Ridge regression is a method of model tuning, used to analyse any data that
suffers from multi collinearity. When the issue of multi collinearity occurs, least-squares
are unbiased, and variances are large, thus the results in predicted values to be far
away from the actual values.
rs
Min(||Y – X(theta)||^2 + λ||theta||^2)
Lambda is that the penalty term. λ also denoted by an alpha parameter in the ridge
function. By changing the values of alpha, we’re controlling the penalty term. Higher the
values of alpha, larger is that the penalty and therefore the magnitude of coefficients is
ve
minimised.
Y = XB + e
ni
Where Y is that the dependent variable, X represents the independent variables,
B is that the regression coefficients to be calculable, and e represents the errors are
U
residuals. Once we tend the lambda function to this equation, the variance that’s not
evaluated by the overall model is considered. Once the data is ready and identified to
be part of L2 regularization, there are steps that one can undertake.
Standardization
ity
ridge regression models on an actual dataset. However, following the general trend
which one needs to remember is:
The assumptions of ridge regression are the similar as that of linear regression:
linearity, constant variance, and independence. However, as ridge regression doesn’t
provide confidence limits, the distribution of errors to be traditional needn’t to be
assumed.
Amity Directorate of Distance & Online Education
Business Research Methods 107
ity
Here we can do a polynomial regression on the data to fit a polynomial equation
to it. It is very difficult to match a linear regression line low value of error. Hence, we
are only able to use the polynomial regression to match a polynomial line so that we
are able to achieve a minimum error or minimum cost function. The equation of the
polynomial regression be:
y = J0 + J1x1 + J1 x1²
rs
Now we can say a general equation of a polynomial regression is:
ve
●● Polynomial provides the best approximate relationship between the dependent
and independent variable.
●● A Broad range of function can be fit under it.
●● Polynomial basically fits a wide range of curvature.
The linear regression towards the mean = the transpose of the weight matrix X
)A
Variance is that the square of the standard deviation σ (multiplied by the scalar
matrix because this is often a multi-dimensional formulation of the model).Main
focus of Bayesian Regression isn’t to search out the single “best” value of the model
parameters, but also to work out the posterior distribution for the model parameters.
This is because the model parameters are assumed to come back from a distribution .
(c
The posterior probability of the model parameters is conditional upon the training inputs
and outputs:
Notes
ity
Here, P(β|y, X) is the posterior probability distribution of the model parameters
given the inputs and outputs. Which is adequate to the likelihood P(y|β, X) of the info,
multiplied by the prior probability of the parameters and divided by a normalization
constant. It is a simple expression of Bayes Theorem, the basic underpinning of
Bayesian Inference:
rs
In contrast to Ordinary least square method, here have a posterior distribution
for the model parameters that’s proportional to the likelihood of the info multiplied by
ve
the prior probability of the parameters. In the following main two primary benefits of
Bayesian Regression are
●● Priors: If we’ve got domain knowledge about the model parameters then we
are able to include them in our model, unlike within the frequentist(is a sort of
statistical inference that pulls conclusions from sample data by emphasizing the
ni
frequency or proportion of the information) approach which assumes everything
there’s to understand about the parameters comes from the data. If we’ve
got no estimates ahead of time, then we can use non-informative priors for the
parameters such as a traditional distribution.
U
●● Posterior: The results of performing Bayesian Regression may be a distribution
of possible model parameters based on the information and also the before
quantify our uncertainty about the model: if we have fewer datum, the posterior
distribution are more displayed.
As the amount of datum increases, the likelihood washes out the prior, and within
ity
the case of infinite data, the outputs for the parameters converge to the values obtained
from OLS.
hypothesis, and as we collect data that either supports or disproves our ideas, we alter
our model of the world (ideally this is how we would reason)!
For practice, evaluating the posterior distribution for the model parameters is
intractable for continuous variables, so we use sampling methods to draw samples from
the posterior in order to approximate the posterior. The technique of drawing random
samples from a distribution to approximate the distribution is one application of Monte
Carlo methods.
Here we’ll not discus about the code but the fundamental procedure for
implementing Bayesian Regression, i.e.: specify priors for the model parameters (may
be normal distributions), creating a model mapping the training inputs to the training
outputs, and then have a Markov Chain Monte Carlo (MCMC) algorithm draw samples
from the posterior distribution for the model parameters. The end result will be posterior Notes
ity
distributions for the parameters. We can inspect these distributions to introduce a way
of what is occurring.
The first pic show the approximations of the posterior distributions of model
parameters. These are the result of 1000 steps of MCMC, meaning the algorithm drew
1000 steps from the posterior distribution.
rs
ve
ni
U
ity
m
If we tend to compare the mean values for the slope and intercept to those
obtained from OLS (the intercept from OLS was -21.83 and the slope was 7.17),
we tend to see that they are very similar. However, whereas we are able to use the
)A
mean as one purpose estimate, we also have a range of possible values for the model
parameters. As the number of data points increases, this range will shrink and converge
one a single value representing greater confidence in the model parameters.
When we wish to show the linear match from a Bayesian model, rather than
showing of solely estimate, we are able to draw a variety of lines, with each one
representing a different estimate of the model parameters. As the number of data points
(c
increases, the lines begin to overlap because there is less uncertainty in the model
parameters. In order to demonstrate the effect of the number of data points in the
model, I used two models, the first, with the resulting fits shown on the left, used 500
data points and the one on the right used 15000 data points. Each graph shows 100
Notes possible models drawn from the model parameter posteriors.
ity
rs
ve
ni
U
ity
m
)A
There is much more variation in the fits when using fewer data points, which
(c
represents a greater uncertainty in the model. With all of the data points, the OLS and
Bayesian Fits are nearly identical because the priors are washed out by the likelihoods
from the data.
When predicting the output for a single data point using our Bayesian Linear
Model, we also do not get a single value but a distribution. Following is the probability Notes
ity
density plot for the number of calories burned exercising for 15.5 minutes. The red
vertical line indicates the point estimate from OLS.
rs
ve
ni
U
We see that the probability of the number of calories burned peaks around 89.3,
but the full estimate is a range of possible values.
ity
Lasso regression is one of the regression models that are available to analyze
the data. LASSO stands for Least Absolute Shrinkage and Selection Operator. It was
developed in 1989. It’s basically an alternate to the classic method of least squares
m
estimate to avoid many of the problems with over fitting once we have an outsized
number of independent variables
Lasso regression is one of the regularization methods that creates frugal models
with large number of features, where large means either of the below two things:
)A
objective is as followed.
Where LS Obj stands for Least Squares Objective ( linear regression objective)
Notes without regularization and λ is the turning factor to control the amount of regularization
ity
and the bias will increase with the increasing value of λ and the variance will decrease
as the amount of shrinkage (λ) increases.
rs
the algorithm starts modelling complex relations to calculate the output & ends the
over fitting for the particular data. Lasso regression merges a factor with the sum of the
absolute value of the coefficients.
ve
The lasso regression estimate is defined as
ni
●● When λ = ∞: All coefficients are zero
●● When 0 < λ <∞: We get coefficients between 0 and that of simple linear
regression
So, when λ is in between the two extremes, we are balancing the below two ideas.
U
●● Fitting a linear model of y on X
●● Shrinking the coefficients
But the character of L1 regularization penalty causes some coefficients to be
shrunken to zero. Hence, not like ridge regression, lasso regression is in a position to
ity
perform variable choice within the liner model. So as the value of λ increases, more
coefficients will be set to value zero (provided fewer variables are selected) and so
among the nonzero coefficients, more shrinkage will be employed. The below working
example will explain it well.
Working example:
m
For analyzing the prostate-specific substance and therefore the clinical measures
among the patients united nation agency where close to have their prostates removed,
ridge regression will provide smart results provided there are a good number of true
coefficients. But if there are solely a couple of coefficients to predict the results lasso
)A
regression is the higher option to have accurate results since lasso can perform better
than ridge when the coefficients are few.
Recall that mean squared error (MSE) is a metric we can use to measure the
accuracy of a given model and it is calculated as:
ity
The basic idea of lasso regression is to introduce a little bias so that the variance
can be substantially reduced, which leads to a lower overall MSE.
rs
ve
ni
Notice that as λ increases, variance drops substantially with very little increase in
bias. Beyond a certain point, though, variance decreases less rapidly and the shrinkage
in the coefficients causes them to be significantly underestimated which results in a
U
large increase in bias.
We can see from the chart that the test MSE is lowest when we choose a value for
λ that produces an optimal trade-off between bias and variance.
When λ = 0, the penalty term in lasso regression has no effect and thus it produces
ity
This means the model fit by lasso regression will produce smaller test errors than
the model fit by least squares regression.
(c
Step 1: Calculate the correlation matrix and VIF values for the predictor
Notes variables.
ity
First, we must always produce a correlation matrix and calculate the variance
inflation factor values for every variable quantity.
If we detect high correlation between predictor variables and high VIF values
(some texts define a “high” VIF value as 5 while others use 10) then lasso regression is
likely appropriate to use.
rs
However, if there’s no multi collinearity present within the data then there could
also be no need to perform lasso regression within the first place. Instead, we are able
to perform ordinary least squares regression.
ve
Step 2: Fit the lasso regression model and choose a worth for λ.
Once we determine that lasso regression is acceptable to use, we can fit the model
(using popular programming languages like R or Python) using the optimal value for λ.
To determine the optimal value for λ, we are able to fit several models using
different values for λ and choose λ to be the value that produces very cheap test MSE.
ni
Step 3: Compare lasso regression to ridge regression and ordinary method
of least squares regression.
Lastly, we are able to compare our lasso regression model to a ridge regression
U
model to work out least squares regression model to determine which model produces
all time low test MSE by using k-fold cross-validation.
Depending on the link between the predictor variables and therefore the response
variable, it’s entirely possible for one of these three models to outperform the others in
ity
different scenarios.
the input consists of the k closest training examples in data set. Whereas the output
depends on whether k-NN is employed for classification or regression
For k-NN regression the neighbours are taken from a group of objects the object
property value is known. A peculiarity of the k-NN algorithm is that it is sensitive to the Notes
ity
local structure of the info.
KNN Algorithm
●● Load the info
●● Initialize K to our chosen number of neighbours
rs
●● For each example within the data
●● Calculate the distance between the query example and also the current example
from the info.
●● Add the space and therefore the index of the instance to an ordered collection
ve
●● Sort the ordered collection of distances and indices from smallest to largest (in
ascending order) by the distances
●● Pick the primary K entries from the sorted collection
●● Get the labels of the chosen K entries
●●
●●
If regression, return the mean of the K labels
If classification, return the mode of the K labels
supposed to find a model for predicting the value so fY from new X values. In theory,
the solution is simply apartition of the X space in to kd is joint sets, A1, A2,..., Ak, such
that the predicted value of Y is j if X belongs to Aj,forj 1, 2,...,k.If the X variables take
ordered values, two classical solutions are linear discriminate analysis and another one
Notes is nearest neighbour classification. These methods output sets Ajwith pie Classification
ity
tree methods output rectangular sets Ajby recursively partitioning the data set one X
variable at a time for easy interpret. For example, Figure 1 gives an example wherein
there are three classes and two X variables to show the decision tree structure. A key
advantage of the tree structure is its applicability to any number of variables, whereas
the plot on its left is limited to at most two.
rs
8
ve
2
X2
-2
-4
-6 -4 -2
ni
0
X1
2 4
2 3 2 1
U
The first published classification tree algorithm is THAID. Employing a measure
of node impurity based on the distribution of the observed Y values in the node,
THAID splits a node by exhaustively searching over all X and S for the split X S that
minimizes the total impurity of its two child nodes. If X takes ordered values, the set
S is an interval of the form ( ∞, c]. Otherwise, S is a subset of the values taken by X.
ity
The process is applied recursively on the data in each child node. Splitting stops if the
relative decrease in impurity is below a pre specified threshold. Algorithm1 gives the
pseudo code for the basic steps.
●● For each X, find the set S that minimizes the sum of the node impurities in the
two child nodes and choose the split X∗S∗ that gives the minimum overall X
and S.
●● If a stopping criterion is reached, exit. Otherwise, apply step 2 to each child
)A
node in turn.
●● Choose the variable X* associated with the Xr that has the smallest
significance probability.
●● Find the split set X* belongs to S* that minimizes the sum of Gini indexes and
use it to split the node into two child nodes.
●● If a stopping criterion is reached, exit. Otherwise, apply steps 2–5 to each
(c
child node.
●● Prune the tree with the CART method
A regression tree is similar to a classification tree, except that the Y variable takes
ordered values and a regression model is fitted to each node to give the predicted Notes
ity
values of Y. Historically, the first regression tree algorithm is AID [36], which appeared
several years before THAID. The AID and CART regression tree methods follow
Algorithm 1, with the node impurity being the sum of squared deviations about the
mean and the node prediction the sample mean of Y. This yields piece-wise constant
models. Although they are simple to interpret, the prediction accuracy of these models
often lags behind that of models with more smoothness. It can be computationally
rs
impracticable, however, to extend this approach to piece-wise linear models, because
two linear models (one for each child node) must be fitted for every candidate split a
regression tree algorithm by Quinlan,22 uses a more computationally efficient strategy
to construct piecewise linear models. It first constructs a piece-wise constant tree and
then fits a line a regression model to the data in each leaf node. Because the tree
ve
structure is the same as that of a piece-wise constant model, the resulting trees tend to
be larger than those from other piece-wise linear tree.
Because the total model complexity is shared between the tree structure and
the set of node models, the complexity of a tree structure often decreases as the
complexity of the node models increases. Therefore, the user can choose a model
ni
by trading off tree structure complexity against node model complexity. Piece-wise
constant models are mainly used for the insights their tree structures provide. But
they tend to have low prediction accuracy, unless the data are sufficiently informative
and plentiful to yield a tree with many nodes. The trouble is that the larger the tree,
U
the harder it is to derive insight from it. Trees (a) and (e) are quite large, but because
they split almost exclusively on ht, we can infer from the predicted values in the leaf
nodes that FEV increases monotonically with ht. The piece-wise simple linear (b) and
quadratic (c) models reduce tree complexity without much loss (if any) of interpretability.
Instead of splitting the nodes, ht now serves exclusively as the predictor variable in
ity
each node. This suggests that ht has strong linear and possibly quadratic effects. On
the other hand, the splits on age and sex point to interactions between them and ht.
These interactions can be interpreted with the help of Figures 6 and 7, which plot the
data values of FEV and ht and the fitted regression functions, with a different symbol
and colour for each node.
m
)A
(c
●● Clustering:
Notes
ity
Clustering is that the task of dividing the population or data points into a variety
of groups such data points within the same groups are more likely to other data points
within the same group than those in other groups. In simple words, the aim is to
segregate groups with similar traits and assign them into clusters.
Let’s understand this with an example. Suppose, you’re the top most person of
a rental store and wish to understand preferences of your costumers to rescale your
rs
business. Is it possible for you to look at details of every costumer and devise a unique
business strategy for each and every one of them? Definitely not but, what you’ll do is
to cluster all of your costumers into ten groups based on their purchasing habits and
use a separate strategy for costumers in each of these ten groups. And this can be
what we call clustering.
ve
Now, that we understand what is clustering. Let’s take a look at a glance of the
categories of clustering
Types of Clustering
ni
Broadly speaking, clustering will be divided into two subgroups:
Since the task of clustering is subjective, the means that is used for achieving
this goal are plenty. Every methodology follows a special set of rules for outlining the
‘similarity’ among datum. In fact, there are more than hundreds of clustering algorithms.
But few of the algorithms are used popularly, let’s take a look at them in detail:
m
Connectivity models: As the name suggests, these models are supported the
notion that the datum closer in data space exhibit more similarity to each other than
the datum lying farther away. These models can follow two approaches within the
first approach, they begin with classifying all data points into separate clusters & then
)A
In the second approach, all data points are classified as one cluster and then
partitioned as per the distance increases. Also, the selection of distance function is
subjective. These models are very easy to interpret but lacks scalability for handling
big data sets. Examples of these models are hierarchical clustering algorithm and its
variants.
(c
●● Centroid models: These are iterative clustering algorithms during which the
notion of similarity is springs by the closeness of a datum to the centroid of the
clusters. K-Means clustering algorithm could be a popular algorithm that falls into
this category. In these models, the no. of clusters required at the end must have to
be mentioned beforehand, which makes it important to process prior knowledge of Notes
ity
the data set. These models run it eratively to seek out the local optima.
●● Distribution models: These clustering models are supported the notion of how
probable is it that all datum within the cluster belong to the same distribution
(For example: Normal, Gaussian). These models often suffer from over fitting. A
preferred example of these models is Expectation-maximization algorithm which
rs
uses multivariate normal distributions.
●● Density models: These models supported the data space for areas of varied
density of datum in the data space. It isolates various different density regions and
assign the datum withinthese regions within the same cluster. Popular examples
of density models are DBSCAN and OPTICS.
ve
Summary
●● What is regression
●● Discus about different kind of regression.
ni
●● Concept of Bayesian analysis
●● Concept of clustering
Questions
U
1. Discus about different kind of regression
2. Define the Bayesian analysis
3. What is Lasso regression
4. Difference between Logistic Regression & Classification tree
ity
m
)A
(c
ity
Unit Objectives:
At the end of this unit, participants will be able to learn:
rs
●● Active Learning.
ve
data sets consisting of input data without labelled responses. In different pattern
recognition problems, the training data consists of a set of input vectors x without any
corresponding target values. The goal of unsupervised learning problems to discover
groups of similar examples within the data(clustering), or to determine how the data is
distributed in the space (density estimation).
●●
ni
There are several issues such as:
●● We may want to use clustering to gain some insight into the structure of the data
before designing a classifier.
learning because there are no answer labels available and hence there is no correct
measure of accuracy available to check the result. Notes
ity
●● Non-parametric Unsupervised Learning
In non-parameterized unsupervised learning, the data is grouped into clusters,
where each cluster indicate something about categories and classes present in the
data. This method is basically used to model and analyze data for small sample sizes.
In nonparametric models do not require the modeler to make any assumptions about
rs
the distribution of the population, and that’s why sometimes it’s called distribution-free
method.
ve
Planning, Design, & Analysis are the main three components to creating data
through designed experiments.
Some of the main components we wish to form in this process are as follows:
●●
●●
●●
What is the question you want answered?
What is the population in question?
What are dependent and independent variables? ni
U
Analysis
When conducting an experiment, there are 3 main characteristic to consider. These
three aspects of an experiment allow us to assess our population’s variability.
●● Randomization
ity
●● Replication
●● Blocking
Randomization
The purpose of randomization is to form positive that if there’s variation in
outcomes that’s associated with outside factors, then it is distributed across treatment
m
teams.
Replication
When conducting an experiment, we have a tendency to ask the variability of
)A
outcomes. For instance; if I were to run a given experiment however once and that
I was looking on an outcome that may have occurred due to random likelihood. The
purpose here is to grasp the broad spectrum of prospects or outcomes, it’s vital that we
have a tendency to replicate the experiment consequently.
Statistical Power
(c
it’s the likelihood that it would not be due to random chance. Best practice is 80%
Notes applied mathematical power.
ity
So, to change this even further; If our hypothesis seems to be correct, what’s the
likelihood that we didn’t get that outcome just due to random likelihood.
Blocking
Blocking is employed to assist management variability by creating treatment teams
rs
additional alike. Inside of a given cluster, we would possibly see that differences are
minimal, however across alternative teams that would be much larger. One example of
this might be blocking an experiment by gender.
ve
●● Randomized Complete Block style (RCBD) experiment
T-test
After accumulating information from our experiment; one fast and east take a look
at statistical significance we would possibly run is termed as a t-test.
●●
●●
ni
Consider your hypothesis or central research question:
NULL hypothesis — let’s keep this simple, the null hypothesis is pretty much
when you’re wrong. For the mtcars dataset, the null hypothesis might be
something like a vehicle horse power has no effect on miles per gallon.
U
●● Alternative hypothesis — conversely the alternative hypothesis means that
there was a difference. If we are able to confirm with statistical significance the
impact of the independent variable on the dependent variable, we’d say that
we reject the null hypothesis and choose the alternative hypothesis.
●● Is this a one- or two-sided test?
ity
●● One sided test — when you are testing whether a given variable is greater
than another then it’s a one-sided test; if you’re testing whether it’s less than
another… still one-sided.
●● Two-sided test — when you are testing that a given variable is not equal to
another, then that is two sided ‘> or <’ than in a single test.
m
correct, statistical significance is effectively knowing that it’s not likely due to
random chance.
●● The standard here is 95% confidence, or a less than or equal to likelihood of
5%.
●● What is statistical power?
●● Similar to statistical significance; given that the alternative hypothesis is true,
(c
power represents the likelihood that the null hypothesis will be rejected.
●● The standard for power is 80%.
Sample Size
Notes
ity
For a given experiment one issue contemplate is that the sample size. So as to
hit at a needed range for this is requires a handful of alternative variables including
targeted statistical power & significance.
Another measure is that of impact size. Impact size represents the distinction
between the average of 2 groups divided by the standard deviation of both groups
combined.
rs
The > the distance between groups < of a sample to validate it. The smaller the
difference the greater the likelihood that the observed distance is only due to chance.
In order to calculate any of these values including effect size, statistical power,
p-value, etc. Load up the package power and use the pwr.anova.test to identify the odd
ve
variable out here.
k — number of groups
f = effect size
“Play the blue blood song 1999” or “Play River by Joni Mitchell”. In every text, labels
are connected to particular words — Song Name for “1999” and “River”, as an example,
and Artist Name for Prince and Joni Mitchell. By analyzing annotated information, the
system learns to classify unannotated information on its own.
At the annual meeting of the North American Chapter of the Association for
)A
annotated data (in which the labels were suppressed to simulate unannotated data), we
conducted a smaller trial with unlabeled data and human annotators and found that our
results held, with improvements of 4% to 9% relative to the baseline machine learning
models.
ity
must be efficient. The classical way to select examples is to use a simple linear
classifier, which assigns every word in a sentence a weight. The sum of the weights
yields a score, and a score greater than zero indicates that the sentence belongs to a
particular category.
rs
the category music, it would probably assign the word “play” a positive weight, because
music requests frequently begin with the word “play”. But it might assign the word
“video” a negative weight, because that’s a word that frequently denotes the customer’s
desire to play a video, and the video category is distinct from the music category.
ve
Such weights are learned from training examples. During training, the linear
classifier is optimized using a loss function, which measures the distance between its
performance and perfect classification of the training data.
Typically, in active learning, examples are selected for annotation if they receive
scores close to zero — whether positive or negative — which means that they are near
the decision boundary of the linear classifier. The hypothesis is that hard-to-classify
ni
examples are the ones that a model will profit from most.
A graph showing how different loss functions (black lines) divide training data in
ity
different ways. Easily classified examples (red and green X’s) are less informative than
examples that fall closer to classification boundaries (grey X’s).
dissent. To select the most informative examples from that pool, we experimented with
several different re-ranking strategies.
If the CRF easily classifies the words of a request, the score increases; if the CRF
struggles, the score decreases. (Again, low-scoring requests are preferentially selected
for annotation.) Adding the CRF classifier does not significantly reduce the efficiency of
the algorithm because we execute the re-ranking only on examples where the majority Notes
ity
of models agreed.
For re-ranking, we add the committee scores and then take the absolute value of
the sum. This permits individual models on the committee to provide high-confidence
classifications, so long as strong positive scores are offset by strong negative scores.
The committee approaches reported in the literature enforced dissent among the
rs
models; interestingly, using the criterion of majority scores greater than zero yielded
better results, even without the CRF. With the CRF, however, the error rate shrank by
an additional 1% to 2%.
Ref:https://www.amazon.science/blog/active-learning-algorithmically-selecting-
ve
training-data-to-improve-alexas-natural-language-understanding.
Typical RL Scenario
ni
U
Here are some important terms used in Reinforcement AI:
ity
●● State (s): State refers to the current situation returned by the environment.
●● Policy (π): It is a strategy which applies by the agent to decide the next action
based on the current state.
)A
●● Q value or action value (Q): Q value is quite similar to value. The only
Notes difference between the two is that it takes an additional parameter as a current
ity
action.
rs
Consider the scenario of teaching new tricks to a cat
●● As cat doesn’t understand English or any other human language, we can’t tell
her directly what to do. Instead, we follow a different strategy.
●● We emulate a situation, and the cat tries to respond in many different ways. If
ve
the cat’s response is the desired way, we will give her fish.
●● Now whenever the cat is exposed to the same situation, the cat executes a
similar action with even more enthusiastically in expectation of getting more
reward(food).
●● That’s like learning that cat gets from “what to do” from positive experiences.
●●
ni
At the same time, the cat also learns what not do when faced with negative
experiences.
In this case,
●● Our cat is an agent that’s exposed to the environment. During this case,
it is our house. An example of a state might be our cat sitting, and that we
)A
Value-Based
Notes
ity
In a value-based Reinforcement Learning method, we ought to aim to maximize a
worth function V(s). During this method, the agent is expecting a long-term return of the
current states under policy π.
Policy-based
In a policy-based RL method, we try and come up with such a policy that the action
rs
performed in every state helps us to achieve maximum gift within the future.
●● Deterministic: For any state, the identical action is produced by the policy π.
ve
●● Stochastic: Every action contains a certain probability, which is decided by
the subsequent equation. Stochastic Policy :
n{a\s) = P\A, = a\S, =S]
Model-Based
In this Reinforcement Learning method, we wish to create a virtual model for each
Positive:
It is defined as an occurrence that happens thanks to specific behaviour. It
increases the strength and also the frequency of the behaviour and impacts positively
)A
Negative:
(c
ity
There have two important learning models in reinforcement learning:
rs
●● Set of actions- A
●● Set of states -S
●● Reward- R
ve
●● Policy- n
●● Value- V
The mathematical approach for mapping a solution in reinforcement Learning is
recon as a Markov Decision Process or (MDP).
ni
U
Q-Learning
ity
●● There are five rooms inside a building which are connected by doors.
●● Each room is numbered 0 to 4
m
●● The outside of the building are often be one big outside area (5)
●● Doors no 1 and 4 lead into the building from room 5
)A
(c
ity
●● Doors which lead on to the goal have a gift of 100
●● Particular doors which is not directly connected to the target room gives zero
reward
●● As doors are two-way, and two arrows are assigned for every room
●● Every arrow within the above image contains an intermediate gift value
rs
Explanation
In this image, we’ll view that room represents a state
ve
In the following image, a state is described as a node, while the arrows show the
action.
ni
U
For example, an agent traverse from room number 2 to 5
Summary
m
●● Define Learning
●● Discuss about unsupervised Learning.
●● Creating Data through several experiments.
)A
●● Reinforcement Learning
Exercises:
1. Regression coefficient is independent of change of .......................
a) . origin
(c
b) Subject
c) Data
d) None of the above
ity
a) simple linear
b) complex linear
c) non linear
d) none of the above
rs
3. ……………..analysis is based on the statistical principle of multivariate statistics,
which involves observation and analysis of more than one statistical variable at a
time
a) Multivariate
ve
b) Simple
c) Both a and b
d) None of the above
4. In discriminant analysis, .......................groups are compared.
a)
b)
c)
Three
two or more
one
ni
U
d) none of the above
5. If the discriminant analysis involves two groups, there are .......................centroids
a) Four
b) Three
ity
c) Two
d) one
6. .......................analysis is concerned with the measurement of the joint effect of two
or more attributes.
m
a) Simple
b) Conjoint
c) Complex
)A
c) Item
d) None of the above
8. The .......................is a part-worth or utility for each level of each attribute
a) Output
Notes
ity
b) Input
c) Both a and b
d) None of the above
9. When the objective is to summarise information from a large set of variables into
fewer factors, ....................... analysis is used.
rs
a) principle component factor
b) sub factors
c) data
ve
d) none of the above
10. Correspondence analysis is a .......................technique.
a) Descriptive/Exploratory
b) Precise/gist
ni
c) both a or b
d) None of the above
11. In a typical correspondence analysis, a cross-tabulation table of frequencies is first
.......................
U
a) standardized
b) rationalized
c) subdued
ity
a) marketing
b) selling
c) buying
d) none of the above
14. An advantage of the non-metric models is that they permit the researcher to
(c
b) identify, scrutinize
Notes
ity
c) collect, analyze
d) none of the above
15. The spatial display of data provided by MDS is also sometimes referred to as
………………..
a) perceptual mapping
rs
b) conceptual mapping
c) geographical mapping
d) none of the above
ve
Answers:
1. origin
2. simple linear
3. Multivariate
4.
5.
6.
two or more
two
Conjoint
ni
U
7. attributes
8. output
9. principle component factor
10. descriptive/exploratory
ity
11. standardized
12. Cluster
13. marketing
14. categorize, examine
m
ity
Key Learning outcomes:
rs
2. Analyse literature review writing
3. Analyse report writing
Structure
ve
Unit 5.1 : Prewriting Considerations
5.1.1 Topic
5.1.2 Audience
5.1.3 Purpose
5.3.7 Footnotes
5.3.8 Key Considerations/factors
(c
ity
Unit Outcomes
rs
●● How to get attention of audience.
●● Why Choose the topics.
5.1.1 : Topic
ve
Prewriting is the first stage of the writing process, and it requires the writer to think
about three main factors: topic, audience, and purpose.
A student may be faced with one of two types of topics: assigned topics or topics
chosen by the student. If the topic has been assigned, the assignment instructions
will limit and determine the approach to take. Instructions must be carefully read and
ni
instructions must be followed to the letter. If the student is given free reign over the
topic, it is critical that they consider the value and significance of the final product.
A writer should choose something he is passionate about and knows a lot about,
but he should also think about the effect he wants to achieve and the reaction he
U
wants from the reader. Any topic can spark a lively debate if the following options are
considered: choosing an unusual topic or taking a fresh and unique approach to an old
one.
5.1.2 Audience
ity
The question to consider is: What will the reader gain by reading this essay? The
m
goal will be to educate, entertain, or persuade. These goals are frequently combined in
a paper, with each goal serving as a function of the others.
The main goal of prewriting activities is to determine the paper’s focus. The point
of focus is where all of your energy is focused. The paper will be vague, superficial, and
)A
5.1.3 Purpose
Consider the audience to see if the topic is narrow enough. If your audience lacks
specific knowledge of the subject, you may want to take a more general approach. Our
own knowledge of the subject also limits you. You can’t be precise about something you
(c
don’t understand. Of course, research will provide you with the necessary information
on a subject.
After you’ve decided on a strategy, you can start gathering ideas. Remember that
you can always change your paper’s focus if you give yourself enough time to make Notes
ity
the necessary changes. If you’re having trouble narrowing down your topic, a prewriting
activity might help.
Summary:
●● What is Prewriting
rs
●● Choosing a topic
●● The audience of the topics.
Questions:
ve
●● Why we choosing a topics for prewriting is a vital issue?
●● Who are the main audience of different kind of topics?
●● What is the purpose?
ni
U
ity
m
)A
(c
ity
Unit Out Comes
rs
●● What is the purpose
●● Structure and Writing style
●● Types of Literature Review
ve
5.2.1 What is Literature Review?
Do you know that we, human beings, are the most intelligent living beings on
earth? Thanks to our stellar intelligence, we can utilise the knowledge that has been
preserved or accumulated over eons. Human knowledge comprises three equally
crucial phases - namely preservation, transmission, and advancement. Research helps
ni
in advancement of knowledge so that an updated knowledge reservoir is created and
transmitted for the benefit of mankind.
Human beings build upon the recorded and accumulated knowledge of the past
and this constant endeavour of adding to the vast reservoir of knowledge in every
U
possible field makes advancement of human race possible. You, as a researcher, need
to ensure that considerable work has already been done on topics related to your field
of investigation. You are required to be familiar with all previous projects, research, and
theory related to the research problem you are dealing with. You need to conduct a
thorough review of research and theoretical literature to ensure such familiarity.
ity
In this unit, you are going to study the meaning and importance of reviewing
literature, and identify the sources and steps of writing review of literature.
The term “review” means “to organise the knowledge of the specific research
area to create a knowledge pool so that your study adds on to and enriches the field
of research.” The term “literature” stands for “the knowledge of a specific area of
m
●● It provides the opportunity to show what research has already been done on
any given subject.
●● Review of Literature provides researchers with theories, ideas, explanations
ity
evidence that solves the problem sufficiently this initiative avoids the
replication of research
●● Review of Literature serves as prominent sources for hypothesis - researchers
can formulate research hypotheses based on available studies
●● Review of Literature suggests data sources, methodology, and statistical
rs
techniques apt for the solution of the research problem
●● Review of Literature helps researchers locate comparative data and findings
useful in the correct interpretation of results
ve
Narrative literature review
Critiques and summarises the body of a work of literature. A narrative review can
also be used to draw conclusions about a topic and identify gaps or inconsistencies in a
body of knowledge. To conduct a narrative literature review, you must have a sufficiently
focused research question.
When you do a meta-analysis, you combine the results of multiple studies on the
ity
same topic and analyse them using standardised statistical procedures. Patterns and
relationships are identified, and conclusions are drawn, in meta-analysis. Meta-analysis
is linked to a deductive research strategy.
the subject. Integrative literature review will be your only option if your research does
not involve primary data collection and analysis.
ity
Theoretical literature review is concerned with a body of knowledge that has
accumulated in relation to a topic, concept, theory, or phenomenon. Theoretical
literature reviews are useful for determining what theories already exist, their
relationships, and the extent to which existing theories have been investigated, as well
as for developing new hypotheses to test.
rs
5.2.4 Structure & Writing Style
Writing up the review
ve
●● the methods we have used to find the papers
●● the papers we have read
●● the criteria used to analyze the papers
●● what we hoped to find in our review of the literature
●● what we found from reading the literature
●●
●●
●● ni
what if any gaps in the literature we found
where our research question fits in
Style of writing
U
The style of writing we use is important and also needs to express:
●● Themes are arising from papers read rather being a summary of each paper
●● Examples of where authors agree or disagree on particular points, ideas or
conclusions
ity
●● Key theories being examined and how different authors are using or applying
the theories
●● Thoughts on the usefulness of the literature in response to your research
question
Literature review template
m
Here we discus about a simple pattern for writing up a systematic literature review.
This is a very simple outline, that’s why be sure discuss with supervisor to ensure that
their requirements are met and that specific elements of literature review/research are
covered.
)A
Notes
ity
rs
ve
ni
U
ity
m
Jackson
RES 5000/6000
Consider the specific area of study. Think about the field of interests.
Talk to professor, brainstorm, and read lecture notes and recent issues of
periodicals in the field.
ity
2. Search for literature
Define the source selection criteria (I.e., articles published between a specific date
range, focusing on a specific geographic region, or using a specific methodology).
Reference lists of recent articles and reviews can lead to other useful papers.
rs
Include any studies contrary to your point of view.
ve
Note the following:
Note: If each paragraph begins with a researcher’s name, it might indicate that,
instead of evaluating and comparing the research literature from an analytical point of
view, we have simply described what research has been done.
●● For example, look at the following two passages and note that Student A
merely describes the literature, whereas Student B takes a more analytical
and evaluative approach by comparing and contrasting. You can also see
that this evaluative approach is well signalled by linguistic markers indicating
ity
Student B’s ability to synthesize knowledge.
Student A: San (2000) concludes that personal privacy in their living quarters is
the most important factor in nursing home residents’ perception of their autonomy. He
suggests that the physical environment in the more public spaces of the building did
not have much impact on their perceptions. Neither the layout of the building nor the
activities available seem to make much difference. Ram and Ramen make the claim
rs
that the need to control one’s environment is a fundamental need of life (2001), and
suggest that the approach of most institutions, which is to provide total care, may be
as bad as no care at all. If people have no choices or think that they have none, they
become depressed.
ve
Student B: After studying residents and staff from two intermediate care facilities
in Calgary, Alberta, San (2000) came to the conclusion that except for the amount of
personal privacy available to residents, the physical environment of these institutions
had minimal if any effect on their perceptions of control (autonomy). However, French
(1998) and Haroon (2000) found that availability of private areas is not the only aspect
ni
of the physical environment that determines residents’ autonomy. Haroon interviewed
115 residents from 32 different nursing homes known to have different levels of
autonomy (2000). It was found that physical structures, such as standardized furniture,
heating that could not be individually regulated, and no possession of a house key
for residents limited their feelings of independence. Moreover, Hope (2002), who
U
interviewed 225 residents from various nursing homes, substantiates the claim that
characteristics of the institutional environment such as the extent of resources in the
facility, as well as its location, are features which residents have indicated as being of
great importance to their independence.
ity
Make an outline of each section of the paper and decide whether need to add
information, to delete irrelevant information, or to re-structure sections.
m
Read work out loud by which able to identify the need of punctuation marks to
signal pauses or divisions within sentences, where have made grammatical errors, or
where the sentences are unclear.
)A
Since the purpose of a literature review is to demonstrate that the writer is familiar
with the important professional literature on the chosen subject,
Make certain that all of the citations and references are correct
Text should be written in a clear and concise academic style; it should not be
descriptive in nature or use the language of everyday speech.
(c
ity
The following steps are described about the way to organize the literature review.
Chronological (by date): This is a common way for the topic that have been
talked about for a long time and have changed over its history. Organise it in stages of
how the topic has changed: the first definitions of it, then major time periods of change
as researchers talked about it, then how it is thought about today.
rs
Broad-To-Specific: Another approach is to start with a section for reviewing the
general type of issue, then narrow down to increasingly specific issues in the literature
until to reach the articles that are most specifically similar to the research question,
thesis statement, hypothesis, or proposal. This can be a good way to introduce a lot of
background and related facets of selected topic when there is not much directly on topic
ve
but we are tying together many related, broader articles.
ni
Prominent Authors: If a certain researcher started a field, and there are several
famous people who developed it more, a good approach can be grouping the famous
author/researchers and what each is known to have said about the topic and then
U
organise other authors into groups by which famous authors’ ideas they are following.
critically.
●● Our literature review helped us identify the various sub-topics.
●● We should describe our subject, show that we understand it, and explain what
research has been done and how it will affect our own research.
●● Our work should be fully referenced in order to avoid plagiarism.
(c
●● We should use quotes from other authors as needed, but we should not rely on
them.
●● The introduction to a literature review should be written first. This is a brief
ity
●● After the introduction, present the main body of our literature review, which will
make up the majority of our work.
●● Finally, present our conclusions, which should summarise our findings and,
hopefully, justify our choice of research topic.
●● A reference list and/or a bibliography are required. This is a comprehensive list of
rs
all the sources we used and/or consulted.
Summary
●● Discuss about Literature review
ve
●● Different kind of Stages of Literature Review Development
●● Ways to organize Literature Review
●● Types of Literature Reviews
●● Writing Literature Review
Questions
1.
2.
How you can write a review
Describe the different kind of Literature review
ni
U
3. How you can organise a literature review.
ity
m
)A
(c
ity
Unit Outcomes:
rs
●● Types of report
●● APA style
●● Key factors.
ve
5.3.1 Meaning of Research Report
A report is a very formal document that is written for various purposes, such as
sciences, social sciences, engineering and business disciplines. Generally, findings
pertaining to a given or specific task are written up into a report. It should be noted that
reports are considered to be legal documents in the workplace and, thus, they need to
ni
be precise, accurate.
There are three features that, together, characterize report writing at a very basic
level:
●● a predefined structure,
U
●● independent sections
●● reaching unbiased conclusions.
Predefined structure: In bigger sense, these headings may indicate sections
within a report, such as an introduction, discussion, and conclusion.
ity
twenty-page report is clearly lengthy. But where does the line of demarcation lie? Keep
in mind that as a report grows longer (or whatever length you decide), it begins to
resemble formal reports more.
ity
Annual reports, monthly financial reports, and reports on employee absenteeism
are examples of informational reports that carry objective data from one part of
an organisation to another. Scientific research, feasibility reports, and real-estate
appraisals are examples of analytical reports that attempt to solve problems.
Proposal Report
rs
The proposal is a problem-solving report with a twist. A proposal is a written
document that explains how one company can meet the needs of another. The
majority of government agencies use “requests for proposal,” or RFPs, to publicise
their requirements. The RFP identifies a requirement, and potential suppliers submit
proposal reports outlining how they will meet that requirement.
ve
Vertical or Lateral Reports
The direction in which a report travels is classified in this way. Vertical reports
are reports that are more upward or downward in the hierarchy, and they help with
management control. Lateral reports, on the other hand, help with organisation
coordination. A lateral report is one that travels between units at the same
Internal reports circulate within the company. External reports, such as company
U
annual reports, are written for distribution outside of the organisation.
Periodic Reports
Periodic reports are sent out on a regular basis. They are usually upwardly directed
and serve to control management. The uniformity of periodic reports is aided by pre-
ity
Functional Reports
Summary or Abstract The abstract or summary informs the reader of the paper’s
main points and findings in a concise manner.This allows the reader to determine
whether or not the paper will be of interest to them.When looking for papers that are
relevant to your research, get into the habit of reading only the abstracts.Only read the
body of a paper if you believe it will be of use to you.
(c
Introduction
The introduction informs the reader about the paper’s general topic, why it is
important, and what to expect in the body of the paper.Introductions should flow
from broad concepts to the paper’s specific topic.In some cases, introductions are
Notes incorporated into literature reviews.
ity
Review of Literature
The literature review informs the reader about what other researchers have
discovered about the topic of the paper or about other relevant research.
A literature review should influence how readers think about a topic by informing
rs
them about what the academic community has to say about it and its related issues.
Often, what students refer to as a “research paper” is nothing more than a literature
review.
It states facts and ideas about the social world along the way, and it backs them up
ve
with credit for where they came from.The literature review makes it clear that the author
is speculating if an idea cannot be substantiated by the community of scholars, and the
logic of the speculation is detailed.Information that isn’t relevant isn’t discussed.
The literature review has its own distinct personality. The information sources aren’t
heavily quoted or “copied and pasted.” Instead, the author rewrites facts and ideas in
ni
his or her own words, citing the source of the information. Consider how you tell your
family about the exciting things you’ve learned in class... Consider how you talk about
sociology at cocktail parties. You claim things in your own words... You don’t copy and
paste or quote word for word.
U
Research Methodology:
This is the most important section of the report, as it contains all of the crucial
information. Readers can gain information about the topic while also evaluating the
quality of the content provided, and the research can be approved by other market
researchers. As a result, this section must be extremely informative, with each aspect
ity
Research Results:
This section of the results will include a brief description of the findings as well as
m
the calculations used to achieve the goal. The exposition that follows data analysis is
usually done in the report’s discussion section.
Research Discussion:
)A
In this section, the findings are discussed in great detail, as well as a comparison
of reports that may or may not exist in the same domain. In the discussion section, any
anomaly discovered during research will be discussed. When writing research reports,
the researcher must connect the dots to show how the findings can be applied in the
real world.
Finish by summarising all of the research findings and mentioning each and every
author, article, or other piece of content from which references were taken.
ity
The American Psychological Association (APA) is a professional organisation
of psychologists. You’ll find answers to questions like “What is APA format?” in this
guide. In terms of writing and organising your paper according to the standards of the
American Psychological Association our APA citation page has instructions on how
to properly cite sources. Our guide was based on the official American Psychological
Association handbook, and we’ve included page numbers from it throughout. This page,
rs
however, is not affiliated with the organisation.
If your paper is about science, you’ll almost certainly use the APA format. The
standards and guidelines of this organisation are used by many behavioural and social
sciences.
ve
General Document Guidelines:
http://www.vanguard.edu/uploaded/research/apa_style_guide/apastyleessentials.
pdf
Line Spacing: Double-space throughout the paper, including the title page
ity
and references. Spacing after Punctuation: Space once after commas, colons, and
semicolons within sentences. Insert two spaces after punctuation marks that end
sentences.
Alignment: Flush left (creating uneven right margin). Indent the first line of each
paragraph Pagination: The page number appears at the top right of every page. Title
m
page is page 1 Running Head: The running head is a short title that appears at the top
left of the pages of a paper or published article. The running head should not exceed
50 characters, including punctuation and spacing. Using most word processors, the
running head and page number can be inserted into a header (in Microsoft word go to
)A
Order of Pages:
Running Head: typed flush left (all uppercase) following “Running head:”
Centered on the page: Paper title, author, course name, professors name, date.
(c
Body The body of the paper begins on a new page. Subsections of the body of the
paper do not begin on new pages. The body of the paper includes:
Title: The title of the paper (in uppercase and lowercase letters) is centered at the
Notes top of the first page of the body of the paper (so before the introduction paragraph) on
ity
the first line below the running head.
Introduction: The introduction paragraph (which is not labelled with the word
Introduction) begins on the line following the paper title.
Headings: Headings may be used to help organize the paper. Main headings
would use Level 1 (centered, boldface), and subheadings would use Level 2 (flush left,
rs
boldface). Text citations: In text citations should be included throughout the body of the
paper
Reference Page All sources included in the References section must be cited
in the body of the paper (and all sources cited in the paper must be included in the
ve
References section).
Heading: The word References (centered on the first line below the running head)
Format: The references (with hanging indent - meaning the first line is not indented
ni
but the following lines are) begin on the line following the References heading. Entries
are organized alphabetically by last names of first authors.
In-Text citations:
U
Source material must be documented in the body of the paper by citing the
author(s) and date(s) of the sources. The underlying principle is that ideas and words
of others must be formally acknowledged. The reader can obtain the full source citation
from the list of references that follows the body of the paper.
●● When the names of the authors of a source are part of the formal structure
ity
sentence.]
●● When the authors of a source are not part of the formal structure of the
sentence, both the authors and year of publication appear in parentheses.
Consider the following example: Reviews of research on religion and health
)A
have concluded that at least some types of religious behaviours are related
to higher levels of physical and mental health (Gartner, Larson, & Allen, 1991;
Koenig, 1990; Levin &Vanderpool, 1991; Maton&Pargament, 1987; Paloma &
Pendleton, 1991; Payne, Bergin, Bielema, & Jenkins, 1991). [Note: &is used
when multiple authors are identified in parenthetical material. Note also that
when several sources are cited parenthetically, they are ordered alphabetically
by first authors’ surnames and separated by semicolons.]
(c
●● When a source that has two authors is cited, both authors are included every
time the source is cited.
●● When a source that has three, four, or five authors is cited, all authors are
included the first time the source is cited. (Payne, Bergin, Bielema, & Jenkins,
1991).When that source is cited again, the first author’s surname and “et al.” Notes
ity
are used. Payne et al. (1991) showed that ... When a source that has six or
more authors is cited, the first author’s surname and “et al.” are used every
time the source is cited (including the first-time).
●● Every effort should be made to cite only sources that you have actually read.
When it is necessary to cite a source that you have not read (“Grayson”
in the following example) that is cited in a source that you have read
rs
(“Murzynski&Degelman” in the following example), use the following format for
the text citation and list only the source you have read in the References list:
Grayson (as cited in Murzynski&Degelman, 1996) identified four components
of body language that were related to judgments of vulnerability.
ve
●● To cite a personal communication (including letters, emails, and telephone
interviews), include initials, surname, and as exact a date as possible.
Because a personal communication is not “recoverable” information, it is not
included in the References section. For the text citation, use the following
format: B. F. Skinner (personal communication, February 12, 1978) claimed...
ni
●● To cite a Web document, use the author-date format. If no author is identified,
use the first few words of the title in place of the author. If no date is provided,
use “n.d.” in place of the date. Consider the following examples: Degelman
(2009) summarizes guidelines for the use of APA writing style. Changes
in Americans’ views of gender status differences have been documented
U
(Gender and Society,n.d.).
●● TocitetheBible,providethebook,chapter,andverse.ThefirsttimetheBibleiscitedin
the text, identify the version used. Consider the following example: “You are
forgiving and good, O Lord, abounding in love to all who call to you” (Psalm
86:5, New International Version). [Note: No entry in the References list is
ity
and should be incorporated into the formal structure of the sentence. Consider the
following example:
Patients receiving prayer “required less diuretic and antibiotic therapy, had fewer
episodes of pneumonia, had fewer cardiac arrests, and were less frequently intubated
)A
●● Authors: Authors are listed in the same order as specified in the source, using
each author’s last name and first initial. Commas separate all authors. If no
ity
the closing parenthesis. If no publication date is identified, use n.d.
●● Source Reference: Includes title, journal, volume, pages (for journal article) or
title, city of publication, publisher (for book). Italicize titles of books, titles of
periodicals, and periodical volume numbers.
●● Electronic Retrieval Information: Electronic retrieval information may include
rs
digital object identifiers (DOIs) or uniform resource locators (URLs). DOIs are
unique alphanumeric identifiers that lead users to digital source material. To
learn if an article has been assigned a DOI, go to http://www.crossref.org/
guestquery/ HYPERLINK “http://www.crossref.org/guestquery/”.
ve
Examples of sources (Note: On the reference page you do not write the type of
source as is done below in bold - it is just listed here to show you different types of
references. Also remember that on the reference page, references need to be double
spaced.)
Book:
Koenig, H. G. (1990). Research on religion and mental health in later life: A review
and commentary. Journal of Geriatric Psychiatry, 23, 23-53.
Journal article without DOI, retrieved online [Note: For articles retrieved from
databases, include the URL of the journal home page. Database information is not
needed. Do not include the date of retrieval.]
m
ity
5.3.6 Citing & Referencing Sources
According to the Harvard system uses the author’s name and data of publication to
identify cited documents within the text. For example:
rs
●● When referring generally to work by different authors on the subject, place the
authors in alphabetical order: (Baker, 1991; Lewis, 1991; Thornhill, 1993).
●● When referring to dual authors: (Saunders and Cooper, 1993).
●● When there are more than two authors: (Bryce et al., 1991).
ve
●● For corporate authors, for instance a company report: (Hanson Trust Plc.,
1990).
●● For publications with no obvious author; for example, an employment gazette:
(Employment Gazette, 1993).
ni
Referencing in the Text
When using footnotes, a number shows references within the research report.
For example: ‘Recent research indicates that…’ This number refers directly to the
references.
U
These list the referenced publications sequentially in the order they are referred
to in our research report. This can be useful as it enables us to include comments and
footnotes as well as references.
●● The layout of individual references in the bibliography is the same as that for
ity
Abbreviation Explanation
Op. cit. (opereciato Meaning, in the work cited. This refers to a work previously
referenced and so you must give the author and date and if
)A
necessary the page number, like: Robson (1993) op. cit. pp. 23-4.
Loc. Cit. (loco ciato Meaning, in the place cited. This refers to the same page of a work
previously referenced. So you must give the author and date, like:
Robson (1993) loc. Cit.
Ibid. (ibidem) Meaning, the same work given immediately before. This refers to
the work referenced immediately before and replaces all details
(c
5.3.7 Footnotes
Notes
ity
Researchers must insert footnotes in the appropriate places. These fulfil two
purposes:
rs
explanation of a point of view. The recent trend is to avoid footnotes. Some
people feel that they enhance display of the scholarship of the researchers.
But it is neither an end nor a means of displaying scholarship.
ve
5.3.8 Key Considerations/factors
Using proper report form (for longer reports): the “form” filter
Report cover
●● Title page (Includes: title of report, for whom the report is prepared, by whom
ni
it is prepared, release date. If the title does not contain the recommendation,
it normally indicates what problem the report tries to solve: Ways to Market
Communication Consulting Services)
●● Table of Contents (List headings exactly as they appear in the body of the
report, along with page numbers.)
U
●● List of Illustrations (Tables are numbered independently from figures (pie
charts, bar charts, drawings, etc.))
●● Executive Summary (A good summary can be understood by itself.
It summarizes the recommendation of the report, reasons for the
ity
investigate the quality of the radios? The advertising campaign? The cost
of manufacturing? The demand for radios?). Depending on the situation,
may also have: Limitations (problems or factors that limit the validity of your
recommendations), Assumptions (statements whose truth you assume and
)A
which you use to prove your final point), Methods (an explanation of how
you gathered your data), Criteria (factors you used to weigh in the decision).
Definitions (if you have terms to define). Background/History of the Problem
(Serves as a record for later readers of the report. For most of your cases,
this will not be necessary. However, in business reports, this is often a useful
component of a longer, formal report.)
●● Body (Presents and interprets information in words and visuals. Analyzes
(c
●● Conclusions (Summarizes main points of report. The most widely read part
of reports. No new information should be included in the Conclusions. Notes
ity
Conclusions are usually presented in paragraphs. but you could also use a
numbered or bulleted list.)
●● Recommendations (Recommends actions to solve the problem. May be
combined with Conclusions, may be put at beginning of body rather than
at the end (for direct order). Number the recommendations to make it easy
for people to discuss them. If they seem difficult or controversial give a brief
rs
paragraph of rationale after each recommendation. The recommendations will
also be in the Exec. Summary.)
●● References (Document sources cited in the report. Use appropriate form for
citations)
ve
●● Appendixes (Provide additional materials that the reader may want copies of
questionnaires, interviews, computer printouts, previous reports, etc. Number
and title them for example, Appendix A Copy of Survey. Appendix B: Sample
Breakfast Menu Board, etc.)
Introduction: Explain clearly the decision problem and research objective. The
ni
background information should be provided on the product and services provided by the
Limitations: Every report will have some limitation such as time, geographical
area, the methodology adopted, correctness of the responses, etc.
ity
Analysis and interpretations: collected data will be tabulated. Statistical tools if any
will Notes be applied to make analysis and to take decisions.
ity
Summary
●● Types of Report
●● Discus about Correlation & Regression
●● Meaning of Research Report
●● The Usefulness APA style essentials
rs
●● Key Considerations/factors
Questions
1. How many types of report are there discuss about them.
ve
2. Which are the key factors for a report writing?
3. Discus Key features of the report writing.
Exercises:
1.
a)
b)
need
position
ni
The research report will differ based on the …………of the particular managers using
the report.
U
c) designation
d) none of the above
2. Accuracy refers to the degree to which information reflects……………..
ity
a) reality
b) light
c) unreality
d) none of the above
3. Availability refers to the communication process between researcher and
m
the………………..
a) decision maker
b) trainees
)A
c) other researchers
d) none of the above
4. …………….refers to the time span between completion of the research project and
presentation of the research report to management
a) Currency
(c
b) custom
c) taxation
ity
5. …………………is regarded as a major component of the research study
a) Research report
b) final report
c) formal report
d) none of the above
rs
6. Writing of report is the ………..step in a research study and requires a set of skills
somewhat different from those called for in respect of the former stages of research.
a) final
ve
b) semifinal
c) primary
d) none of the above
7. ………………means bringing out the meaning of data.
ni
a) Interpretation
b) translation
c) transformation
U
d) none of the above
8. Successful interpretation depends on how well the data is……………...
a) analysed
b) collected
ity
c) interpreted
d) none of the above
9. In the ………………method, one starts from observed data and then generalisation
is done
a) induction
m
b) conduction
c) coronation
d) invention
)A
b) short
Notes
ity
c) medium
d) none of the above
12. The …………….statement should explain the nature of the project, how it came
about and what was attempted.
a) opening
rs
b) Closing
c) Starting
d) ending
ve
13. The ………………..should indicate the various parts or sections of the report.
a) table of contents
b) chair of contents
c) stool of contents
ni
d) none of the above
14. …………..Page should indicate the topic on which the report is prepared.
a) Title
U
b) introduction
c) conclusion
d) none of the above
15. A selected bibliography lists the items which the author thinks are of ………….
ity
c) aligned
d) none of the above
17. Aim must be logical and ……………in the report presentation
a) systematic
b) unsystematic
(c
c) illogical
d) none of the above
Answers:
Notes
ity
1. need
2. reality
3. decision maker
4. Currency
5. Research report
rs
6. final
7. Interpretation
8. analysed
ve
9. induction
10. communication
11. Long
12. opening
13. table of contents
14. Title
15. primary
ni
U
16. Consistency
17. systematic
ity
m
)A
(c