0% found this document useful (0 votes)
21 views7 pages

BRM CH - 07

Uploaded by

natimul3424
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views7 pages

BRM CH - 07

Uploaded by

natimul3424
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

CHAPTER SEVEN

DATA PROCESSING AND ANALYSIS

The goal of any research is to provide information out of raw data. The raw data after collection has
to be processed and analysed in line with the outline (plan) laid down for the purpose at the time of
developing the research plan. The compiled data must be classified, processed, analysed and
interpreted carefully before their complete meanings and implications can be understood.

7.1 Data Processing


Data processing implies editing, coding, classification and tabulation of collected data so that they
are amenable to analysis.
1. Editing:
It is a process of examining the collected raw data to detect errors and omission (extreme values)
and to correct those when possible.
 It involves a careful scrutiny of completed questionnaires or schedules
 It is done to assure that the data are:
 Accurate
 Consistent with other data gathered
 Uniformly entered
 As complete as possible
 And has been well organized to facilitate coding and tabulation.

Editing can be either field editing or central editing

 Field editing: consist of reviewing of the reporting forms by the investigator for completing
what has been written in abbreviation and/ or in illegible form at a time of recording the
respondent’s response. This sort of editing should be done as soon as possible after the
interview or observation.

 Central Editing: it will take place at the research office. Its objective is to correct errors
such as entry in the wrong place, entry recorded in wrong units (weeks instead of month).

2. Coding:
Coding refers to the process of assigning numerical or other symbols to answers so that responses
can be put in to a limited number of categories or classes. Such classes should be appropriate to the

1
research problem under consideration. There must be a class of every data items. They must be
mutually exclusive (a specific answer can be placed in one and only one cell in a given category set).
Coding decisions should usually be taken at the designing stage of the questionnaire.
3. Classification:
Most research studies in a large volume of raw data, which must be reduced in to homogenous group.
Data classification implies the processes of arranging data in groups or classes on the basis of
common characteristics. Data having common characteristics placed in one class and in this way the
entire data get divided in to a number of groups or classes.

3.1 Data according to attributes:

Data are classified on the basis of common characteristics that are descriptive such as literacy, sex,
honesty, etc. Such descriptive characteristics refer to qualitative phenomenon, which cannot be
measured quantitatively: only their presence or absence in an individual item can be noticed. Data
obtained this way on the basis of certain attributes are known as statistics of attributes and their
classification is said to be classification according to attributes.

3.2 Classification according to Class interval:

Unlike descriptive characteristics the numerical characteristics refer to quantitative phenomena,


which can be measured through some statistical unit. Data relating to income, age, weight, etc. come
under this category. Such data are known as statistics of variables and are classified on the basis of
class interval. For example, individuals whose incomes, say, are within 1001-1500 birr can form one
group, those whose incomes within 500-1500 birr can form another group and so on. In this way the
entire data may be divided in to a number of groups or classes or what are usually called class interval.
Each class-interval, thus, has an upper as well as lower limit, which is known as class limit. The
difference between the two-class limits is known as class magnitude. The number of items that fall
in a given class is known as the frequency of the given class.

7.2 Data Analysis

Data analysis is further transformation of the processed data to look for patterns and relations among
data groups. By analysis we mean the computation of certain indices or measures along with
searching for patterns or relationship that exist among the data groups. Analysis particularly in case
of survey or experimental data involves estimating the values of unknown parameters of the
population and testing of hypothesis for drawing inferences.
2
Analysis can be categorized as:

 Descriptive Analysis
 Inferential (Statistical) Analysis
7.2.1 Descriptive Analysis
Descriptive analysis is largely the study of distribution of one variable. Analysis begins for most
projects with some form of descriptive analysis to reduce the data in to a summary format.
Descriptive analysis refers to the transformation of raw data in to a form that will make them easy to
understand and interpret.

The most common forms of describing the processed data are:

 Tabulation
 Percentage
 Measurers of central tendency
 Measures of dispersion
 Measures of asymmetry
Tabulation

Tabulation refers to the orderly arrangements of data in a table or other summary format. It presents
responses or the observations on a question-by-question or item-by item basis and provides the most
basic form of information. It tells the researcher how frequently each response occurs. Tabulation
may be done by hand or by mechanical or electronic devices such as the computer.

Tabulation may be classified as simple and complex:

Simple tabulation gives information about one or more groups of independent questions resulting in
one-way table.

Complex tabulation shows the division of data into two or more categories. It is designed to give
information concerning one or more sets of inter-related questions

Need for Tabulation


 It conserves space and reduces explanatory and descriptive statement to a minimum
 It facilitates the process of comparison
 It facilitates the summation of items and the detection of errors and omission
 It provides basis for various statistical computation.

3
Percentage:

Whether the data are tabulated by computer or by hand, it is useful to have percentages and
cumulative percentage. Table containing percentage and frequency distribution is easier to interpret.
Measures of Central Tendency

Describing the central tendency of the distribution with mean, median, or mode is another basic form
of descriptive analysis.

These measures are most useful when the purpose is to identify typical values of a variable or the
most common characteristics of a group. Measures of central tendency are also known as statistical
average. Mean, median, and mode are most popular averages.

 The most commonly used measure of central tendency is the mean. To compute the mean, you
add up all the numbers and divide by how many numbers there are. It's not the average nor a
halfway point, but a kind of center that balances high numbers with low numbers. For this reason,
it's most often reported along with some simple measure of dispersion, such as the range, which
is expressed as the lowest and highest number.

 The median is the number that falls in the middle of a range of numbers. It's not the average; it's
the halfway point. There are always just as many numbers above the median as below it. In cases
where there is an even set of numbers, you average the two middle numbers. The median is best
suited for data that are ordinal, or ranked. It is also useful when you have extremely low or high
scores.

 The mode is the most frequently occurring number in a list of numbers. It's the closest thing to
what people mean when they say something is average or typical. The mode doesn't even have to
be a number. It will be a category when the data are nominal or qualitative. The mode is useful
when you have a highly skewed set of numbers, mostly low or mostly high. You can also have
two modes (bimodal distribution) when one group of scores are mostly low and the other group
is mostly high, with few in the middle.

Measure of Dispersion:
Measure of Dispersion: is a measurement how the value of an item scattered around the truth-value
of the average. Average value fails to give any idea about the dispersion of the values of an item or
4
a variable around the truth-value of the average. After identifying the typical value of a variable the
researcher can measure how the value of an item is scattered around the true value of the mean. It is
a measurement of how far is the value of the variable from the average value. It measures the
variation of the value of an item.

Important measures of dispersion are:

 Range: measures the difference between the maximum and the minimum value of the observed
variable.
 Mean Deviation: it is the average dispersion of an observation around the mean value.
 Variance: it is mean square deviation. It measures the sample variability.
 Standard deviation: the square root of variance
Measure of asymmetry (Skew-ness):
Measure of asymmetry (skew-ness): when the distribution of items is happen to be perfectly
symmetrical, we then have a normal curve and the relating distribution is normal distribution. Such
curve is perfectly bell shaped curve in which case the value of Mean= Median= Mode

Skewness is, thus a measurement of asymmetry and shows the manner in which the items are
clustered around the average. In a symmetric (normal distribution) the items show a perfect balance
on either side of the mode, but in a skewed distribution the balance is skewed one side or distorted.
The amount by which the balance exceeds on one-side measures the skew-ness.

Knowledge about the shape of the distribution is crucial to the use of statistical measure in research
analysis. Since most methods make specific assumption about the nature of distribution. Skew -ness
describes the asymmetry of a distribution. A skewed distribution therefore has one tail longer than
the other.

 A positively skewed distribution has a longer tail to the right


 A negatively skewed distribution has a longer tail to the left
 A distribution with no skew (e.g. a normal distribution) is symmetrical

5
7.2.2. Inferential Analysis
Most researcher wishes to go beyond the simple tabulation of frequency distribution and calculation
averages and/or dispersion. They frequently conduct and seek to determine the relationship between
variables and test statistical significance.

When the population is consisting of more than one variable, it is possible to measure the relationship
between them. Is there any association or correlation between the two or more variable? If yes, then
up to what degree?

This will be answered by the use of correlation technique.

Correlation

The most commonly used relational statistic is correlation and it's a measure of the strength of some
relationship between two variables, not causality. Interpretation of a correlation coefficient does not
even allow the slightest hint of causality. The most a researcher can say is that the variables share
something in common; that is, are related in some way. The more two things have something in
common, the more strongly they are related. There can also be negative relations, but the important
quality of correlation coefficients is not their sign, but their absolute value. A correlation of -.58 is
stronger than a correlation of .43, even though with the former, the relationship is negative. The
following table lists the interpretations for various correlation coefficients:

.8 to 1.0 Very strong


.6 to .8 Strong
.4 to .6 Moderate
.2 to .4 Weak
.0 to .2 Very weak

Pearson's correlation coefficient, or small r, represents the degree of linear association between any
two variables. Unlike regression, correlation doesn't care which variable is the independent one or
the dependent one, therefore, you cannot infer causality. Correlations are also dimension-free, but
they require a good deal of variability or randomness in your outcome measures. A correlation
coefficient always ranges from negative one (-1) to one (1), so a negative correlation coefficient of -
0.65 indicates that "65% of the time, when one variable is low, the other variable is high". A positive
correlation coefficient of 0.65 indicates, "65% of the time, when one variable exerts a positive
6
influence, the other variable also exerts a positive influence". A correlation coefficient at zero, or
close to zero, indicates no linear relationship.

Is there any cause and effect (causal relationship) between two variables or between one variable on
one side and two or more variables on the other side?

This question can be answered by the use of regression analysis. In regression analysis the researcher
tries to estimate or predict the average value of one variable on the basis of the value of other variable.

Regression

Regression is the closest thing to estimating causality in data analysis, and that's because it predicts
how much the numbers "fit" a projected straight line. The most common form of regression, however,
is linear regression, and the least squares method to find an equation that best fits a line representing
what is called the regression of y on x. Instead of finding the perfect number, however, one is
interested in finding the perfect line, such that there is one and only one line (represented by equation)
that perfectly represents, or fits the data, regardless of how scattered the data points. The slope of the
line (equation) provides information about predicted directionality, and the estimated coefficients (or
beta weights) for x and y (independent and dependent variables) indicates the power of the
relationship.

Yi= Bo + B1Zi

Yi= Outcomes score for the nth unit (dependent variable)

B0= coefficient for the intercept

B1= Coefficient for slope

Zi= independent variable

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy