0% found this document useful (0 votes)
30 views103 pages

Edte 326 Statistics

Uploaded by

neshalvin7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views103 pages

Edte 326 Statistics

Uploaded by

neshalvin7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 103

STATISTICS

• What is statistics?
• Statistics is a scientific method of collecting,
organizing, summarizing, analyzing, and
presenting data.
• There are two major areas of statistics namely
• i) Descriptive
• Ii) Inferential
• Descriptive statistics refer to the entire sample or
class.
• It involves tabulating, depicting, presenting and
describing collections of data under
consideration.
• The data may either be quantitative (measures of
height, weight, age etc.) or qualitative I.Q.,
personality, morality etc.)
• In essence descriptive statistics serve as a tool to
describe and reduce the data to manageable form
of information.
• Types of descriptive statistics
• Measures of central tendency
• Measures of variability/ Spread
• Measures of relationship
• Inferential statistics are used to infer/ predict population
parameters from a sample measure(s)
• Types of inferential statistics include:
• t-test
• f-ratio
• Analysis of variance (ANOVA)
• Analysis of covariance (ANCOVA)
• Multiple analysis of variance (MANOVA)
• Multiple correlations
• Data from continuous variables are referred to as
continuous data while that of discrete is called
discrete data.
• To measure continuous variables we use
measurement scales.
• Nominal scale
• Ordinal scale
• Interval scale
• Ratio
• Each of these levels has their own rules and
characteristics.
• Each level is hierarchical and incorporates the
properties of the lower.
• Nominal scale used for categorizing and
identifying. It is the most elementary or crudest
in measurement.
• When numbers are assigned to this category they
have no numeric meaning, they are only labeling
or coding information.
• Ordinal scale –ranks in order. The ordinal scale of
measurement incorporates the classifying and
labeling function and to it brings a sense of order
into a property being identified.
• Numbers are used to indicate rank order and nothing
more e.g. arranging from shortest to the tallest.
• However interpreting numbers from the scale is
worthy to note that the distance between is
unknown and not necessarily equal.
• Interval scale- adds magnitude to whatever is
being measured. In addition to identifying and
rank ordering the data, the interval involves
assigning numbers to objects in such a way that
an equal unit difference correspond actual
interval in the amount of property measured.
• Ratio- is the highest level of measurement and it
differs from interval scale only because the zero
point indicates total absence of measurement.
• Parametric- those scales that have significant
meaning (ratio and interval)
• Non –parametric- those without significant
meaning( nominal and ordinal)
Data Presentation

• Raw data/ raw scores should be organized and


grouped systematically to promote analysis.
• So before analyzing the raw data it is good to
organize them systematically.
• The arranged data can the be presented in :
• i) Text form – writing in pros
ii) Tabular form – using tables
iii) Graphical form- using graphs
Common Methods Of Data
Presentation
• Bar graphs Pie charts
• Frequency tables Histograms
• Frequency polygon
• Cumulative frequency graphs
Bar Graphs

• Portraying information by means of a bar graph


is particularly useful when dealing with data
gathered from discrete variables that are
measured on nominal scale.
• Uses rectangles and bars to represent discrete
categories of data, the length of the bar being
proportional to the number of frequencies within
the category.
• The categories are placed on horizontal x-axis with
each category being assigned a bar.
• The vertical y-axis indicates the observed
frequency in each category.

• Bar graph to be included.


Pie Chart

• Is best suited to simple comparison of discrete


variables,.
• A circle is divided to equivalent percentage
proportions of the frequency distribution.
• Insert a pie chart
Histogram

• Is similar to a bar graph, the only difference in


presentation being that the bars are joined
together.
• Insert a histogram
Tabulating Frequency Data

• As a teacher you will be dealing with large


amount of data, usually in form of test scores.
• As more and more score accumulate, it gets more
and more difficult to make sense of the data.
• Q. What does that mean?
• What it means is that as data accumulates, it
becomes more difficult to answer questions such as:
• How many students are above average?
• How many scored above the cutoff passing score?
• Did most of the class do well on the test?
• What is the highest or lowest score?
• Following is a set of scores from 25 form two
students on a math test.
• 36 63 51 43 93 54 48 84 36 45
• 57 45 48 96 66 54 72 81 30 27
• 45 51 47 63 88
• Without doing anything to the scores it will be
difficult to answer the questions raised earlier.
• For example the first two questions can not be
answered until a teacher computes the average
score and establish the cutoff or passing score.
• How about the last two questions?
• The questions may be answered but it will take a lot
of time to do so.
• For example to determine whether most of the
students did well, you may have crossed and counted
the scores in the 80s and 90s.
• Whatever strategy you used, it was used because
it was difficult to make sense of the 25 scores as
they were.
• There are several systematic ways to make sense
of the scores.
• One way is organizing or introducing some sort
of order to unorganized or unordered test scores.
• The first method is to simply list the scores in
ascending or descending numerical order.
• Let us now list our 25 test scores in a descending
order.
• 96 93 88 84 81 72 66 63 63 57
54 54 51 51 48 48 47 45 45
45 43 36 36 30 27
• Introducing some order or sense in this group scores it
makes it easy to interpret.
• For example at a glance we can be able to tell the
highest and the lowest score.
• We can also easily see that only five students scored
above 80 on the test.
• Listing has helped us organize this set of scores.
But suppose the scores were of 100 or 1000
students?
• As the number of scores increases, the advantage
of simply listing scores decreases.
• Many scores will repeat themselves several
times.
• It becomes more and more difficult to make sense
of data and the scores would require a lot of
paper work.
• Also as you list the data, you will notice that many
scores are missing.
• For example in our case, 95, 94, 92, 91, 90, 89, 87 and
so on are missing.
• Failure to consider these missing scores can sometimes
result in misinterpretations of the data.
Simple Frequency
Distribution
• This approach to tabulating data considers all
scores, including those that are missing.
• It may however cause as much or more confusion
as the original group of unorganized scores.
• Unless your test yield a narrow spread of scores,
simple frequency distribution tend to be so lengthy
that it is difficult to make sense of the data, which
is what we are trying to do.
• In summary simple frequency distribution will
summarize data effectively only if the spread of
scores is small.
Grouped Frequency
Distribution
• The grouped frequency distribution method of
tabulating data is very similar to the simple
distribution, except that ranges or intervals of
scores are used for categories rather than
considering each possible score as a category.
• The following is a grouped frequency distribution for the 25
scores.
• Interval f
• 91-97 2
• 84-90 2
• 77-83 1
• 70-76 1
• 63-69 3
• 56-62 1
• 49-55 4
• 42-48 7
• 35-41 2
• 28-34 1
• 21-27 1
• The grouped frequency distribution has two
major advantages over listing and simple
frequency distribution.
• It compresses the size of the table and makes the
data much more interpretable.
• At a glance it becomes apparent that most of the
class (as indicated by the numbers in the frequency
column) obtained scores of 55 or below.
• If we add the f column, we can see specifically that
15 ( 4+7+2+1+1= 15) of the 25 students in the
class scored 55 or below.
• That is, four students scored between 49 and 55,
seven scored between 42 and 48, and two scored
between 35 and 41, one scored between 28 and 34 and
one scored between 21 and 27.
• Since most of the class scored 55 or lower, one
interpretation the grouped frequency distribution helps
us make is that the test may have been too difficult.
• However, there may be other possible interpretations;
• Perhaps the students did not prepare for the test;
• Perhaps the students need more instruction in this area
because they are slower than the teacher expected them to
be;
• Perhaps the instruction was ineffective or inappropriate.
• Grouped frequency distribution does help us
make sense of the a set of scores.
• But there are some disadvantages of using
grouped frequency distribution.
• The main disadvantage of grouped frequency
distribution is that information about individual
scores is lost.
• As a result the information we deal with becomes
less accurate.
• Consider the interval scores 49-55 in the previous
grouped frequency distribution.
• We see that four scores fell in this interval.
• However, we can not tell exactly what the scores
were.
• Were the scores 49,51,53 and 55? Or were they 49,50,51 and
54? Or were all four scores 49? Or were two scores 52 and
two scores 53? Or 51?
• The four scores could be any conceivable combination of
scores.
• Without referring to the original list of scores we can not tell.
Steps in Constructing a Grouped
Frequency Distribution.
• 1. Determine a range of scores (symbolized by R).
• The range (or spread) of scores is determined by subtracting the
lowest score (L) from the highest score (H).
• Formula Application
• R= H-L R= 96-
27=69
• The range of scores for the 25 students is 69
• 2. Determine the appropriate number of intervals.
• The number of intervals or categories used in a
grouped frequency distribution is somewhat flexible.
• Different authorities will suggest that you select
among, 5, 10, or 15 intervals, or 8, 10,12, or 15
intervals.
• In our example we used 11 intervals, so what is
correct?
• Because we said this is flexible, you will decide
as a teacher.
• However don’t select too few intervals.
• On the other hand if you find more than one
interval with zero in the frequency column, note
that you decided on too many intervals.
• You can experiment and vary the number of
intervals until you find a suitable one that best
represents your data.
• It is generally best to begin with 8 or 10 intervals
when constructing a grouped frequency
distribution for a group of 25 to 30, the number
of intervals can be increased for a larger set of
scores.
• 3. Divide the range with the number of intervals you
decide to use and round to the nearest odd number.
This will give you i, the interval width.
• Formula Application
• i= _R_____ i=__69_____=6.9 (7)
• Number of interval 11
• The width of the interval is 7.
• If we decided to use 8 for our number of intervals, we
would arrive at a wider interval width.
• If we used 15 we would arrive at a narrower width than
we would with 10 or 8 intervals. It would be 4.6 which
would be 5 to the nearest odd number.
• Note that there is an inverse relationship of
intervals and the width of each interval.
• That is, as fewer intervals are used, the width of
each interval increases; as more intervals are used
the width decreases.
• Also keep in mind that as i the interval width
increases, we loose more and more information
about individual scores.
• 4. Construct the interval column making sure that
the lowest score in each interval, called the lower
limit(LL) is a multiple of the interval width.
• The upper limit of each interval (UL) is one point
less than the next interval.
• This means that the lowest score of each interval
should be a value that is equal to the interval width
times 1,2,3, etc.
• With an interval width of 7, the LL of each interval
could be 7, 14, 21 etc. (7x1, 7x2, 7x3, etc.)
• However, we eliminate those intervals below and
above the intervals and that include or “capture”
the lowest and highest scores.
• Consider the following set of intervals for which
the highest score was 96 and the lowest score was
27. (list the scores on the board)
• We retain only intervals 21-27 through 91- 97.
thus the interval column of our grouped
frequency distribution should look like this:
• Interval
• 91-97
• 84-90
• 77-83
• 70-76
• 63-69
• 56-62
• 49-55
• 42-48
• 35-41
• 28-34
• 21-27
• 5. Construct the f, or frequency, column by
tallying the number of scores that are captured by
each interval.
• Intervals Tally f
• 91-97 II 2
• 84-90 II 2
• 77-83 I 1
• 70-76 I 1
• 63-69 III 3
• 56-62 I 1
• 49-55 IIII 4
• 42-48 IIII II 7
• 35-41 II 2
• 28-34 I 1
• 21-27 1 1
• You may decide to have 10 intervals but end up
with 11. This is because of what happens in step
3,( rounding off to the nearest odd number).
• So it is not unusual to end up with one more or
one less interval than you intended.
Forms of Frequency
Distribution
• Frequency distribution can occur in an unlimited
number of shapes/forms;
• The normal curve
• Skewed curve
• Leptokurtic curve
• Mesokurtic curve
• Platykurtic curve
Normal Curve

• This a bell shaped curve with the pick of


distribution in the center and the tails of
distribution continuously approaching but never
touching the horizontal axis.
Skewed Curve

• A distribution is skewed if scores trail off in one


direction.
• The degree of skewness (the amount and extent
to which a distribution departs from symmetry)
can be determined by comparing a distribution
with the normal curve.
• Positively skewed – the tail tappers or extends
out towards the right.
• Negatively skewed- the tail tappers or extends to
the left.
• Kurtosis – the flatness or pickedness of a
distribution in relation to the normal curve.
• If it is more picked it is termed as leptokurtic.
• If less picked is platykurtic
• The normal distribution is spoken of as mesokurtic
Measures of Central
Tendency
• A examination of a graph of a frequency
distribution in any given distribution, reveals
that;
• i) the values of variables tend to cluster around a
central value.
• ii) the values spread around a central area in a
specific way.
• Describing the central points around which
values in a distribution spread is what we mean
by a measure of central tendency.
• This measure gives some idea of the average of
representative scale in a distribution.
• The three measures of central tendency are ;
• Mean Median and
• Mode
Ungrouped Data

• Mode – this the most frequent occurring score or


value in a distribution, generally denoted (Mo).
• Example, 1,4,3,4, 2, 4, 2, 7 =11
• Median – this is a point in the distribution of scores
(arranged in merit) such that 50% are located above it
and the other 50% below it.
• It is the mid-point of a score distribution when the
scores are arranged in merit.
• For example, 2,7,9,10, 14,16,18 =10
• Mean –there are four interpretations of mean;
geometric, harmoniums, quadratic and arithmetic.
• Our main focus is arithmetic.
• Mean is found by adding all the scores in a
distribution and dividing by the number of scores.
• Formula= summation sign
N
For example 1,3,7,9,4
M= 1+3+7+9+4
5
Measures of Variability
(Dispersion)
• While the measures of central tendency give us
information about representative scores,
measures of variability provide information about
difference in spread about distribution in scores.
• There are five conventional measures of
dispersion namely;
• i) Range
• ii) Interquartile range
• iii) Semi interquartile range
• iv) Variance
• v) Standard Deviation
• Range- Is the difference between the highest and
lowest score in a given distribution. It is
determined by subtracting the lowest score from
the highest score.
Interquartile Range(IQR)

• Is the measure of variability, based on dividing a


data set into quartiles.
• Quartiles divide a rank ordered data set into four
equal parts.
• The values that divide each part are called the
first , second and third quartiles; and they are
denoted by Q1, Q2, and Q3 respectively.
• Q1 is the “middle” value in the first half of the
rank ordered data set.
• Q2 is the median value in the set.
• Q3 is the “middle” value in the second half of the
rank ordered data set.
• The interquartile range is equal to Q3 minus Q1.
• For example, consider the following numbers;
1,2,3,4,5,6,7,8.
• Q1 Q2 Q3
•1 2 3 4 5 6 7 8
Semi Interquartile Range

• SIQR = ½ IQ Range
= ½ (UQ – LQ)
• Or (Q3 – Q1)/2
Variance

• Variance is the measure of variability which


involves all the given scores in any distribution.
• To obtain the variance the following steps must
be followed;
• i) compute mean of scores
• ii) subtract the mean from each score and get the
difference called a deviation.
• iii) Square the deviation
• iv) Sum the squared deviation
• V) Divide the sum by the number of scores.
• The variance of one distribution can be compared
with the variance of another distribution.
• The smaller the variance the less the dispersion
or spread of scores within the distribution.
Example
x Y
05 15 15 16
12 03 12 18
04 19 15 14
37 11 13 17
06 34 11 15
• Mean of x = 14.6 and the mean of Y is also 14.6
• The high the variability the less consistency the
scores (X)
• The low the variability the more the consistency
(Y).
Significance of Measures of
Variability
• Determine the reliability of an average
• Serve as a basis for the control of the variability
• To compare variability of two or more series and
• Facilitate the use of other statistical measures.
Standard Deviation

• This is the deviation of individual raw score from


the mean.
• It is the square root of the variance.
• Standard deviation is the estimate of variability that
accompanies the mean in describing a distribution.
• Standard deviation is used to indicate the
spread/scatter or variance that may exist in a
distribution of scores.
• The standard deviation is affected by the spread
of scores from the mean.
• The smaller the standard deviation the more
homogeneous is the class for that particular
subject.
• The larger the standard deviation the more
heterogeneous the class is in a particular subject.
Measures of Relationship

• This indicates the degree of association between


two subjects or variables.
• For example, the more year you spend in the
teaching profession, the better teacher you
become.
• The unit of association is called coefficient
correlation.
• Relationships are of three broad magnitudes
• i) Positive relationship
• ii) Negative relationship
• iii) Zero relationship
• The maximum value of r is 1.00
• 1.00 implies a perfect relationship (positive
magnitude)
• -1.00 implies a perfect negative relationship
Methods of Expressing
Relationships
• Scatter plot / diagram
• Spearman's rho
• Pearson product moment correlation
• Correlation has two distinctions:
• Correlation which merely describe the presence
or absence of a relationship.
• Correlation that shows the degree/ magnitude of
relationship between the two measures.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy