Statistics Course Work Presentation
Statistics Course Work Presentation
COURSE WORK
PRESENTATION
Kaire Agnes-VU-BPL-2307-0632-EVE
Kayongo Daniel- VU-BPL-2307-0646-EVE
Question 4
Discuss the concept of data presentation
• Tabular, e.g. Univariate frequency distributions, Simple frequency
distribution, Grouped frequency distribution.
• Graphical e.g. Histogram, Frequency polygon, Cumulative frequency
distribution.
• Diagrammatical e.g. Charts
Introduction
• Data presentation is the process of organizing and displaying data in a way that
makes it easy to understand and analyze. Effective data presentation transforms
raw data into meaningful information through various formats and techniques.
One of the most common formats for presenting data is tabular presentation. Let's
explore some specific types of tabular data presentations:
o Univariate Frequency Distributions
o Simple Frequency Distributions, and
o Grouped Frequency Distributions.
1. Univariate Frequency Distributions
• A univariate frequency distribution is used to show the frequency (i.e., the number
of times) each unique value occurs in a dataset for a single variable. This type of
distribution is helpful for understanding the distribution and central tendency of a
particular variable.
• Example:
• A univariate frequency distribution table showing exam scores of students at
Victoria university in year 1.3
• 20,30,30,30,40,40,50,50,50,20,30,40,40,40,40,40,40,50,50,50,50,50,50,50,50,50,50
,50,50,60,60,60,60,60,60,60,60,60,60,70,70,70,70,70,70,80,80,80,90,90.
Table 01. A dataset showing exam scores of students Victoria university in a year
• When dealing with a large range of data, it can be helpful to group the data
into intervals or classes. A grouped frequency distribution summarizes data
by grouping adjacent values into class intervals and showing the frequency
of data points within each interval. This method is particularly useful for
continuous data or when the dataset is large (e.g over 30 observations).
• Example:
A grouped frequency distribution for a dataset of exam scores for a class of
100 students might look like this:
Table 03: A Dataset of scores for a class of
students.
Age Relative Cumulative
Frequency
Range Frequency Frequency
Oct-19 5 0.2 5
20-29 8 0.32 13
30-39 6 0.24 19
40-49 4 0.16 23
50-59 2 0.08 25
Total 25 1
Advantages of Tabular Data Presentation
• Clarity: Tables can present data in a clear and concise manner, making it
easier to identify patterns and trends.
• Ease of Comparison: Tables facilitate the comparison of different data
points or categories.
• Compactness: Tables can effectively summarize large amounts of data
without overwhelming the reader.
• Accessibility: Well-designed tables are straightforward to read and
understand, even for those without advanced statistical knowledge.
Graphical data presentation
Graphical data representation is a key component of data analysis, allowing
for the visualization of data patterns, trends, and distributions.
Here, we will delve into three specific types of graphical data presentations:
Histograms, Frequency polygons, and Cumulative frequency
distributions.
Histogram
• Concept: A histogram is a graphical representation of the distribution of
numerical data. It is an estimate of the probability distribution of a
continuous variable and consists of contiguous (touching) bars where each
bar represents the frequency of data points falling within the specific
interval.
• For example
A histogram based on the example dataset of exam scores below.
Table 04. A dataset of exam marks for a
class 0f BPL 1.3 business statistics
class cumulative
interval frequency class boundary frequency
20-25 10 19.5-25.5 10
26-30 28 25.5-30.5 38
31-35 32 30.5-35.5 70
36-40 45 35.5-40.5 115
41-45 50 40.5-45.5 165
46-50 35 45.5-50.5 200
51-55 12 50.5-55.5 212
Histogram
Histogram showing marks for 1.3 BPL business statis-
tics
60
50
40
Frequency
30
20
10
Class boundary
50
40
Frequency
30
20
10
0
20 25 30 35 40 45 50 55
Class Mark
Notes
• The frequency polygon can serve as an alternative to a histogram. Both
visual representations perfectly reflect the shape of a distribution.
• The frequency polygon represents the frequency distribution of
continuous data graphically. Its relevance lies in its ability to visually
represent data, allowing for more straightforward interpretation and
analysis. As a result, it is a valuable tool in statistics, helping researchers
to identify patterns and trends in large data sets.
3. Cumulative Frequency Distribution
• Cumulative frequency distribution is a tabular summary of the frequencies
of observations in a dataset, sorted from the smallest value to the largest.
• A cumulative frequency distribution (or cumulative frequency curve) shows the
cumulative frequency of data points up to a certain value.
• It is constructed by plotting points representing the cumulative frequency on the y-
axis and the corresponding lower class boundaries on the x-axis.
• Cumulative frequency distributions are useful for analyzing the proportion of data
points below or above certain thresholds and identifying percentiles and quartiles.
Construction of a Graph for Cumulative
Frequency Distribution:
• Data Preparation: Organize the dataset in ascending order.
• Calculation: Calculate the cumulative frequency for each value by adding
up the frequencies as you progress through the dataset.
• Plotting: Plot the values of cumulative frequencies on the y-axis and their
corresponding lower class boundaries on the x-axis.
• For example, consider the data set of exam score of BBA statistics class,
70,75,80,85,90,75,85,80,85,90,95,85,90,85,80,65,70,75
Table 06. cumulative frequency distribution
for the exam score of BBA statistics class
CLASS CUMULATIVE
CLASS BOUNDARY CLASS MARK FREQUENCY
INTERVAL FREQUENCY
65-69 64.5-69.5 67 1 1
70-74 69.5-74.5 72 2 3
75-79 74.5-79.5 77 3 6
80-84 79.5-84.5 82 3 9
85-89 84.5-89.5 87 5 14
90-94 89.5-94.5 92 3 17
95-99 1 18
94.5-99.5 97
18
Total
Cumulative frequency distribution
curve(Ogive)
A Graph Showing Cumulative Frequency Distribution of Exam scores of
20
BBA statistics class
18
16
14
Cumulative Frequency
12
10
0
60 65 70 75 80 85 90 95 100
Lower Class Boundary
Graph interpretation
In this cumulative frequency distribution graph:
• Each point represents a score from the dataset, with the x-coordinate representing the score and the
y-coordinate representing the cumulative frequency.
• The graph shows how the cumulative frequency increases as we move through the sorted dataset.
• The graph allows us to observe the distribution of scores and how the cumulative frequency
accumulates as scores increase.
• It provides insights into the spread of scores and the relative frequencies of different score ranges.
• Graphical presentation of cumulative frequency distribution facilitates a visual understanding of the
distribution of data, making it easier to interpret and analyze.
Advantages Graphical data presentations
• Visual clarity: Graphs and charts provide a clear and concise representation of data, making it easier to interpret and
understand complex patterns and relationships.
• Comparative analysis: Graphical presentations enable comparison between different datasets or variables, facilitating
insights into trends, differences, and similarities.
• Engaging communication: Visualizations are often more engaging and memorable than numerical tables, enhancing
audience comprehension and retention of information.
• Decision-making support: Graphical representations of data can aid decision-making processes by highlighting key
insights and trends, enabling stakeholders to make informed decisions based on evidence.
Disadvantages of graphical presentations
including;
• Limited Detail: Graphs and charts often provide a condensed summary of data,
which may omit certain details or nuances present in the raw data. This can be a
disadvantage when a more comprehensive understanding of the data is required.
Disadvantages… con’d
• Over emphasis on Visual Appeal: Sometimes, graphical presentations
prioritize aesthetics over clarity or accuracy. This can result in visually
appealing but misleading visualizations that fail to effectively communicate the
intended message.
• Pie Chart:
• A pie chart is a circular chart divided into slices, each representing a proportion of
the whole data.
• It is useful for showing the composition of a categorical variable as parts of a whole.
• Pie charts are ideal for visualizing percentages and relative proportions but may become less
effective when representing large amounts of data.
• Example: shares owned by shareholders of a company:
• Shareholder A: 40%, shareholder B: 30%, shareholder C: 20%, shareholder D: 10%
Pie chart
Apie chart showing percentages of shares owned by shareholders
10
20 40
30
END
Question Seven
Probability theory is a branch of mathematics that deals with the study of chance
events and their likelihood of occurrence.
It provides a mathematical framework for analyzing and modelling uncertain events,
making predictions and estimating the likelihood of occurrences.eg, In
o weather forecasting
o sports outcomes
o card games and other chance games
o insurance
o medical diagnosis
o election outcomes
o Shopping recommendations, etc.
Key concepts in probability theory
Events – These are the occurrences or outcomes of a random experiment/
activity. Eg, when tossing a coin, the events are either getting a head or a
tail.
Probability – This is a number between 0 and 1 representing the
likelihood of an event occurring. It can also be expressed as percentages
ranging from 0% to 100%. A probability of 0 indicates that there is no
chance that a particular event will occur, whereas a probability of 1(100%)
indicates that an event is certain to occur. A probability of 0.45 (45%)
indicates that there are 45 chances out of 100 of the event occurring.
Key concepts….cont’d
Random variables- These are variables whose possible values are determined by chance.
E.g. when tossing a coin, showing of the head is a random variable.
Sample space(s) -The set of all possible outcomes of a random experiment. Eg, when a
coin is tossed, there are only two possible outcomes, head and tail. So the sample space is
2.
Independence – This is when the probability of an event is not affected by another event’s
probability. E.g. when tossing a coin, the probability of getting a head is independent of the
probability of getting a 6 when a die is rolled.
Conditional probability- This is the probability of an event given that another event has
occurred.
Probability of an event
Probability ,P of an event ,E is the likelihood that the event will occur.
For any event, E, 0 ≤ P(E)≤1, where P(E) is the probability of E.
Probability, P=
For example, when we toss a coin, the probability, P of getting a head (H)
is calculated as below;
P(H)=
=
Rules of Probability
• Non-negativity: There’s no negative probability. The probability of an
impossible event is 0 and the probability of a certain event is 1. Therefore,
for any event A, the range of possible probabilities is: 0 ≤ P(A) ≤ 1
• Normalization: The sum of all the probabilities for all possible events
( sample space) of a random experiment is equal to 1. E.g, when a coin is
tossed, the sum of the probability of getting a head , P(H) and the
probability of getting a tail, P(T) is equal to 1.
P(H)+P(T)=1
Rules of Probability…cont’d
• Complementarity: The probability of the compliment(opposite) of an event is 1
minus the probability of the event. Thus, for any event A, P(A’ ) = 1 - P(A).
• Mutual exclusivity: If two events, A and B, are mutually exclusive (also called
disjoint events) , then A and B can not occur at the same time. Thus the probability
that both events occur, P(A ꓵ B) or P(A and B)=0.
The probability of either events happening is given by, P(AUB) or P(A or B) = P(A)
+ P(B).
If the two events are NOT mutually exclusive, then P(A or B) = P(A) + P(B) - P(A and
B).
Rules of Probability…cont’d
• Dependency(Conditional Probability) : This is the probability of an event
given that another event has occurred. (ie, both events occur). For events A
and B,
P(A and B) = P(A)* P(B|A) or P(B)*P(A|B).
• Note: This straight line symbol, |, does not mean divide! It means "conditional"
or "given". For instance P(A|B) means the probability that event A occurs given
event B has occurred. It is given by;
• P(AǀB)= and P(BǀA)=.
For mutually exclusive events, P(AǀB)= = =0. Also, P(BǀA)==0
Rules of Probability…cont’d
• Independency :If A and B are independent events, neither event
influences or affects the probability that the other event occurs. The
probability of independent events is given by, P(AꓵB) or P(A and B) =
P(A)*P(B). This particular rule extends to more than two independent
events. Eg, P(A and B and C) = P(A)*P(B)*P(C).
• Inclusion –Exclusion principle(Rule): This states that probability of a
union of independent events is the sum of their probabilities minus the
probability of their intersection. ie, P(A ∪ B) = P(A) + P(B) – P(A ∩ B)
PROBABILITY DISTRIBUTIONS
• A probability distribution is a statistical function that describes all the
possible values and probabilities for a random variable within a given
range.
• This range will be bound by the minimum and maximum possible values,
but where the possible value would be plotted on, the probability
distribution will be determined by a number of factors like mean
(average), standard deviation, skewness, and kurtosis of the distribution.
Types of Probability Distribution
The probability distribution are divided into two:
• Discrete Probability Distributions
• Continuous Probability Distributions
Discrete Probability Distribution
• A discrete distribution describes the probability of occurrence of
each value of a discrete random variable(one which may take on
only a countable number of distinct values such as 0,1,2,3,4).
Discrete Probability Distribution
Binomial distribution
A binomial probability distribution is one in which there is only a probability
of two outcomes. In this distribution, data are collected in one of two forms
after repetitive trials and classified into either success or failure. It generally
has a finite set of just two possible outcomes, such as zero or one. Eg,,
flipping a coin gives you the list {Heads, Tails}.
Types …contd
Bernoulli distribution
Bernoulli distributions are similar to binomial distributions because there are
two possible outcomes but only one trial is conducted. The outcomes in a
Bernoulli distribution are labeled as either a zero or one. A one indicates
success, and a zero means failure.(one trial is called a Bernoulli trial).
Eg , if you used one green marble (for success) and one red marble (for
failure) in a covered bowl and chose without looking, you would record each
result as a zero or one rather than success or failure for your sample.
Discrete Probability Distributions…cont’d
Poisson Distribution
The Poisson distribution expresses the probability that a given number of
events will occur over a fixed period.
For instance, say you have a covered bowl with one red and one green
marble, and your chosen period is two minutes. Your test is to record
whether you pick the green or red marble, with the green indicating success.
After each test, you place the marble back in the bowl and record the results.
Discrete Probability Distributions…cont’d
Multinomial distributions.
Multinomial distributions occur when there is a probability of more than two
outcomes with multiple counts.
For instance, say you have a covered bowl with one green, one red, and one
yellow marble. For your test, you record the number of times you randomly
choose each of the marbles for your sample
Continuous Probability Distributions
• A continuous distribution describes the probabilities of possible values of
a continuous random variable.
• A continuous random variable has an infinite and uncountable set of
possible values (known as the range). Eg , Height could be any one of the
infinite values in between 5 to 5.9feet.
• The probability of a continuous random variable is given by the area
under the curve of the Probability Density Function (PDF).
Probability density function
• The probability density function given by the following equation,
describes the probability that variable x falls between two values (a and
b), and is equal to the area under the curve from a to b.
f(x)= P(a ≤ x ≤b)= dx ≥ 0
x is a continuous random variable that can take on any value within a given
range of values.
Types of continuous probability distribution
Thank you