0% found this document useful (0 votes)
34 views13 pages

Exercises LU0: Descriptive Statistics: Statistics and Probability Distributions

This document provides examples and exercises related to descriptive statistics and probability distributions. It includes examples of qualitative and quantitative variables, continuous and discrete variables, and statistical distributions related to market research studies. The exercises ask students to identify variable types, calculate and interpret descriptive statistics like the mean, median, mode, variance and quartiles for different data distributions.

Uploaded by

Tiago Paulino
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views13 pages

Exercises LU0: Descriptive Statistics: Statistics and Probability Distributions

This document provides examples and exercises related to descriptive statistics and probability distributions. It includes examples of qualitative and quantitative variables, continuous and discrete variables, and statistical distributions related to market research studies. The exercises ask students to identify variable types, calculate and interpret descriptive statistics like the mean, median, mode, variance and quartiles for different data distributions.

Uploaded by

Tiago Paulino
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Statistics and Probability Distributions

Bachelor’s degree in Information Management


Bachelor’s degree in Information Systems
Bachelor’s degree in Data Science

Exercises
LU0: Descriptive Statistics

TEACHERS

Ana Cristina Costa


Email: ccosta@novaims.unl.pt
Leonor Fernandes
Email: mfernandes@novaims.unl.pt

February 2022
1. For each of the cases presented, check the type of variable by placing a cross ("X") in
the appropriate field of the following table:

TYPES OF VARIABLES
Cases Qualitative Quantitative
Nominal Ordinal Discrete Continuous
Sector of activity: Agriculture, Industry,
Services
Rating (1): Weak, Good, Excellent
Rating (2): 1, 2, 3, 4, 5
Number of children in household
Person's height in centimetres
Countries: Portugal, Spain, Italy, Rest
of the World
T-shirt size: XS (extra small), S (small),
M (medium), L (large), XL (extra-large)
Air temperature
Number of errors per page in a book
Number of people who enter a bank per
hour
Time between two phone calls in a call
centre

2. In the following list, identify with a cross ("X") which variables are continuous and
which are discrete.

Variable Discrete Continuous


The number of times that “6” appears when a die is rolled
The weight of 7 students in this class
The time it takes each student to get to the university
The number of goals scored by Benfica last season
The heights of the football team players
The time it takes for a person to run 100 meters
The number of matches in a box having with the
information "contains 50 units on average "
Exam scores with a maximum of 20 points
The weight of a bag of rice
The number of cars parked in a car park

<1>
3. In a market study on weekly newspaper reading implemented in the year 2000, 1000
readers of both genders were asked about the weekly newspaper they usually read.
The results were as follows:

Weekly Journal
Gender WS Journal NY Times Independent Economic News
Female 130 75 130 40
Male 300 175 75 75

This study was repeated in the following years, and the following results were obtained:

Weekly Journal 2001 2002 2003 2004

WS Journal 430 462 470 505


NY Times 250 210 185 200
Independent 205 230 220 160
Economic News 115 98 125 135

a) What are the statistical units of the market study?


b) Interpret the results provided in the tables and plot them in Excel.
c) In relative terms, what was the increase in the number of readers of the WS Journal
from 2001 to 2002? And from 2001 to 2004?

4. 20 financial analysts made a forecast on the values per share of a given company for
the following year. The results are shown in the following table:
Value per share Number of Proportion Cumulative proportion Class
(in euros) analysts of analysts of analysts mark
9.95 – 10.45 2
10.45 – 10.95 8
10.95 – 11.45 6
11.45 – 11.95 3
11.95 – 12.45 1

a) What are the statistical units of this study? What is the study variable?
b) Complete the columns of the table.
c) What is the mean predicted value per share?
d) What is the median value? Explain in your own words what this value means.
e) What is the value of the third quartile? Explain in your own words what this value
means.

5. In a study carried out to analyse the germination capacity of a certain type of cereal,
five seeds were sown in each pot. In the pots, containing the same type of soil, the
number of germinated seeds was recorded. It was found that in 16 pots there were
zero germinated seeds, in 32 pots germinated 1 seed, in 89 pots germinated 2 seeds,

<2>
in 137 pots germinated 3 seeds, in 98 pots germinated 4 seeds and in only 25 pots
germinated 5 seeds.

a) Complete the following table based on the previous information (display 4 decimal
places in the results).

Nr. of seeds Cumulative Cumulative


Number of Proportion of
germinated number of proportion of
pots pots
per pot pots pots

Total

b) Calculate the value of the mean, mode, and median of the distribution of the
number of seeds germinated per pot and explain the meaning of these values.
c) What are the statistical units of this study? What is the study variable?

6. Mr. Noble decided to devote his business to the creation of piglets. He sells the piglets
when they reach two months of age and weigh more than 9 Kg. In order to study the
profits obtained from this activity, he decided to weigh 60 piglets at the age of two
months and obtained the following results:
Cumulative Cumulative
Weight Number of Proportion
number of proportion
(in Kg) piglets of piglets
piglets of piglets
[4 , 6[ 3 0.0500 3 0.0500
[6 , 8[ 7 0.1167 10 0.1667
[8, 10[ 18 0.3000 28 0.4667
[10 , 12[ 17 0.2833 45 0.7500
[12 , 14[ 12 0.2000 57 0.9500
[14 , 16[ 3 0.0500 60 1
Total 60 1

For the distribution of the piglet weight, calculate and interpret:

a) The value of the mean.

b) The value of sample variance.

c) The value of the coefficient of variation.

d) The value of the 1st quartile.

e) The value of the 3rd quartile.

f) The value of the interquartile range.

<3>
7. The following table shows data on the amounts deposited by 1000 customers at a
financial institution.
Cumulative Cumulative
Amount Number of Proportion of
number of proportion of
(in euro) customers customers
customers customers
[500 , 1000[ 150 0.150 150 0.150
[1000 , 1500[ 425 0.425 575 0.575
[1500, 2000[ 260 0.260 835 0.835
[2000 , 2500[ 165 0.165 1000 1
Total 1000 1

a) What was the mean amount deposited by those customers?

b) Calculate the sample standard deviation of the amounts deposited by customers.

c) Determine the median of the amounts deposited by customers.

d) Calculate the mode value of the distribution of the amounts deposited by customers.

e) Describe, in your own words, the information you consider most relevant about the
distribution of the amounts deposited by customers.
f) What are the statistical units of this study? What is the study variable?

8. A market research company was asked to assess the success of some cell phone
models. For this, some university students were asked to give their opinion on 7
models on a scale of 1 to 5 (1 – very bad; 2 – bad; 3 - medium; 4 - good; 5 - very
good). The number of cell phones sold for these models was also obtained. The
following table shows the results.
Cell phone More frequent Number of cell
model opinion phones sold
A 1 475
B 2 500
C 2 420
D 3 650
E 4 920
F 5 1100
G 5 1050

a) What are the statistical units of this study? What are the study variables?
b) Check if there is a strong relationship between the students’ opinion on cell phone
models and the quantity sold.

9. A market research company was asked to assess the success of some washing
detergents. For this, some housewives were asked to give an opinion on 7 detergents

<4>
(1 – very bad; 2 – bad; 3 - medium; 4 - good; 5 - very good). The quantity sold of
these detergents was also recorded in tons per year. The following table shows the
results.
Detergent Opinion Quantity sold
A 1 9,0
B 2 9,2
C 2 9,1
D 3 9,5
E 4 9,6
F 5 9,8
G 5 9,7

Check if there is a strong relationship between the opinion on detergents and the
quantity sold.

10. The following data refer to the length (in cm) of a group of 40 premature babies
(gestational age less than 36 weeks) born during one month in a maternity hospital:

10.1 18 19.4 23.8 29.7 33.1 36.8 38.2


11.2 18.3 19.4 24.3 29.9 33.3 37.4 39.2
13.6 18.6 19.7 24.6 30.0 33.5 37.7 39.6
16.2 19.1 20.1 24.7 31.6 33.7 37.8 40.2
17.2 19.1 20.4 27.3 32.2 34.7 38.2 41.4

a) What are the statistical units of this study? What is the study variable?
b) Calculate and interpret the following statistical measures:

i) Mean.

ii) Sample standard deviation.

iii) Coefficient of variation.

c) Create the frequency table (simple and cumulative frequencies).

d) Based on the frequency table of the previous paragraph, plot the histogram of the
distribution of the length of premature babies.

e) Calculate the quartiles and interquartile range of this distribution based on the
histogram.
f) Check for potential outliers and, if so, identify their observations.

11. Consider the number of mistakes made by an office assistant in 30 pages of written
text:

<5>
1 0 2 1 3 4 1 1 2 3 5 2 0 1 1
4 1 2 0 3 1 1 2 1 3 2 2 3 1 0

a) What are the statistical units of this study? What is the study variable? Classify the
statistical variable.

b) Create the table of simple frequencies, absolute and relative.

c) Graphically represent the distribution of simple relative frequencies.

d) Calculate the cumulative frequencies, absolute and relative.

e) Should class intervals be created to produce a frequency table? What advantages


and disadvantages would it have in this case?

f) What is the mean number of errors per page?

12. Associate each histogram to the corresponding boxplot:

(a) (b) (c)

(d) e)

1) 2) 3) 4) 5)

<6>
13. A group of 50 financial analysts forecasted a company's earnings per share, in euros,
for the coming year. The results are presented in the following table using 7 classes
of equal width:
Class mark Number of
Class Ni Fi
(Ci) analysts
5 4 0,08
8
[8; 10[
[10; 12[ 8 27
13 37 0,74

17 5

a) Complete the table.

b) Calculate the median and interpret its value.

c) Explain, without performing calculations, whether the following questions are true
or false:

i. The 74th percentile is equal to 13 euros.

ii. 84% of the analysts predicted a gain per share of more than 8 euros.

d) Complete the following statement:

___ % of the analysts predicted a gain per share of less than 12.40 euros.

14. The commercial department of a wine importing company has two stores (A and B)
open to the public and decided to study the number of units sold daily for 6 consecutive
weeks. The results were as follows:
Number of units
[0; 10[ [10; 20[ [20; 30[ [30; 40[ [40, 50[
sold per day
Shop A 10 15 3 2 2
Shop B 6 18 4 1 1

a) Which store has the highest mean daily sales?

b) Using central tendency measures, compare the sales of the two stores for
symmetry. How do you interpret these results regarding the sales of the two stores?

c) Sketch the histogram.

<7>
15. Using data on real estate sales commissions (in thousands of euros) of a set of
mediators, the following SPSS output was obtained:

a) What are the statistical units of this


study? What is the study variable?
Classify the statistical variable.

b) Interpret the values of the statistics


shown in the output.

c) Sketch the boxplot of the distribution


of real estate sales commissions.

d) Justify the veracity, or falsity, of the


following statement: “18 mediators
receive commissions above 62
thousand euros.”

16. Football is a very popular game in many countries. The following output shows the
results obtained in SPSS for the useful game time (in minutes) in 70 football games.
Nr. of games

Useful game time (minutes)

Based on the information provided:


a) Create the frequency table.
b) Complete the following statement: "75% of the games had at least ___ minutes of
useful play time."
c) What is the mean useful play time per game? And the most frequent one?
d) Do you consider that the mean is representative of the data? Justify.

<8>
e) "It was observed that the frequency of games with extremely high game time (i.e.,
between 65 and 69 minutes) was the double of the frequency of those with
extremely low game time (i.e., between 41 and 45 minutes)." Do you agree with
this statement? Justify.

17. The following table shows the number of complaints filed per day in April at a given
office.
Day Complaints Day Complaints
01 1 16 1
02 1 17 0
03 1 18 0
04 1 19 2
05 0 20 0
06 0 21 3
07 0 22 1
08 1 23 1
09 2 24 1
10 0 25 0
11 1 26 0
12 1 27 2
13 3 28 0
14 4 29 0
15 0 30 0

a) Create the frequency table.

b) Plot the distribution of the simple absolute frequencies.

c) Graphically represent the distribution of the cumulative relative frequencies.

d) Calculate location measures of central tendency using the observations in the above
table.

e) Calculate the values of the mean, median, and mode from the frequency table, and
interpret the results.

f) Calculate the quartile values from the frequency table and interpret the results.

g) Calculate the range, sample variance, standard deviation, and coefficient of


variation using the observations in the above table.

h) Calculate the range, interquartile range, sample variance, standard deviation, and
coefficient of variation from the frequency table, and interpret the results.

<9>
18. The following table shows the mean duration of court cases (in years) related to a
particular type of crime.
Duration Duration
Process Process
(years) (years)
1 0.1 16 2.77
2 0.54 17 3.22
3 0.63 18 3.44
4 0.65 19 3.45
5 0.8 20 4.19
6 0.82 21 4.21
7 1.57 22 4.3
8 1.88 23 4.6
9 1.91 24 5.14
10 2.07 25 5.42
11 2.1 26 6.2
12 2.1 27 6.31
13 2.25 28 7.4
14 2.37 29 8.55
15 2.48 30 9.56

a) Create the frequency table.

b) Graphically represent the distribution of simple relative frequencies and the


frequency polygon.

c) Graphically represent the cumulative frequency function.

d) Calculate location measures of central tendency using the observations in the above
table.

e) Calculate the values of the mean, median, and mode from the frequency table, and
interpret the results.

f) Calculate the quartile values from the frequency table and interpret the results.

g) Calculate the range, sample variance, standard deviation, and coefficient of


variation using the observations in the above table.

h) Calculate the range, interquartile range, sample variance, standard deviation, and
coefficient of variation from the frequency table, and interpret the results.

< 10 >
SOLUTIONS

3. c) Nr. classes=6; width=5.2 but may


a) Readers
consider 5.5 due to rounding; start the
c) 7.4% and 17.4%
1st class in 10
4.
d) Min=10 cm; 1st quartile=18.708 cm;
a) Financial analysts; value per share of a
given company Median=27.6 cm; 3rd quartile=36.125
c) 11,03 € cm; Max=43 cm; IQ=17.417cm
d) 10,95 € e) BEI=−33.5 cm; BII=−7.4 cm; BIS=62.3
e) 11,37 € cm; BES=88,4 cm
5. 11.
b) mean = 2.8665  3; mode=3; a) Pages of written text; nr. of mistakes;
discrete variable
median=3
f) 1.766  2
c) Pots; nr. of seeds germinated
6. 12. a-3; b-1; c-5; d-2; e-4.
a) 10.2332 Kg
13.
b) 5.9789
a)
c) 24%
Number of
d) 8.5553 Kg Class Class mark NI Fi
analysts
e) 12 kg [4,6[ 5 4 4 0,08
f) 3.445 Kg [6;8[ 7 4 8 0,16

7. [8; 10[ 9 11 19 0,38


a) 1470 € [10; 12[ 11 8 27 0,54

b) 468,32 € [12; 14[ 13 10 37 0,74


[14; 16[ 15 8 45 0,9
c) 1411,76 €
[16; 18[ 17 5 50 1
d) 1317,07 €
f) Customers; amount deposited b) 11.50
8. c) i) False; ii) True
a) Cell phone models; more frequent d) 58%
opinion, and number of cell phones
sold 14.
b) 0.9286 a) In store B (Mean A=15.9; Mean B=16)
b) Mode A=12.3; Mean A=15.9; Median
9. 0.9821
A=14; Mode B=14; Mean B=16; Median
10.
B=15. Both distributions are positive
a) Premature babies; length
asymmetric.
b) i) 27.33 cm; ii) 9.108 cm; iii) 33.32%

< 11 >
15. Mediators; real estate sales commissions; continuous variable

16.
b) 50.4225 m
c) Mean=55.5509 m; Mode= 51.5263 m (King's method)
d) Yes, because CV=10.88% <50%
e) I agree with the statement, because there were 4 games with useful game time between 65
and 69 minutes, and only 2 games between 41 and 45 minutes.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy