Exercises LU0: Descriptive Statistics: Statistics and Probability Distributions
Exercises LU0: Descriptive Statistics: Statistics and Probability Distributions
Exercises
LU0: Descriptive Statistics
TEACHERS
February 2022
1. For each of the cases presented, check the type of variable by placing a cross ("X") in
the appropriate field of the following table:
TYPES OF VARIABLES
Cases Qualitative Quantitative
Nominal Ordinal Discrete Continuous
Sector of activity: Agriculture, Industry,
Services
Rating (1): Weak, Good, Excellent
Rating (2): 1, 2, 3, 4, 5
Number of children in household
Person's height in centimetres
Countries: Portugal, Spain, Italy, Rest
of the World
T-shirt size: XS (extra small), S (small),
M (medium), L (large), XL (extra-large)
Air temperature
Number of errors per page in a book
Number of people who enter a bank per
hour
Time between two phone calls in a call
centre
2. In the following list, identify with a cross ("X") which variables are continuous and
which are discrete.
<1>
3. In a market study on weekly newspaper reading implemented in the year 2000, 1000
readers of both genders were asked about the weekly newspaper they usually read.
The results were as follows:
Weekly Journal
Gender WS Journal NY Times Independent Economic News
Female 130 75 130 40
Male 300 175 75 75
This study was repeated in the following years, and the following results were obtained:
4. 20 financial analysts made a forecast on the values per share of a given company for
the following year. The results are shown in the following table:
Value per share Number of Proportion Cumulative proportion Class
(in euros) analysts of analysts of analysts mark
9.95 – 10.45 2
10.45 – 10.95 8
10.95 – 11.45 6
11.45 – 11.95 3
11.95 – 12.45 1
a) What are the statistical units of this study? What is the study variable?
b) Complete the columns of the table.
c) What is the mean predicted value per share?
d) What is the median value? Explain in your own words what this value means.
e) What is the value of the third quartile? Explain in your own words what this value
means.
5. In a study carried out to analyse the germination capacity of a certain type of cereal,
five seeds were sown in each pot. In the pots, containing the same type of soil, the
number of germinated seeds was recorded. It was found that in 16 pots there were
zero germinated seeds, in 32 pots germinated 1 seed, in 89 pots germinated 2 seeds,
<2>
in 137 pots germinated 3 seeds, in 98 pots germinated 4 seeds and in only 25 pots
germinated 5 seeds.
a) Complete the following table based on the previous information (display 4 decimal
places in the results).
Total
b) Calculate the value of the mean, mode, and median of the distribution of the
number of seeds germinated per pot and explain the meaning of these values.
c) What are the statistical units of this study? What is the study variable?
6. Mr. Noble decided to devote his business to the creation of piglets. He sells the piglets
when they reach two months of age and weigh more than 9 Kg. In order to study the
profits obtained from this activity, he decided to weigh 60 piglets at the age of two
months and obtained the following results:
Cumulative Cumulative
Weight Number of Proportion
number of proportion
(in Kg) piglets of piglets
piglets of piglets
[4 , 6[ 3 0.0500 3 0.0500
[6 , 8[ 7 0.1167 10 0.1667
[8, 10[ 18 0.3000 28 0.4667
[10 , 12[ 17 0.2833 45 0.7500
[12 , 14[ 12 0.2000 57 0.9500
[14 , 16[ 3 0.0500 60 1
Total 60 1
<3>
7. The following table shows data on the amounts deposited by 1000 customers at a
financial institution.
Cumulative Cumulative
Amount Number of Proportion of
number of proportion of
(in euro) customers customers
customers customers
[500 , 1000[ 150 0.150 150 0.150
[1000 , 1500[ 425 0.425 575 0.575
[1500, 2000[ 260 0.260 835 0.835
[2000 , 2500[ 165 0.165 1000 1
Total 1000 1
d) Calculate the mode value of the distribution of the amounts deposited by customers.
e) Describe, in your own words, the information you consider most relevant about the
distribution of the amounts deposited by customers.
f) What are the statistical units of this study? What is the study variable?
8. A market research company was asked to assess the success of some cell phone
models. For this, some university students were asked to give their opinion on 7
models on a scale of 1 to 5 (1 – very bad; 2 – bad; 3 - medium; 4 - good; 5 - very
good). The number of cell phones sold for these models was also obtained. The
following table shows the results.
Cell phone More frequent Number of cell
model opinion phones sold
A 1 475
B 2 500
C 2 420
D 3 650
E 4 920
F 5 1100
G 5 1050
a) What are the statistical units of this study? What are the study variables?
b) Check if there is a strong relationship between the students’ opinion on cell phone
models and the quantity sold.
9. A market research company was asked to assess the success of some washing
detergents. For this, some housewives were asked to give an opinion on 7 detergents
<4>
(1 – very bad; 2 – bad; 3 - medium; 4 - good; 5 - very good). The quantity sold of
these detergents was also recorded in tons per year. The following table shows the
results.
Detergent Opinion Quantity sold
A 1 9,0
B 2 9,2
C 2 9,1
D 3 9,5
E 4 9,6
F 5 9,8
G 5 9,7
Check if there is a strong relationship between the opinion on detergents and the
quantity sold.
10. The following data refer to the length (in cm) of a group of 40 premature babies
(gestational age less than 36 weeks) born during one month in a maternity hospital:
a) What are the statistical units of this study? What is the study variable?
b) Calculate and interpret the following statistical measures:
i) Mean.
d) Based on the frequency table of the previous paragraph, plot the histogram of the
distribution of the length of premature babies.
e) Calculate the quartiles and interquartile range of this distribution based on the
histogram.
f) Check for potential outliers and, if so, identify their observations.
11. Consider the number of mistakes made by an office assistant in 30 pages of written
text:
<5>
1 0 2 1 3 4 1 1 2 3 5 2 0 1 1
4 1 2 0 3 1 1 2 1 3 2 2 3 1 0
a) What are the statistical units of this study? What is the study variable? Classify the
statistical variable.
(d) e)
1) 2) 3) 4) 5)
<6>
13. A group of 50 financial analysts forecasted a company's earnings per share, in euros,
for the coming year. The results are presented in the following table using 7 classes
of equal width:
Class mark Number of
Class Ni Fi
(Ci) analysts
5 4 0,08
8
[8; 10[
[10; 12[ 8 27
13 37 0,74
17 5
c) Explain, without performing calculations, whether the following questions are true
or false:
ii. 84% of the analysts predicted a gain per share of more than 8 euros.
___ % of the analysts predicted a gain per share of less than 12.40 euros.
14. The commercial department of a wine importing company has two stores (A and B)
open to the public and decided to study the number of units sold daily for 6 consecutive
weeks. The results were as follows:
Number of units
[0; 10[ [10; 20[ [20; 30[ [30; 40[ [40, 50[
sold per day
Shop A 10 15 3 2 2
Shop B 6 18 4 1 1
b) Using central tendency measures, compare the sales of the two stores for
symmetry. How do you interpret these results regarding the sales of the two stores?
<7>
15. Using data on real estate sales commissions (in thousands of euros) of a set of
mediators, the following SPSS output was obtained:
16. Football is a very popular game in many countries. The following output shows the
results obtained in SPSS for the useful game time (in minutes) in 70 football games.
Nr. of games
<8>
e) "It was observed that the frequency of games with extremely high game time (i.e.,
between 65 and 69 minutes) was the double of the frequency of those with
extremely low game time (i.e., between 41 and 45 minutes)." Do you agree with
this statement? Justify.
17. The following table shows the number of complaints filed per day in April at a given
office.
Day Complaints Day Complaints
01 1 16 1
02 1 17 0
03 1 18 0
04 1 19 2
05 0 20 0
06 0 21 3
07 0 22 1
08 1 23 1
09 2 24 1
10 0 25 0
11 1 26 0
12 1 27 2
13 3 28 0
14 4 29 0
15 0 30 0
d) Calculate location measures of central tendency using the observations in the above
table.
e) Calculate the values of the mean, median, and mode from the frequency table, and
interpret the results.
f) Calculate the quartile values from the frequency table and interpret the results.
h) Calculate the range, interquartile range, sample variance, standard deviation, and
coefficient of variation from the frequency table, and interpret the results.
<9>
18. The following table shows the mean duration of court cases (in years) related to a
particular type of crime.
Duration Duration
Process Process
(years) (years)
1 0.1 16 2.77
2 0.54 17 3.22
3 0.63 18 3.44
4 0.65 19 3.45
5 0.8 20 4.19
6 0.82 21 4.21
7 1.57 22 4.3
8 1.88 23 4.6
9 1.91 24 5.14
10 2.07 25 5.42
11 2.1 26 6.2
12 2.1 27 6.31
13 2.25 28 7.4
14 2.37 29 8.55
15 2.48 30 9.56
d) Calculate location measures of central tendency using the observations in the above
table.
e) Calculate the values of the mean, median, and mode from the frequency table, and
interpret the results.
f) Calculate the quartile values from the frequency table and interpret the results.
h) Calculate the range, interquartile range, sample variance, standard deviation, and
coefficient of variation from the frequency table, and interpret the results.
< 10 >
SOLUTIONS
< 11 >
15. Mediators; real estate sales commissions; continuous variable
16.
b) 50.4225 m
c) Mean=55.5509 m; Mode= 51.5263 m (King's method)
d) Yes, because CV=10.88% <50%
e) I agree with the statement, because there were 4 games with useful game time between 65
and 69 minutes, and only 2 games between 41 and 45 minutes.