Describing Data With Tables
Describing Data With Tables
• In order to find the frequency distribution of quantitative data, we can use the
following table that gives information about "the number of smartphones owned
per family."
• When observations are sorted into classes of single values, the result is
referred to as a frequency distribution for ungrouped data. It is the
representation of ungrouped data and is typically used when we have a smaller
data set.
• A frequency distribution is a means to organize a large amount of data. It takes
data from a population based on certain characteristics and organizes the data in
a way that is comprehensible to an individual that wants to make assumptions
about a given population.
1. Grouped data:
• Grouped data refers to the data which is bundled together in different classes
or categories.
• Data are grouped when the variable stretches over a wide range and there are a
large number of observations and it is not possible to arrange the data in any
order, as it consumes a lot of time. Hence, it is pertinent to convert frequency
into a class group called a class interval.
• Suppose we conduct a survey in which we ask 15 familys how many pets they
have in their home. The results are as follows:
1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 5, 5, 6, 7, 8
2. Classes should be set up so that they do not overlap and so that each piece of
data belongs to exactly one class.
3. List all classes, even those with zero frequencies.
• The real limits are located at the midpoint of the gap between adjacent tabled
boundaries; that is, one-half of one unit of measurement below the lower tabled
boundary and one-half of one unit of measurement above the upper tabled
boundary.
• Table 2.3.4 gives a frequency distribution of the IQ test scores for 75 adults.
• If the lower class limit for the second class, 95, is added to the upper class
limit for the first class,94 and the sum divided by 2, the upper boundary for
the first class and the lower boundary for the second class is determined. Table
2.3.5 gives all the boundaries for Table 2.3.5.
• If the lower class limit is added to the upper class limit for any class and the
sum divided by 2, the class mark for that class is obtained. The class mark for a
class is the midpoint of the class and is sometimes called the class midpoint
rather than the class mark.
Example 2.3.1: Following table gives the frequency distribution for the
cholesterol values of 45 patients in a cardiac rehabilitation study. Give the
lower and upper class limits and boundaries as well as the class marks for
each class.
• Solution: Below table gives the limits, boundaries and marks for the classes.
Example 2.3.2: The IQ scores for a group of 35 school dropouts are as
follows:
b) Specify the real limits for the lowest class interval in this frequency
distribution.
(123-69)/ 10=54/10=5.4≈ 5
Example 2.3.3: Given below are the weekly pocket expenses (in Rupees) of
a group of 25 students selected at random.
37, 41, 39, 34, 41, 26, 46, 31, 48, 32, 44, 39, 35, 39, 37, 49, 27, 37, 33, 38, 49,
45, 44, 37, 36
Solution:
• In the given data, the smallest value is 26 and the largest value is 49. So, the
range of the weekly pocket expenses = 49-26=23.
Outliers
• 'In statistics, an Outlier is an observation point that is distant from other
observations.'
• An outlier is a value that escapes normality and can cause anomalies in the
results obtained through algorithms and analytical systems. There, they always
need some degrees of attention.
• Understanding the outliers is critical in analyzing data for at least two aspects:
• The simplest way to find outliers in data is to look directly at the data table,
the dataset, as data scientists call it. The case of the following table clearly
exemplifies a typing error, that is, input of the data.
• The field of the individual's age Antony Smith certainly does not represent the
age of 470 years. Looking at the table it is possible to identify the outlier, but it
is difficult to say which would be the correct age. There are several possibilities
that can refer to the right age, such as: 47, 70 or even 40 years.
• A relative frequency distribution lists the data values along with the percent
of all observations belonging to each group. These relative frequencies are
calculated by dividing the frequencies for each group by the total number of
observations.
• Example: Suppose we take a sample of 200 India family's and record the
number of people living there. We obtain the following:
Cumulative frequency:
• A cumulative frequency distribution can be useful for ordered data (e.g. data
arranged in intervals, measurement data, etc.). Instead of reporting frequencies,
the recorded values are the sum of all frequencies for values less than and
including the current value.
• Example: Suppose we take a sample of 200 India family's and record the
number of people living there. We obtain the following:
• To convert a frequency distribution into a cumulative frequency distribution,
add to the frequency of each class the sum of the frequencies of all classes
ranked below it.