Unit 1 Computational Statistics
Unit 1 Computational Statistics
Unit 1
Q. What is statistics?
Ans. Statistics is the science of collecting, organizing, summarizing, analysing, and
interpreting information. Good statistics are used to draw conclusions about a sample. In
many situations of our life, statistics can help us perceive what we know and what we don't
know.
For example, it can turn a vague statement like "This medication may cause nausea," or
"You could die if you don't take this medication" into a specific statement like "Three in one
thousand patients had experienced nausea when they took this medication," or "If you don't
take this medication, there is a 95% chance that you will die." Without statistics, the
interpretation of data can quickly become massively flawed. Hence there arises a need for
statistics.
Statistical Data
Q. Categorical Data
Ans. Categorical data refers to a form of information that can be stored and identified
based on their names or labels. It is a type of qualitative data that can be grouped into
categories instead of being measured numerically. Categorical measurements are not given
in numbers but rather in natural language descriptions. Numbers can sometimes represent
it, but those numbers don’t mean anything mathematically.
For example: Birthdate, favourite sport, hair colour, height. This data type is made up of
categorical variables that show things like a person’s gender, hometown, and so on. In the
above example, both the birthdate and the postcode are made up of numbers. It is regarded
as categorical data even though it includes numbers.
Calculating the average is a simple way to determine if the provided data is categorical or
numerical. If you can figure out the average, it is considered numerical data. If you can’t
figure out the average, then it’s considered categorical data.
Types:
a) Boolean: Boolean data are data which can only have two possible values. For example:
female/male, smoker/non-smoker, True/False
b) Nominal: Sometimes classifications require more than two categories. Such data is called
nominal data. Example: married/single/divorced.
c) Ordinal: In contrast to nominal data, ordinal data are ordered and have a logical sequence.
Example: very few/few/some/many/very many.
Q. Numerical (Continuous)
Ans. Numerical data, as the name suggests, consists of numbers. It represents quantitative
information and can be measured and counted. This data type is often used to perform
mathematical operations and statistical analysis. It is a cornerstone in making informed
decisions, drawing conclusions, and discovering patterns. A numerical variable is something
blocking an infinite value.
Numerical data variables can be further categorized into two main types: discrete and
continuous data.
1. Discrete Data
Discrete data consists of distinct and separate values. These values are typically integers and
do not have fractional or decimal components. Example: number of students in a class,
number of cars in a parking lot, number of customer complaints. You can’t have 0.5
complaint or 1.5 student.
2. Continuous Data
Continuous data, on the other hand, can take any value within a specific range. These values
can be integers or decimals. Example: height of individuals(6ft1), temperature(23.2°c),
weight(23.5kg).
Ans.
Mean: The arithmetic mean of a variable, often called the average, is computed by adding
up all the values and dividing by the total number of values
Median: The median of a variable is the middle value of the data set when the data are
sorted in order from least to greatest. It splits the data into two equal halves with 50% of
the data below the median and 50% above the median.
Data Set: 23, 27, 29, 31, 35, 39, 40, 42, 44, 47, 51 ODD
Median: 39
To calculate the median with an even number of values (n is even), first sort the data from
smallest to largest and take the average of the two middle values.
Mode: The mode is the most frequently occurring value in the dataset.
Variance: The variance is the ratio of the sum of the square of the difference between each
value and its arithmetic mean to the no of elements minus 1.
Standard Deviation: It’s the measure of the amount of variance in the data. The standard
deviation is the square root of the variance.
Harmonic Mean: The Harmonic Mean (HM) is defined as the ratio of no of elements in data
set to the sum of the reciprocal of the values.