QM 1
QM 1
Qualitative data are data for which the measurement scale is categorical
Classification
Data
Qualitative Quantitative
Discrete Continuous
Processing Data
He created a map depicting where cases of cholera occurred in London’s West End and found them to be clustered
around a water pump on Broad Street.
Analytics
A. 50 million
B. 52 million
C. 22 million
D. 49 million
Scale of
Measurement
Likely to encounter these terms:
▪ Data are the facts and figures that are collected, summarized, analyzed,
and interpreted
▪ Elements are the entities on which data are collected
▪ A variable is a characteristic of interest for the elements
▪ A data set with n elements contains n observations
▪ Predictor variable: A variable thought to predict an outcome variable. This
term is basically another way of saying ‘independent variable or cause’.
▪ Outcome variable: A variable thought to change as a function of changes in
a predictor variable.(dependent variable or effect)
▪ Variables are measured constructs that vary across entities in the sample.
▪ In contrast, parameters are not measured and are (usually) constants
believed to represent some fundamental truth about the relations
between variables in the model. (mean, median and correlation,
regression)
For Instance
Name of Element
Variables
Variables
Name of Element
For Instance
Types of Measurement scale
• Variables can be split into categorical and continuous, and within these types
there are different levels of measurement:
• Categorical (entities are divided into distinct categories):
• Binary variable: There are only two categories (e.g., dead or alive).
• Nominal variable: There are more than two categories (e.g., whether someone is an
omnivore, vegetarian, vegan, or fruitarian).
• Ordinal variable: The same as a nominal variable but the categories have a logical
order (e.g., whether people got a fail, a pass, a merit or a distinction in their exam)
• Continuous or Quantitative (entities get a distinct score):
• Interval variable: Equal intervals on the variable represent equal differences in the
property being measured (e.g., the difference between 6 and 8 is equivalent to the
difference between 13 and 15).
• Ratio variable: The same as an interval variable, but the ratios of scores on the scale
must also make sense (e.g., a score of 16 on an anxiety scale means that the person is,
in reality, twice as anxious as someone scoring 8). For this to be true, the scale must
have a meaningful zero point.
What is the level of measurement of the following variables?
• The gender of the people giving the bands their phone numbers
https://academo.org/demos/dice-roll-
statistics/#:~:text=If%20you%20roll%20a
%20fair,%22roll%20automatically%22%2
0button%20above.
https://www.youtube.com/wat
ch?v=zeJD6dqJ5lo
Descriptive Statistics
▪ Numerical Measures
▪ Measures of Location
▪ Measures of Variability
Measures of Location
▪Mean
▪Median
▪Mode
▪Percentiles
▪Quartiles
The measure of central tendency
• We can calculate where the centre of a frequency distribution lies
using three measures commonly used: the mean, the mode and the
median.
• The mean is the sum of all scores divided by the number of scores.
The value of the mean can be influenced quite heavily by extreme
scores. (The mean provides a measure of central location)
• The median is the middle score when the scores are placed in
ascending order. It is not as influenced by extreme scores as the
mean.
• The mode is the score that occurs most frequently.
Business Scenario: Mean
• Suppose you want to run a campaign to advertise the racing bikes /
latest fashion trend at a location.
• Whenever a data set has extreme values, the median is the preferred
measure of central location. The median of a data set is the value in
the middle when the data items are arranged in ascending order.
• As a general rule, use median when you want to get the average of a
vector that includes a more uneven data set.
• odd number of scores
Median
• even number of scores
The mean annual loan amount of the population is Customer Loan
13,50,000 INR. But this amount is higher than that Amount (in
earned by 80% of the population. Rs)
1 8,00,000
4 9,50,000
However, the median is not an impeccable statistic.
There are several things that we should consider when
5 32,50,000
using it for communicating statistical information.
Practical use- Salary Analysis: Use the median to report typical salaries when there are a few extremely high or low
salaries that could skew the mean.
Limitations of using Median
Practical use- Customer Preferences: Use the mode to identify the most preferred product or service feature.
Dispersion of distribution
• SS=5.20
• This equation shows how something we have used before (the sum of
squares) can be used to assess the total error in any model (not just the
mean).
• Although, the sum of squared errors (SS) is a good measure of the
accuracy of our model, it depends upon the quantity of data that has
been collected – the more data points, the higher the SS.
• To estimate the mean error in the population we need to divide not by the
number of scores contributing to the total, but by the degrees of freedom
(df), which is the number of scores used to compute the total adjusted for
the fact that we’re trying to estimate the population value
• Our model is the mean, so let’s replace the ‘model’ with the mean ( ), and
the ‘outcome’ with the letter x (to represent a score on the outcome).