0% found this document useful (0 votes)
11 views34 pages

Types of Data

Uploaded by

akshatsharma1278
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views34 pages

Types of Data

Uploaded by

akshatsharma1278
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Data Processing: Introduction to Concepts,

Variables, Attributes, Types of Data


Vocabulary of Statistics
 Those new to statistics sometimes find its
terminology difficult, so at this stage it is
important to understand some new words and
concepts.

 This session is to provide introduction to some of


the basic terms which are essential in carrying out
statistical data analysis.
Some of the Statistical Expressions

Data: Data refers to any group of measurements


that helps in providing information.
Quantitative Data: Data that possess numerical
properties are known as quantitative data.
Qualitative Data: Qualitative data reflects non-
numeric features or qualities of experimental units.
Ex: colour, gender, good, high, low.
Statistics: Statistics is the use of data to help the
decision-makers to reach better decisions.
Variable: A variable is a characteristic that may
take on different values at different times, places
or situations. Ex: income, wages, population, no. of
SHGs, political parties, voters.
Basic concepts:

Constants, Variables, Cases, Values


Discrete And Continuous Variables
Values
Categorizing And Coding The Variables
Nominal Scale
Ordinal Scale
Interval Scale
Grouped And Ungrouped Data
Constants
In math and statistics, a constant is a number that is
fixed and known, unlike a variable which changes with
the context.

A symbol which has a fixed numerical value is called


a constant. For example: 2, 5, 0, -3, -7 etc., are
constants.

Constant is a specific number or a symbol that is


assigned a fixed value. For example, in the equation
below, "y" and "x" are variables, while the numbers 2
and 3 are constants.
y = 2x – 3
A few more constant examples are : The number of days
in a week represents a constant. In the expression 5x +
10, the constant term is 10.
Variables, Cases, Values:

In any study the researcher is concerned with a


particular population or universe.

Population refers to a specific group of people or


institutions or occurrences or observations about
which the researcher wishes to make descriptive or
analytical statements.

When resources are limited researcher will draw


sample observations either randomly or according to
some agreed strategy as the basis for investigation.
Cases-Units of observations

Within a population or sample each individual unit


is called a case or observation.
A case is the basic unit of analysis, it could be an
individual or an organization or an occurrence of
some event.
The population or sample then consists of all the
available cases
with which the study is concerned.
The unit of observation is the unit for which data is
collected. Common examples include individual,
household, community, or school.

Clearly identifying the unit of observation is


important for a logical survey design, organized
data collection, a sound data folder set-up.
Variables

We must define characteristics of population or


sample units to understand the sample or universe
in a better way. Each characteristic of a population
is termed as variable because these are attributes
which vary between cases.

By definition a variable is any characteristic,


number, or quantity that can be measured or
counted.
Variable is a characteristic that may take on
different values for different cases at different
times, places or situations. Ex: incomes of different
individuals, votes obtained by different political
parties, population of different small towns, no. of
SHGs in each district etc.

A variable may also be called a data item. Some


more examples of variables are Age, sex, country of
birth, class grades, eye colour etc.
Example:

Cases: Individuals such as X, Y, Z


Variables: Their respective gender is variable since it
is varying among individuals, Male= 1, Female = 2
For some variables we can have more number of
categories.
Example: Number of children in a family

Cases: Family X, Y, Z
Variables: Their respective number of children is the
variable. Number of children may vary from ‘Zero to
one child, two children and so on.
Discrete and Continuous Variables

A variable may be either continuous or discrete. A


continuous variable is, capable of manifesting every
conceivable fractional value within the range of
possibilities, such as height or weight of persons (Ex.
55.6, 60.4, 72.8 K.G).

On the other hand, a discrete variable is that which


can vary only by ‘finite’ jumps and cannot manifest
every conceivable fractional value.

In some categories the values cannot logically be


subdivided. For example the number of children in a
family can only take certain values such as 1, 2 or 3,
size of the family etc.
Values
These are the possible outcomes for a single Variable.
They are
different for the different cases. Values can be
numbers or
named categories. For example the variable Gender
has two
values, "male" and "female". Some people (cases)
are men, and
some are women.
Example:
Cases: Individuals such as X, Y, Z
Variables: Their respective gender is variable since it
is varying
among individuals, Male= 1, Female = 2.
Values: values for the variable gender are 1 & 2 which
Types of variables
Categorizing and coding the Variables

An important stage of the research process is the


allocation
of a numerical values to each variable.
This is called coding, for example non-literates= 0
and literates= 1. The very process of coding
facilitates the researcher to categorize the population
or sample observations.

Categorical variables:
Categorical variables have values that describe a
'quality' or 'characteristic' of a data unit, like 'what
type' or 'which category'.

There are four levels of measurement or scales to


measure the variables.
a) Nominal scale: Variables measured on the nominal scale
are essentially qualitative rather than quantitative in form.
The values of variables are categories not mere numbers
and cannot be ordered in any mathematically meaningful
way.
A nominal variable with only two possible values is referred
to as a dichotomous variable. An example might be if we
asked a person if they owned a mobile phone. Here, we
may categorize mobile phone ownership as either "Yes" or
"No".

We can as well have more categories for a variable, such as


religious belief. These are called polytomous variables.

Hindu 1
Muslim 2
Christian 3
Buddhist 4
Here in nominal scale each value of the variable
represents a category, they imply no particular
order or relationship between the values.
b) Ordinal scale:
Nominal scale of measurement permits only
classification of the observations into different
categories, whereas ordinal scale of measurement
permits the ordering of those categories into ranks
or scale.
We can distinguish between the values in terms of
degree but cannot measure the degree of difference
between them.

Example: A group of workers opinions about the


work environment.
Very poor 0
Poor 1
Satisfactory 2
Good 3
c) Interval scale: Interval scale implies both an ordering
of categories and a measure of the distance between
them. The differences between points on the scale are
measurable and exactly equal.

Example: Number of absents each employee had in an


organization in a month.

No absences 0
One day 1
Two days 2
Three days 3 and so on.

The number of days here are categories which are ordered


and allows us to measure exactly in a standard unit that
three days is more than one day but less than six days.
Four days absence is twice as many as two days and so on.
d) Ratio scale:
A ratio scale is a quantitative scale where there is a
true zero and equal intervals between neighboring
points.
Unlike on an interval scale, a zero on a ratio scale
means there is a total absence of the variable you
are measuring.

Ex: Length, area, and population are examples of


ratio scales.
Ratio scales are one of the most common ways to
depict scale on maps. It tells the map reader that
one unit on the map is equal to a certain number of
units in the real world. Example: 1:2500. For
example, 1:2500 means that 1 cm = 2500 cm
Age is typically considered to be measured on a ratio
scale. This is because age has a true zero point,
which means that a value of zero represents the
absence of age.

In addition, it is possible to perform mathematical


operations such as addition, subtraction,
multiplication, and division on age values.

The most common examples of ratio scale are


height, money, age, weight etc.
Attributes:
An attribute refers to the quality of a characteristic.
The theory of attributes deals with qualitative types
of characteristics that are calculated by using
quantitative measurements.
Therefore, the attribute needs slightly different kinds
of statistical treatments, which the variables do not
get.
For example, eye color is an attribute of a person.
Attributes refer to the characteristics of the item
under study, like the habit of smoking, or drinking. So
‘smoking’ and ‘drinking’ both refer to the example of
an attribute.
In statistics classifying data based on attributes or
characteristic is known as qualitative classification of
data. Example of attributes are region, caste etc.
Grouped and Ungrouped Data:

Ungrouped Data: The data obtained in original form are


called raw data or ungrouped data.
Example: The ranks obtained by 2500 students in a
certain examination are given below;

25, 8, 37, 16, 45, 40, 29, 12, 42, 25, 14, 16, 16, 20, 10,
36,
33, 24, 25, 35, 11, 30, 45, 48….

This is ungrouped data which is in original form without


any
ordering or grouping.
Grouped Data:

To put the data in a more condensed form, we make


groups of suitable size, and mention the frequency of
each group. Such a table is called a grouped
frequency distribution table. Here we aggregate or
group the data into ordered categories.

Employees age No. of cases


16-20 years 470
21-30 years 950
31-40 years 670
41-50 years 710
Inductive Statistics:

The branch of statistics dealing with generalizations,


predictions, estimations and arriving at conclusions
based on data from sample is called inductive statistics.

When we do this we are inducing or inferring the


characteristics of the population from the characteristics
of the sample.

The purpose of inductive statistics is to assist the


researcher to assess how representative a sample is
from the population. Inductive statistics are also
commonly called inferential statistics.
Example: alpha=0.05
Here in inductive statistics we discuss the
Following:

Why we use sample


Various sampling procedures such as random and
non-random sampling methods
Random sampling error, bias
Estimating the population mean from the sample
mean, normal distribution, standard error, confidence
levels, testing of hypothesis etc.
Concepts of Distributions
Concepts of Cross-section, Time Series, Panel
data
Cross-sectional data:

Definition: Cross-sectional data is information that is


gathered at one point in time to reflect social conditions.

 Cross-sectional data, or a cross section of a population,


in statistics is a type of data collected by observing
individuals, firms, countries, or regions at some point of
time, or without regard to differences in time.

 Analysis of cross-sectional data usually consists of


comparing the differences among the subjects
(individuals, firms, countries, or regions).

Example: Number of habitations in a region in 1996.


For example, if we want to measure current obesity
levels in a population, we could draw a sample of 1,000
people randomly from that population. This is also
known as a cross section of that population.

If we measure their weight and height, and calculate


what percentage of that sample is categorized as obese.

This cross-sectional sample provides us with a snapshot


of that population, at that point of time.

Note that we do not know based on one cross-sectional


sample if obesity is increasing or decreasing; we can
only describe the current proportion.
Time Series data:

Time series data differs from cross-sectional data, in


which units of observations observed at various
points of time.

A time series is a collection of observations made


sequentially through time. The interval between
observations can be any time interval (hours within
days, days, weeks, months, years, etc).
Time series data differs from cross-sectional data, in which units
of observations are observed at various points of time.

A time series is a collection of observations made sequentially


through time (the interval between observations can be any time
interval hours within days, days, weeks, months, years, etc).

Some areas of applications:


Time series can occur in a wide range of fields from economics to
sociology, meteorology, geography to financial investment, etc
Some examples of time series are:
- Malaria incidence or deaths over calendar years, Covid-19
- Daily maximum temperatures
- Hourly records of babies born at a maternity hospital
Can you suggest other examples?
Air pollution etc.
Panel Data
Panel data (or time-series cross-sectional (TSCS)
data, or longitudinal data), combines both cross-
sectional and time series data ideas and looks at how
multiple subjects (units of observations such as
households, individuals, data related to small towns
etc.) change over time.

Panel data examines changes in variables over time


and differences in variables between the subjects.

Examples include estimating the effect of education


on income, with data across time and individuals.
 Panel data contain observations of multiple
phenomena obtained over time periods for the
same units of observations or individuals.
 The term longitudinal data is often used for panel
Thank you

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy