0% found this document useful (0 votes)
35 views87 pages

Statistics For Economics Class 11 Notes - For Merge

The document discusses the importance of economics in addressing the problem of scarcity, highlighting the roles of consumption, production, and distribution. It emphasizes the significance of statistics in economics for data collection, analysis, and policy formulation, while distinguishing between primary and secondary data sources. Additionally, it covers various methods of data collection, including personal interviews, mailing surveys, and telephone interviews, along with the concepts of census and sampling.

Uploaded by

Ujjwal Mittal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views87 pages

Statistics For Economics Class 11 Notes - For Merge

The document discusses the importance of economics in addressing the problem of scarcity, highlighting the roles of consumption, production, and distribution. It emphasizes the significance of statistics in economics for data collection, analysis, and policy formulation, while distinguishing between primary and secondary data sources. Additionally, it covers various methods of data collection, including personal interviews, mailing surveys, and telephone interviews, along with the concepts of census and sampling.

Uploaded by

Ujjwal Mittal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 87

STATISTICS FOR ECONOMICS

CHAPTER-1 INTRODUCTION
WHY DO WE NEED ECONOMICS?
Human wants are unlimited but resources are limited.To
make a balance we need a subject called
economics.Economics seeks to understand and address the
problem of scarcity
Alfred Marshall (one of the founders of modern economics)
called“the study of man in the ordinary business of life”.
Scarcity is the root of all economic problems. Had there been no scarcity, there would have
been no economic problem.We face scarcity because the things that satisfy our wants are
limited in availability.
Consumption, Production and Distribution
Economics involves the study of man engaged in economic activities of various kinds. For this,
you need to know reliable facts about all the diverse economic activities like production,
consumption and distribution.
We want to know how the consumer decides, given his income and many alternative goods to
choose from, what to buy when he knows the prices. This is the study of Consumption.
We also want to know how the producer, similarly, chooses what and how to produce for the
market. This is the study of Production.
Finally, we want to know how the national income or the total income arising from what has
been produced in the country (called the Gross Domestic Product or GDP) is distributed
through salaries,profits and interest.This is the study of Distribution.
“Economics is the study of how people and society choose to employ scarce resources that
could have alternative uses in order to produce various commodities that satisfy their wants
and to distribute them for consumption among various persons and groups in society.”
STATISTICS IN ECONOMICS
Studies required that we know more about economic facts. Such economic facts are also
known as economic data. The purpose of collecting data about these economic problems is to
understand and explain these problems in terms of the various causes behind them. In other
words, we try to analyse them.
For example, when we analyse the hardships of poverty, we try to explain it in terms of the
various factors such as unemployment, low productivity of people, backward technology, etc.
1
But, what purpose does the analysis of poverty serve unless we are able to find ways to
mitigate it.
We may, therefore, also try to find those measures that help solve an economic problem. In
Economics, such measures are known as policies.
WHAT IS STATISTICS?
Statistics deals with the collection, analysis, interpretation and presentation of numerical data.
It is a branch of mathematics and also used in the disciplines such as accounting, economics,
management, physics, finance, psychology and sociology.
For example, a statement in Economics like “the production of rice in India has increased from
39.58 million tonnes in 1974–75 to 106.5 million tonnes in 2013–14, is a quantitative data. In
addition to quantitative data, Economics also uses qualitative data.
For example- ‘gender’ that distinguishes a person as man/woman or boy/girl. It is often
possible to state the information about an attribute of a person in terms of degrees (like
better/ worse; sick/ healthy/ more healthy; unskilled/ skilled/ highly skilled, etc.). Such
qualitative information or statistics is often used in Economics
The next step is to present the data in tabular, diagrammatic and graphic forms. The data,
then, are summarised by calculating various numerical indices, such as mean, variance,
standard deviation, etc.,
WHAT STATISTICS DOES?
1.Statistics is an indispensable tool for an economist that helps him to understand an
economic problem. Using its various methods, effort is made to find the causes behind it with
the help of qualitative and quantitative facts of an economic problem. Once the causes of the
problem are identified, it is easier to formulate certain policies to tackle it.
2.Exact facts are more convincing than vague statements.
For instance, saying that with precise figures, 310 people died in the recent earthquake in
Kashmir, is more factual and, thus, a statistical data. Whereas, saying hundreds of people died,
is not.
3.Statistics also helps in condensing mass data into a few numerical measures (such as mean,
variance etc., about which you will learn later). These numerical measures help to summarise
data.
For example, it would be impossible for you to remember the incomes of all the people in a
data if the number of people is very large. Yet, one can remember easily a summary figure like
the average income that is obtained statistically.

2
4.In this way, Statistics summarises and presents a meaningful overall information about a
mass of data.
5. Quite often, Statistics is used in finding relationships between different economic factors.

Q1. Mark the following statements as true or false.

(i) Statistics can only deal with quantitative data.


False

(ii) Statistics solves economic problems.


True

(iii) Statistics is of no use to Economics without data.


True

Q2. Make a list of activities that constitute the ordinary business of life. Are these economic
activities?
Answer.The activities that constitute the ordinary business of life are:
→ Buying of goods and services.
→ Rendering services to a company by employees and workers.
→ Selling of goods and services.
Yes, the above mentioned activities are regarded as economic activities as it involve the
exchange of money to earn livelihood.

Q3. 'The Government and policy makers use statistical data to formulate suitable policies of
economic development'. Illustrate with two examples.
Answer.The statistical data is important for Government and policy makers to formulate
suitable policies of economic development. It not only helps in analysing and evaluate the
outcomes of the past policies but also assist them to take corrective measures and to
formulate new policies accordingly. It is clear from examples -
(i) It can be ascertained easily by using statistical techniques whether the policy of family
planning is effective in checking the problem of rapidly growing population.
(ii) In preparing annual government budget, previous data of government expenditures and
government revenues are taken into consideration for estimating the allocation of funds
among various projects.

Q4. "You have unlimited wants and limited resources to satisfy them." Explain by giving two
examples.

3
Answer.Every individual have unlimited wants but the resources for satisfying the wants are
limited. Scarcity is the root of all economic problems. Had there been no scarcity, there would
have been no economic problem. This can be understood by examples -
(i) A children pocket money is a limited so he/she have to choose only those things that you
want the most. You can't purchase almost all the things you wants.
(ii) A land available should be put in use either in agricultural or industrial. We can't use same
land for both activities.
Q5. How will you choose the wants to be satisfied?
Answer.Any individual fulfills his/her wants according to his/her needs, satisfactions and
priority attached to different wants. Moreover, the choice of want also depends on the need
of the hour and availability of the goods and also on the availability of means (money) to
purchase that want.

Q6. What are your reasons for studying Economics?


Answer.The reasons for studying economics are:
1.To study the Theory of consumption: We want to know how the consumer decides, given his
income and many alternative goods to choose from, what to buy when he knows the prices.
2.To study the Theory of Production: We also want to know how the producer, similarly,
chooses what to produce for the market when he knows the costs and prices.
3.To study the Theory of Distribution: We want to know how the national income or the total
income arising from what has been produced in the country is distributed through wages (and
salaries), profits and interest.
4.The study of economics also helps us to understand and analyse the root cause of basic
problems faced by an economy like, poverty, unemployment, income disparity, etc. and
helps to take various corrective measures.

Q7. Statistical methods are no substitute for common sense. Comment.


Answer.This is true that Statistical methods are no substitute for common sense. Statistical
data should not be believed blindly as it can be misinterpreted or misused. The statistical data
may involve personal bias or may undergone manipulations. Also, statistical data and methods
fail to reveal the errors committed by an investigator while surveying and collecting data. This
can be understood by a story.
It is said that a family of four persons (husband, wife and two children) once set out to cross a
river. The father knew the average depth of the river. So he calculated the average height of
his family members. Since the average height of his family members was greater than the
average depth of the river, he thought they could cross safely. Consequently some members
of the family (children) drowned while crossing the river. Thus, the common sense must be
used while applying statistical methods.

4
CHAPTER-2 COLLECTION OF DATA
In this chapter should enable you to:
• understand the meaning and purpose of data collection;
• distinguish between primary and secondary sources;
• know the mode of collection of data;
• distinguish between Census and Sample Surveys;
• be familiar with the techniques of sampling;
• know about some important sources of secondary data.
The purpose of collection of data is to show evidence for reaching a sound and clear
solution to a problem.
WHAT ARE THE SOURCES OF DATA?
Statistical data can be obtained from two sources.
The researcher may collect the data by conducting an enquiry. Such data are called Primary
Data
Suppose, you want to know about the popularity of a filmstar among school students. For this,
you will have to enquire from a large number of school students, by asking questions from
them to collect the desired information. The data you get, is an example of primary data.
If the data have been collected and processed by some other agency, they are called
Secondary Data.
HOW DO WE COLLECT THE DATA?
Preparation of Instrument
The most common type of instrument used in surveys is questionnaire/ interview schedule.
The questionnaire is either self-administered by the respondent or administered by the
researcher (enumerator) or trained investigator.
While preparing the questionnaire/interview schedule, you should keep in mind the following
points;
• The questionnaire should not be too long. The number of questions should be as minimum
as possible.
• The questionnnaire should be easy to understand and avoid ambiguous or difficult words.
5
• The questions should be arranged in an order such that the person answering should feel
comfortable.
• The series of questions should move from general to specific.
There are three basic ways of collecting data:
1.Personal Interviews,
2.Mailing (questionnaire) Surveys
3.Telephone Interviews.

1.Personal Interviews
Face-to-face interviews with the respondents.Personal contact is made between the
respondent and the interviewer.
Advantages
1. Opportunity of explaining the study and answering the queries of respondents.
2.The interviewer can request the respondent to expand on answers that are particularly
important.
3.Mis-interpretation and misunderstanding can be avoided. Watching the reactions of
respondents can provide supplementary information.
Disadvantages
It is expensive, as it requires trained interviewers. It takes longer time to complete the survey.
Presence of the researcher may inhibit respondents from saying what they really think.
2.Mailing Questionnaire
When the data in a survey are collected by mail, the questionnaire is sent to each individual by
mail with a request to complete and return it by a given date.
Advantages
1.It is less expensive.
2.It allows the researcher to have access to people in remote areas too
3. It does not allow influencing of the respondents by the interviewer.
4.It also permits the respondents to take sufficient time to give thoughtful answers to the
questions.
6
These days online surveys or surveys through short messaging service, i.e., SMS are popular.
Disadvantages
1.Less opportunity to provide assistance in clarifying instructions, so there is a possibility of
misunderstanding the questions.
2. Low response rates due to certain factors, such as returning the questionnaire without
completing it, not returning the questionnaire at all, loss of questionnaire in the mail itself, etc.
3.Telephone Interviews
In a telephone interview, the investigator asks questions over the telephone.
Advantages
1.Cheaper than personal interviews
2.Can be conducted in a shorter time.
3.They allow the researcher to assist the respondent by clarifying the questions.
4.Better in cases where the respondents are reluctant to answer certain questions in personal
interviews.
Disadvantages
1.Many people may not own telephones.

7
Pilot Survey
Once the questionnaire is ready, it is advisable to conduct a try-out with a small group which is
known as Pilot Survey or Pre-testing of the questionnaire.
1.The pilot survey helps in providing a preliminary idea about the survey.
2.It helps in pre-testing of the questionnaire, so as to know the shortcomings and drawbacks
of the questions.
3.Helps in assessing the suitability of questions, clarity of instructions, performance of
enumerators and the cost and time involved in the actual survey.
CENSUS AND SAMPLE SURVEYS
Census or Complete Enumeration
A survey, which includes every element of the population, is known as Census or the Method
of Complete Enumeration. If certain agencies are interested in studying the total population in
India, they have to obtain information from all the households in rural and urban India. It is
carried out every ten years.
Population and Sample
Population or the Universe in statistics means totality of the items under study.
Once the population is identified, the researcher selects a method of studying it. If the
researcher finds that survey of the whole population is not possible, then he/ she may decide
to select a Representative Sample.
Sample
Refers to a group or section of the population from which information is to be obtained. A
good sample (representative sample) is generally smaller than the population and is capable of
providing reasonably accurate information about the population at a much lower cost and
shorter time.
Suppose you want to study the average income of people in a certain region. According to the
Census method, you would be required to find out the income of every individual in the
region, add them up and divide by number of individuals to get the average income of people
in the region. This method would require huge expenditure, as a large number of enumerators
have to be employed.
Alternatively, you select a representative sample, of a few individuals, from the region and find
out their income. The average income of the selected group of individuals is used as an
estimate of average income of the individuals of the entire region.
Random Sampling
8
As the name suggests, random sampling is one where the individual units from the population
(samples) are selected at random.This is also called lottery method
The government wants to determine the impact of the rise in petrol price on the household
budget of a particular locality. For this, a representative (random) sample of 30 households
has to be taken and studied. The names of all 300 households of that area are written on
paper and mixed, then 30 names to be interviewed are selected one by one.
In random sampling, every individual has an equal chance of being selected. In the above
example, all 300 sampling units (also called sampling frame) of the population got an equal
chance of being included in the sample of 30 units and hence the sample, such drawn, is a
random sample.
Exit Polls You must have seen that when an election takes place, the television networks
provide election coverage. They also try to predict the results. This is done through exit polls,
wherein a random sample of voters who exit the polling booths are asked whom they voted
for. From the data of the sample of voters, the prediction is made. You might have noticed
that exit polls do not always predict correctly. Why? Using the Random Number Tables, how
will you select your sample years?
Non-Random Sampling
There may be a situation that you have to select 10 out of 100 households in a locality. You
have to decide which household to select and which to reject. You may select the households
conveniently situated or the households known to you or your friend. In this case, you are
using your judgement (bias) in selecting 10 households. This way of selecting 10 out of 100
households is not a random selection. In a non-random sampling method all the units of the
population do not have an equal chance of being selected and convenience or judgement of
the investigator plays an important role in selection of the sample. They are mainly selected
on the basis of judgment, purpose, convenience or quota and are nonrandom samples.
SAMPLING AND NON-SAMPLING ERRORS
Sampling Errors
Sampling error refers to the difference between the sample estimate and the corresponding
population parameter
It is possible to reduce the magnitude of sampling error by taking a larger sample.
Thus, the difference between the actual value of a parameter of the population and its
estimate is the sampling error.
Example-Consider a case of incomes of 5 farmers of Manipur. The variable x (income of
farmers) has measure-ments 500, 550, 600, 650, 700.

9
We note that the population average of (500+550+600+650+700) ÷ 5 = 3000 ÷ 5 = 600.
Now, suppose we select a sample of two individuals where x has measurements of 500 and
600. The sample average is (500 + 600) ÷ 2 = 1100 ÷ 2 = 550.
Here, the sampling error of the estimate = 600 (true value) – 550 (estimate) = 50.
Non-Sampling Errors
Non-sampling errors are more serious than sampling errors because a sampling error can be
minimised by taking a larger sample. It is difficult to minimise non-sampling error, even by
taking a large sample.
Some of the non-sampling errors are:
Sampling Bias-Sampling bias occurs when the sampling plan is such that some members of the
target population could not possibly be included in the sample.
Non-Response Errors -Non-response occurs if an interviewer is unable to contact a person
listed in the sample or a person from the sample refuses to respond. In this case, the sample
observation may not be representative.
Errors in Data Acquisition This type of error arises from recording of incorrect responses.
Suppose, the teacher asks the students to measure the length of the teacher’s table in the
classroom. The measurement by the students may differ. The differences may occur due to
differences in measuring tape, carelessness of the students, etc.
Similarly, suppose, we want to collect data on prices of oranges. We know that prices vary
from shop to shop and from market to market. Prices also vary according to the quality.
Therefore, we can only consider the average prices. Recording mistakes can also take place as
the enumerators or the respondents may commit errors in recording or transscripting the
data, for example, he/ she may record 13 instead of 31.
CENSUS OF INDIA AND NSSO
There are some agencies both at the national and state level to collect,
Some of the agencies at the national level are Census of India,
1.National Sample Survey (NSS),
Conduct nationwide surveys on socio-economic issue.
NSS provides periodic estimates of literacy, school enrolment, utilisation of educational
services, employment, unemployment, manufacturing and service sector enterprises,
morbidity, maternity, child care, utilisation of the public distribution system etc

10
2.Central Statistics Office (CSO),
3.Registrar General of India (RGI),
4.Directorate General of Commercial Intelligence and Statistics (DGCIS),
5.Labour Bureau, etc.
The Census of India provides the most complete and continuous demographic record of
population.
The Census is being regularly conducted every ten years since 1881. The first Census after
Independence was conducted in 1951. The Census officials collect information on various
aspects of population such as the size, density, sex ratio, literacy, migration, rural-urban
distribution, etc.

Q1. Frame at least four appropriate multiple-choice options for following questions:
(i) Which of the following is the most important when you buy a new dress?
Answer (a) Colour (b) Price (c) Brand (d) Quality of cloth

(ii) How often do you use computers?


Answer
(a) Everyday
(b) 6 times a week
(c) 4 times a week
(d) 2 times a week

(iii) Which of the following newspaper/s do you read regularly?


Answer
(a) The Times o India
(b) The Hindu
(c) Indian Express
(d) Any other

(iv) Rise in the price of petrol is justified.


Answer
(a) Yes (b) No (c) Don't Know (d) None of the above

11
(v) What is the monthly income of your family?

Answer-(a) Less than Rs 10,000 (b) Rs 10,000 to Rs 20,000 (c) Rs 20,000 to Rs 30,000(d) More
than Rs 30,000

Q2. Frame five two-way questions (with 'Yes' or 'No').

Answer
(i) Do you own car?
(ii) Do you smoke?
(iii) Do you own two-wheeler?
(iv) Have you visited any foreign country?
(v) Are you satisfied with your present income?

Q3.(i) There are many sources of data (true/false).


Answer- False

(ii) Telephone survey is the most suitable method of collecting data, when the population is
literate and spread over a large area (true/false).
Answer-False

(iii) Data collected by investigator is called the secondary data (true/false).


Answer-False

(iv) There is a certain bias involved in the non-random selection of samples (true/false).
Answer-True

(v) Non-sampling errors can be minimised by taking large samples (true/ false).
Answer-False

4. What do you think about the following questions. Do you find any problem with these
questions? If yes, how?
(i) How far do you live from the closest market?
Answer-The question is not clear. The question can't clarify how to show distance.

(ii) If plastic bags are only 5 percent of our garbage, should it be banned?
Answer-The question is too long which discourages people to answer also it gives a clue about
how the respondent should answer..
12
(iii) Wouldn't you be opposed to increase in price of petrol?
Answer-The question contains two negatives which creates confusion to the respondents and
may lead to biased response.

(iv) (a) Do you agree with the use of chemical fertilisers?


(b) Do you use fertilisers in your fields?
(c) What is the yield per hectare in your field?

Answer-The order of question is incorrect. First, general questions should be asked then
specific. The correct order should be:
(i) What is the yield per hectare in your field?
(ii) Do you use fertilisers in your fields?
(iii) Do you agree with the use of chemical fertilisers?

Q5. You want to research on the popularity of Vegetable Atta Noodles among children.
Design a suitable questionnaire for collecting this information.

Answer-QUESTIONNAIRE

Name: ........................
Age: ..........
Sex: ☐ Male ☐ Female

1. Do you eat Noodles?


☐ Yes ☐ No

2. Do you like Vegetable Atta Noodles more than other snacks?


☐ Yes ☐ No

3. How many packets do you consume in one month?


☐ Less than 2 ☐ Less than 5 ☐ More than 5

4. Do you prefer Atta noodles over Maida noodles?


☐ Yes ☐ No

5. Which vegetable according to you should be added in present Atta noodles?


...................................................................

6. When do you prefer to have Vegetable Atta Noodles?


☐ Breakfast ☐ Lunch ☐ Evening Snacks ☐ Dinner
13
7. Do your parents accompany you while having noodles?
☐ Yes ☐ No

Q6. In a village of 200 farms, a study was conducted to find the cropping pattern. Out of the
50 farms surveyed, 50% grew only wheat. Identify the population and the sample here.

Answer-Population or the Universe in statistics means totality of the items under study. So,
the population here is 200 farms.
Sample refers to a group or section of the population from which information is to be
obtained. Out of 200 farms, only 50 farms are selected for survey. Therefore, the sample
population is 50 farms.

Q7. Give two examples each of sample, population and variable.

Answer-Example 1: A study was conducted to know the average income of people in a village.
The total number of person was 750. Out of these, 70 villagers selected and their average
income was recorded. So, in this example:
(i) Population is the number of total villagers which is equal to 750.
(ii) Sample is the 70 villagers whose average income was recorded.
(iii) Variable under study is the income of the villagers.

Example 2: In order to study the to record the level of sugar in the blood, blood sample of
1000 people was taken from 10,000 people. So, in this example
(i) Population is the total number of people i.e., 10,000.
(ii) Sample is the 1000 people.
(iii) Variable is the sugar level.

Q8. Which of the following methods give better results and why?
(a) Census
(b) Sample

Answer-Sample Method gives better results than the Census Method as:
→ Less time consuming: It requires a lot of time to conduct census as evry record have to
obtain while sample can be done in lesser time.
→ Economically feasible: The cost of approaching each individual unit for interrogation and
collection of data is comparatively lower due to small size of sample.

14
→ Accuracy- Although census method provides more accurate and reliable results as
compared to the sample method but in the sample method the errors can be easily located
and rectified in the sampling methods due to the smaller number of items.

→ Lesser Non-sampling Errors- The probability of Non-sampling Errors is also low as the
sample size is smaller as compared to that of the Census Method.

Q9. Which of the following errors is more serious and why?(a) Sampling error (b) Non-
Sampling error

Answer-Non-sampling errors are more serious than sampling errors because a sampling error
can be minimised by taking a larger sample. It is difficult to minimise non-sampling error, even
by taking a large sample as it use of faulty means of collection of data.

Q10. Suppose there are 10 students in your class. You want to select three out of them. How
many samples are possible?

Answer-We have to use combinations to determine the number of samples which are
possible. The formula for the number of such combination is
nCr = n!/(n-r)!r! where n! = n(n-1)(n-2)(n-3).....(3)(2)(1) (Note: 0! = 1)
Therefore the answer will be 10C3 = (10 × 9 × 8)/(3 × 2 × 1) = 720/6 = 120
Number of samples possible = 120

Q11. Discuss how you would use the lottery method to select 3 students out of 10 in your
class?

Answer-Make ten paper slips with name of each student of equal size. Now, there are ten
cards available. Mix them well. Now draw three slips at random without replacement one by
one. By this method we can select three students.
Q12. Does the lottery method always give you a random sample? Explain.

Answer-Yes, the lottery method always gives a random sample if it is used in the proper
manner without any bias. In a random sample, each individual unit has an equal chance of
getting selected. Similarly, in a lottery method, each individual unit is selected at random from
the population and thereby has equal opportunity of getting selected.

Q13. Explain the procedure of selecting a random sample of 3 students out of 10 in your
class, by using random number tables.

15
Answer-For selecting a random sample of 3 students out of 10 by random number tables we
consult one digit random numbers and we will skip random numbers greater than value 10 as
it the largest serial number. We have other 9 one digit numbers. Thus, the 3 selected students
out of 10 are with serial numbers 5,9,2.

Q14. Do samples provide better results than surveys? Give reasons for your answer.
Answer-Sample gives provide better results than surveys because
→ A sample can provide reasonably reliable and accurate information at a lower cost and
shorter time.
→ As samples are smaller than population, more detailed information can be collected by
conducting intensive enquiries.
→ Sample need a smaller team of enumerators, it is easier to train them and supervise their
work more effectively.

16
CHAPTER-3 ORGANISATION OF DATA

RAW DATA
The unclassified data or raw data are highly
disorganised. They are often very large and
cumbersome to handle. To draw meaningful
conclusions from them is a tedious task.Therefore
proper organisation and presentation of such data is
needed before any systematic statistical analysis is
undertaken. Hence after collecting data the next
step is to organise and present them in a
classified form.
Suppose you want to know the performance of
students in mathematics and you have collected
data on marks in mathematics of 100 students
of your school.This data is useless unless it is
organised
Now this data is making some sense

CLASSIFICATION OF DATA
1.Chronological Data
Raw data is classified in various ways depending on
the purpose. They can be grouped according to
time. Such a classification is known as a
Chronological Data or Time-Series Data
In such a classification, data are classified either in
ascending or in descending order with reference to
time such as years, quarters, months, weeks, etc.
Time Series as it depicts a series of values for
different years.
2.Spatial Classification Data are classified with
reference to geographical locations such as
countries, states, cities, districts, etc.
3.Qualitative Data
17
Sometimes you come across characteristics that cannot be expressed quantitatively. Such
characteristics are called Qualities or Attributes.
For example, nationality, literacy, religion, gender, marital status, etc. They cannot be
measured. Such a classification of data on attributes is called a Qualitative Classification.
In the following example, we find population of a country is grouped on the basis of the
qualitative variable “gender”. An observation could either be a male or a female. These two
characteristics could be further classified on the
basis of marital status.
4.Quantitative Data
Characteristics, like height, weight, age, income,
marks of students, etc., are quantitative in nature.
When the collected data of such characteristics
are grouped into classes, it becomes a
Quantitative Classification

VARIABLES: CONTINUOUS AND DISCRETE


Continuous variable
Can take any numerical value. It may take
integral values (1, 2, 3, 4, ...), fractional
values (1/2, 2/3, 3/4, ...), and values that
are not exact fractions ( 2 =1.414, 3 =1.732,
…, 7 =2.645).
Examples of a continuous variable are
weight, time, distance, etc.
Discrete variable
Can take only certain values. Its value changes only by finite’
It “jumps” from one value to another but does not take any intermediate value between them.
For example-a variable like the “number of students in a class”, for different classes, would
assume values that are only whole numbers. It cannot take any fractional value like 0.5
because “half of a student” is absurd.

18
It cannot take a value like 25.5 between 25 and 26. Instead its value could have been either 25
or 26. What we observe is that as its value changes from 25 to 26, the values in between them
— the fractions are not taken by it.

WHAT IS A FREQUENCY DISTRIBUTION?


A frequency distribution is a comprehensive way to
classify raw data of a quantitative variable. It shows
how different values of a variable are distributed in
different classes along with their corresponding class
frequencies.
In this case we have ten classes of marks: 0–10, 10–
20, … , 90–100.
The term Class Frequency means the number of
values in a particular class.
For example, in the class 30– 40 we find 7 values of marks from raw data
Each class in a frequency distribution table is bounded by Class Limits. Class limits are the two
ends of a class.
The lowest value is called the Lower Class Limit and the highest value the Upper Class Limit.
For example, the class limits for the class: 60–70 are 60 and 70. Its lower class limit is 60 and
its upper class limit is 70.
Class Interval or Class Width is the difference between the upper class limit and the lower
class limit.
For the class 60–70, the class interval is 10 (upper
class limit minus lower class limit).
Class Mid-Point
or Class Mark is the middle value of a class. It lies
halfway between the lower class limit and the
upper class limit of a class and can be ascertained
in the following manner:
Class Mid-Point or Class Mark = (Upper Class Limit
+ Lower Class Limit)/2
Frequency Curve

19
Is a graphic representation of a frequency distribution.
we plot the class marks on the X-axis and frequency on the Y axis.
How to prepare a Frequency Distribution?
While preparing a frequency distribution, the following five questions need to be addressed:
1. Should we have equal or unequal sized class intervals?
2. How many classes should we have?
3. What should be the size of each class?
4. How should we determine the class limits?
5. How should we get the frequency for each class?
Should we have equal or unequal sized class intervals?
There are two situations in which unequal sized intervals are used.
1.When we have data on income and other similar variables where the range is very high.
For example, income per day may range from nearly Zero to many hundred crores of rupees.
In such a situation, equal class intervals are not suitable because
1.If the class intervals are of moderate size and equal, there would be a large number of
classes.
2. If class intervals are large, we would tend to suppress information on either very small levels
or very high levels of income.
3.If a large number of values are concentrated in a small part of the range, equal class intervals
would lead to lack of information on many values.
In all other cases, equal sized class intervals are used in frequency distributions.
How many classes should we have?
The number of classes is usually between six and fifteen.
In case, we are using equal sized class intervals then number of classes can be the calculated
by dividing the range (the difference between the largest and the smallest values of variable)
by the size of the class intervals.
What should be the size of each class?
We can determine the number of classes once we decide the class interval. Thus, we find that
these two decisions are interlinked. We cannot decide on one without deciding on the other.
20
How should we determine the class limits?
Class limits should be definite and clearly stated. Generally, open-ended classes such as “70
and over” or “less than 10” are not desirable. The lower and upper class limits should be
determined in such a manner that frequencies of each class tend to concentrate in the middle
of the class intervals.

Class intervals are of two types:


1.Inclusive class intervals: Values equal to the lower
and upper limits of a class are included in the
frequency of that same class. No overlapping
2. Exclusive class intervals: An item equal to either
the upper or the lower class limit is excluded from the
frequency of that class. In the case of discrete
variables, both exclusive and inclusive class intervals
can be used.Overlapping is done
In the case of continuous variables, inclusive class
intervals are used very often.

Finding class frequency by tally marking


A tally (/) is put against a class for
each student whose marks are
included in that class.
For example, if the marks
obtained by a student are 57, we
put a tally (/) against class 50 –60.
If the marks are 71, a tally is put
against the class 70–80. If
someone obtains 40 marks, a tally
is put against the class 40–50.
The counting of tally is made
easier when four of them are put
as //// and the fifth tally is placed across them as

21
Tallies are then counted as groups of five. So if there are 16 tallies in a class, we put them as
/ for the sake of convenience. Thus frequency in a class is equal to the number
of tallies against that class

Frequency distribution with unequal classes


By now you are familiar with frequency
distributions of equal class intervals. You know
how they are constructed out of raw data. But
in some cases frequency distributions with
unequal class intervals are more appropriate.
If you observe the frequency distribution in
this table, you will notice that most of the
observations are concentrated in classes 40–50, 50–60 and 60–70.
Their respective frequencies are 21, 23 and 19. It means that out of 100 students, 63
(21+23+19) students are concentrated in these classes.
Thus, 63 per cent are in the middle range of 40-70. The remaining 37 per cent of data are in
classes 0–10, 10–20, 20–30, 30–40, 70–80, 80–90 and 90–100.
These classes are sparsely populated with observations.
Further you will also notice that observations in these classes deviate more from their
respective class marks than in
comparison to those in other
classes. But if classes are to be
formed in such a way that class
marks coincide, as far as
possible, to a value around
which the observations in a
class tend to concentrate, then
unequal class interval is more
appropriate.
This Table shows the same
frequency distribution of
previous table in terms of
unequal classes.
Each of the classes 40– 50, 50–60 and 60–70 are split into two class 40–50 is divided into 40–
45 and 45– 50.
22
The class 50–60 is divided into 50– 55 and 55–60.
And class 60–70 is divided into 60–65 and 65–70.
The new classes 40–45, 45–50, 50–55, 55–60, 60–65 and 65–70 have class interval of 5.
The other classes: 0–10, 10–20, 20–30, 30–40, 70– 80, 80–90 and 90–100 retain their old class
interval of 10. The last column of this table shows the new values of class marks for these
classes.
Compare them with the old values of class marks in Table 3.6. Notice that the observations in
these classes deviated more from their old class mark
values than their new class mark values. Thus the new
class mark values are more representative of the data
in these classes than the old values.
Figure 3.2 shows the frequency curve of the
distribution in Table 3.7. The class marks of the table
are plotted on X-axis and the frequencies are plotted
on Y-axis.

Frequency array
For a discrete variable, the classification of its data is
known as a Frequency Array.
Since a discrete variable takes values and not
intermediate/fractional values between two integral
values,
This table illustrates a Frequency Array
The variable “size of the household” is a discrete
variable that only takes integral values as shown in the
table.

BIVARIATE FREQUENCY DISTRIBUTION

23
A Bivariate Frequency Distribution can be defined as the frequency distribution of two
variables.
For example-we have taken
sample of 20 companies from
the list of companies based in
a city. Suppose that we collect
information on sales and
expenditure on
advertisements from each
company. In this case, we
have bivariate sample data.
Such bivariate data can be
summarised using a Bivariate
Frequency Distribution.
This Table shows the frequency distribution of two variables, sales and advertisement
expenditure (in Rs. lakhs) of 20 companies.
For example, there are 3 firms whose sales are between Rs 135 and Rs145 lakh and their
advertisement expenditures are between Rs 64 and Rs 66 thousand.
EXERCISE

1. Which of the following alternatives is true?

(i) The class midpoint is equal to:


(a) The average of the upper class limit and the lower class limit
(b) The product of upper class limit and the lower class limit

(c) The ratio of the upper class limit and the lower class limit
(d) None of the above
Answe:(a) The average of the upper class limit and the lower class limit.

(ii) The frequency distribution of two variables is known as


(a) Univariate Distribution
(b) Bivariate Distribution
(c) Multivariate Distribution
(d) None of the above
Answer:(b) Bivariate Distribution

24
(iii) Statistical calculations in classified data are based on
(a) the actual values of observations
(b) the upper class limits
(c) the lower class limits
(d) the class midpoints
Answer (d) the class midpoints

(iv) Under Exclusive method,


(a) the upper class limit of a class is excluded in the class interval
(b) the upper class limit of a class is included in the class interval

(c) the lower class limit of a class is excluded in the class interval

(d) the lower class limit of a class is included in the class interval
Answer (a) the upper class limit of a class is excluded in the class interval

(v) Range is the


(a) difference between the largest and the smallest observations
(b) difference between the smallest and the largest observations

(c) average of the largest and the smallest observations


(d) ratio of the largest to the smallest observation
Answer:(a) difference between the largest and the smallest observations

2. Can there be any advantage in classifying things? Explain with an example from your daily
life.
Answer:-Yes, there are many advantages of classifying things. These are:

1.It saves our time and energy by making easy to locate a specific data.

2.It facilitates the analysis, tabulation and interpretation.


3.It makes data comparable.
4.It is also easy to summarise.
For example: We make specific notebook for each subject.

3. What is a variable? Distinguish between a discrete and a continuous variable.


Answer:A characteristic, number, or quantity whose value changes overtime is called variable.
For example: weight, income etc. It can be either discrete or continuous.

25
Discrete Variable Continuous Variable

• A variable that takes only whole number


as its value is called discrete variable. • A variable that can take any value, within a
reasonable limit is called a continuous variable.
• These variables increase in jumps or in
complete numbers. • These variables assume a range of values or
increase in fractions and not in jumps.
• For example- Number of people in a
family, number of students in a class, etc. • For example- age, height, weight, etc.

Q4. Explain the 'exclusive' and 'inclusive' methods used in classification of data.
Answer:-Exclusive method: The classes, by this method, are formed in such a way that the
upper class limit of one class equals the lower class limit of the next class for example, 0-10,
10-20, and so on . Thus, the continuity of the data is maintained. The upper class limit is
excluded but the lower class limit of a class is included in the interval. This method is most
appropriate for data of continuous variables.
Inclusive method: This method does not exclude the upper class limit in a class interval. It
includes the upper class in a class. Thus both class limits are parts of the class interval for
example, 1-5, 6-10, 11-15 and so on. The interval 1-5 includes both the limits i.e. 1 and 5.

Q5. Use the data in Table 3.2 that relate to monthly household expenditure (in Rs) on food
of 50 households and obtain the range of
monthly household expenditure on food.

(i) Obtain the range of monthly household


expenditure on food.
Answer:-Range = Highest Value - Lowest Value
Highest Value = 5090
Lowest Value = 1007
So, Range = 5090 - 1007 = 4083

(ii) Divide the range into appropriate number of


class intervals and obtain the frequency distribution of expenditure.
26
Answer

(iii) Find the number of households whose monthly expenditure on food is

(a) less than Rs 2000

(b) more than Rs 3000


c) between Rs 1500 and Rs 2500
Answer:-(a) Number of households whose monthly expenditure on food is less than Rs 2000
= 20 + 13 = 33
(b) Number of households whose monthly expenditure on food is more than Rs 3000=
2+1+2+0+1 = 6

(c) Number of households whose monthly expenditure on food is between Rs 1500 and Rs
2500 = 13 + 6 = 19

Q 6. In a city 45 families were surveyed for the number of domestic appliances they used.
Prepare a frequency array based on their replies as recorded below.

Answer

27
No. of Domestic No. of
Appliances Households

0 1

1 7

2 15

3 12

4 5

5 2

6 2

7 1

Total 45

Q 7. What is 'loss of information' in classified data?


Answer:-The classified data summarises the raw data making it concise and comprehensible, it
does not show the details that are found in raw data. Once the data are grouped into classes,
an individual observation has no significance in further statistical calculations. Further, the
statistical calculations are based on the values of the class marks, ignoring the exact
observations of the data leading to the problem of loss of information.

Q 8. Do you agree that classified data is better than raw data?

28
Answer:-The raw data are usually large an fragmented, it is very difficult to draw any
meaningful conclusion from them. Classification makes the raw data comprehensible by
surprising them into groups. When facts of similar characteristics are placed in the same class,
it enables one to locate them easily, make comparison, and draw inferences without any
difficulty. Therefore, classified data is better than raw data
Q 9. Distinguish between Univariate and Bivariate frequency distribution.
Answer :-The frequency distribution of a single variable is called a Univariate Distribution.
Income of people, marks scored by students, etc. are examples of Univariate Distribution.
The frequency distribution of two variables is called Bivariate distribution. Sales and
advertisement expenditure, weight and height of individuals, etc. are examples of Bivariate
distribution.

Q 10. Prepare a frequency distribution by inclusive method taking class interval of 7 from
the following data:

Answer

29
CHAPTER-4 PRESENTATION OF DATA

INTRODUCTION - As data are generally voluminous, they need to be put in a compact and
presentable form. There are generally three forms of presentation of data:
1.Textual or Descriptive presentation
2.Tabular presentation
3.Diagrammatic presentation.
1.TEXTUAL PRESENTATION OF DATA-In textual presentation, data are described within the
text. When the quantity of data is not too large this form of presentation is more suitable.
EXAMPLE- In a bandh call given on 08 September 2005 protesting the hike in prices of petrol
and diesel, 5 petrol pumps were found open and 17 were closed whereas 2 schools were
closed and remaining 9 schools were found open in a town of Bihar.
2.TABULAR PRESENTATION OF DATA -In a tabular presentation, data are presented in rows
(read horizontally ) and columns (read vertically).
For example -Tabulating information about literacy rates. It has three rows (for male, female
and total) and three columns (for urban, rural and total). It is called a 3 × 3 Table giving 9 items
of information in 9 boxes called the "cells" of the Table.
Classification used in tabulation is of four kinds:
(A).Qualitative
(B).Quantitative
(C).Temporal
(D).Spatial
(A).Qualitative Classification
When classification is done according to attributes,
such as social status, physical status, nationality, etc.,
it is called qualitative classification.
(B).Quantitative Classification
In quantitative classification, the data are classified on
the basis of characteristics which are quantitative in
nature.

30
For example-Age, height, production, income, etc are
quantitative characteristics.

(C).Temporal Classification
In this classification time becomes the classifying
variable and data are categorised according to time.
Time may be in hours, days, weeks, months, years,
etc.
(D).Spatial Classification
When classification is done on the basis of place, it is
called spatial classification. The place may be a
village, block, district, state, country, etc.

TABULATION OF DATA AND PARTS OF A TABLE

31
3.DIAGRAMMATIC PRESENTATION OF DATA
This is the third method of presenting data. This method provides the quickest understanding
of the actual situation to be explained by data.It translates quite effectively the highly abstract
ideas contained in numbers into more concrete and easily comprehensible form.
Three types of diagram
(I) Geometric diagram
(II) Frequency diagram
(III) Arithmetic line graph
(I) Geometric Diagram Bar diagram and pie diagram come in the category of geometric
diagram. The bar diagrams are of three types —
(A).Simple,
(B).Multiple
(C).Component bar diagrams.
(A)Simple Bar Diagram -Comprises a group of equal-space and equal-width rectangular bars .
Height of the bar reads the magnitude of data.

32
The lower end of the bar touches the base line such that the height of a bar starts from the
zero unit.
Bars of a bar diagram can be visually compared by their relative height and accordingly data
are comprehended quickly.
Data for this can be of frequency or non-frequency type.

(B).Multiple Bar Diagram- Used


for comparing two or more sets of
data,
For example-income and
expenditure or import and export
for different years, marks
obtained in different subjects in
different classes, etc.

(C).Component Bar Diagram

33
Are very useful in comparing the sizes of different component parts
Component bar diagrams are usually shaded or
coloured suitably.

Pie
Diagram
A pie diagram is also a component diagram, but unlike a bar diagram, here it is a circle whose
area is proportionally divided among the components. The circle is divided into as many parts
as there are components by drawing straight lines from the centre to the circumference.
Pie charts usually are not drawn with absolute values of a category. The values of each
category are first expressed as percentage of the total value of all the categories. A circle in a
pie chart, irrespective of its value of radius, is thought of having 100 equal parts of 3.6°
(360°/100) each. To find out the angle, the component shall subtend at the centre of the
circle, each percentage figure of every component is multiplied by 3.6°.

(II)Frequency Diagram Data


Generally represented by frequency
diagrams like
(A).Histogram,
(B).Frequency Polygon
(C).Frequency Curve

34
(D).Ogive
(A.)Histogram
A histogram is a two dimensional diagram.
If the class intervals are of equal width (which they generally are) the area of the rectangles
are proportional to their respective frequencies.
Since histograms are rectangles, a line parallel to the base line and of the same magnitude is
to be drawn at a vertical distance equal to frequency .
Since, for countinuous variables, the lower class boundary of a class interval fuses with the
upper class boundary of the previous interval, equal or unequal.
If the classes are not-continuous they are first converted into continuous classes.
A histogram looks similar to a bar diagram. But there are more differences than similarities
In histogram no space is left between two rectangles, but in a bar diagram some space must
be left between consecutive bars.
(B).Frequency Polygon
Frequency polygon is an
alternative to histogram and
is also derived from histogram
itself.
The simplest method of
drawing a frequency polygon
is to join the midpoints of the
topside of the consecutive
rectangles of the histogram.
Broken lines or dots may join
the two ends with the base line.
Frequency polygon is the most
common method of presenting
grouped frequency distribution.
Both class boundaries and class-
marks can be used along the X-
axis, the distances between two
consecutive class marks being
proportional/equal to the width
of the class intervals.
35
Frequency Curve
The frequency curve is obtained by drawing a smooth freehand curve passing through the
points of the frequency polygon as closely as possible. It may not necessarily pass through all
the points of the frequency polygon but it passes through them as closely as possible
Ogive
Ogive is also called cumulative frequency curve. As there are two types of cumulative
frequencies,
for example ‘‘less than’’ type and ‘‘more than’’ type, accordingly there are two ogives for any
grouped frequency distribution data.
For ‘‘less than’’ ogive the cumulative frequencies are plotted against the respective upper
limits of the class intervals whereas for more than ogives the cumulative frequencies are
plotted against the respective lower limits of the class interval.

Arithmetic Line Graph


An arithmetic line graph is also called time series graph. In this graph, time is plotted along x-
axis and the value of the variable along y-axis.
It helps in understanding the trend, periodicity, etc., in a long term time series data

36
Here you can see from Fig. 4.9 that for the period 1993-94 to 2013-14, the imports were more
than the exports all through the period.
You may notice the value of both exports and imports rising rapidy after 2001-02. Also the gap
between the two (imports and exports) has widened after 2001-02.

Answer: the following questions, 1 to 10, choosing the correct answer.

Question 1.Bar diagram is a


(a) one-dimensional diagram
(b) two-dimensional diagram
(c) diagram with no dimension
(d) None of these

Answer:(a) Bar diagrams are one-dimensional diagrams. Though these are represented on a
plane of two axis in form of rectangular bars, the width is of no consequence and only the
length depicts the frequency.
Question 2.Data represented through a histogram can help in finding graphically the
(a) mean
(b) mode
(c) median
(d) All of these
Answer:(b) Histogram gives value of mode of the frequency distribution graphically through
the highest rectangle.

37
Question 3.Ogives can be helpful in locating graphically the
(a) mode
(b) mean
(c) median
(d) None of these
Answer:(c) Intersection point of the less than and more than ogives gives the median.
Question 4.Data represented through arithmetic line graph help in understanding
(a) long term trend
(b) cyclicity in data
(c) seasonality in data
(d) All of the above
Answer:(a) Arithmetic line graph helps in understanding the trend, periodicity, etc in a long
term time series data.
Question 5.Width of bars in a bar diagram need not be equal. (True/False)
Answer:False
Bar diagram comprises a group of equispaced and equiwidth rectangular bars for each class or
category of data.
Question 6.Width of rectangles in a histogram should essentially be equal. (True/False)
Answer:False
If the class intervals are of equal width, the area of the rectangles are proportional to their
respective frequencies and width of rectangles will be equal. However, sometimes it is
convenient or necessary to use varying width of class intervals and hence unequal width of
rectangles.
Question 7.Histogram can only be formed with continuous classification of data. (True/False)
Answer:True
A histogram is never drawn for a discrete variable/data. If the classes are not continuous they
are first converted into continuous classes.
Question 8.Histogram and column diagram are the same method of presentation of data.
(True/False)
Answer:False
Histogram is a two dimensional diagram drawn for continuous data and the rectangles do not
have spaces in between while column diagram is one dimensional with space in between
every column (bar).
Question 9.Mode of a frequency distribution can be known graphically with the help of
histogram. (True/False)
Answer:True
Histogram gives value of mode of the frequency distribution graphically through the highest
rectangle.
38
Question 10.Median of a frequency distribution cannot be known from the ogives.
(True/False)
Answer:False
Intersection-point of the less than and more than ogives gives the median.

Question 11.What kind of diagrams are more effective in representing the following?
(a) Monthly rainfall in a year
(b) Composition of the population of Delhi by religion
(c) Components of cost in a factory

Answer:(a) The monthly rainfall in a year can be best represented by a bar diagram as only one
variable i.e., monthly rainfall is to be presented diagrammatically. The rainfall is plotted on Y-
axis in the corresponding month that is plotted on the X-axis.

(b) Composition of the population of Delhi by religion can be represented by a component bar
diagram. A component bar diagram shows the bar and its sub-divisions into two or more
components. Thus, the total population can be sub divided in terms of religion and presented
through a component bar diagram.

(c) Different components of cost in a factory can most effectively be depicted through a pie
chart. The circle represents the total cost and various components of costs are shown by
different portions of the circle drawn according to percentage of total cost each component
covers.
Question 12.Suppose you want to emphasise the increase in the share of urban non-workers
and lower level of urbanisation in India as shown in Example 4.2. How would you do it in the
tabular form?

Answer:Share of urban workers and non workers in India

Question 13.How does the procedure of drawing a histogram differ when class intervals are
unequal in comparison to equal class intervals in a frequency table?
Answer:A histogram is a set of rectangles with bases as the intervals between class boundaries
39
(along X-axis) and with areas proportional to the class frequency. If the class intervals are of
equal width, the area of the rectangles are proportional to their respective frequencies.

However, sometimes it is convenient or at times necessary, to use varying width of class


intervals. For graphical representation of such data, height for area of a rectangle is the
quotient of height i.e., frequency and base i.e., width of the class interval. When intervals are
equal, all rectangles have the same base and area can conveniently be represented by the
frequency of the interval.

But, when bases vary in their width, the heights of rectangles are to be adjusted to yield
comparable measurements by dividing class frequency by width of the class interval instead of
absolute frequency. This gives us the frequency density for the purpose of comparison.
Thus Frequency density/( Height of rectangle )= Class Frequency / Width of the class interval
Question 14.The Indian Sugar Mills Association reported that, ‘sugar production during the
first fortnight of December, 2001 was about 3,87,000 tonnes, as against 3,78,000 tonnes
during the same fortnight last year (2000). The off-take of sugar from factories during the first
fortnight of December, 2001 was 2,83,000 tonnes for internal consumption and 41,000 tonnes
for exports as against 1,54,000 tonnes for internal consumption and nil for exports during the
same fortnight last season.’

(i) Present the data in tabular form.


(ii) Suppose you were to present these data in diagrammatic form which of the diagrams
would you use and why?
(iii) Present these data diagrammatically.

Answer:
(i) Data in tabular form.
Sugar Production in India

40
(ii) The data can effectively be presented diagrammatically using the multiple bar diagram. This
is because multiple bar diagrams are used for comparing two or more sets of data for different
years or classes, etc.

Question 15.
The following table shows the estimated sectoral real growth rates (percentage change over
the previous year) in GDP at factor cost.

Represent the data as multiple time-series graphs.

41
42
CHAPTER-5 MEASURES OF CENTRAL TENDENCY

There are several statistical measures of central tendency or “averages”. The three most
commonly used averages are: •
1.Arithmetic Mean 2.Median 3.Mode

ARITHMETIC MEAN
Suppose the monthly income (in Rs) of six families is given as: 1600, 1500, 1400, 1525, 1625,
1630.
The mean family income is obtained by adding up the incomes and dividing by the number of
families. = = Rs 1,547 It implies that on an average, a family earns Rs 1,547.
Arithmetic mean is the most commonly used measure of central tendency. It is defined as the
sum of the values of all observations divided by the number of observations and is usually
denoted by x̄.
In general, if there are N observations as X1 , X2 , X3 , ..., XN , then the Arithmetic Mean is
given by
X 1+ X 2+ X 3+… … … … .. X N
x̄ = N
N

the right hand side can be written as ∑ Xi


i=1
N

Here, i is an index

Thus x̄ =
∑ X where ∑ X =sum of all observations
N

And N=Total number of Observation


How Arithmetic Mean is Calculated
1. Arithmetic Mean for Ungrouped Data.
2. Arithmetic Mean for Grouped Data.
1.Arithmetic Mean for Series of Ungrouped Data
a.Direct Method

43
Arithmetic mean by direct method is the sum of all observations in a series divided by the total
number of observations.
Example 1 Calculate Arithmetic Mean from the data showing marks of students in a class in an
economics test: 40, 50, 55, 78, 58.

x̄ =
∑ X = 40+50+55+ 78+58 =56.2
N 5

The average mark of students in the economics test is 56.2.


b.Assumed Mean Method
If the number of observations in the data is more or figures are large, it is difficult to calculate
arithmetic mean by direct method.
The calculation can be made easier by using assumed mean method.
Here you assume a particular figure in the data as the arithmetic mean on the basis of logic.
Then you may take deviations of the said assumed mean from each of the observation. You
can, then, take the summation of these deviations and divide it by the number of observations
in the data.
The actual arithmetic mean is estimated by taking the sum of the assumed mean and the ratio
of sum of deviations to number of observations.

x̄ =A+
∑d
N

A=assumed mean
X=individual observations
N=Total number of observation
d=deviation of assumed mean from individual observation. i.e d=X-A
Example 2 The following data shows the weekly income of 10 families.
Family A B C D E F G H I J
Weekly Income (in Rs) 850 700 100 750 5000 80 420 2500 400 360
Compute mean family income.
Computation of Arithmetic Mean by Assumed Mean Method

44
Familie Income d=X- d’=(X-
s 850 850)/10
A 850 0 0
B 700 -150 -15
C 100 -750 -75
D 750 -100 -10
E 5000 4150 415
F 80 -770 -77
G 420 -430 -43
H 2500 1650 165
I 400 -450 -45
J 360 -490 -49
11160 2660 266

x̄ =A+
∑ d =850+ 2660 =Rs 1116
N 10

Thus, the average weekly income of a family by both methods is Rs 1,116.


You can check this by using the direct method.
Step Deviation Method
The calculations can be further simplified by dividing all the deviations taken from assumed
mean by the common factor ‘c’.
The objective is to avoid large numerical figures, i.e., if d = X – A is very large, then find d'.
This can be done as follows:
d X− A
d’= c = C

The formula is given below:

x̄ = A +
∑ d ' ×c
N
X− A
where d' = C

c = common factor,

45
N = number of observations,
A= Assumed mean.
Thus, you can calculate the arithmetic mean in the example 2, by the step deviation method,
266
X= 850+ 10 ×10 = Rs 1,116

2.Calculation of arithmetic mean for Grouped data


(i)Discrete Series
(a) Direct Method
In case of discrete series, frequency against each observation is multiplied by the value of the
observation.
The values, so obtained, are summed up and divided by the total number of frequencies.
∑ fX
x̄ =
∑f
where,∑ fX =sum of product of variables and
frequencies.
∑ f = sum of frequencies
Example:- Plots in a housing colony come in only
three sizes100 sq. metre, 200 sq. meters and 300
sq. metre and the number of plots are
respectively 200 50 and 10.

Assumed Mean Method -As in case of individual series the calculations can be simplified by
using assumed mean method, as described earlier, with a simple modification. Since frequency
(f) of each item is given here, we multiply each deviation (d) by the frequency to get fd.
Then we get Σ fd. The next step is to get the total of all frequencies i.e. Σ f. Then find out Σ fd/Σ
f.
∑ fd
Finally, the arithmetic mean is calculated by x̄ = A + using assumed mean method.
∑f
Step Deviation Method In this case, the deviations are divided by the common factor ‘c’
which simplifies the calculation.
46
d X− A
Here we estimate d ‘ = c
= c

in order to reduce the size of numerical figures for easier calculation.


Then get fd' and Σ fd'. The formula for arithmetic mean using step deviation method is given
as,
∑ fd ' × c
x̄ = A +
∑f
Continuous Series -Here, class intervals are given. The process of calculating arithmetic mean
in case of continuous series is same as that of a discrete series.
The only difference is that the mid-points of various class intervals are taken. We have already
known that class intervals may be exclusive or inclusive or of unequal size.
Example of exclusive class interval is, say, 0–10, 10–20 and so on.
Example of inclusive class interval is, say, 0–9, 10–19 and so on.
Example of unequal class interval is, say, 0–20, 20–50 and so on.
In all these cases, calculation of arithmetic mean is done in a similar way.
Example 4 Calculate average marks of the following students using (a) Direct method (b) Step
deviation method.
Direct Method
Marks 0–10 10–20 20–30 30–40 40–50 50–60
60–70
No. of Students 5 12 15 25 8 3
2

Mark No of Mid- f×m d’= fd’


s (x) Students Value m−35
(f) (m) 10
0-10 5 5 25 -3 -15
10-20 12 15 180 -2 -24
20-30 15 25 375 -1 -15
30-40 25 35 875 0 0
40-50 8 45 360 1 8
50-60 3 55 165 2 6
60-70 2 65 130 3 6
47
70 2110 -34

Steps: 1. Obtain mid values for each class denoted by m.


2. Obtain Σ fm and apply the direct method formula:
∑ fm 2110
x̄ = = = 30.14 Marks
∑f 70

Step deviation method


m− A
1. Obtain d'= c
2. Take A=35
C= Common Factor

∑ fd ' × c (−34 )
x̄ = A + = 35 + ×10 = 30.14 marks
∑f 70

Two interesting properties of A.M.


(i) the sum of deviations of items about arithmetic mean is always equal to zero.
Symbolically, Σ ( X – X ) = 0.
(ii) arithmetic mean is affected by extreme values. Any large value, on either end, can
push it up or down.

Weighted Arithmetic Mean


Sometimes it is important to assign weights to various items according to their importance
when you calculate the arithmetic mean.
For example, there are two commodities, mangoes and potatoes. You are interested in finding
P 1+ P 2
the average price of mangoes (P1 ) and potatoes (P2 ). The arithmetic mean will be 2

However, you might want to give more importance to the rise in price of potatoes (P2 ).
To do this, you may use as ‘weights’ the share of mangoes in the budget of the consumer
(W1 ) and the share of potatoes in the budget (W2 ).
W 1 P 1+W 2 P 2
Now the arithmetic mean weighted by the shares in the budget would be W 1+W 2

48
In general the weighted arithmetic mean is given by,
W 1 X 1+W 2 X 2+… … … … … … … … … … .+ WnXn ∑ wx
=
w 1+w 2+ … … … … … … … …+Wn ∑w

Median
Median is that positional value of the variable which divides the distribution into two equal
parts, one part comprises all values greater than or equal to the median value and the other
comprises all values less than or equal to it.
The Median is the “middle” element when the data set is arranged in order of the magnitude.
Since the median is determined by the position of different values, it remains unaffected if,
say, the size of the largest value increases.
Computation of median
The median can be easily computed by sorting the data from smallest to largest and finding
out the middle value.
Example 5 Suppose we have the following observation in a data set:
5, 7, 6, 1, 8, 10, 12, 4, and 3. Arranging the data, in ascending order you have: 1, 3, 4, 5, 6, 7, 8,
10, 12.
The “middle score” is 6, so the median is 6.
Half of the scores are larger than 6 and half of the scores are smaller. If there are even
numbers in the data, there will be two observations which fall in the middle.
The median in this case is computed as the arithmetic mean of the two middle values.
Example 6 The following data provides marks of 20 students. You are required to calculate the
median marks.
25, 72, 28, 65, 29, 60, 30, 54, 32, 53, 33, 52, 35, 51, 42, 48, 45, 47, 46, 33.
Arranging the data in an ascending order, you get
25, 28, 29, 30, 32, 33, 33, 35, 42, 45, 46, 47, 48, 51, 52, 53, 54, 60, 65, 72.
You can see that there are two observations in the middle, 45 and 46.
49
The median can be obtained by taking the mean of the two observations:
45+ 46
Median = 2
=45.5 Marks

In order to calculate median it is important to know the position of the median i.e. item at
which the median lies.
The position of the median can be calculated by the following formula:
( N +1 ) th
Position of median = items
2

Where N = number of items


Median is computed by the formula:
( N +1 ) th
Median = size of item.
2

Discrete Series
( N +1 ) th
In case of discrete series the position of median i.e items can be located through
2
cumulative frequency.
The corresponding value at this position is the value of median.
Computation of Median for Discrete Series
Example 7 The frequency distributsion of the number of persons and their respective incomes
(in Rs) are given below. Calculate the median income.
Income No of Cumulative
Income (in Rs): 10 20 30 40 (Rs) Persons frequency
Number of persons: 2 4 10 4 (f) (c f)
10 2 2
The median is located in the (N+1)/ 2 = (20+1)/2 = 10.5th 20 4 6
observation. This can be easily located through cumulative 30 10 16
frequency. The 10.5th observation lies in the c.f. of 16. The 40 4 20
income corresponding to this is Rs 30, so the median
income is Rs 30.

Continuous Series
In case of continuous series you have to locate the median class where N/2th item [not
(N+1)/2th item] lies. The median can then be obtained as follows:

50
N
−c . f .
Median = L + 2 ×h
f

Where, L = lower limit of the median class,


c.f. = cumulative frequency of the class preceding the median class,
f = frequency of the median class,
h = magnitude of the median class interval.
No adjustment is required if frequency is of unequal size or magnitude.

Example 8 Following data relates to daily wages of persons working in a factory. Compute the
median daily wage.
Daily Wages in Rs – 55-60 50-55 45-50 40-45 35-40 30-35 25-30 20-25
No of Workers 7 13 15 20 30 33 28 14
N
−c . f .
Median =L + 2 ×h
f Daily No of Cumulative
wages Workers Frequency
(f) c.f
=35+(80-75)/30*(40-35) 20-25 14 14
=Rs 35.83 25-30 28 42
30-35 33 75
Thus, the median daily wage is Rs 35.83. This means that 35-40 30 105
50% of the workers are getting less than or equal to Rs 40-45 20 125
35.83 and 50% of the workers are getting more than or 45-50 15 140
equal to this wage. 50-55 13 153
You should remember that median, as a measure of 55-60 7 160
central tendency, is not sensitive to all the
values in the series. It concentrates on the
values of the central items of the data.
Quartiles
Quartiles are the measures which divide
the data into four equal parts, each portion contains equal number of observations. There are
three quartiles.

51
The first Quartile (denoted by Q1 ) or lower quartile has 25% of the items of the distribution
below it and 75% of the items are greater than it.
The second Quartile (denoted by Q2 ) or median has 50% of items below it and 50% of the
observations above it.
The third Quartile (denoted by Q3 ) or upper Quartile has 75% of the items of the distribution
below it and 25% of the items above it. Thus, Q1 and Q3 denote the two limits within which
central 50% of the data lies.
Calculation of Quartiles The method for locating the Quartile is same as that of the median in
case of individual and discrete series. The value of Q1 and Q3 of an ordered series can be
obtained by the following formula where N is the number of observations.
( N +1)th 3(N +1)th
Q1= Size of item Q3= Size of item
4 4

Example:-Calculate the value of lower quartile from the data of the marks obtained by ten
students in an examination. 22, 26, 14, 30, 18, 11, 35, 41, 12, 32.
Arranging the data in an ascending order, 11, 12, 14, 18, 22, 26, 30, 32, 35, 41.
( N +1 ) th ( 10+1 ) th
Q1 = size of item = size of item = size of 2.75 th item
4 4

=2nd item + 0.75 (3rd item -2nd item )


=12 +0.75 (14 -12) = 13.5 Marks
MODE
Sometimes, you may be interested in knowing the most typical value of a series or the value
around which maximum concentration of items occurs.
For example, a manufacturer would like to know the size of shoes that has maximum demand
or style of the shirt that is more frequently demanded. Here, Mode is the most appropriate
measure. The word mode has been derived from the French word “la Mode” which signifies
the most fashionable values of a distribution, because it is repeated the highest number of
times in the series. Mode is the most frequently observed data value. It is denoted by Mo.

Computation of Mode
Discrete Series
Consider the data set 1, 2, 3, 4, 4, 5. The mode for this data is 4 because 4 occurs most
frequently (twice) in the data.

52
Example 10 Look at the following discrete series:
Variable 10 20 30 40 50
Frequency 2 8 20 10 5
Here, as you can see the maximum frequency is 20, the value of mode is 30.
In this case, as there is a unique value of mode, the data is unimodal.
But, the mode is not necessarily unique, unlike arithmetic mean and median. You can have
data with two modes (bi-modal) or more than two modes (multi-modal).
It may be possible that there may be no mode if no value appears more frequent than any
other value in the distribution. For example, in a series 1, 1, 2, 2, 3, 3, 4, 4, there is no mode.
Continuous Series
In case of continuous frequency distribution, modal class is the class with largest frequency.
Mode can be calculated by using the formula:
D1
M0 = L+ D1+ D 2 × h

Where L = lower limit of the modal class


D1 = difference between the frequency of the modal class and the frequency of the class
preceding the modal class (ignoring signs).
D2 = difference between the frequency of the modal class and Income Cumulative
the frequency of the class succeeding the modal class (ignoring per month Frequency
signs). (in
h = class interval of the distribution. thousand)
Less than 97
You may note that in case of continuous series, class intervals 50
should be equal and series should be exclusive to calculate the Less than 95
mode. 45
If mid points are given, class intervals are to be obtained. Less than 90
50
Example 11 Calculate the value of modal worker family’s Less than 80
monthly income from the following data: 50
Less than 60
Less than cumulative frequency distribution of income per
50
month (in ’000 Rs)
Less than 30
50
Less than 12
50
53
Less than 4
50
As you can see this is a case of cumulative frequency distribution. In order to calculate mode,
you will have to convert it into an exclusive series.
In this example, the series is in the descending order. This table should be converted into an
ordinary frequency table to determine the modal class.

The value of the mode lies in 25–30 class interval.


By inspection also, it can be seen that this is a modal class.
Now L = 25, D1 = (30 – 18) = 12, D2 = (30 – 20) = 10, h = 5
Using the formula, you can obtain the value of the mode as: MO (in ’000 Rs)
D1
M0=L + D1+ D 2 × h

12
=25 + 12+ 10 ×5 = 27.273

Thus the modal worker family’s monthly income is Rs 27.273.

Question /Answer
Question 1.
Which average would be suitable in the following cases?
(i) Average size of readymade garments.
(ii) Average intelligence of students in a class.
(iii) Average production in a factory per shift.
(iv) Average wages in an industrial concern.
(v) When the sum of absolute deviations from average is least.
(vi) When quantities of the variable are in ratios.
(vii) In case of open-ended frequency distribution.

Answer:
(i) Mode Average size of any ready made garments should be the size for which demand is the
54
maximum. Hence, the modal value which represents the value with the highest frequency
should be taken as the average size to be produced.

(ii) Median It is the value that divides the series into two equal parts. Therefore, Median will
be the best measure for calculating the average intelligence of students in a class as it will give
the average intelligence such that there are equal number of students above and below this
average. It will not be affected by extreme values.

(iii) Arithmetic Mean The average production in a factory per shift is best calculated by
Arithmetic Mean as it will capture all types of fluctuations in production during the shifts.

(iv) Arithmetic Mean Arithmetic Mean will be the most suitable measure. It is calculated by
dividing the sum of wages of all the workers by the total number of workers in the industrial
concern. It gives a fair idea of average wage bill taking into account all the workers.

(v) Arithmetic Mean The algebraic sum of the deviations of values about Arithmetic Mean is
zero. Hence, when the sum of absolute deviations from average is the least, then mean could
be used to calculate the average.

(vi) Median Median will be the most suitable measure in case the variables are in ratios as it is
least affected by the extreme values.

(vii) Median Median is the most suitable measure as it can be easily computed even in case of
open ended frequency distribution and will not get affected by extreme values.

Question 2.
Indicate the most appropriate alternative from the multiple choices provided against each
question.
(i) The most suitable average for qualitative measurement is
(a) Arithmetic mean
(b) Median
(c) Mode
(d) Geometric mean
(e) None of these
Answer:
(b) Median is the most suitable average for qualitative measurement because Median divides
55
a series in two equal parts thus representing the average qualitative measure without being
affected by extreme values.

(ii) Which average is affected most by the presence of extreme items?


(a) Median
(b) Mode
(c) Arithmetic Mean
(d) Geometric Mean
(e) Harmonic Mean
Answer:
(c) It is defined as the sum of the values of all observations divided by the number of
observations and therefore it is. affected the most by extreme values.

(iii) The algebraic sum of deviation of a set of n values from AM is


(a) n
(b) 0
(c) 1
(d) None of these
Answer:
(b) This is one of the mathematical properties of arithmetic mean that the algebraic sum of
deviation of a set of n values from AM is zero.

Question 3.
Comment whether the following statements are true or false.
(i) The sum of deviation of items from median is zero.
(ii) An average alone is not enough to compare series.
(iii) Arithmetic mean is a positional value.
(iv) Upper quartile is the lowest value of top 25% of items.
(v) Median is unduly affected by extreme observations.
Answer:
(i) False
This mathematical property applies to the arithmetic mean and not to median.
(ii) True
Average is not enough to compare the series as it does not explain the extent of deviation of
different items from the central tendency and the difference in the frequency of values. These
are measured by measures of dispersion and kurtosis.
(iii) False
Median is a positional value.
(iv) True
The upper quartile also called the third quartile, has 75 % of the items below it and 25 % of
items above it.
56
(v) False
Arithmetic mean is unduly affected by extreme observations.

Question 4.
If the arithmetic mean of the data given below is 28, find (a) the missing frequency and (b) the
median of the series

Answer:
(a) Let the missing frequency br f1.
Arithmetic Mean = 28

or 2240 -2100 = 35f1 = 28f1


or 140 = 7f1
f1 = 20
Hence, the missing frequency is 20.
(b)

So, the Median class = Size of (N2)th item = 50th term.


57
50th item lies in the 57th cumulative frequency and the corresponding class interval is 20-30.

Question 5.
The following table gives the daily income of ten workers in a factory. Find the arithmetic
mean.

Answer:

N = 10
X¯¯¯¯=ΣXN=240010=240
Arithmetic Mean = ₹ 240
Question 6.
Following information pertains to the daily income of 150 families. Calculate the arithmetic
mean.

58
Answer:

Question 7.The size of land holdings of 380 families in a village is given below. Find the median
size of land holdings.

Answer:

So, the median class = Size of (N2) th item = 190 item


190th lies in the 129 th cumulative frequency and the corresponding class interval is 200-300.

Median size of land holdings = 241.22 acres


59
Question 8.The following series relates to the daily income of workers employed in a firm.
Compute (a) highest income of lowest 50% workers, (b) minimum income earned by the top
25% workers and (c) maximum income earned by lowest 25% workers.

Answer:

(a) Highest income of lowest 50% workers will be given by the median. Σf = N = 65
Median class = Size of (N2)th item = Size of (652)th item=325 th item
32.5th item lies in the 50th cumulative frequency and the corresponding class interval is 24.5 –
29.5.

(b) Minimum income earned by top 25% workers will be given by the lower quartile Q 1.
Class interval of Q1 = (N4)th item
= (654)th item = 1625th item
16.25th item lies in the 30th cumulative frequency and the corresponding class interval is 19.5
– 24.5

(c) Maximum income earned by lowest 25% workers will be given by the upper quartile Q 3.
60
Class interval of Q3 = (N4)th item
= 3(654)th item
= 3 × 1625th item
= 48.75th item
48.75th item lines in 50th item and the corresponding class interval is 24.5-29.5.

Question 9.
The following table gives production yield in kg per hectare of wheat of 150 farms in a village.
Calculate the mean, median and mode production yield.

Answer:
(i) Mean

61
(ii) Median

62
(iii) Mode
Grouping Table

63
Analysis Table

64
Chapter -6 Correlation

It is a statistical method or a statistical technique that measures quantitative relationship


between different variables, like between price and demand.
According to Croxton and Cowden, “When the relationship is of a quantitative nature, the
appropriate statistical tool for discovering and measuring the relationship and expressing it in
a brief formula is known as correlation.”

Types of Correlation
Correlation is commonly classified into negative and positive correlation.

 Positive Correlation When two variables move in the same direction, such a relation is
called positive correlation, e.g., Relationship between price and supply
 Negative Correlation When two variables changes in different directions, it is called
negative correlation. Relationship between price and demand.

Degree of Correlation
Degree of correlation refers to the coefficient of correlation

(ii) Absence of Correlation


(iii) Limited Degree of correlation
The degree of correlation between 0 and 1 may be rated as

 High (0.75 and 1)


 Moderate (0.25 and 0.75)
 Low (0 and 0.25)

65
Methods of Estimating Correlation
(i) Scatter Diagram Scattered diagram offers a graphic expression of the direction and
degree of correlation.

Karl Pearson’s Coefficient of Correlation


This is also known as product moment correlation and simple correlation coefficient.
Karl Pearson has given a quantitative method of calculating correlation Karl Pearson’s
coefficient correlation is generally written as V.
Formula According to Karl Pearson’s method, the coefficient of correlation is measured as
r=ΣxyNσxσy
Where,
r = Coefficient of correlation;
x = x – x¯
y= y – y¯

σx = Standard deviation of x series


σy = Standard deviation of y series
N= Number of observations
If there is no need to calculate standard deviation
of x and y directly using the following formula
r=ΣxyΣx2×Σy2√
Here, x(x – x¯¯¯), y = (y – y¯¯¯)

Short-cut Method
This method is used when mean value is not in
whole number but in fractions. In this method,
deviation is calculated by taking the assumed mean
both the series.

Coefficient of correlation is calculated using the following formula

Here, dx = deviation of x series from the assumed mean = (x – A)


dy = deviation of y series from the assumed mean = (y – A)
Σ dxdy – sum of the multiple of dx and dy
Σ dx2 = sum of square of dx
66
Σ dy2 = sum of square of dy
Σdx= sum of deviation of x-series
Σdy = sum of deviation of y-series
N = Total number of items
Step Deviation Method
Coefficient of correlation is calculated using the following formula

Spearman’s Rank Correlation Coefficient


In 1904, ‘Charles Edwards Spearman’ developed a formula to calculate coefficient correlation
of qualitative variables. It is popularly known as Spearman’s rank. Difference formula or
method.

Coefficient of Rank Correlation when Ranks are Equal formula

Here, m = number of items of equal ranks.

Importance or Significance of Correlation

 The study of correlation shows the direction and degree of relationship between the
variables.
 Correlation coefficient some times suggests cause and effect relationship.
 Correlation analysis facilitates business decisions because the trend path of one variable
may suggest the expected changes in the other.
 Correlation analysis also helps policy formulation.

Question 1.The unit of correlation coefficient between height in feet and weight in kgs is
(a) kg/feet
(b) percentage
(c) non-existent

67
Answer:(c) Correlation coefficient (r) has no unit. It is a pure number. It meansss units of
measurement are not part of r.

Question 2.The range of simple correlation coefficient is


(a) 0 to infinity
(b) minus one to plus one
(c) minus infinity to infinity
Answer:(b) The value of the correlation coefficient lies between minus one and plus one, -1 ≤ r
≤ 1. If the value of r is outside this range it indicates error in calculation.

Question 3.If rXY is positive the relation between X and Y is of the type
(a) when Y increases X increases
(b) when Y decreases X increases
(c) when Y increases X does not change
Answer:(a) If r is positive the two variables move in the same direction. e.g., when the price of
coffee rises, the demand for tea also rises as coffee is a substitute of tea. Therefore, the r
between price of coffee and demand for tea will be positive.

Question 4.If rXY = 0, the variable X and Y are


(a) linearly related
(b) not linearly related
(c) independent
Answer:(b) If rXY = 0, it means the two variables are uncorrelated and there is no linear relation
between them. However, other types of relation may be there and they may not be
independent.

Question 5.Of the following three measures which can measure any type of relationship?
(a) Karl Pearson’s coefficient of correlation
(b) Spearman’s rank correlation
(c) Scatter diagram
Answer:(c) The scatter diagram gives a visual presentation of the relationship and is not
confined to linear relations. Karl Pearson’s coefficient of correlation and Spearman’s rank
correlation are strictly the measures of linear relationship.

Question 6.If precisely measured data are available the simple correlation coefficient is
(a) more accurate than rank correlation coefficient
(b) less accurate than rank correlation coefficient

(c) as accurate as the rank correlation coefficient


Answer:

68
(a) Rank correlation should be used only when the variables cannot be measured precisely,
generally it is not as accurate as the simple correlation coefficient as all the information
concerning the data is not utilised in this.

Question 7.Why is r preferred to covariance as a measure of association?


Answer:Both, correlation coefficient and covariance measure the degree of linear relationship
between two variables, but correlation coefficient is generally preferred to covariance due to
the following reasons

 The correlation coefficient (r) has no unit.


 The correlation coefficient is independent of origin as well as scale.

Question 8.Can r lie outside the -1 and 1 range depending on the type of data?
Answer:No the value of the correlation coefficient lies between minus one and plus one, -1 ≤ r
≤ 1. If the value of r is outside this range in any type of data, it indicates error in calculation.

Question 9.Does correlation imply causation?


Answer:No, correlation measures do not imply causation. Correlation measures co-variation
and not causation.
Correlation does not imply cause and effect relation. The knowledge of correlation only gives
us an idea of the direction and intensity of change in a variable when the correlated variable
changes. The presence of correlation between two variables X and Y simply means that when
the value of one variable is found to change in one direction, the value of the other variable is
found to change either in the same direction (i.epositive change) or in the opposite direction
(i.e., negative change), in a definite way.

Question 10.When is rank correlation more precise than simple correlation coefficient?
Answer:Rank correlation is more precise than simple correlation coefficient in the following
situations

 When the Measurements of the Variables are Suspect e.g., in a remote village where
measuring rods or weighing scales are not available, height and weight of people cannot
be measured precisely but the people can be easily ranked in terms of height and weight.
 When Data is Qualitative It is difficult to quantify qualities such as fairness, honesty etc.
Ranking may be a better alternative to quantification of qualities.
 When Data has Extreme Values Sometimes the correlation coefficient between two
variables with extreme values may be quite different from the coefficient without the
extreme values. Under these circumstances rank correlation provides a better alternative
to simple correlation.

69
Question 11.Does zero correlation mean independence?
Answer:No, zero correlation does not mean independence. If there is zero correlation (r XY = 0),
it means the two variables are uncorrelated and there is no linear relation between them.
However, other types of relation may be there and they may not be independent.

Question 12.Can simple correlation coefficient measure any type of relationship?


Answer:No, simple correlation coefficient can measure only linear relationship.

Question 13.List some variables where accurate measurement is difficult.


Answer:Accurate measurement is difficult in case of

 Qualitative variables such as beauty, intelligence, honesty, etc.


 It is also difficult to measure subjective variables such as poverty, development, etc
which are interpreted differently by different people.

Question 14.Interpret the values of r as 1, -1 and 0.


Answer:

 If r = 0 the two variables are uncorrelated. There is no linear relation between them.
However, other types of relation may be there and hence the variables may not be
independent.
 If r= 1 the correlation is perfectly positive. The relation between them is exact in the
sense that if one increases, the other also increases in the same proportion and if one
decreases, the other also decreases in the same proportion.
 If r = -1 the correlation is perfectly negative. The relation between them is exact in the
sense that if one increases, the other decreases in the same proportion and if one
decreases, the other increases in the same proportion.

Question 15.Why does rank correlation coefficient differ from Pearsonian correlation
coefficient?
Answer:Rank correlation coefficient differs from Pearsonian correlation coefficient in the
following ways

 Rank correlation coefficient is generally lower or equal to Karl Pearson’s coefficient.


 Rank correlation coefficient is preferred to measure the correlation between qualitative
variables as these variables cannot be measured precisely.
 The rank correlation coefficient uses ranks instead of the full set of observations that
leads to some loss of information.
 If extreme values are present in the data, then the rank correlation coefficient is more
precise and reliable.

70
Question 16.Calculate the correlation coefficient between the heights of fathers in inches (X)
and their sons (Y).

Answer:

Note Answer: printed in NCERT is incorrect.

Question 17.Calculate the correlation coefficient between X and Y and comment on their
relationship.

71
Answer:

As the value of r is zero, so there is no linear correlation between X and Y.

Question 18.Calculate the correlation coefficient between X and Y and comment on their
relationship.

Solution

72
As the correlation coefficient between the two variables is + 1, so the two variables are
perfectly positive correlated.

73
Chapter 7 Index Number

An index number is a statistical device for measuring changes in the magnitude of a group of
related variables. It represents the general trend of diverging ratios from which it is calculated.
According to Croxton and Cowden, “Index numbers are devices for measuring difference in the
magnitude of a group of related variables.”

Methods of Constructing Index Numbers

Construction of Simple Index Numbers


There are two methods of constructing simple index numbers.
(i) Simple Aggregative Method In this method, we use the following formula

P01=ΣP1ΣP0×100
Here, P01 = Price index of current year
ΣP1 = Sum of prices of the commodities in the current year
ΣP0 = Sum of prices of the commodities in the base year
(ii) Simple Average of Price Relatives Method
According to this method, we first find out price relatives from each commodity and then take
simple average of all the prices relatives.
Price relatives, P01 = Current year price (P1) Base year price (P0)×100

We can find out price index number of the current year by using the following formula

P01=∑[P1P0×100]N

74
Construction of Weighted Index Numbers
(i) Weighted Average of Price Relative Method
According to this method, weighted sum of the price relatives is divided by the sum total of
the weight. In this method, goods are given weight according to their quantity, thus
P01=ΣRWΣW
Here, P01 = Index number for the current year in relation to the base year
W = weight
R = price relative
(ii) Weighted Aggregative Method Under this method, different goods are accorded weight
according to the quantity bought therefore, suggested different techniques of weighting some
of well known methods are as under

Fisher’s Method is considered as ‘Ideal’ because

 It is based on variable weights.


 It takes into consideration the price and quantities of both the base year and current
year.
 It is based on Geometric Mean (GM) which is regarded as the best mean for calculating
index number.
 Fisher’s index number satisfies both the Time Reversal Test and Factor Reversal Test.

Consumer Price Index or Cost of Living Index Number


The consumer price index is the index number which measures the averages change in prices
paid by the specific class of consumers for goods and services consumed by them in the
current year in comparison with base year.

Construction of Consumer Price Index

 Selection of the consumer class


 Information about the family budget
 Choice of base year
 Information about prices
 Weightage – There are two ways of according weights

 Quantity weight
 Expenditure weight
75
The following formula is used to find consumer’s price index
Consumer Price Index (CPI) = ΣWRΣW
Wholesale Price Index (WPI)
The Wholesale Price Index (WPI) measures the relative changes in the prices of commodities
traded in the wholesale markets. In India, the wholesale price index numbers are constructed
on weekly basis.

Industrial Production Index


The index number of industrial production measures changes in the level of industrial
production comprising many industries. It includes the production of the public and the
private sector. It is a weighted average of quantity relatives. The formula for the index is
P01=Σq1×WΣW×100
Construction of Index Number of Industrial Production

 Classification of industries
 Statistics or data related to industrial production
 Weightage

Agricultural Production Index


Index number of agricultural production is weighted average of quantity relatives.

Sensex
Sensex is the index showing changes in the Indian stock market. It is a short form of a Bombay
Stock Exchange sensitive index. It is constructed with 1978-79 as the reference year or the
base year. It consists of 30 stocks of leading companies in the country.

Purpose of Constructing Index Number

 Purpose of constructing index number of prices is to know the relative change or


percentage in the price level over time. A rising general price level over time is a pointer
towards inflation, while a falling general price level over time is a pointer towards
deflation.
 Purpose of constructing index number of quantity is to know relative change or
percentage change in the quantum or volume of output of different goods and services.
A rising index of quantity suggests a rising level of economic activity and vice-versa.

Question 1.An index number which accounts for the relative importance of the items is known
as
(i) weighted index

76
(ii) simple aggregative index
(iii) simple average of relatives
Answer:(i) An index number becomes a weighted index when the relative importance of items
is taken care of weighted index is the weighted average of different goods.

Question 2.In most of the weighted index numbers the weight pertains to
(i) base year
(ii) current year
(iii) both base and current year
Answer:(i) In general, the base period weight is preferred in calculating the weighted index
number but as per Laspeyre’s method it uses the base year quantity as weight, Paache uses
current year quantities as weight and Fisher’s Index Method uses both base and current year
quantities.

Question 3.The impact of change in the price of a commodity with little weight in the index
will be
(i) small
(ii) large
(iii) uncertain
Answer:(i) An equal rise in the price of an item with little weight will have lower implications
for the overall change in the price ;ndex than that of an Item with more weight.

Question 4.A consumer price index measures changes in


(i) retail prices
(ii) wholesale prices
(iii) producers’prices
Answer:(i) Consumer Price Index (CPI), also known as the cost of living index, measures the
average change in retail prices which show the most accurate impact of price rise on the cost
of living of common people.

Question 5.The item having the highest weight in consumer price index for industrial workers
is
(i) food
(ii) housing
(iii) clothing
Answer:(i) As weight and Fisher’s index method uses both base and current year quantities.
Food is given around 57% weight in CPI for industrial workers as it constitutes the major
proportion of their total consumption.

Question 6.In general, inflation is calculated by using


(i) wholesale price index
77
(ii) consumer price index
(iii) producer’s price index
Answer:(i) The WPI is widely used to measure the rate of inflation. The weekly inflation rate is
given by
XtXt1Xt−1×100
where X, and Xt-1 to the WPI for the (t)th and (t- 1)th weeks.

Question 7.Why do we need an index number?


Answer:Index number enables us to calculate a single measure of change of a large number of
items. The index numbers are needed for the general and specific purpose they are

 Measurement of Change in the Price Level or the Value of ‘ Money Index number
measures the value of money during different periods of time as well as we can use it to
know the Impact of the change in the value of money on different sections of society. It
can be worked out to correct the inflationary and deflationary gaps in the system.
 Information of Foreign Trade Index of export and import provides useful information
regarding foreign trade which helps in formulating the policies of export and import.
 Calculating Real Wages CPI are used in calculating the purchasing power of money and
real wage as follows

 Purchasing power of money = 1/Cost of living index


 Real wage = (Money wage/Cost of living index) × 100
 Measuring and Comparing Output Index of Industrial Production (IIP) gives us a
quantitative figure about the change in production in the industrial sector and thus helps
in comparing industrial output in different periods. Similarly, agricultural production
index provides us an estimate of the production index provides us an estimate of the
production in agricultural sector.
 Policy Making of Government With the help of index numbers government determines
the minatory and fiscal prey and take nassery steps to develop the country.
 Indicating Stock Prices Sensex and NIFT are index numbers of share prices on BSE and
NSE respectively. They serve as a useful guide for investors in the stock market. If the
sensex and nifty are rising, investors have positive expectations about the future
performance of the economy and it is an appropriate time for investment.

Question 8.What are the desirable properties of the base period?


Answer:Base period should have the following properties

 The base year should be a normal period and periods in which extraordinary events have
occurred should not be taken as base periods as they are not appropriate for general
comparisons.
78
 Extreme values should not be selected as base period.
 The period should not be too far in the past as comparison with current period cannot be
done with such base year as policies, economic and social conditions change with time.
 Base period should be updated periodically.

Question 9.Why is it essential to have different CPI for different categories of consumers?
Answer:The Consumer Price Index (CPI) in India is calculated for different categories as under

 CPI for industrial workers.


 CPI for urban non-manual employees.
 CPI for agricultural labourers.

The reason behind calculation of three different CPIs is that the consumption pattern of the
three groups (i.e., industrial workers, urban non-manual workers and agricultural labourers)
differs significantly from each other. Therefore, to assess the impact of the price change on the
cost of living of the three groups, component items included in the index need to be given
different weights for each of the group. This necessitates the calculation of different CPI for
different categories of consumers.

Question 10.What does a consumer price index for industrial workers measure?
Answer:Consumer price index for industrial workers measures the average change in retail
prices of a basket of commodities which an industrial worker generally consumes. Consumer
price index for industrial workers is increasingly being considered the appropriate indicator of
general inflation, which shows the most accurate impact of price rise on the cost of living of
common people.

The items included in CPI (Consumer Price Index) for industrial workers are food, pan, supari,
tobacco, fuel and lighting, housing, colthing, and miscellaneous expenses with food being
accorded the highest weight. This implies that the food price changes have a significant impact
on the CPI.

Question 11.What is the difference between a price index and a quantity index?
Answer:The difference between a price index and a quantity index is as follows

 Price index numbers measure and allow for comparison of the prices of certain goods
while quantity index number measure the changes in the physical volume of production,
construction or employment.
 Price index numbers are more widely used as compared to quantity index numbers.
 Price index is known as unweighted index number while quantity index number is known
was weighted index numbers.

79
Question 12.Is the change in any price reflected in a price index number?
Answer:No, the change in any price is not reflected in a price index number. Price index
numbers measure and permit comparison of the prices of certain goods included in the basket
being used to compare prices in the base period with prices in the current period. Moreover,
an equal rise in the price of an item with large weight and that of an item with low weight will
have different implications for the overall change in the price index.

Question 13.Can the CPI number for urban non-manual emplyees represent the changes in
the cost of living of the President of India?
Answer:The CPI for the urban non-manual employees cannot represent the changes in the
cost of living of the President of India. This is because the consumption basket of an average
non-manual employee does not consist of the items that would be a part of the consumption
basket of the President of India.

Question 14.The monthly per capita expenditure incurred by workers for an industrial centre
during 1980 and 2005 on the following items are given below. The weights of these items are
75, 10, 5, 6 and 4 respectively.
Prepare a weighted index number for cost of living for 2005 with 1980 as the base.

80
Answer:

Question 15.Read the following table carefully and give your comments.
Answer:Index of Industrial Production Base 1993-94

The following conclusions can be made by analysing the above table

 Manufacturing industry has the highest weight of 79.58% in Index of Industrial


Production (IIP) while mining and quarrying and electricity industries account for 10.73%
and 10.69% respectively.
 Manufacturing Industry has registered the highest growth among all industrial sectors in
both the years 1996-97 and 2003-04.
 Mining and quarrying has registered the lowest growth rate in both the years.
 The General Index shows that industrial increased by 30.8% in 1996.-97 as compared to
1993-94 and by 89% in 2003-04.

Question 16.Try to list the important items of consumption in your family.


Answer:(This is a general example. You can use the actual consumption items in your family).
The following items constitute the total consumption needs for a family

 Food
81
 Clothing
 House-Rent/EMI of Housing loan
 Education
 Electricity
 Entertainment and recreation
 Miscellaneous expenses

Question 17.If the salary of a person in the base year is ? 4,000 per annum and the current
year salary is ? 6,000 by how much should his salary rise to maintanin the same standard of
living if the CPI is 400?
Answer:
Base CPI = ₹ 100
Current CPI = ₹400
Base Year Salary = ₹ 4,000
Current Year Salary = ₹ 6,000
When Base CPI is ₹100, then the salary is = ₹ 4,000
Current salary equivalent to base year salary = (Base year salary/100) × CPI of current year
When Current CPI is ₹ 400, then the salary should be
= 4,000100×400 = ₹ 16,000 100
Thus, his salary should be X 16,000 to maintain his purchasing power. Therefore, in the current
year his salary should increase by ₹ 16,000 – ₹ 6,000 = ₹ 10,000 so as to maintain the same
level of living in the current year as that of the base year.

Question 18.The consumer price index for June, 2005 was 125. The food index was 120 and
that of other items
What is the percentage of the total weight given to food?
Answer:

Let the total weight = 100


W1 denotes weight of food
W2 denotes weight of other items
So,

82
Multiplying both sides of Eq. (i) by 135 and subtracting Eq. (ii) from (i) we get

So, W1 = 100015 = 66.67


Substituting the value of in the Eq. (i), we get
W1 + W2 = 100
or 6667 + W2 = 100
W2 = 33.33
Therefore, percentage of total weight given to food is 66.67% and other items 33.33%.

Question 19.An enquiry into the budgets of the middle class families in a certain city gave the
following information

What is the cost of living index of 2004 as compared with 1995?


Answer:

83
Cost of Living Index = 134.50
Thus, the price rose by 34.50% during 1995 and 2004.

Question 20.Record the daily expenditure quantities bought and prices paid per unit of the
daily purchases of your family for two weeks. How has the price change affected your family?
Answer:
This is a practical exercise. Record the daily expenditure, quantities bought and prices paid per
unit of the daily purchases of your family for two weeks and try to analyse if quantities
purchased decrease with rise in price of the respective items and also note if the percentage
change in quantity brought about by a percentage change in price differ for different types of
items.

Question 21.
Given the following data

84
Source Economic Survey, Government of India 2004-2005
(i) Calculate the inflation rates using different index numbers.
(ii) Comment on the relative values of the index numbers.
(iii) Are they comparable?

Answer:(i) (a) Inflation using CPI of Industrial Workers

(b) Inflation using CPI of Non-maunal Employees


85
(c) Inflation using CPI of Agricultural Labourers

(d) Inflation using WPI

86
(ii) The inflation rate calculated using CPI industrial worker with the base year 1982 is the
highest and inflation rate calculated using WPI with the base year 1993-94 is the least.
(iii) No the index number are not comparable because of the following reasons

 Base periods for CPI of industrial workers, urban non-manual workers, agricultural
labourers and WPI are different.
 Commodities and their weightage in different index number may be different.

87

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy