Statistics For Economics Class 11 Notes - For Merge
Statistics For Economics Class 11 Notes - For Merge
CHAPTER-1 INTRODUCTION
WHY DO WE NEED ECONOMICS?
Human wants are unlimited but resources are limited.To
make a balance we need a subject called
economics.Economics seeks to understand and address the
problem of scarcity
Alfred Marshall (one of the founders of modern economics)
called“the study of man in the ordinary business of life”.
Scarcity is the root of all economic problems. Had there been no scarcity, there would have
been no economic problem.We face scarcity because the things that satisfy our wants are
limited in availability.
Consumption, Production and Distribution
Economics involves the study of man engaged in economic activities of various kinds. For this,
you need to know reliable facts about all the diverse economic activities like production,
consumption and distribution.
We want to know how the consumer decides, given his income and many alternative goods to
choose from, what to buy when he knows the prices. This is the study of Consumption.
We also want to know how the producer, similarly, chooses what and how to produce for the
market. This is the study of Production.
Finally, we want to know how the national income or the total income arising from what has
been produced in the country (called the Gross Domestic Product or GDP) is distributed
through salaries,profits and interest.This is the study of Distribution.
“Economics is the study of how people and society choose to employ scarce resources that
could have alternative uses in order to produce various commodities that satisfy their wants
and to distribute them for consumption among various persons and groups in society.”
STATISTICS IN ECONOMICS
Studies required that we know more about economic facts. Such economic facts are also
known as economic data. The purpose of collecting data about these economic problems is to
understand and explain these problems in terms of the various causes behind them. In other
words, we try to analyse them.
For example, when we analyse the hardships of poverty, we try to explain it in terms of the
various factors such as unemployment, low productivity of people, backward technology, etc.
1
But, what purpose does the analysis of poverty serve unless we are able to find ways to
mitigate it.
We may, therefore, also try to find those measures that help solve an economic problem. In
Economics, such measures are known as policies.
WHAT IS STATISTICS?
Statistics deals with the collection, analysis, interpretation and presentation of numerical data.
It is a branch of mathematics and also used in the disciplines such as accounting, economics,
management, physics, finance, psychology and sociology.
For example, a statement in Economics like “the production of rice in India has increased from
39.58 million tonnes in 1974–75 to 106.5 million tonnes in 2013–14, is a quantitative data. In
addition to quantitative data, Economics also uses qualitative data.
For example- ‘gender’ that distinguishes a person as man/woman or boy/girl. It is often
possible to state the information about an attribute of a person in terms of degrees (like
better/ worse; sick/ healthy/ more healthy; unskilled/ skilled/ highly skilled, etc.). Such
qualitative information or statistics is often used in Economics
The next step is to present the data in tabular, diagrammatic and graphic forms. The data,
then, are summarised by calculating various numerical indices, such as mean, variance,
standard deviation, etc.,
WHAT STATISTICS DOES?
1.Statistics is an indispensable tool for an economist that helps him to understand an
economic problem. Using its various methods, effort is made to find the causes behind it with
the help of qualitative and quantitative facts of an economic problem. Once the causes of the
problem are identified, it is easier to formulate certain policies to tackle it.
2.Exact facts are more convincing than vague statements.
For instance, saying that with precise figures, 310 people died in the recent earthquake in
Kashmir, is more factual and, thus, a statistical data. Whereas, saying hundreds of people died,
is not.
3.Statistics also helps in condensing mass data into a few numerical measures (such as mean,
variance etc., about which you will learn later). These numerical measures help to summarise
data.
For example, it would be impossible for you to remember the incomes of all the people in a
data if the number of people is very large. Yet, one can remember easily a summary figure like
the average income that is obtained statistically.
2
4.In this way, Statistics summarises and presents a meaningful overall information about a
mass of data.
5. Quite often, Statistics is used in finding relationships between different economic factors.
Q2. Make a list of activities that constitute the ordinary business of life. Are these economic
activities?
Answer.The activities that constitute the ordinary business of life are:
→ Buying of goods and services.
→ Rendering services to a company by employees and workers.
→ Selling of goods and services.
Yes, the above mentioned activities are regarded as economic activities as it involve the
exchange of money to earn livelihood.
Q3. 'The Government and policy makers use statistical data to formulate suitable policies of
economic development'. Illustrate with two examples.
Answer.The statistical data is important for Government and policy makers to formulate
suitable policies of economic development. It not only helps in analysing and evaluate the
outcomes of the past policies but also assist them to take corrective measures and to
formulate new policies accordingly. It is clear from examples -
(i) It can be ascertained easily by using statistical techniques whether the policy of family
planning is effective in checking the problem of rapidly growing population.
(ii) In preparing annual government budget, previous data of government expenditures and
government revenues are taken into consideration for estimating the allocation of funds
among various projects.
Q4. "You have unlimited wants and limited resources to satisfy them." Explain by giving two
examples.
3
Answer.Every individual have unlimited wants but the resources for satisfying the wants are
limited. Scarcity is the root of all economic problems. Had there been no scarcity, there would
have been no economic problem. This can be understood by examples -
(i) A children pocket money is a limited so he/she have to choose only those things that you
want the most. You can't purchase almost all the things you wants.
(ii) A land available should be put in use either in agricultural or industrial. We can't use same
land for both activities.
Q5. How will you choose the wants to be satisfied?
Answer.Any individual fulfills his/her wants according to his/her needs, satisfactions and
priority attached to different wants. Moreover, the choice of want also depends on the need
of the hour and availability of the goods and also on the availability of means (money) to
purchase that want.
4
CHAPTER-2 COLLECTION OF DATA
In this chapter should enable you to:
• understand the meaning and purpose of data collection;
• distinguish between primary and secondary sources;
• know the mode of collection of data;
• distinguish between Census and Sample Surveys;
• be familiar with the techniques of sampling;
• know about some important sources of secondary data.
The purpose of collection of data is to show evidence for reaching a sound and clear
solution to a problem.
WHAT ARE THE SOURCES OF DATA?
Statistical data can be obtained from two sources.
The researcher may collect the data by conducting an enquiry. Such data are called Primary
Data
Suppose, you want to know about the popularity of a filmstar among school students. For this,
you will have to enquire from a large number of school students, by asking questions from
them to collect the desired information. The data you get, is an example of primary data.
If the data have been collected and processed by some other agency, they are called
Secondary Data.
HOW DO WE COLLECT THE DATA?
Preparation of Instrument
The most common type of instrument used in surveys is questionnaire/ interview schedule.
The questionnaire is either self-administered by the respondent or administered by the
researcher (enumerator) or trained investigator.
While preparing the questionnaire/interview schedule, you should keep in mind the following
points;
• The questionnaire should not be too long. The number of questions should be as minimum
as possible.
• The questionnnaire should be easy to understand and avoid ambiguous or difficult words.
5
• The questions should be arranged in an order such that the person answering should feel
comfortable.
• The series of questions should move from general to specific.
There are three basic ways of collecting data:
1.Personal Interviews,
2.Mailing (questionnaire) Surveys
3.Telephone Interviews.
1.Personal Interviews
Face-to-face interviews with the respondents.Personal contact is made between the
respondent and the interviewer.
Advantages
1. Opportunity of explaining the study and answering the queries of respondents.
2.The interviewer can request the respondent to expand on answers that are particularly
important.
3.Mis-interpretation and misunderstanding can be avoided. Watching the reactions of
respondents can provide supplementary information.
Disadvantages
It is expensive, as it requires trained interviewers. It takes longer time to complete the survey.
Presence of the researcher may inhibit respondents from saying what they really think.
2.Mailing Questionnaire
When the data in a survey are collected by mail, the questionnaire is sent to each individual by
mail with a request to complete and return it by a given date.
Advantages
1.It is less expensive.
2.It allows the researcher to have access to people in remote areas too
3. It does not allow influencing of the respondents by the interviewer.
4.It also permits the respondents to take sufficient time to give thoughtful answers to the
questions.
6
These days online surveys or surveys through short messaging service, i.e., SMS are popular.
Disadvantages
1.Less opportunity to provide assistance in clarifying instructions, so there is a possibility of
misunderstanding the questions.
2. Low response rates due to certain factors, such as returning the questionnaire without
completing it, not returning the questionnaire at all, loss of questionnaire in the mail itself, etc.
3.Telephone Interviews
In a telephone interview, the investigator asks questions over the telephone.
Advantages
1.Cheaper than personal interviews
2.Can be conducted in a shorter time.
3.They allow the researcher to assist the respondent by clarifying the questions.
4.Better in cases where the respondents are reluctant to answer certain questions in personal
interviews.
Disadvantages
1.Many people may not own telephones.
7
Pilot Survey
Once the questionnaire is ready, it is advisable to conduct a try-out with a small group which is
known as Pilot Survey or Pre-testing of the questionnaire.
1.The pilot survey helps in providing a preliminary idea about the survey.
2.It helps in pre-testing of the questionnaire, so as to know the shortcomings and drawbacks
of the questions.
3.Helps in assessing the suitability of questions, clarity of instructions, performance of
enumerators and the cost and time involved in the actual survey.
CENSUS AND SAMPLE SURVEYS
Census or Complete Enumeration
A survey, which includes every element of the population, is known as Census or the Method
of Complete Enumeration. If certain agencies are interested in studying the total population in
India, they have to obtain information from all the households in rural and urban India. It is
carried out every ten years.
Population and Sample
Population or the Universe in statistics means totality of the items under study.
Once the population is identified, the researcher selects a method of studying it. If the
researcher finds that survey of the whole population is not possible, then he/ she may decide
to select a Representative Sample.
Sample
Refers to a group or section of the population from which information is to be obtained. A
good sample (representative sample) is generally smaller than the population and is capable of
providing reasonably accurate information about the population at a much lower cost and
shorter time.
Suppose you want to study the average income of people in a certain region. According to the
Census method, you would be required to find out the income of every individual in the
region, add them up and divide by number of individuals to get the average income of people
in the region. This method would require huge expenditure, as a large number of enumerators
have to be employed.
Alternatively, you select a representative sample, of a few individuals, from the region and find
out their income. The average income of the selected group of individuals is used as an
estimate of average income of the individuals of the entire region.
Random Sampling
8
As the name suggests, random sampling is one where the individual units from the population
(samples) are selected at random.This is also called lottery method
The government wants to determine the impact of the rise in petrol price on the household
budget of a particular locality. For this, a representative (random) sample of 30 households
has to be taken and studied. The names of all 300 households of that area are written on
paper and mixed, then 30 names to be interviewed are selected one by one.
In random sampling, every individual has an equal chance of being selected. In the above
example, all 300 sampling units (also called sampling frame) of the population got an equal
chance of being included in the sample of 30 units and hence the sample, such drawn, is a
random sample.
Exit Polls You must have seen that when an election takes place, the television networks
provide election coverage. They also try to predict the results. This is done through exit polls,
wherein a random sample of voters who exit the polling booths are asked whom they voted
for. From the data of the sample of voters, the prediction is made. You might have noticed
that exit polls do not always predict correctly. Why? Using the Random Number Tables, how
will you select your sample years?
Non-Random Sampling
There may be a situation that you have to select 10 out of 100 households in a locality. You
have to decide which household to select and which to reject. You may select the households
conveniently situated or the households known to you or your friend. In this case, you are
using your judgement (bias) in selecting 10 households. This way of selecting 10 out of 100
households is not a random selection. In a non-random sampling method all the units of the
population do not have an equal chance of being selected and convenience or judgement of
the investigator plays an important role in selection of the sample. They are mainly selected
on the basis of judgment, purpose, convenience or quota and are nonrandom samples.
SAMPLING AND NON-SAMPLING ERRORS
Sampling Errors
Sampling error refers to the difference between the sample estimate and the corresponding
population parameter
It is possible to reduce the magnitude of sampling error by taking a larger sample.
Thus, the difference between the actual value of a parameter of the population and its
estimate is the sampling error.
Example-Consider a case of incomes of 5 farmers of Manipur. The variable x (income of
farmers) has measure-ments 500, 550, 600, 650, 700.
9
We note that the population average of (500+550+600+650+700) ÷ 5 = 3000 ÷ 5 = 600.
Now, suppose we select a sample of two individuals where x has measurements of 500 and
600. The sample average is (500 + 600) ÷ 2 = 1100 ÷ 2 = 550.
Here, the sampling error of the estimate = 600 (true value) – 550 (estimate) = 50.
Non-Sampling Errors
Non-sampling errors are more serious than sampling errors because a sampling error can be
minimised by taking a larger sample. It is difficult to minimise non-sampling error, even by
taking a large sample.
Some of the non-sampling errors are:
Sampling Bias-Sampling bias occurs when the sampling plan is such that some members of the
target population could not possibly be included in the sample.
Non-Response Errors -Non-response occurs if an interviewer is unable to contact a person
listed in the sample or a person from the sample refuses to respond. In this case, the sample
observation may not be representative.
Errors in Data Acquisition This type of error arises from recording of incorrect responses.
Suppose, the teacher asks the students to measure the length of the teacher’s table in the
classroom. The measurement by the students may differ. The differences may occur due to
differences in measuring tape, carelessness of the students, etc.
Similarly, suppose, we want to collect data on prices of oranges. We know that prices vary
from shop to shop and from market to market. Prices also vary according to the quality.
Therefore, we can only consider the average prices. Recording mistakes can also take place as
the enumerators or the respondents may commit errors in recording or transscripting the
data, for example, he/ she may record 13 instead of 31.
CENSUS OF INDIA AND NSSO
There are some agencies both at the national and state level to collect,
Some of the agencies at the national level are Census of India,
1.National Sample Survey (NSS),
Conduct nationwide surveys on socio-economic issue.
NSS provides periodic estimates of literacy, school enrolment, utilisation of educational
services, employment, unemployment, manufacturing and service sector enterprises,
morbidity, maternity, child care, utilisation of the public distribution system etc
10
2.Central Statistics Office (CSO),
3.Registrar General of India (RGI),
4.Directorate General of Commercial Intelligence and Statistics (DGCIS),
5.Labour Bureau, etc.
The Census of India provides the most complete and continuous demographic record of
population.
The Census is being regularly conducted every ten years since 1881. The first Census after
Independence was conducted in 1951. The Census officials collect information on various
aspects of population such as the size, density, sex ratio, literacy, migration, rural-urban
distribution, etc.
Q1. Frame at least four appropriate multiple-choice options for following questions:
(i) Which of the following is the most important when you buy a new dress?
Answer (a) Colour (b) Price (c) Brand (d) Quality of cloth
11
(v) What is the monthly income of your family?
Answer-(a) Less than Rs 10,000 (b) Rs 10,000 to Rs 20,000 (c) Rs 20,000 to Rs 30,000(d) More
than Rs 30,000
Answer
(i) Do you own car?
(ii) Do you smoke?
(iii) Do you own two-wheeler?
(iv) Have you visited any foreign country?
(v) Are you satisfied with your present income?
(ii) Telephone survey is the most suitable method of collecting data, when the population is
literate and spread over a large area (true/false).
Answer-False
(iv) There is a certain bias involved in the non-random selection of samples (true/false).
Answer-True
(v) Non-sampling errors can be minimised by taking large samples (true/ false).
Answer-False
4. What do you think about the following questions. Do you find any problem with these
questions? If yes, how?
(i) How far do you live from the closest market?
Answer-The question is not clear. The question can't clarify how to show distance.
(ii) If plastic bags are only 5 percent of our garbage, should it be banned?
Answer-The question is too long which discourages people to answer also it gives a clue about
how the respondent should answer..
12
(iii) Wouldn't you be opposed to increase in price of petrol?
Answer-The question contains two negatives which creates confusion to the respondents and
may lead to biased response.
Answer-The order of question is incorrect. First, general questions should be asked then
specific. The correct order should be:
(i) What is the yield per hectare in your field?
(ii) Do you use fertilisers in your fields?
(iii) Do you agree with the use of chemical fertilisers?
Q5. You want to research on the popularity of Vegetable Atta Noodles among children.
Design a suitable questionnaire for collecting this information.
Answer-QUESTIONNAIRE
Name: ........................
Age: ..........
Sex: ☐ Male ☐ Female
Q6. In a village of 200 farms, a study was conducted to find the cropping pattern. Out of the
50 farms surveyed, 50% grew only wheat. Identify the population and the sample here.
Answer-Population or the Universe in statistics means totality of the items under study. So,
the population here is 200 farms.
Sample refers to a group or section of the population from which information is to be
obtained. Out of 200 farms, only 50 farms are selected for survey. Therefore, the sample
population is 50 farms.
Answer-Example 1: A study was conducted to know the average income of people in a village.
The total number of person was 750. Out of these, 70 villagers selected and their average
income was recorded. So, in this example:
(i) Population is the number of total villagers which is equal to 750.
(ii) Sample is the 70 villagers whose average income was recorded.
(iii) Variable under study is the income of the villagers.
Example 2: In order to study the to record the level of sugar in the blood, blood sample of
1000 people was taken from 10,000 people. So, in this example
(i) Population is the total number of people i.e., 10,000.
(ii) Sample is the 1000 people.
(iii) Variable is the sugar level.
Q8. Which of the following methods give better results and why?
(a) Census
(b) Sample
Answer-Sample Method gives better results than the Census Method as:
→ Less time consuming: It requires a lot of time to conduct census as evry record have to
obtain while sample can be done in lesser time.
→ Economically feasible: The cost of approaching each individual unit for interrogation and
collection of data is comparatively lower due to small size of sample.
14
→ Accuracy- Although census method provides more accurate and reliable results as
compared to the sample method but in the sample method the errors can be easily located
and rectified in the sampling methods due to the smaller number of items.
→ Lesser Non-sampling Errors- The probability of Non-sampling Errors is also low as the
sample size is smaller as compared to that of the Census Method.
Q9. Which of the following errors is more serious and why?(a) Sampling error (b) Non-
Sampling error
Answer-Non-sampling errors are more serious than sampling errors because a sampling error
can be minimised by taking a larger sample. It is difficult to minimise non-sampling error, even
by taking a large sample as it use of faulty means of collection of data.
Q10. Suppose there are 10 students in your class. You want to select three out of them. How
many samples are possible?
Answer-We have to use combinations to determine the number of samples which are
possible. The formula for the number of such combination is
nCr = n!/(n-r)!r! where n! = n(n-1)(n-2)(n-3).....(3)(2)(1) (Note: 0! = 1)
Therefore the answer will be 10C3 = (10 × 9 × 8)/(3 × 2 × 1) = 720/6 = 120
Number of samples possible = 120
Q11. Discuss how you would use the lottery method to select 3 students out of 10 in your
class?
Answer-Make ten paper slips with name of each student of equal size. Now, there are ten
cards available. Mix them well. Now draw three slips at random without replacement one by
one. By this method we can select three students.
Q12. Does the lottery method always give you a random sample? Explain.
Answer-Yes, the lottery method always gives a random sample if it is used in the proper
manner without any bias. In a random sample, each individual unit has an equal chance of
getting selected. Similarly, in a lottery method, each individual unit is selected at random from
the population and thereby has equal opportunity of getting selected.
Q13. Explain the procedure of selecting a random sample of 3 students out of 10 in your
class, by using random number tables.
15
Answer-For selecting a random sample of 3 students out of 10 by random number tables we
consult one digit random numbers and we will skip random numbers greater than value 10 as
it the largest serial number. We have other 9 one digit numbers. Thus, the 3 selected students
out of 10 are with serial numbers 5,9,2.
Q14. Do samples provide better results than surveys? Give reasons for your answer.
Answer-Sample gives provide better results than surveys because
→ A sample can provide reasonably reliable and accurate information at a lower cost and
shorter time.
→ As samples are smaller than population, more detailed information can be collected by
conducting intensive enquiries.
→ Sample need a smaller team of enumerators, it is easier to train them and supervise their
work more effectively.
16
CHAPTER-3 ORGANISATION OF DATA
RAW DATA
The unclassified data or raw data are highly
disorganised. They are often very large and
cumbersome to handle. To draw meaningful
conclusions from them is a tedious task.Therefore
proper organisation and presentation of such data is
needed before any systematic statistical analysis is
undertaken. Hence after collecting data the next
step is to organise and present them in a
classified form.
Suppose you want to know the performance of
students in mathematics and you have collected
data on marks in mathematics of 100 students
of your school.This data is useless unless it is
organised
Now this data is making some sense
CLASSIFICATION OF DATA
1.Chronological Data
Raw data is classified in various ways depending on
the purpose. They can be grouped according to
time. Such a classification is known as a
Chronological Data or Time-Series Data
In such a classification, data are classified either in
ascending or in descending order with reference to
time such as years, quarters, months, weeks, etc.
Time Series as it depicts a series of values for
different years.
2.Spatial Classification Data are classified with
reference to geographical locations such as
countries, states, cities, districts, etc.
3.Qualitative Data
17
Sometimes you come across characteristics that cannot be expressed quantitatively. Such
characteristics are called Qualities or Attributes.
For example, nationality, literacy, religion, gender, marital status, etc. They cannot be
measured. Such a classification of data on attributes is called a Qualitative Classification.
In the following example, we find population of a country is grouped on the basis of the
qualitative variable “gender”. An observation could either be a male or a female. These two
characteristics could be further classified on the
basis of marital status.
4.Quantitative Data
Characteristics, like height, weight, age, income,
marks of students, etc., are quantitative in nature.
When the collected data of such characteristics
are grouped into classes, it becomes a
Quantitative Classification
18
It cannot take a value like 25.5 between 25 and 26. Instead its value could have been either 25
or 26. What we observe is that as its value changes from 25 to 26, the values in between them
— the fractions are not taken by it.
19
Is a graphic representation of a frequency distribution.
we plot the class marks on the X-axis and frequency on the Y axis.
How to prepare a Frequency Distribution?
While preparing a frequency distribution, the following five questions need to be addressed:
1. Should we have equal or unequal sized class intervals?
2. How many classes should we have?
3. What should be the size of each class?
4. How should we determine the class limits?
5. How should we get the frequency for each class?
Should we have equal or unequal sized class intervals?
There are two situations in which unequal sized intervals are used.
1.When we have data on income and other similar variables where the range is very high.
For example, income per day may range from nearly Zero to many hundred crores of rupees.
In such a situation, equal class intervals are not suitable because
1.If the class intervals are of moderate size and equal, there would be a large number of
classes.
2. If class intervals are large, we would tend to suppress information on either very small levels
or very high levels of income.
3.If a large number of values are concentrated in a small part of the range, equal class intervals
would lead to lack of information on many values.
In all other cases, equal sized class intervals are used in frequency distributions.
How many classes should we have?
The number of classes is usually between six and fifteen.
In case, we are using equal sized class intervals then number of classes can be the calculated
by dividing the range (the difference between the largest and the smallest values of variable)
by the size of the class intervals.
What should be the size of each class?
We can determine the number of classes once we decide the class interval. Thus, we find that
these two decisions are interlinked. We cannot decide on one without deciding on the other.
20
How should we determine the class limits?
Class limits should be definite and clearly stated. Generally, open-ended classes such as “70
and over” or “less than 10” are not desirable. The lower and upper class limits should be
determined in such a manner that frequencies of each class tend to concentrate in the middle
of the class intervals.
21
Tallies are then counted as groups of five. So if there are 16 tallies in a class, we put them as
/ for the sake of convenience. Thus frequency in a class is equal to the number
of tallies against that class
Frequency array
For a discrete variable, the classification of its data is
known as a Frequency Array.
Since a discrete variable takes values and not
intermediate/fractional values between two integral
values,
This table illustrates a Frequency Array
The variable “size of the household” is a discrete
variable that only takes integral values as shown in the
table.
23
A Bivariate Frequency Distribution can be defined as the frequency distribution of two
variables.
For example-we have taken
sample of 20 companies from
the list of companies based in
a city. Suppose that we collect
information on sales and
expenditure on
advertisements from each
company. In this case, we
have bivariate sample data.
Such bivariate data can be
summarised using a Bivariate
Frequency Distribution.
This Table shows the frequency distribution of two variables, sales and advertisement
expenditure (in Rs. lakhs) of 20 companies.
For example, there are 3 firms whose sales are between Rs 135 and Rs145 lakh and their
advertisement expenditures are between Rs 64 and Rs 66 thousand.
EXERCISE
(c) The ratio of the upper class limit and the lower class limit
(d) None of the above
Answe:(a) The average of the upper class limit and the lower class limit.
24
(iii) Statistical calculations in classified data are based on
(a) the actual values of observations
(b) the upper class limits
(c) the lower class limits
(d) the class midpoints
Answer (d) the class midpoints
(c) the lower class limit of a class is excluded in the class interval
(d) the lower class limit of a class is included in the class interval
Answer (a) the upper class limit of a class is excluded in the class interval
2. Can there be any advantage in classifying things? Explain with an example from your daily
life.
Answer:-Yes, there are many advantages of classifying things. These are:
1.It saves our time and energy by making easy to locate a specific data.
25
Discrete Variable Continuous Variable
Q4. Explain the 'exclusive' and 'inclusive' methods used in classification of data.
Answer:-Exclusive method: The classes, by this method, are formed in such a way that the
upper class limit of one class equals the lower class limit of the next class for example, 0-10,
10-20, and so on . Thus, the continuity of the data is maintained. The upper class limit is
excluded but the lower class limit of a class is included in the interval. This method is most
appropriate for data of continuous variables.
Inclusive method: This method does not exclude the upper class limit in a class interval. It
includes the upper class in a class. Thus both class limits are parts of the class interval for
example, 1-5, 6-10, 11-15 and so on. The interval 1-5 includes both the limits i.e. 1 and 5.
Q5. Use the data in Table 3.2 that relate to monthly household expenditure (in Rs) on food
of 50 households and obtain the range of
monthly household expenditure on food.
(c) Number of households whose monthly expenditure on food is between Rs 1500 and Rs
2500 = 13 + 6 = 19
Q 6. In a city 45 families were surveyed for the number of domestic appliances they used.
Prepare a frequency array based on their replies as recorded below.
Answer
27
No. of Domestic No. of
Appliances Households
0 1
1 7
2 15
3 12
4 5
5 2
6 2
7 1
Total 45
28
Answer:-The raw data are usually large an fragmented, it is very difficult to draw any
meaningful conclusion from them. Classification makes the raw data comprehensible by
surprising them into groups. When facts of similar characteristics are placed in the same class,
it enables one to locate them easily, make comparison, and draw inferences without any
difficulty. Therefore, classified data is better than raw data
Q 9. Distinguish between Univariate and Bivariate frequency distribution.
Answer :-The frequency distribution of a single variable is called a Univariate Distribution.
Income of people, marks scored by students, etc. are examples of Univariate Distribution.
The frequency distribution of two variables is called Bivariate distribution. Sales and
advertisement expenditure, weight and height of individuals, etc. are examples of Bivariate
distribution.
Q 10. Prepare a frequency distribution by inclusive method taking class interval of 7 from
the following data:
Answer
29
CHAPTER-4 PRESENTATION OF DATA
INTRODUCTION - As data are generally voluminous, they need to be put in a compact and
presentable form. There are generally three forms of presentation of data:
1.Textual or Descriptive presentation
2.Tabular presentation
3.Diagrammatic presentation.
1.TEXTUAL PRESENTATION OF DATA-In textual presentation, data are described within the
text. When the quantity of data is not too large this form of presentation is more suitable.
EXAMPLE- In a bandh call given on 08 September 2005 protesting the hike in prices of petrol
and diesel, 5 petrol pumps were found open and 17 were closed whereas 2 schools were
closed and remaining 9 schools were found open in a town of Bihar.
2.TABULAR PRESENTATION OF DATA -In a tabular presentation, data are presented in rows
(read horizontally ) and columns (read vertically).
For example -Tabulating information about literacy rates. It has three rows (for male, female
and total) and three columns (for urban, rural and total). It is called a 3 × 3 Table giving 9 items
of information in 9 boxes called the "cells" of the Table.
Classification used in tabulation is of four kinds:
(A).Qualitative
(B).Quantitative
(C).Temporal
(D).Spatial
(A).Qualitative Classification
When classification is done according to attributes,
such as social status, physical status, nationality, etc.,
it is called qualitative classification.
(B).Quantitative Classification
In quantitative classification, the data are classified on
the basis of characteristics which are quantitative in
nature.
30
For example-Age, height, production, income, etc are
quantitative characteristics.
(C).Temporal Classification
In this classification time becomes the classifying
variable and data are categorised according to time.
Time may be in hours, days, weeks, months, years,
etc.
(D).Spatial Classification
When classification is done on the basis of place, it is
called spatial classification. The place may be a
village, block, district, state, country, etc.
31
3.DIAGRAMMATIC PRESENTATION OF DATA
This is the third method of presenting data. This method provides the quickest understanding
of the actual situation to be explained by data.It translates quite effectively the highly abstract
ideas contained in numbers into more concrete and easily comprehensible form.
Three types of diagram
(I) Geometric diagram
(II) Frequency diagram
(III) Arithmetic line graph
(I) Geometric Diagram Bar diagram and pie diagram come in the category of geometric
diagram. The bar diagrams are of three types —
(A).Simple,
(B).Multiple
(C).Component bar diagrams.
(A)Simple Bar Diagram -Comprises a group of equal-space and equal-width rectangular bars .
Height of the bar reads the magnitude of data.
32
The lower end of the bar touches the base line such that the height of a bar starts from the
zero unit.
Bars of a bar diagram can be visually compared by their relative height and accordingly data
are comprehended quickly.
Data for this can be of frequency or non-frequency type.
33
Are very useful in comparing the sizes of different component parts
Component bar diagrams are usually shaded or
coloured suitably.
Pie
Diagram
A pie diagram is also a component diagram, but unlike a bar diagram, here it is a circle whose
area is proportionally divided among the components. The circle is divided into as many parts
as there are components by drawing straight lines from the centre to the circumference.
Pie charts usually are not drawn with absolute values of a category. The values of each
category are first expressed as percentage of the total value of all the categories. A circle in a
pie chart, irrespective of its value of radius, is thought of having 100 equal parts of 3.6°
(360°/100) each. To find out the angle, the component shall subtend at the centre of the
circle, each percentage figure of every component is multiplied by 3.6°.
34
(D).Ogive
(A.)Histogram
A histogram is a two dimensional diagram.
If the class intervals are of equal width (which they generally are) the area of the rectangles
are proportional to their respective frequencies.
Since histograms are rectangles, a line parallel to the base line and of the same magnitude is
to be drawn at a vertical distance equal to frequency .
Since, for countinuous variables, the lower class boundary of a class interval fuses with the
upper class boundary of the previous interval, equal or unequal.
If the classes are not-continuous they are first converted into continuous classes.
A histogram looks similar to a bar diagram. But there are more differences than similarities
In histogram no space is left between two rectangles, but in a bar diagram some space must
be left between consecutive bars.
(B).Frequency Polygon
Frequency polygon is an
alternative to histogram and
is also derived from histogram
itself.
The simplest method of
drawing a frequency polygon
is to join the midpoints of the
topside of the consecutive
rectangles of the histogram.
Broken lines or dots may join
the two ends with the base line.
Frequency polygon is the most
common method of presenting
grouped frequency distribution.
Both class boundaries and class-
marks can be used along the X-
axis, the distances between two
consecutive class marks being
proportional/equal to the width
of the class intervals.
35
Frequency Curve
The frequency curve is obtained by drawing a smooth freehand curve passing through the
points of the frequency polygon as closely as possible. It may not necessarily pass through all
the points of the frequency polygon but it passes through them as closely as possible
Ogive
Ogive is also called cumulative frequency curve. As there are two types of cumulative
frequencies,
for example ‘‘less than’’ type and ‘‘more than’’ type, accordingly there are two ogives for any
grouped frequency distribution data.
For ‘‘less than’’ ogive the cumulative frequencies are plotted against the respective upper
limits of the class intervals whereas for more than ogives the cumulative frequencies are
plotted against the respective lower limits of the class interval.
36
Here you can see from Fig. 4.9 that for the period 1993-94 to 2013-14, the imports were more
than the exports all through the period.
You may notice the value of both exports and imports rising rapidy after 2001-02. Also the gap
between the two (imports and exports) has widened after 2001-02.
Answer:(a) Bar diagrams are one-dimensional diagrams. Though these are represented on a
plane of two axis in form of rectangular bars, the width is of no consequence and only the
length depicts the frequency.
Question 2.Data represented through a histogram can help in finding graphically the
(a) mean
(b) mode
(c) median
(d) All of these
Answer:(b) Histogram gives value of mode of the frequency distribution graphically through
the highest rectangle.
37
Question 3.Ogives can be helpful in locating graphically the
(a) mode
(b) mean
(c) median
(d) None of these
Answer:(c) Intersection point of the less than and more than ogives gives the median.
Question 4.Data represented through arithmetic line graph help in understanding
(a) long term trend
(b) cyclicity in data
(c) seasonality in data
(d) All of the above
Answer:(a) Arithmetic line graph helps in understanding the trend, periodicity, etc in a long
term time series data.
Question 5.Width of bars in a bar diagram need not be equal. (True/False)
Answer:False
Bar diagram comprises a group of equispaced and equiwidth rectangular bars for each class or
category of data.
Question 6.Width of rectangles in a histogram should essentially be equal. (True/False)
Answer:False
If the class intervals are of equal width, the area of the rectangles are proportional to their
respective frequencies and width of rectangles will be equal. However, sometimes it is
convenient or necessary to use varying width of class intervals and hence unequal width of
rectangles.
Question 7.Histogram can only be formed with continuous classification of data. (True/False)
Answer:True
A histogram is never drawn for a discrete variable/data. If the classes are not continuous they
are first converted into continuous classes.
Question 8.Histogram and column diagram are the same method of presentation of data.
(True/False)
Answer:False
Histogram is a two dimensional diagram drawn for continuous data and the rectangles do not
have spaces in between while column diagram is one dimensional with space in between
every column (bar).
Question 9.Mode of a frequency distribution can be known graphically with the help of
histogram. (True/False)
Answer:True
Histogram gives value of mode of the frequency distribution graphically through the highest
rectangle.
38
Question 10.Median of a frequency distribution cannot be known from the ogives.
(True/False)
Answer:False
Intersection-point of the less than and more than ogives gives the median.
Question 11.What kind of diagrams are more effective in representing the following?
(a) Monthly rainfall in a year
(b) Composition of the population of Delhi by religion
(c) Components of cost in a factory
Answer:(a) The monthly rainfall in a year can be best represented by a bar diagram as only one
variable i.e., monthly rainfall is to be presented diagrammatically. The rainfall is plotted on Y-
axis in the corresponding month that is plotted on the X-axis.
(b) Composition of the population of Delhi by religion can be represented by a component bar
diagram. A component bar diagram shows the bar and its sub-divisions into two or more
components. Thus, the total population can be sub divided in terms of religion and presented
through a component bar diagram.
(c) Different components of cost in a factory can most effectively be depicted through a pie
chart. The circle represents the total cost and various components of costs are shown by
different portions of the circle drawn according to percentage of total cost each component
covers.
Question 12.Suppose you want to emphasise the increase in the share of urban non-workers
and lower level of urbanisation in India as shown in Example 4.2. How would you do it in the
tabular form?
Question 13.How does the procedure of drawing a histogram differ when class intervals are
unequal in comparison to equal class intervals in a frequency table?
Answer:A histogram is a set of rectangles with bases as the intervals between class boundaries
39
(along X-axis) and with areas proportional to the class frequency. If the class intervals are of
equal width, the area of the rectangles are proportional to their respective frequencies.
But, when bases vary in their width, the heights of rectangles are to be adjusted to yield
comparable measurements by dividing class frequency by width of the class interval instead of
absolute frequency. This gives us the frequency density for the purpose of comparison.
Thus Frequency density/( Height of rectangle )= Class Frequency / Width of the class interval
Question 14.The Indian Sugar Mills Association reported that, ‘sugar production during the
first fortnight of December, 2001 was about 3,87,000 tonnes, as against 3,78,000 tonnes
during the same fortnight last year (2000). The off-take of sugar from factories during the first
fortnight of December, 2001 was 2,83,000 tonnes for internal consumption and 41,000 tonnes
for exports as against 1,54,000 tonnes for internal consumption and nil for exports during the
same fortnight last season.’
Answer:
(i) Data in tabular form.
Sugar Production in India
40
(ii) The data can effectively be presented diagrammatically using the multiple bar diagram. This
is because multiple bar diagrams are used for comparing two or more sets of data for different
years or classes, etc.
Question 15.
The following table shows the estimated sectoral real growth rates (percentage change over
the previous year) in GDP at factor cost.
41
42
CHAPTER-5 MEASURES OF CENTRAL TENDENCY
There are several statistical measures of central tendency or “averages”. The three most
commonly used averages are: •
1.Arithmetic Mean 2.Median 3.Mode
ARITHMETIC MEAN
Suppose the monthly income (in Rs) of six families is given as: 1600, 1500, 1400, 1525, 1625,
1630.
The mean family income is obtained by adding up the incomes and dividing by the number of
families. = = Rs 1,547 It implies that on an average, a family earns Rs 1,547.
Arithmetic mean is the most commonly used measure of central tendency. It is defined as the
sum of the values of all observations divided by the number of observations and is usually
denoted by x̄.
In general, if there are N observations as X1 , X2 , X3 , ..., XN , then the Arithmetic Mean is
given by
X 1+ X 2+ X 3+… … … … .. X N
x̄ = N
N
Here, i is an index
Thus x̄ =
∑ X where ∑ X =sum of all observations
N
43
Arithmetic mean by direct method is the sum of all observations in a series divided by the total
number of observations.
Example 1 Calculate Arithmetic Mean from the data showing marks of students in a class in an
economics test: 40, 50, 55, 78, 58.
x̄ =
∑ X = 40+50+55+ 78+58 =56.2
N 5
x̄ =A+
∑d
N
A=assumed mean
X=individual observations
N=Total number of observation
d=deviation of assumed mean from individual observation. i.e d=X-A
Example 2 The following data shows the weekly income of 10 families.
Family A B C D E F G H I J
Weekly Income (in Rs) 850 700 100 750 5000 80 420 2500 400 360
Compute mean family income.
Computation of Arithmetic Mean by Assumed Mean Method
44
Familie Income d=X- d’=(X-
s 850 850)/10
A 850 0 0
B 700 -150 -15
C 100 -750 -75
D 750 -100 -10
E 5000 4150 415
F 80 -770 -77
G 420 -430 -43
H 2500 1650 165
I 400 -450 -45
J 360 -490 -49
11160 2660 266
x̄ =A+
∑ d =850+ 2660 =Rs 1116
N 10
x̄ = A +
∑ d ' ×c
N
X− A
where d' = C
c = common factor,
45
N = number of observations,
A= Assumed mean.
Thus, you can calculate the arithmetic mean in the example 2, by the step deviation method,
266
X= 850+ 10 ×10 = Rs 1,116
Assumed Mean Method -As in case of individual series the calculations can be simplified by
using assumed mean method, as described earlier, with a simple modification. Since frequency
(f) of each item is given here, we multiply each deviation (d) by the frequency to get fd.
Then we get Σ fd. The next step is to get the total of all frequencies i.e. Σ f. Then find out Σ fd/Σ
f.
∑ fd
Finally, the arithmetic mean is calculated by x̄ = A + using assumed mean method.
∑f
Step Deviation Method In this case, the deviations are divided by the common factor ‘c’
which simplifies the calculation.
46
d X− A
Here we estimate d ‘ = c
= c
∑ fd ' × c (−34 )
x̄ = A + = 35 + ×10 = 30.14 marks
∑f 70
However, you might want to give more importance to the rise in price of potatoes (P2 ).
To do this, you may use as ‘weights’ the share of mangoes in the budget of the consumer
(W1 ) and the share of potatoes in the budget (W2 ).
W 1 P 1+W 2 P 2
Now the arithmetic mean weighted by the shares in the budget would be W 1+W 2
48
In general the weighted arithmetic mean is given by,
W 1 X 1+W 2 X 2+… … … … … … … … … … .+ WnXn ∑ wx
=
w 1+w 2+ … … … … … … … …+Wn ∑w
Median
Median is that positional value of the variable which divides the distribution into two equal
parts, one part comprises all values greater than or equal to the median value and the other
comprises all values less than or equal to it.
The Median is the “middle” element when the data set is arranged in order of the magnitude.
Since the median is determined by the position of different values, it remains unaffected if,
say, the size of the largest value increases.
Computation of median
The median can be easily computed by sorting the data from smallest to largest and finding
out the middle value.
Example 5 Suppose we have the following observation in a data set:
5, 7, 6, 1, 8, 10, 12, 4, and 3. Arranging the data, in ascending order you have: 1, 3, 4, 5, 6, 7, 8,
10, 12.
The “middle score” is 6, so the median is 6.
Half of the scores are larger than 6 and half of the scores are smaller. If there are even
numbers in the data, there will be two observations which fall in the middle.
The median in this case is computed as the arithmetic mean of the two middle values.
Example 6 The following data provides marks of 20 students. You are required to calculate the
median marks.
25, 72, 28, 65, 29, 60, 30, 54, 32, 53, 33, 52, 35, 51, 42, 48, 45, 47, 46, 33.
Arranging the data in an ascending order, you get
25, 28, 29, 30, 32, 33, 33, 35, 42, 45, 46, 47, 48, 51, 52, 53, 54, 60, 65, 72.
You can see that there are two observations in the middle, 45 and 46.
49
The median can be obtained by taking the mean of the two observations:
45+ 46
Median = 2
=45.5 Marks
In order to calculate median it is important to know the position of the median i.e. item at
which the median lies.
The position of the median can be calculated by the following formula:
( N +1 ) th
Position of median = items
2
Discrete Series
( N +1 ) th
In case of discrete series the position of median i.e items can be located through
2
cumulative frequency.
The corresponding value at this position is the value of median.
Computation of Median for Discrete Series
Example 7 The frequency distributsion of the number of persons and their respective incomes
(in Rs) are given below. Calculate the median income.
Income No of Cumulative
Income (in Rs): 10 20 30 40 (Rs) Persons frequency
Number of persons: 2 4 10 4 (f) (c f)
10 2 2
The median is located in the (N+1)/ 2 = (20+1)/2 = 10.5th 20 4 6
observation. This can be easily located through cumulative 30 10 16
frequency. The 10.5th observation lies in the c.f. of 16. The 40 4 20
income corresponding to this is Rs 30, so the median
income is Rs 30.
Continuous Series
In case of continuous series you have to locate the median class where N/2th item [not
(N+1)/2th item] lies. The median can then be obtained as follows:
50
N
−c . f .
Median = L + 2 ×h
f
Example 8 Following data relates to daily wages of persons working in a factory. Compute the
median daily wage.
Daily Wages in Rs – 55-60 50-55 45-50 40-45 35-40 30-35 25-30 20-25
No of Workers 7 13 15 20 30 33 28 14
N
−c . f .
Median =L + 2 ×h
f Daily No of Cumulative
wages Workers Frequency
(f) c.f
=35+(80-75)/30*(40-35) 20-25 14 14
=Rs 35.83 25-30 28 42
30-35 33 75
Thus, the median daily wage is Rs 35.83. This means that 35-40 30 105
50% of the workers are getting less than or equal to Rs 40-45 20 125
35.83 and 50% of the workers are getting more than or 45-50 15 140
equal to this wage. 50-55 13 153
You should remember that median, as a measure of 55-60 7 160
central tendency, is not sensitive to all the
values in the series. It concentrates on the
values of the central items of the data.
Quartiles
Quartiles are the measures which divide
the data into four equal parts, each portion contains equal number of observations. There are
three quartiles.
51
The first Quartile (denoted by Q1 ) or lower quartile has 25% of the items of the distribution
below it and 75% of the items are greater than it.
The second Quartile (denoted by Q2 ) or median has 50% of items below it and 50% of the
observations above it.
The third Quartile (denoted by Q3 ) or upper Quartile has 75% of the items of the distribution
below it and 25% of the items above it. Thus, Q1 and Q3 denote the two limits within which
central 50% of the data lies.
Calculation of Quartiles The method for locating the Quartile is same as that of the median in
case of individual and discrete series. The value of Q1 and Q3 of an ordered series can be
obtained by the following formula where N is the number of observations.
( N +1)th 3(N +1)th
Q1= Size of item Q3= Size of item
4 4
Example:-Calculate the value of lower quartile from the data of the marks obtained by ten
students in an examination. 22, 26, 14, 30, 18, 11, 35, 41, 12, 32.
Arranging the data in an ascending order, 11, 12, 14, 18, 22, 26, 30, 32, 35, 41.
( N +1 ) th ( 10+1 ) th
Q1 = size of item = size of item = size of 2.75 th item
4 4
Computation of Mode
Discrete Series
Consider the data set 1, 2, 3, 4, 4, 5. The mode for this data is 4 because 4 occurs most
frequently (twice) in the data.
52
Example 10 Look at the following discrete series:
Variable 10 20 30 40 50
Frequency 2 8 20 10 5
Here, as you can see the maximum frequency is 20, the value of mode is 30.
In this case, as there is a unique value of mode, the data is unimodal.
But, the mode is not necessarily unique, unlike arithmetic mean and median. You can have
data with two modes (bi-modal) or more than two modes (multi-modal).
It may be possible that there may be no mode if no value appears more frequent than any
other value in the distribution. For example, in a series 1, 1, 2, 2, 3, 3, 4, 4, there is no mode.
Continuous Series
In case of continuous frequency distribution, modal class is the class with largest frequency.
Mode can be calculated by using the formula:
D1
M0 = L+ D1+ D 2 × h
12
=25 + 12+ 10 ×5 = 27.273
Question /Answer
Question 1.
Which average would be suitable in the following cases?
(i) Average size of readymade garments.
(ii) Average intelligence of students in a class.
(iii) Average production in a factory per shift.
(iv) Average wages in an industrial concern.
(v) When the sum of absolute deviations from average is least.
(vi) When quantities of the variable are in ratios.
(vii) In case of open-ended frequency distribution.
Answer:
(i) Mode Average size of any ready made garments should be the size for which demand is the
54
maximum. Hence, the modal value which represents the value with the highest frequency
should be taken as the average size to be produced.
(ii) Median It is the value that divides the series into two equal parts. Therefore, Median will
be the best measure for calculating the average intelligence of students in a class as it will give
the average intelligence such that there are equal number of students above and below this
average. It will not be affected by extreme values.
(iii) Arithmetic Mean The average production in a factory per shift is best calculated by
Arithmetic Mean as it will capture all types of fluctuations in production during the shifts.
(iv) Arithmetic Mean Arithmetic Mean will be the most suitable measure. It is calculated by
dividing the sum of wages of all the workers by the total number of workers in the industrial
concern. It gives a fair idea of average wage bill taking into account all the workers.
(v) Arithmetic Mean The algebraic sum of the deviations of values about Arithmetic Mean is
zero. Hence, when the sum of absolute deviations from average is the least, then mean could
be used to calculate the average.
(vi) Median Median will be the most suitable measure in case the variables are in ratios as it is
least affected by the extreme values.
(vii) Median Median is the most suitable measure as it can be easily computed even in case of
open ended frequency distribution and will not get affected by extreme values.
Question 2.
Indicate the most appropriate alternative from the multiple choices provided against each
question.
(i) The most suitable average for qualitative measurement is
(a) Arithmetic mean
(b) Median
(c) Mode
(d) Geometric mean
(e) None of these
Answer:
(b) Median is the most suitable average for qualitative measurement because Median divides
55
a series in two equal parts thus representing the average qualitative measure without being
affected by extreme values.
Question 3.
Comment whether the following statements are true or false.
(i) The sum of deviation of items from median is zero.
(ii) An average alone is not enough to compare series.
(iii) Arithmetic mean is a positional value.
(iv) Upper quartile is the lowest value of top 25% of items.
(v) Median is unduly affected by extreme observations.
Answer:
(i) False
This mathematical property applies to the arithmetic mean and not to median.
(ii) True
Average is not enough to compare the series as it does not explain the extent of deviation of
different items from the central tendency and the difference in the frequency of values. These
are measured by measures of dispersion and kurtosis.
(iii) False
Median is a positional value.
(iv) True
The upper quartile also called the third quartile, has 75 % of the items below it and 25 % of
items above it.
56
(v) False
Arithmetic mean is unduly affected by extreme observations.
Question 4.
If the arithmetic mean of the data given below is 28, find (a) the missing frequency and (b) the
median of the series
Answer:
(a) Let the missing frequency br f1.
Arithmetic Mean = 28
Question 5.
The following table gives the daily income of ten workers in a factory. Find the arithmetic
mean.
Answer:
N = 10
X¯¯¯¯=ΣXN=240010=240
Arithmetic Mean = ₹ 240
Question 6.
Following information pertains to the daily income of 150 families. Calculate the arithmetic
mean.
58
Answer:
Question 7.The size of land holdings of 380 families in a village is given below. Find the median
size of land holdings.
Answer:
Answer:
(a) Highest income of lowest 50% workers will be given by the median. Σf = N = 65
Median class = Size of (N2)th item = Size of (652)th item=325 th item
32.5th item lies in the 50th cumulative frequency and the corresponding class interval is 24.5 –
29.5.
(b) Minimum income earned by top 25% workers will be given by the lower quartile Q 1.
Class interval of Q1 = (N4)th item
= (654)th item = 1625th item
16.25th item lies in the 30th cumulative frequency and the corresponding class interval is 19.5
– 24.5
(c) Maximum income earned by lowest 25% workers will be given by the upper quartile Q 3.
60
Class interval of Q3 = (N4)th item
= 3(654)th item
= 3 × 1625th item
= 48.75th item
48.75th item lines in 50th item and the corresponding class interval is 24.5-29.5.
Question 9.
The following table gives production yield in kg per hectare of wheat of 150 farms in a village.
Calculate the mean, median and mode production yield.
Answer:
(i) Mean
61
(ii) Median
62
(iii) Mode
Grouping Table
63
Analysis Table
64
Chapter -6 Correlation
Types of Correlation
Correlation is commonly classified into negative and positive correlation.
Positive Correlation When two variables move in the same direction, such a relation is
called positive correlation, e.g., Relationship between price and supply
Negative Correlation When two variables changes in different directions, it is called
negative correlation. Relationship between price and demand.
Degree of Correlation
Degree of correlation refers to the coefficient of correlation
65
Methods of Estimating Correlation
(i) Scatter Diagram Scattered diagram offers a graphic expression of the direction and
degree of correlation.
Short-cut Method
This method is used when mean value is not in
whole number but in fractions. In this method,
deviation is calculated by taking the assumed mean
both the series.
The study of correlation shows the direction and degree of relationship between the
variables.
Correlation coefficient some times suggests cause and effect relationship.
Correlation analysis facilitates business decisions because the trend path of one variable
may suggest the expected changes in the other.
Correlation analysis also helps policy formulation.
Question 1.The unit of correlation coefficient between height in feet and weight in kgs is
(a) kg/feet
(b) percentage
(c) non-existent
67
Answer:(c) Correlation coefficient (r) has no unit. It is a pure number. It meansss units of
measurement are not part of r.
Question 3.If rXY is positive the relation between X and Y is of the type
(a) when Y increases X increases
(b) when Y decreases X increases
(c) when Y increases X does not change
Answer:(a) If r is positive the two variables move in the same direction. e.g., when the price of
coffee rises, the demand for tea also rises as coffee is a substitute of tea. Therefore, the r
between price of coffee and demand for tea will be positive.
Question 5.Of the following three measures which can measure any type of relationship?
(a) Karl Pearson’s coefficient of correlation
(b) Spearman’s rank correlation
(c) Scatter diagram
Answer:(c) The scatter diagram gives a visual presentation of the relationship and is not
confined to linear relations. Karl Pearson’s coefficient of correlation and Spearman’s rank
correlation are strictly the measures of linear relationship.
Question 6.If precisely measured data are available the simple correlation coefficient is
(a) more accurate than rank correlation coefficient
(b) less accurate than rank correlation coefficient
68
(a) Rank correlation should be used only when the variables cannot be measured precisely,
generally it is not as accurate as the simple correlation coefficient as all the information
concerning the data is not utilised in this.
Question 8.Can r lie outside the -1 and 1 range depending on the type of data?
Answer:No the value of the correlation coefficient lies between minus one and plus one, -1 ≤ r
≤ 1. If the value of r is outside this range in any type of data, it indicates error in calculation.
Question 10.When is rank correlation more precise than simple correlation coefficient?
Answer:Rank correlation is more precise than simple correlation coefficient in the following
situations
When the Measurements of the Variables are Suspect e.g., in a remote village where
measuring rods or weighing scales are not available, height and weight of people cannot
be measured precisely but the people can be easily ranked in terms of height and weight.
When Data is Qualitative It is difficult to quantify qualities such as fairness, honesty etc.
Ranking may be a better alternative to quantification of qualities.
When Data has Extreme Values Sometimes the correlation coefficient between two
variables with extreme values may be quite different from the coefficient without the
extreme values. Under these circumstances rank correlation provides a better alternative
to simple correlation.
69
Question 11.Does zero correlation mean independence?
Answer:No, zero correlation does not mean independence. If there is zero correlation (r XY = 0),
it means the two variables are uncorrelated and there is no linear relation between them.
However, other types of relation may be there and they may not be independent.
If r = 0 the two variables are uncorrelated. There is no linear relation between them.
However, other types of relation may be there and hence the variables may not be
independent.
If r= 1 the correlation is perfectly positive. The relation between them is exact in the
sense that if one increases, the other also increases in the same proportion and if one
decreases, the other also decreases in the same proportion.
If r = -1 the correlation is perfectly negative. The relation between them is exact in the
sense that if one increases, the other decreases in the same proportion and if one
decreases, the other increases in the same proportion.
Question 15.Why does rank correlation coefficient differ from Pearsonian correlation
coefficient?
Answer:Rank correlation coefficient differs from Pearsonian correlation coefficient in the
following ways
70
Question 16.Calculate the correlation coefficient between the heights of fathers in inches (X)
and their sons (Y).
Answer:
Question 17.Calculate the correlation coefficient between X and Y and comment on their
relationship.
71
Answer:
Question 18.Calculate the correlation coefficient between X and Y and comment on their
relationship.
Solution
72
As the correlation coefficient between the two variables is + 1, so the two variables are
perfectly positive correlated.
73
Chapter 7 Index Number
An index number is a statistical device for measuring changes in the magnitude of a group of
related variables. It represents the general trend of diverging ratios from which it is calculated.
According to Croxton and Cowden, “Index numbers are devices for measuring difference in the
magnitude of a group of related variables.”
P01=ΣP1ΣP0×100
Here, P01 = Price index of current year
ΣP1 = Sum of prices of the commodities in the current year
ΣP0 = Sum of prices of the commodities in the base year
(ii) Simple Average of Price Relatives Method
According to this method, we first find out price relatives from each commodity and then take
simple average of all the prices relatives.
Price relatives, P01 = Current year price (P1) Base year price (P0)×100
We can find out price index number of the current year by using the following formula
P01=∑[P1P0×100]N
74
Construction of Weighted Index Numbers
(i) Weighted Average of Price Relative Method
According to this method, weighted sum of the price relatives is divided by the sum total of
the weight. In this method, goods are given weight according to their quantity, thus
P01=ΣRWΣW
Here, P01 = Index number for the current year in relation to the base year
W = weight
R = price relative
(ii) Weighted Aggregative Method Under this method, different goods are accorded weight
according to the quantity bought therefore, suggested different techniques of weighting some
of well known methods are as under
Quantity weight
Expenditure weight
75
The following formula is used to find consumer’s price index
Consumer Price Index (CPI) = ΣWRΣW
Wholesale Price Index (WPI)
The Wholesale Price Index (WPI) measures the relative changes in the prices of commodities
traded in the wholesale markets. In India, the wholesale price index numbers are constructed
on weekly basis.
Classification of industries
Statistics or data related to industrial production
Weightage
Sensex
Sensex is the index showing changes in the Indian stock market. It is a short form of a Bombay
Stock Exchange sensitive index. It is constructed with 1978-79 as the reference year or the
base year. It consists of 30 stocks of leading companies in the country.
Question 1.An index number which accounts for the relative importance of the items is known
as
(i) weighted index
76
(ii) simple aggregative index
(iii) simple average of relatives
Answer:(i) An index number becomes a weighted index when the relative importance of items
is taken care of weighted index is the weighted average of different goods.
Question 2.In most of the weighted index numbers the weight pertains to
(i) base year
(ii) current year
(iii) both base and current year
Answer:(i) In general, the base period weight is preferred in calculating the weighted index
number but as per Laspeyre’s method it uses the base year quantity as weight, Paache uses
current year quantities as weight and Fisher’s Index Method uses both base and current year
quantities.
Question 3.The impact of change in the price of a commodity with little weight in the index
will be
(i) small
(ii) large
(iii) uncertain
Answer:(i) An equal rise in the price of an item with little weight will have lower implications
for the overall change in the price ;ndex than that of an Item with more weight.
Question 5.The item having the highest weight in consumer price index for industrial workers
is
(i) food
(ii) housing
(iii) clothing
Answer:(i) As weight and Fisher’s index method uses both base and current year quantities.
Food is given around 57% weight in CPI for industrial workers as it constitutes the major
proportion of their total consumption.
Measurement of Change in the Price Level or the Value of ‘ Money Index number
measures the value of money during different periods of time as well as we can use it to
know the Impact of the change in the value of money on different sections of society. It
can be worked out to correct the inflationary and deflationary gaps in the system.
Information of Foreign Trade Index of export and import provides useful information
regarding foreign trade which helps in formulating the policies of export and import.
Calculating Real Wages CPI are used in calculating the purchasing power of money and
real wage as follows
The base year should be a normal period and periods in which extraordinary events have
occurred should not be taken as base periods as they are not appropriate for general
comparisons.
78
Extreme values should not be selected as base period.
The period should not be too far in the past as comparison with current period cannot be
done with such base year as policies, economic and social conditions change with time.
Base period should be updated periodically.
Question 9.Why is it essential to have different CPI for different categories of consumers?
Answer:The Consumer Price Index (CPI) in India is calculated for different categories as under
The reason behind calculation of three different CPIs is that the consumption pattern of the
three groups (i.e., industrial workers, urban non-manual workers and agricultural labourers)
differs significantly from each other. Therefore, to assess the impact of the price change on the
cost of living of the three groups, component items included in the index need to be given
different weights for each of the group. This necessitates the calculation of different CPI for
different categories of consumers.
Question 10.What does a consumer price index for industrial workers measure?
Answer:Consumer price index for industrial workers measures the average change in retail
prices of a basket of commodities which an industrial worker generally consumes. Consumer
price index for industrial workers is increasingly being considered the appropriate indicator of
general inflation, which shows the most accurate impact of price rise on the cost of living of
common people.
The items included in CPI (Consumer Price Index) for industrial workers are food, pan, supari,
tobacco, fuel and lighting, housing, colthing, and miscellaneous expenses with food being
accorded the highest weight. This implies that the food price changes have a significant impact
on the CPI.
Question 11.What is the difference between a price index and a quantity index?
Answer:The difference between a price index and a quantity index is as follows
Price index numbers measure and allow for comparison of the prices of certain goods
while quantity index number measure the changes in the physical volume of production,
construction or employment.
Price index numbers are more widely used as compared to quantity index numbers.
Price index is known as unweighted index number while quantity index number is known
was weighted index numbers.
79
Question 12.Is the change in any price reflected in a price index number?
Answer:No, the change in any price is not reflected in a price index number. Price index
numbers measure and permit comparison of the prices of certain goods included in the basket
being used to compare prices in the base period with prices in the current period. Moreover,
an equal rise in the price of an item with large weight and that of an item with low weight will
have different implications for the overall change in the price index.
Question 13.Can the CPI number for urban non-manual emplyees represent the changes in
the cost of living of the President of India?
Answer:The CPI for the urban non-manual employees cannot represent the changes in the
cost of living of the President of India. This is because the consumption basket of an average
non-manual employee does not consist of the items that would be a part of the consumption
basket of the President of India.
Question 14.The monthly per capita expenditure incurred by workers for an industrial centre
during 1980 and 2005 on the following items are given below. The weights of these items are
75, 10, 5, 6 and 4 respectively.
Prepare a weighted index number for cost of living for 2005 with 1980 as the base.
80
Answer:
Question 15.Read the following table carefully and give your comments.
Answer:Index of Industrial Production Base 1993-94
Food
81
Clothing
House-Rent/EMI of Housing loan
Education
Electricity
Entertainment and recreation
Miscellaneous expenses
Question 17.If the salary of a person in the base year is ? 4,000 per annum and the current
year salary is ? 6,000 by how much should his salary rise to maintanin the same standard of
living if the CPI is 400?
Answer:
Base CPI = ₹ 100
Current CPI = ₹400
Base Year Salary = ₹ 4,000
Current Year Salary = ₹ 6,000
When Base CPI is ₹100, then the salary is = ₹ 4,000
Current salary equivalent to base year salary = (Base year salary/100) × CPI of current year
When Current CPI is ₹ 400, then the salary should be
= 4,000100×400 = ₹ 16,000 100
Thus, his salary should be X 16,000 to maintain his purchasing power. Therefore, in the current
year his salary should increase by ₹ 16,000 – ₹ 6,000 = ₹ 10,000 so as to maintain the same
level of living in the current year as that of the base year.
Question 18.The consumer price index for June, 2005 was 125. The food index was 120 and
that of other items
What is the percentage of the total weight given to food?
Answer:
82
Multiplying both sides of Eq. (i) by 135 and subtracting Eq. (ii) from (i) we get
Question 19.An enquiry into the budgets of the middle class families in a certain city gave the
following information
83
Cost of Living Index = 134.50
Thus, the price rose by 34.50% during 1995 and 2004.
Question 20.Record the daily expenditure quantities bought and prices paid per unit of the
daily purchases of your family for two weeks. How has the price change affected your family?
Answer:
This is a practical exercise. Record the daily expenditure, quantities bought and prices paid per
unit of the daily purchases of your family for two weeks and try to analyse if quantities
purchased decrease with rise in price of the respective items and also note if the percentage
change in quantity brought about by a percentage change in price differ for different types of
items.
Question 21.
Given the following data
84
Source Economic Survey, Government of India 2004-2005
(i) Calculate the inflation rates using different index numbers.
(ii) Comment on the relative values of the index numbers.
(iii) Are they comparable?
86
(ii) The inflation rate calculated using CPI industrial worker with the base year 1982 is the
highest and inflation rate calculated using WPI with the base year 1993-94 is the least.
(iii) No the index number are not comparable because of the following reasons
Base periods for CPI of industrial workers, urban non-manual workers, agricultural
labourers and WPI are different.
Commodities and their weightage in different index number may be different.
87