0% found this document useful (0 votes)
25 views26 pages

P & S Unit-1 Material

The document discusses the scope and importance of statistics. It explains that statistics is useful for planning, government, mathematics, economics, business, accounting, auditing, insurance, physical sciences, and astronomy. Statistics helps with decision making, forecasting, cost accounting, life tables, and analyzing movement of heavenly bodies.

Uploaded by

Ganesh Degala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views26 pages

P & S Unit-1 Material

The document discusses the scope and importance of statistics. It explains that statistics is useful for planning, government, mathematics, economics, business, accounting, auditing, insurance, physical sciences, and astronomy. Statistics helps with decision making, forecasting, cost accounting, life tables, and analyzing movement of heavenly bodies.

Uploaded by

Ganesh Degala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

1

Probability and Statistics

UNIT 1 :: DESCRIPTIVE STATISTICS AND METHODS OF DATA SCIENCE

SYLLABUS: Descriptive Statistics and Methods for Data Science: Data science – Statistics
Introduction - Population Vs Sample – Collection of Data - Primary and Secondary Data –
Type of Variable: Dependent and Independent Categorical and Continuous Variables – Data
Visualization – Measures of Central Tendency – Measures of Variability (Spread or Variance) –
Skewness, Kurtosis.

1. DATA SCIENCE:

Q1: State whether statistics is a science or art.

Answer: There has been many arguments regarding the statistics as a science or art. Science
is defined as systematized arrangement related to knowledge. It focuses on cause and effect
relationship in returns of scientific principles or laws in order to make clear generalizations. In
simple words, science provides the knowledge that is helpful in finding a way. But is does not
show the direction to be chosen. On contrary to science, art refers to the ability of managing
and facts in order to attain the set goals. It provides the ways of handling and presenting the
data, making judgment logically and obtaining the effective and relevant results.

Earlier the natural scientists were in dilemma regarding statistics to be considered as a


distinct science. In today’s world, all sciences are considered as statistical.

When statistics is considered as science, it is not similar to the sciences such as


chemistry, physics, zoology and so on. This is due to the reason that statistical phenomenon
are influenced by multiplicity of causes that cannot be measured accurately. In short, the
statistical science is not much accurate in comparison to natural sciences. Statistics is thus
considered as a scientific method and its tools are widely applied in scientific studies.

“Statistics is not a body of substantive knowledge but a body of methods for obtaining
knowledge”.

Science is knowledge whereas art is action. Thus, from this point of view, statistics can
also be considered as an “art”. The basic function of statistics is to apply the given method is
order to extract facts, results so as to arrive at taking effective conclusion.
2

2. STATISTICS INTRODUCTION

Q2: Define statistics and explain its characteristics.

Answer: Definition of Statistics: WEBSTER defined statistics as “Statistics is the classified


facts representing the conditions of the people in a state specially those facts which can be
stated in number or in tables of numbers or in any tabular or classified arrangement”.

The simple definition of statistics given by CROXTON and COWDEN is as follows:

“Statistics may be defined as a science of collection, presentation, analysis and


interpretation of numerical data”

The methods used in the analysis of statistical data are called as statistical methods.

Definitions by A.L.BOWLEY are as follows:

1. “Statistics may be called the science of counting”


2. “Statistics may rightly be called the science of averages”
3. “Statistics is the science of the measurement of social organism, regarded as a whole in
all its manifestations”.

Characteristics of Statistics:

1. Aggregate of Facts: Statistics is the aggregate of information i.e., complete facts and
figures that can be related and compared. Single figures cannot be termed as statistics.
For example, if heights of different students in a class are given then such data can be
compared and conclusions can be drawn regarding the height of each student.
Therefore, such data is considered as statistics. Whereas if the height of a single
student is given the comparison becomes impossible and it is not considered as
statistics. Likewise single figures concerning sales of firm, marks of students, price,
demand, exports etc., con not be considered as statistics.
2. Affected by Multiplicity of Causes: In statistics, the figures and facts are significantly
effected by various factors operating together.
3. Statistics are Expressed Numerically: Statistics are numerical statements of facts. In
other words, statistics are expressed in numbers or quantitative terms. For example,
the production level of XYZ manufacturing company increased from 100 tones in the
year 2000 to 150 tones in the year 2002. A qualitative statement like the sales of XYZ
company increasing year after the year can not be regarded as statistics.
3

4. Statistics are Estimated as Per Some Reasonable Standards of Accuracy: Numerical


facts can be collected either by actually counting i.e., actual count and measuring m
i.e., measurement or by estimating i.e., estimates, Estimates may or may not be
accurate but actual count or measurement always give accurate results. For instance,
if the strength of the class is counted and concluded that there are 90 students then it
can be said that it is an accurate figure.

If an estimation is made that 10000 people have participated in the protest, then it
means that it is an approximate figure, it can be either more or less than 10000. In
statistics, attaining mathematical accuracy is a complicated task. Therefore, in order to
ensure accuracy, estimation has to be carried out by applying reasonable standards of
accuracy.

5. Statistics are Gathered in a Systematic Manner: Statistical data must be collected in


a systematic manner. A sequential action plan must be designed before collecting data.
The data collected in an unplanned manner would result in wrong conclusions.
6. Statistics are Collected for a Predetermined Purpose: The purpose for which the
data is collected must be decided first. For instance, if the intention of the study is to
collect the data related to the costs of marketing of a firm, then the researcher can
attain expected results only when he/she should collect data on cost of marketing
activities outsourced by the firm or cost of marketing activities undertaken by film itself.
7. Statistics should be Placed in Relation to Each Other: In order to interpret
statistical data, the numerical facts and figures must be comparable. It must be
remembered that the quantitative terms or numbers which are compared must be
homogeneous in order to get the accurate results. For instance, comparing the weights
of the children with adults would not give accurate results.

Q3: Explain the Scope and importance of statistics.

Answer: Scope and Importance / Applications of Statistics:

1. Statistics in Planning: Statistics in planning may be at business, economics or


government level. The modern age is regarded as the ‘age of planning’. Planning plays
a prominent role in every organization as it helps in efficient and effective working of an
organization. National Sample Survey (NSS) in India was established in 1950 for
collecting statistical data for planning in India.
4

2. Statistics in State: Statistics was also known as ‘Science of Statecraft’ and was used
for collecting data for making fiscal and military policies. Now-a-days the statistical
data is used for prices, production, income, consumption, expenditure and profits by
government. It plays a vital role for increasing the welfare of the state.
3. Statistics in Mathematics: Statistics and mathematics are the two interrelated
subjects. These subjects are based on the concept of ‘theory of probability’. The
significant role of mathematics into statistics has evolved into new branch of statistics
called as ‘Mathematical Statistics’. Statistics according to CONNOR, “is a branch of
applied mathematics which specializes in data”.
4. Statistics in Economics: The relation between statistics and economics was first
explained by “William Petty” in his book ‘Political arithmetic’. The statistical techniques
are used in solving the economic problems such as production, consumption,
distribution of income, wealth, profits etc., The relation of mathematics and statistics
with economics has evolved into a new branch called ‘econometrics’
5. Statistics in Business and Management: Statistics has an extensive role in business
as well as management. According to Prof. Ya-Lun-Chou “Statistics is a method of
decision-making in the face of uncertainty on the basis of numerical data and
calculated risks”. Most of the managerial decisions and business ‘forecasting
techniques are based on the statistical information.
6. Statistics in Accountancy and Auditing: Statistics has various applications in
accountancy and auding. An example for the application of statistics in accountancy is
the ‘Method of inflation Accounting’. Various statistical techniques are used for cose
accounting and auditing purposes.
7. Statistics in Insurance: Statistics in insurance is applied on the basis of probability
theory. Life insurance was developed by ‘Edmund Hally’ in 1961. The problems or
questions of life insurance were solved by using the life tables given by Edmund Hally.
The success of insurance industry mainly depends on the application of statistical data
in life tables.
8. Statistics in Physical Sciences and Astronomy: Statistics in astronomy is a physical
science which has various applications. .Kepler gave three famous laws relating to the
movements of heavenly bodies with the help of statistical data collected by Brave. On
the basis of Kepler’s laws, Sir Isaac Newton gave his famous law of gravitation, statistics
is widely used in the physical science such as astronomy, engineering, geology, physics
etc.,
5

9. Statistics in Social Science : According to Prof. ‘A.L. Bowley’, “Statistics is the science
of measurement of social organism regarded as a whole in all its manifestation.”
Sampling techniques and estimation theory are most useful tools of statistics which are
used in social science for conducting the social survey. The important application of
statistics in sociology is the study of death rates, birth rates, population growth etc.,
10. Statistics in Biology and Medical Sciences: The study of ‘correlation analysis’ given
by Prof. Karl Pearson is purely based on statistics. Moreover in medical sciences, the
statistical data related to the causes and incidence of diseases are of great importance.
For example, statistical papers used in the study of heart beats through
electrocardiogram [E.C.O.] Statistics has greater applications in psychology and
education which leads to ‘psychometry’ and in war. Thus statistics plays a vital role in
various disciplines and has greater managerial applications.

Q4: Explain the factors responsible for the development of statistics in modern
times.

Answer: Today, the statistics is considered as an important tool in taking decisions


regarding ‘uncertainty’. Almost every branch of science makes use of statistics to solve
their problems.

The development of statistics are subjected to two main factors which are:

1. Increased Demand of Statistics: In the modern time, there has been a tremendous
development in business, science, governmental activities, research, commerce etc.,
Statistics acts as an important tool in formulating appropriate policies in those fields.
In the business context, statistics is very essential in solving the problems of complexity
and growing needs. In the case of governmental activities, the demand of statistics has
also been increased. Earlier government was primarily involved in maintenance of law
and order. But today, government has appeared in almost all spheres. With the
growing number of governmental functions, the need of statistics arise. Similarly, the
advancement in science and extensive research has called for the need and assistance
of statistics, thus, the demand for statistics has greatly increased.
2. Decreasing Cost of statistics: The development of electronic machines like
computers and calculators, has reduced the cost and time required for data collection.
Du to the this reason, statistics is used increasing in solving the problems. In addition
to this, the development of statistical theory has led to the reduced cost of data
collection and processing. Also, a branch of statistics known as “Design of
experiments” has developed. It is helpful in collecting and analyzing the data more
rapidly and economically. Even though many scholars attempted to contribute to the
6

science of statistics, Sir Ronald Fisher (1890-1962) must be accredited for the
development and progress of statistics. His contribution has brought outstanding
development in the statistical sciences. Although statistical tools are widely used in
solving the problems, at times they are not accurate. It can be concluded that
statistical methods are the effective ways in drawing the conclusion so as to arrive at
the better result.

Q5:Discuss functions of statistics.

Answer: Functions of Statistics: The various functions of statistics are,

1. Makes the Complicated Data Easily Understandable: In statistics, statistical


methods such as totals, averages, percentages and so on are used to concise the huge
data. The huge data is converted into few significant figures so that the user can
understand the data easily.
2. Studies the Relationship between Two or More Variables: Statistics basically
studies the relationship between two or more variables with the help of correlation
analysis. This study helps to a great extent in forecasting and estimating future
changes. For instance, studying the relationship between supply and demand,
advertisement and sales and so on are essential and helpful in effective planning.
3. Presents the Data in a Definite Form: Statistics uses quantitative statements of facts
rather than vague statements. Quantitative statements of facts are clear and specific.
Therefore, they are easy to understand. For instance, if incase the two statements as
are taken into consideration.
Statement 1: There is a 10% increase in the Indian population from last year to this
year.
Statement 2: The population in India is increasing.
Statement 1 is more clear and definite when compared to the statement 2 because in
statement 1, quantitative figure is used.
4. Assists in Forecasting Activities: Statistics is an important tool which is used for
analyzing the activities related to commerce, trade and industry. This analysis helps in
ascertaining the trends in trade commerce and industrial activities which can be used
as basis for making forecasts regarding various aspects of the study.
5. Offers a Techniques for Comparison and Quantitative Facts: Statistics offers a
technique for comparing the quantitative facts. This is one of the important functions
of statistics. For instance, the data relating to sales, import, export, production and so
on is compared location wise and period wise. This comparison assists in ascertaining
the performance of economic activities.
7

6. Assist in Testing and Formulating Theories: Statistics not only assists in testing and
formulating theories in various fields but also assists in measuring the impact of such
theories on various fields. For instance, in agricultural and biological science,
statistical techniques are used for ascertaining the role of growth and development
activities of the plant. Consumers surveys, which are and market surveys carried out
effectively acts as the bases for formulating specific and clear production policies.
7. Assist in Formulating Policies: Statistical methods are useful in formulating various
economic and business policies. A survey conducted with respect to exports, imports,
production, wages and so on are useful in formulating policies and plans in the
respective fields.

Q6: Discuss the limitations of statistics.

Answer: Limitations of Statistics:

1. Statistics are Not Qualitative: Statistics are numerical statements. It is applied to only
those areas which are measured quantitatively. Hence the statements such as “the
number of students in a college have increased when compared to last year” does not
form statistics.
2. Statistics Does Not Constitute Isolated Facts and Figures: In statistics, the facts and
figures are always aggregate in nature rather than single or isolated. For example, price
of a single product, marks of a student, etc., cannot be regarded as statistics as these
figures can not be either related as statistics as these figures can not be either related
or compared with each other. The aggregate figures such as household income,
expenditure, profits and sales of a firm over different years constitutes statistics.
3. Statistical Laws are probabilistic in nature: Statistical laws are not exact they are
probabilistic in nature. The conclusions drawn by following statistical give laws
approximate results/values and not exact figures.
4. Statistics Can be Misused: Statistics can be misused if it is used by the non experts.
The figures can also be misused by politicians, unethical workers by manipulating the
facts for their personal selfish intentions. Statistics does not proves or disproves
anything. It is only a tool which can be very much useful if utilized approximately. In
case if it is misused by inexperienced or unethical statisticians then it may result in
false conclusions and may prove highly dangerous to the firm under consideration.

Q7: Explain in detail about the branches of statistics.

Answer: Branches of statistics:


8

The two main branches of statistics are 1] Descriptive statistics and 2] Inferential statistics.

1. Descriptive Statistics: The statistical methods included in this branch are collection,
presentation and characterization of data as to explain the different characteristics of
the set of data.
The various methods of descriptive statistics include,
(i) Graphic Method: Bar charts, pie charts and line graphs.
(ii) Numeric measures: Dispersion, kurtosis, measures of central tendency and
skewness.
2. Inferential Statistics: Inferential statistics includes the statistical methods involving
the estimation of population characteristics or decision making regarding the
population based on the sample results. The term population refers to a large group of
units about which inferences are to be done. Whereas, sample is a fraction, portion or
subset of the population.
Inferential statistics is classified as
(i) Parametric Statistics: It is based on the assumption that the population from
which the sap le is drown, is generally distributed. It can only be used when the
data collected is on ratio scale or internal scale.
(ii) Non-parametric Statistics: It has no explicit assumption about the normality of
distribution in the population. It can only be used when the data is collected on
ordinal or nominal scale. When the data is sought for number of elements such
as companies, household, customers, products, voters, individuals, there arises
the need of sampling to draw the conclusions about the population. Hence the
data is collected from small portion of population (i.e., sample) due to time, cost
and other concerns.

The concept of inferential statistics can be clearly understood from the following
definitions.

(a) Process: A process is nothing but a set of rules that collectively perform to
transform inputs into outputs. Example: Banking transaction.
(b) Population: It is a set of elements or observations associated with the phenomenon
under study for which a better comprehension and knowledge is required.
(c) Statistical Variable: It is a feature / characteristics of a population / process
defined operationally. It describes the quantity to be measured or observed.
(d) Sample: It is a set of few elements or observations of a process or population.
(e) Parameter: It is a descriptive measure related to a statistical variable that outlines
the features of entire population.
9

(f) Statistic: It is a numerical quantity that outlines the features of a sample drawn
from a population.
3. POPULATION Vs SAMPLE

Q8. Define the term population and sample. What are the ways into which
samples are classified?

Answer: POPULATION: The term population refers tp information of group of observations


about which inferences are to be made Population size denoted as “N” represents the number
of objects or observations in the population. Population may be finite or infinite depending
upon N being finite or infinite.

Examples: 1] Engineering students in Andhra Pradesh

2] Budget of India.

SAMPLE: The term sample refers to a finite subset of the population. Sample size is
represented as ‘n’ denoting the number of objects or observation in the sample.

Examples: 1] Engineering students of SACET

2] Budget of Andhra Pradesh.

TYPES OF SAMPLES: There are two ways into which samples are classified.

(a) Large Sampling: The sample comprising of objects which are more than 30 [i.e., n>=
30], it is known as large sampling.
(b) Small Sampling: The sample comprising of objects less than 30 [i.e., n<30], it is
known as small sampling.

Q9: What are the different methods of sampling?

Answer: Methods of Sampling: The different methods of sampling are (i) Purposive sampling
(ii) Random sampling (iii) Simple Sampling (iv) Stratified Sampling.

1. Purposive Sampling: If the sample of elements are selected with some purpose then it
is said to be purposive sampling.
Example: If there is a complaint against the defectiveness of components produced,
then sample of elements which are defective are considered, whereas others are not
considered. This is purposive sampling.
2. Random Sampling: If every element in [space] sample space have equal chance of
being included in test, then it comes under random sampling.
10

3. Simple Sampling: It is a special type of random sampling in which selection chance of


element in a sample is not dependent on the previous selection made is known as
simple sampling.
Example: The selection made with a coin or a die comes under, simple sampling.
4. Stratified Sampling: Heterogeneous population is divided into subpopulation of states
which are homogeneous within itself than in whole population.
Example: The population of people watching movies can be subdivided into strates of
people watching Hindi movies and people watching English movies. After dividing into
strates, random selection of individuals are made from [these] each stratum.
The aggregate of sampled individuals of each stratum is known to be stratified sample.
This technique of selecting is known as stratified sampling technique.
4] COLLECTION OF DATA, PRIMARY AND SECONDARY DATA:

Q10: What do you mean by data collection? What are the different types of
data?

Answer: Data Collection: The collection of data simply refers to a process of gathering all the
facts and figures related to a particular subject that is under investigation. These facts and
figures are often referred to as data. In other words, “Meaningful facts that are expressed in a
quantitative form can be termed as data”.

Data can be needed in any situation for making decisions. Success or failure of any
statistical investigation depends upon the accuracy and reliability of available data.

Therefore, data collection is considered as the basic activity in the decision-making


process.

TYPES OF DATA:

Data can be classified into two types based on the sources from where it is collected.
They are, 1] Primary Data 2] Secondary Data.

1. Primary Data: Primary data refers to the data collected specifically which is for the
purpose of research problem. It is the first hand information collected by the research
firm or by an external agent with the objective of solving a research problem. There are
different methods of collecting primary information, Researchers can conduct
experiments to gather the required information. Other methods include questionnaires,
mails, interviews of individuals, families, organizations, representatives etc.,
For example, the study of the working conditions of labourers in a big industry
conducted by the investigator himself or by his agent is know as primary data.
11

2. Secondary Data: Primary data for one party sometimes acts as the secondary data for
the other party. Secondary data refers to the existing data that have been collected
with an objective other than for research. It could be the data collected by the firm
itself for any other purpose, or by any external party for the same or other research
problem.
For example, inventory records maintained by a firm as a part of their routine operating
function acts as the internal secondary data of the firm for evaluating the seasonal
demand for its product. Alternatively, data gathered on industry demand by a
marketing research firm can also be used for the same research problem.

Q11: Explain the methods of collecting primary data with merits and demerits.

Answer: The following are the various methods of collecting primary data,

1. Direct personal investigation


2. Indirect oral investigation
3. Information through local sources or correspondents
4. Questionnaires sent via mail
5. Investigation by enumerators.
1. Direct Personal Investigation: In this method, the investigator collects the data
according to his/her own requirements either by direct or by personal contact. Here,
the investigator must be present on the spot and collect the data by himself. This
method is very important as the investigator himself inspects the situation, by avoiding
unnecessary collection of data. This method is used only when,
(i) The area of investigation is limited
(ii) The area of conclusion is required
(iii) Secrecy of data is required
(iv) The problem is complicated.

Merits:

(i) It results in the originality and accuracy of data


(ii) This is the most suitable method if the area of investigation if limited.
(iii) As the data is collected by a single person, the investigator, similarity and
homogeneity in data is present.
(iv) Checking of data is automatically done at the time of data collection.
(v) Allied information is also collected by the investigator present on each spot.

Demerits:
12

(i) It is not practically applicable in case of extensive enquiries.


(ii) It requires too much labor and time.
(iii) The prejudice of the investigator influences the data to a great exgtent.
(iv) It is not scientific.
2. Indirect Oral investigation:
In this method, the investigator does not contact the person directly. Instead of asking
questions to him the investigator makes enquiries from other persons [or third parties]
who have complete knowledge about that person. This method is generally used by
commission and enquiry committees.
This method is used only when
(i) The investigator faces difficulty to contact the concerned person or on making
contact the person refuses to give information or shows his disinterest in
providing the desired information.
(ii) The area of investigation is vast.
(iii) The investigation is required to be kept confidential from the related person.

MERITS:

(i) This method is easy


(ii) It is very useful in extensive enquiry.
(iii) It is the most inexpensive method as little amount of time, labor and money is
required.
(iv) It is free from the prejudices of the investigator.

DEMERITS:

(i) The data collected depends upon the indirectly obtained information i.e., the
information is gathered from those persons that are not related with the facts
and so the results are likely to be inaccurate and unreliable.
(ii) The information received is not free from the prejudices and ignorance of the
informers.

3. Information through Local Sources or Correspondents:


In this method, the investigator engages local people or correspondents who obtain
information in their own way and communicate the same to the investigator. Generally,
these informers obtain the information according to their own guess and experience
rather than collecting it. Hence, the accuracy and reliability of these information is
doubtful. This method is generally adopted by newspapers and magazines or where the
accuracy in investigation is not so important.
13

MERITS:
(i) This method is applicable in situations where the field of enquiry is extensive
and places of enquiry are scattered.
(ii) This method is economical as it saves time, labor and money.

DEMERITS:

(i) The information obtained by this method is non-homogeneous as it is collected


by different correspondents using different methods.
(ii) The information is not reliable and accurate as it depends upon the guesses.
(iii) The delay in sending information often makes the investigation ineffective.
4. Questionnaires Sent Via Mail:
In this method, the investigator prepares a list of questions related to every aspect of
the problem under investigation and places them in certain order. This is called
‘questionnaire’. This is sent by mail to the related persons with a letter of request for
sending the answers explaining the reasons and utility of the investigation. The
informants then sends the replies of the questionnaire to the investigator. On certain
occasions the informant requests the investigator to keep the information secret and the
investigator assures him about the secrecy of information. This method is used in case
of income tax or sales tax returns.
MERITS:
(i) This method can be applied for the vast area of investigation since the replies of
questionnaires can be obtained via mail from any corner of the world.
(ii) Since the information is received directly from the related persons, it is original,
reliable and free from prejudices.
(iii) It is economical as it saves time and labor to a great extent.

DEMERITS:

(i) The greatest disadvantage of this method is the unwillingness of a person to


reply or filling of questionnaire due to carelessness or fear of exposure.
(ii) This method requires the informers to be educated and intelligent enough to
understand and reply the questions, so as to limit the area of investigation.
(iii) The non-availability of answers to all questions compels the investigator to
modify his investigation according to the data available.
(iv) The results obtained from the data collected by this method sometimes go wrong
due to improper replies for the questions which are not very clear.
5. Investigation by Enumerators:
14

This method is an improvement over the previous method. Here some enumerators are
appointed who contact the related persons and fill the schedules after making enquires
from them [i.e., blank forms with questions printed and space for noting the replies].
These enumerators are allotted for different areas, so that only one enumerator can
contact one person.
This method is adopted by the government specially during the census calculations.
MERITS:
(i) By this method the information can be obtained from the uneducated persons
too.
(ii) Homogeneous information is obtained because the schedules are fully explained
to each enumerator.
(iii) This method is applied for the large areas of investigation.
(iv) The replies are obtained from all the related persons since the enumerators
contact them personally.
(v) The prejudices of the enumerators does not matter because of their availability
in large number.

DEMERITS:

(i) If the enumerators are inefficient and careless, the results obtained from the
information given by them are wrong and unreliable.
(ii) This is the most uneconomical method as it requires more time, labor and
money. Hence, it is generally adopted by the government organizations only.

Q12: Write the advantages and disadvantages of primary data.

Answer: Advantages of Primary Data:

1. It eliminates the possibility of occurrence of errors, as it does not involve copying of


figures, text from the material.
2. The primary data provides more accurate, correct, dependable and exact information
according to the objects and purpose of investigation.

Disadvantages of primary Data:

1. A large number of planning and execution processes which require a lot of time are
required for gathering data from a primary source.
2. The correctness of primary data relies on the features like honesty, integrity and
sincerity of the investigation being carried out by the investigator and the amount of
response from the related persons.
15

3. Gathering of primary data must be done by efficient, skillful, intelligent, tactful sincere
and trained investigators often leads to complexities such as time and money problems.
4. There is always a possibility of providing an improper information because of lack of
integrity among investigators and the related persons.

Q13: Discuss about the editing of primary data.

Answer: Editing of Primary Data: Immediately after gathering the data from primary or
secondary sources, the data must be edited. This editing process involves identification of
errors and mistakes present in the collected data. The degree of accuracy and the extent of
acceptable errors is decided early to avoid any confusion in later stages. However, editing of
primary data is an extensive process.

The primary data can be edited o the basis of the following factors.

`1. Completeness 2. Consistency 3. Uniformity 4. Accuracy.

1. Completeness: The editor should check whether the answer to each and every
question in the questionnaire is furnished or not. If any question is found to be not
answered then the respondent must be contacted to obtain the answer of the respective
question. But, if the editor fails to contact with the respondent then ‘No report’ must be
marked under that question. And moreover, if the question remains unanswered and is
of vital importance then the editor should discard the questionnaire itself.
2. Consistency: The editor should check whether the answers to questions in the
questionnaire is contradictory or not. If any mutually contradictory answers are found
then the editor must perform necessary action to obtain the correct answers. This
problem can be solved either by referring back to the questionnaire or by contacting
with the respondent.
3. Uniformity: The editor should check whether all the respondents have answered the
questions in the same sense or not. Because, some times it may happen that a single
question can be taken in different way by different respondents.
For example, consider a question of salary. Different respondents may take this
question in different sense. That is, some may answer this by writing yearly salary
while some may write monthly salary.
Therefore, if a question is found to be answered by different respondents in different
sense then that question must be reduced to some common base.
4. Accuracy: The editor should check whether the questionnaire that has been received
provides the correct information or not. If the information received is found to be
incorrect then it may result in misleading of investigation conclusion. Therefore, to
16

obtain accuracy appropriate actions must be taken to avoid wrong information. Editing
of data to obtain accuracy is one of the must complex task, but at the same time it is
necessary to carry out in order to obtain reliable conclusion of an investigation.

Q14: List and explain the various sources of secondary data.

Answer: Sources of Secondary Data: the major sources of secondary data are of two types.
They are [1] Published sources [2] Unpublished sources.

1. Published Sources: the main sources of obtaining published data are,


(i) Government Publications: From the reports of cerntral or state governments
or from various commissions appointed by them such as statistical abstract of
India or Wanchoo Commission’s report on Taxation etc.,
(ii) Semi-official Publications: From the reports of municipalities or various
boards like Khadi-Gramodyog Board or U.P. Education Board etc.,
(iii) Business Publication: From the publication of various banks, trade journals,
annual reports of chambers of commerce, market reports of stock exchanges
etc.,
(iv) News Agencies: From newspapers, magazines, journals etc.,
(v) Publications of International Bodies: From the publications of international
bodies or foreign government like statistical year book published by U.N. Other
international agencies include, United Nation Organization (UNO), World Health
Organization (WHO), International Labour Organization (ILO), International
Monetary Fund (IMF), World Bank, etc., All these agencies provide valuable
statistical data on a variety of socio-economic and political events.
2. Unpublished Sources: All statistical data need not be published. A major source of
statistical data produced by government, semi-government, private and public
organizations is based on the data drawn from internal records. This data is authentic
and is much cheaper when compared to primary data. Some examples of internal
records include employees payroll, the amount of raw materials supplied, cash receipts
and cash books. It is very difficult to have access to unpublished data. Great care
should always be taken while using the secondary data. Especially other people’s
statistics are full of pitfalls for the user unless used with caution. It is never safe to
take published statistics at their face value without knowing their meaning and
limitations and it is always necessary to criticize arguments based on them.

Q15. Discuss about editing of secondary data.


17

Answer: Editing of Secondary Data: The editing process carried-out on the secondary data is
much simpler than the editing of primary data. However, the secondary data must be used
with utmost care. In other words, the secondary data must be used only if it reliable, accurate,
adequate and compatible to the problem under investigation. All the inconsistencies, errors
and omissions present in the secondary data must be eliminated before using it. The process
of examining the secondary data for any inconsistencies, errors, omissions etc., can also be
referred to an ‘scrutinization of secondary data’. Thus it can be said that, it is never safe to use
secondary data without proper scrutinization.

The following are the factors that must be considered while using the secondary data.

1. Reliability of data 2. Compatibility of data 3. Adequacy of data.


1. Reliability of Data: The reliability of data can be measured by measuring the
reliability, integrity and expertise of the collecting organization. In addition to this, the
reliability of information sources and methods used for obtaining the data must also be
considered. The collecting organizations must also be examined to identify their
intention behind collection, compilation and presentation of data. Furthermore, after
the scrutinization of data, verify that data is not taken during the period of natural
calamities, economic slow down etc., because, such data will be different from normal
times data. Apart from this, the sample from which the data was obtained must be of
adequate size, and should contain the characteristics of the entire population. The data
should be free from sampling errors, this can be done by employing trained,
experienced and unbiased investigators. The population parameters must be carefully
considered thereby maintaining the accuracy of data.
2. Compatibility of Data: After verifying the reliability of collected data, it must be
checked for whether it is suitable for investigation or not. This can be done by
comparing the objectives, natural and scope of the given enquiry with the actual
investigation. The terms and units defined in the earlier investigation must be clean
and uniform and must be suitable for the current enquiry also. The difference in
collection timings and homogeneity of conditions between the current enquiry and the
actual investigation must also be considered.
3. Adequacy of Data: Even after verifying the reliability and compatibility of the
secondary data, it must not be used until it is checked for adequacy. The current
enquiry cannot be made if the available data is inadequate that is, if the coverage of the
actual enquiry is too shorter or too wider than the desired one.
18

Example:

Consider the consumption of petrol and oil in Andhra Pradesh, this data will be
inadequate if it requires to measure the consumption of oil and petrol for the entire country
because there may be fluctuations in consumption at different states. Similarly, if there exist
the consumption data for the entire country, it is difficult to identify the consumption at each
and every individual state.

Therefore, for obtaining the accurate and reliable data, the secondary data must be
subjected to a thorough scrutiny and complete editing process before it is used.

5] TYPE OF VARIABLE: DEPENDENT AND INDEPENDENT CATEGORICAL AND


CONTINUOUS VARIABLES.

Q16: Define variable. Discuss various types of variables.

Answer: Variable: The statistical data is gathered from the observations of several
individuals. Each of the observations collected as data are distinct.

A variable corresponds to quantity or quality that varies from one individual to the
other in the same population. The observations collected for a variable are called variate. Any
changes in the value of the variable associated with the branch of science is unpredictable,
thus, these variables are random variables.

Example: Ages of patients, heights of adult males etc.,

A variable can be dependent or independent.

(i) Independent Variable: It is also called as experimental variable as it is used to capture


the effect of its dependent variable in a experiment.
(ii) Dependent Variable: It is also called as outcome variable whose outcome is
dependent on independent variable. There are two different types of variables. They
are, (a) Quantitative variable (b) Qualitative [or] categorical variable.
(a) Quantitative Variable: When the variables are differentiated from each other
depending on the measurements, those variables are called as quantitative variables.
Example: Blood pressure, height, age, body temperature etc., that means the
observations carried out on individuals is denoted as quantity or a number. There are
two different types of quantitative variables. They are, (i) Continuous variables (ii)
Discontinuous variables.
(i) Continuous Variables: These variable have values that are within a particular
range that belong to the population. Most of the biological and medicinal
19

variables are continuous variables. Example: Height, weight, age, etc., that are
measured as fractional values.
(ii) Discontinuous Variables: These variables have fixed numerical values without
any intermediate values in between them. Example: Blood pressure, Blood
sugar, number of children in a family, pulse rate et.,
(b) Qualitative [or] Categorical Variable: A variable whose value cannot be measured but
can be used in categorizing individuals based on some quality are called as qualitative
variables.
Example: Marital status, qualification, sex etc., of an individual or the color of a flower
etc.,

6] DATA VISUALIZATION

Q17: Explain in detail about data visualization and also list its major
advantages.

Answer: Data Visualization: Data Visualization is a process of converting numeric data into
some meaningful images which are easily interpreted by the humans. In other words, it can be
defined as a graphical representation of information that provides the viewer with qualitative
interpretation of information. It is a study of visual representation, wherein the information is
abstracted in some schematic format. Earlier, the field of data visualization was related to
information graphics and statistical graphics but now it has become important in the areas of
research, teaching and development.

The main purpose of employing this technique is to reduce the difficulty in


understanding the textual data which is generally collected from numerous sources like
satellites, surveys or computer simulations. The current visualization technique depends on
computer graphics in order to represent the data in an innovative way such that complete
information becomes obvious as well as easy to understand via images and animations. The
common visualization techniques include, charts, graphs, plots, maps, 3D surface etc.,

Basically, a human brain has the ability of processing the visual information consisting
of different physical objects quickly when compared to textual information. The primary
function of data visualization tool is to help the users in examining the complex data sets. This
analysis is done by considering the physical characteristic properties such as transparency,
curvature, speed, color, lighting effects with regard to the data.

Advantages of Data Visualization: Following are the advantages of data visualization,

1. Data Visualization helps in identifying the hidden patterns present within the data.
20

2. It helps in developing polling applications that enables the humans to focus only on the
important issued.
3. It helps the organization to gain a competitive advantage by identifying the important
trends in corporate and market data.
4. It plays an important role in bit data and advanced analytics projects.
5. It has become an effective standard for modern business intelligence.
6. Data visualization tools have been important in democratizing data and analytics and
making data-driven insights available too workers in an organization.
7. It is typically easier to operate than traditional statistical analysis software or earlier
versions of business intelligence software.
8. It assists in including interactive capabilities, enabling users to drill into the data for
querying and analysis.

Q18: State the various application areas of data visualization techniques.

Answer: Application areas of data visualization techniques are as follows,

1. Telecommunication: Data visualization is used in the telecommunication areas where


the visualized information helps in performing the following:
(i) Managing the network operations.
(ii) Managing the call-center environment.
(iii) Analyzing the service policy.
(iv) Analyzing the market status.
(v) Analyzing the loaded patterns.
2. Retail Businesses: It is used in the retail businesses where visualized information
helps in performing the following:
(i) Analyzing the customer / product cross-selling.
(ii) Creating a strategy to determine the product’s price.
3. Banking Sector: It is used in banking areas where visualized information helps in
performing the following:
(i) Managing the E-banking
(ii) Determining the risks obtained through credit card transactions.
4. Insurance Sector: It is used in insurance areas where visualized information helps in
performing the following:
(i) Managing the assets and liabilities.
(ii) Modeling and calculating the insurance risks and premiums.
(iii) Managing and analyzing the workflows.
21

5. Government Sector: It is used in the government areas where visualized information


helps in performing the following:
(i) Analyzing the budget.
(ii) Managing the resources.
(iii) Analyzing the economic cost.
(iv) Detecting the illegal actions.
6. Transportation Sector: it is used in the transportation areas where visualized
information helps in performing the following:
(i) Analyzing the outputs.
(ii) Utilizing the assets.
(iii) Managing the movements.
7. Capital Markets Sector: It is used in capital market areas where visualized
information helps in performing the following:
(i) Evaluating the risks.
(ii) Trading the derivatives.
(iii) Marketing the retail products.
(iv) Creating institutional sales systems.
8. Healthcare Sector: it is used in the health care and medical areas where visualized
information helps in performing the following:
(i) Analyzing the remedies / therapies.
(ii) Analyzing claims, i.e., the request made for compensation in terms oof an
insurance policy.
9. Asset Management Sector: it is used in asset management areas where visualized
information helps in performing the following:
(i) Analyzing portfolio performance.
(ii) Optimizing portfolios (iii) Allocating assets.
22

7] MEASURES OF CENTRAL TENDENCY

Q19: What do you mean by measures of central tendency? List out the
characteristics of a good measure of central tendency.

Answer: Measures of Central Tendency: Measures of central tendency are also known as
Averages. Averages are the values that lie between the smallest and the largest observations.
It is the Mean of the given data. The five important measures of central tendency are
Arithmetic mean, Median, Mode, Geometric mean and Harmonic mean. All these averages can
be calculated for individual, Discrete and Continuous series.

Characteristics of a Good Measure of Central Tendency:

Some of the characteristics of a good measure of central tendency are as follows:

(i) It should be clearly defined and must not be ambiguous


(ii) It should be easy to understand and simple to calculate.
(iii) All observations must be taken into consideration while calculating the average
value.
(iv) It should not get affected by the two extreme observations.
(v) It should be capable if undergoing further mathematical or algebraic properties.

Q20: What is arithmetic mean? Write its merits and demerits.

Answer: Arithmetic Mean: Arithmetic mean is the most popular measure of central tendency
that is used only in case of quantitative data. Simply it is referred to as ‘Mean’. It is defined as
the sum of all observations by the total number of observations. Generally, mean for X

observations is denoted as X.

The computation of mean can be performed in different ways for different types of data
based on the way they are distributed. Basically there are two ways in which data can be
distributed. They are named as ungrouped (or individual series) data and grouped data.

Merits: The arithmetic mean is the most widely used measurement of central tendency due to
the following merits,

1. It is simple to understand.
2. It is simple to compute.
3. Its definition is inflexible (i.e., unique).
4. It remains unaffected even it there is any fluctuations in sampling.
5. It considers all the observations.
23

6. It can be used for comparison.

Demerits: Following are the demerits of arithmetic mean,

1. The extreme items can affect the arithmetic mean excessively.


2. It is not possible to obtain the accurate mean even if a single value is unknown.
3. It is not possible to compute the mean in open-end classes unless the size of the class-
intervals is assured.

Q21. Discuss about the computation of mean for ungrouped data.

Answer: Mean for Ungrouped Data: Ungrouped Data is a raw data that does not go under any
statistical treatment.

If X is a variable with N number of observations such as x1 , x2 , ........, x N , then the


mean for x can be calculated by any of the following methods.

(i) Direct Method (or) Actual Mean Method: In this method, mean is obtained simply
by adding the values of all observation in the given series divided by the total
number of observation in that series. It is mathematically denoted as,

x
X = x = x1 + x 2 + .......... + x N
i
where, i
N
N = Number of observations.
(ii) Short-cut Method (or) Indirect Method (or) Assumed Mean Method: For a data
containing more number of observations or large figures, the calculation of mean by
direct method becomes complex. This complexity can be reduced by using short-cut
method. In short-cut method, an arbitrary value from the given data set is assumed
as a mean which is then used to take the deviations from each observation. After
this, all the deviations are added and then it is divided by the total number of
observations. Finally, the resultant value is added with the assumed mean to
obtain the actual mean. It is mathematically denoted as,

X = A+
d i
Where, A = Assumed Mean
N
d i = d1 + d 2 + ............. + d N
d i = xi − A
N = Number of observations.
(iii) Step Deviation Method: The calculation of mean can be further simplified by
using step deviation method. In this method, all the deviations obtained from
24

assumed mean is divided by a common factor(say ‘C’) of all the given observations.
The formula to calculate mean by step deviation is given as,

X = A+
d' i
XC Where, A = Assumed Mean
N
d' i = d '1 + d ' 2 +............. + d ' N
xi − A
d 'i =
C
C = Common factor of observations
N = Number of observations.

Q22: Discuss about the computation of mean for grouped data.

Answer: Mean for Grouped Data: Grouped data is a data organized in a tabular format
containing different groups. There are two forms of grouped data: Discrete series and
continuous series.

I. Mean for Discrete Series: Discrete series consists of both variables without class

intervals and frequencies of the variables. If x1 , x2 , ........, x N , are different variables


with frequencies f1 , f 2 , ........, f N , respectively, then the mean for discrete series can
be calculated by any of the following methods.
(i) Direct Method:

fx
X = fx = f1 x1 + f 2 x 2 + ................ + f N x N
i i
, where
f
i i
i

f i = f1 + f 2 + ......... + f N
(ii) Short-cut Method:

X = A+
fd i i
, where A = Assumed mean from observation and d i = xi − A
f i

fd = fd i i 1 1 + f 2 d 2 + ................ + f N d N
f = f +fi 1 2 + ......... + f N
25

(iii) Step Deviation Method:

X = A+
 f d' i i
XC Where, A = Assumed Mean
f i

 f d'
i i = f1 d '1 + f 2 d ' 2 +............. + f N d ' N
xi − A
d 'i = ; C = Common factor of observations;  f i = Total frequency.
C

II. Mean for Continuous Series: In continuous series, Variable X are given in terms of
class intervals. Therefore, the midpoint (m) of different classes are considered. And
using this mid-points and the given frequencies the calculation for mean is
performed in the same way as that of a discrete series.

If l1 − u1 , l 2 − u 2 , .............. l N − u N are different class intervals with frequencies


f1 , f 2 ,........, f N then the mean for this series can be calculated by any one of the
following methods.
(i) Direct Method:

X =
fm i i
, where mi =
l i +u i
[ Mid po int oof each class int erval
f i 2
fm = fm
i i 1 1 + f 2 m2 + ................ + f N m N
f = f +f
i 1 2 + ......... + f N
(ii) Short-cut Method:

X = A+
fd i i
, where A = Assumed mean from mid values and
f i

d i = mi − A [ Deviation mid value from assumed mean)


fd = fd
i i 1 1 + f 2 d 2 + ................ + f N d N
f = f +f
i 1 2 + ......... + f N
26

(iii) Step Deviation Method:

X = A+
 f d'i i
X C .I . Where, A = Assumed Mean from mid value
f i

f = Frequency; d = Deviation , d ' = Step Deviation


 f d'i i = f1 d '1 + f 2 d ' 2 +............. + f N d ' N
mi − A
d 'i =
C.I .
; C.I . = Common Interval ; f i = Total frequency.

NOTE: In continuous series, the class intervals can be of three types

(i) Exclusive: Ex: 0 -10, 10-20 and so on


(ii) Inclusive: Ex. 0-9, 10-19 and so on
(iii) Unequal: Ex. 0-10, 10-30 and so on

For all such class intervals, the arithmetic mean is calculated in a similar manner.

Q23: State the properties of arithmetic mean.

Properties of Arithmetic Mean:

1. The sum of the deviations of a series of values taken from their arithmetic mean results
in zero. i.e., x1, x2,……….,xN, are a set of values with frequency distribution f 1, f2, ….fN
N
then  f (x
i =1
i i − x) = 0 .

2. The sum of the squared deviation of a set of values taken from arithmetic mean is
always minimum.

3. If n1 and n2 are the sizes of two data sets with their means as x1 and x 2 respectively,

n1 x1 + n 2 x 2
then the combined mean is defined as x = .
n1 + n 2

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy