1) Biostatistics Introduction Note
1) Biostatistics Introduction Note
Learning objective
One can understand the importance, use and meaning of Statistics after going through
this module.
STATISTICS
Meaning of Statistics
The word ‘Statistics’ has come from the Latin word ‘status’ ,the Italian word ‘statista’ or
the German word ‘statistik' , the French word ‘statistique’, each of which means political
state.
In early days, facts and figures about the financial resources, births and deaths,
army strength and income were collected for the purpose of efficient
administration which was called statistics i.e, anything pertaining to the state.
Drvet.in 2020
Now a days Statistics is not only the science of state but it plays an important role
in all walks of life and in all branch of scientific enquiry. In fact, statistics has
become one of the essential tools in modern biology.
Usually, the word 'statistics' carries different meanings depending on the
occasion in which it is used.
o For e.g., it may mean statistical data which refers to quantitative
information, statistical method which means the methods dealing with
quantitative information or statistical measures of a sample. i.e.,
Arithmetic mean, standard deviation etc. of a sample.
By statistical data, we mean the aggregate of facts which are affected by
multiplicity of causes, numerically expressed, estimated to a reasonable
standard of accuracy and collected in a systematic manner for a pre -
determined purpose.
Statistical method includes collection, classification, tabulation, presentation,
analysis and interpretation of data.
Biostatistics is the application of statistical methods to the problems of biology
including human biology, medicine and public health.
Biostatistics is also called Biometry meaning "biological measurement".
Functions of Statistics
Limitations of Statistics
DEFINITIONS
Population
When a few units are selected from a population, it is called as a sample. (e.g.)
animals of a particular breed in a farm.
Variable
Constant
It is a numerical value, which is same for all the units in the population. (e.g.) no.
of credit hours for B.V.Sc students.
Attribute
It refers to the qualitative character of the items chosen. (e.g.) breed of an animal.
Parameter
Statistic
Continuous variable
Learning objective
The learner will get an idea of the ways of collecting and simplifying the data
after going through this module.
Drvet.in 2020
COLLECTION OF DATA
A statistical investigation always begins with collection of data. One can collect the data
either by himself or from available records.
The data collected by the investigator himself or by his agent from the sample or
population are called as the primary data.
The source from which one gathers primary data is called as the primary source.
The data collected from the available sources is known as secondary data.
The source from which we are getting secondary data is known as secondary
source.
PRIMARY DATA
Direct personal observation: The investigator himself goes to the field of enquiry
and collects the data.
Indirect personal observation: The investigator collects data from a third person
(called as witness), who knows about the data being gathered.
Data collection through agents, local reporters etc: Here the investigator
appoints some person called agents or local reporters on his behalf to collect
information.
Data collection through questionnaires: The investigator prepares the needed
information for the particular study in the form of questions,called
questionnaires and sends the same to the respondents to collect data from the
respondents.
give information.
Note
SECONDARY DATA
The data collected from the available sources like published reports, documents,
journals etc. are called secondary data.
The source from which the secondary data are collected is called as secondary
source of data.
While the primary data are collected for a specific purpose, the secondary data
are gathered from sources which were done for some other purpose.
Merits
Drvet.in 2020
It saves time, labour and money.
Demerits
CLASSIFICATION OF DATA
Classification is the process of arranging data into sequences and groups according to
their common characteristics or separating them into different but related parts.
Objectives of Classifications
Methods of Classification
Numerical Classification
o Classification of data according to quantitative characters. (e.g)
classification of animals in a farm according to their weight
Descriptive Classification
o Classification according to attributes i.e, qualitative characters. (e.g).
classification of animals according to breeds
Spatial or Geographical Classification
o Classification according to geographical area. (e.g) district-wise
livestock population in Tamil Nadu
Temporal or Chronological Classification
o Classification according to time (e.g) livestock population in different
years
Classification according to class interval or frequency distribution
o When the data are grouped into classes of appropriate interval, showing
the number in each class, we get frequency distribution.This is called
grouped data.The original data is called raw data.
The following is the frequency table showing the distribution of chicks in
different weight classes.
44-48 17
48-52 05
52-56 06
56-60 10
Total 75
Terms used in Frequency distribution
Data are classified or grouped into regular intervals with the range of values of
the data (Class Interval) with the lower and upper limits which iare known as
Class Limits.
True Class Interval
o When the Class Intervals are continuous, it is called True or Inclusive
Class Interval.
Apparent Class Interval
o When there is a small gap between the upper boundary of any class and
lower boundary of sucessive class, then the Class Interval is called
Apparent or Exclusive Class Interval.
Width 0r length of the Class Interval is the difference between the upper
boundary and lower boundary of the same class.
Class Mark
TABULATION OF DATA
Tables are more comprehensive and intelligible and carry a lasting impression on
the mind of the reader.
Tables facilitate quick comparisons.
Drvet.in 2020
Tables facilitate economy of space (while presenting) and time (while reading)
Relationship and other relevant characteristics of item can be easily marked out
in tabulated data.
The title should be short but clear and it should give a full idea of its contents.
The column and row headings should be self explanatory.
Footnotes may be given if absolutely necessary.
Prominence may be given to important facts by different methods of mailing and
spacing.
To have better clarity, space should be left after every five to ten rows.
It the table is taken from secondary data, it is advisable to give a source note for
the table mentioning the source for which the data is collected.
Types of table
The class interval should be of equal width and of such size that the characteristic
features of the distribution are displayed.
Classes should not be too large (or) too small. If too large, it will involve
considerable errors in assuming that the midpoints of the class intervals are
the average of that class. If too small, there will be many classes with zero
frequency (or) small frequency. There are however certain type of data, which
may require the use of unequal or varying class intervals.
When there is irregular flow of data and wide fluctuating gap among the varieties,
varying class intervals are to be taken (or) otherwise there may be a possibility
of classes without any frequency or observations falling in that category.
The range of the classes should cover the entire range of data and the classes
must be continuous.
It is convenient to have the midpoint of the class interval to be an integer. As a
general rule, the number of classes should be in the range of 6-16 and never
more than 30.
Drvet.in 2020
First we have to form the class interval. The difference between maximum and
minimum values in the collected data are noted and it is to be divided by the
number of required classes. This value should be rounded off to our
convenience.
The number of required classes can be calculated using the formula suggested
either by Sturge’s rule or Yule's rule.
Sturge’s rule
Yule's rule
K = 2.5 x n ¼ (approx.)
After forming the class interval each should be written one below the other and
for each item in the collected data a stroke is marked against the class interval
in which it falls.
Usually after every four such strokes in the class interval, the fifth item is
indicated by striking the previous four strokes, thus, making it easy to count.
These strokes are counted and this is called formation of frequency distribution
by the method of tally marks.
Array Method
Learning objective
This helps the reader to know about the various ways of representing the data
by means of diagrams and graphs so that the voluminous numerical data can
be exhibited by attractive pictures.
PRESENTATION OF DATA
Introduction
Functions
Limitations
Histogram
Frequency Polygon
If points are plotted with the x co-ordinate equal to the mid value of the class
intervals and the corresponding frequencies as the y co-ordinate and these
points are joined by means of a straight line, we obtain frequency polygon.
These points are the midpoints of the top of the bars in the histogram.
Frequency Curve
If points are plotted with the x co-ordinate equal to the mid value of the class
intervals and the corresponding frequencies as the y co-ordinate and these
points are joined by means of a smooth curve then we get frequency curve.
Ogive
A frequency distribution gives the number of observations that lie in any class
interval whereas the cumulative frequency distribution gives the number of
frequencies that lie below any mark or above any given mark.
When derived from a frequency distribution, the cumulative frequency
distribution of one kind gives the number of observations less than the lower
boundaries of the successive class and the cumulative frequency distribution of
the second kind gives the number of observations that exceed the lower
boundaries of the class which are respectively known as the less than and
greater than cumulative frequency distribution.
Drvet.in 2020
If we draw frequency polygon to the above two distribution we get cumulative
frequency polygon (less than and greater than).
If we draw a frequency curve to the above two distribution in the same graph, we
get cumulative frequency curve or Ogive.
The x co-ordinate of the point of intersection of less than and greater than
cumulative frequency curve is the median.
Lorenz Curve
This is a modification of the Ogive when the variables and the cumulative
frequencies are expressed as percentages.
It serves to measure the evenness of the distribution and is useful in picturing the
distribution and dispersion of wealth, sales and profits etc.,
Types of a diagram
Line diagram
o This requires vertical lines to be drawn at equal intervals each of length
proportional to the magnitude of the variable for the different items.
o It has no width and hence of very poor visual effect.
o It makes comparison easy although it is less attractive.
Bar Diagram
o It is the simplest of all statistical diagrams.
o It consists of bars of equal width (all horizontal or vertical) standing on
a common base line at equal intervals, the length of the bars being
proportional to the magnitude of the variable for different items.
Sub-divided bar diagram or component bar diagram
o Sometimes the variable is capable of being sub-divided into two or more
component parts each representing a sub variable.
o In this case, all the bars are subdivided by lines in the same order so
that each subdivision represents the parts in magnitude in the same
scale.
o They are properly coloured or marked differently for visual guidance.
o Small squares should be given below the diagram containing the same
colour or mark to show their significance.
Superimposed or Multiple bar diagram
o Bars may sometimes be superimposed for comparative purpose.
Percentage bar diagram
o When the component parts are expressed in percentages of the whole,
the resulting bar diagram is called a percentage bar diagram.
o In this case all the bars are of equal length.
Pie diagram
Circles with area proportional to the magnitudes of the data are drawn (i.e.) radii
proportional to the square root of the magnitude of the data and the
components(sub variables) are drawn with sectors proportional in area to their
magnitude.
A circle subtends an angle of 3600 at the centre and this represents the total. The
required angle of the sector representing the component is calculated and area
distinguished by different colours or markings and key for this should be given.
It is usual to start from a horizontal radius to the right and proceed in the anti-
clock wise direction giving the quantities in descending order of magnitude
except the miscellaneous which is shown at the end.
Drvet.in 2020
The lengths have more visual effect than areas and hence it is of less use for
comparative purpose. It is commonly used to represent single observation with
different components.
PICTOGRAM
Learning objective
Readers of this module will come to know the methods of condensing the data
by means of a single figure and comparing two or more distributions.
MEASURES OF AVERAGE
Need of an average