0% found this document useful (0 votes)
83 views103 pages

RMBS BPT402

The document outlines concepts related to research methodology and bio-statistics. It discusses various measures of central tendency (mean, median, mode), measures of variation (range, standard deviation), probability, correlation, regression, sampling, and vital statistics. Key summary measures covered include the mean, median, mode, quartiles, range, and standard deviation. Methods of data collection, presentation, and calculating various statistical values are also presented.

Uploaded by

Atul Dahiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views103 pages

RMBS BPT402

The document outlines concepts related to research methodology and bio-statistics. It discusses various measures of central tendency (mean, median, mode), measures of variation (range, standard deviation), probability, correlation, regression, sampling, and vital statistics. Key summary measures covered include the mean, median, mode, quartiles, range, and standard deviation. Methods of data collection, presentation, and calculating various statistical values are also presented.

Uploaded by

Atul Dahiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 103

Research Methodology

and Bio-Statistics
BPT-402
Syllabus
Measures of central tendency or measures of Location – Mean,
Median Mode in Ungrouped& grouped series. Partition Values –
Quartiles, Deciles, Percentiles in Ungrouped& grouped series.
Graphical Determination of Median, Mode & partition Values.
Measures of Skewness – Pearson’s and Bowley’s coefficient of
Skew ness. Measures of Dispersion or Variation – Range, Mean
Deviation, Standard Deviation.
Probability – Random experiment, sample space, events,
probability of an event, addition & multiplication laws of
probability, use of permutations & combinations in calculation
of probabilities, random variable, probability distribution of a
random variable, Binomial Distribution.
 Correlation – Bivariate distribution, scatter diagram, coefficient of correlation, calculation &
interpretation of correlation coefficient.
 Regression – Lines of regression, calculation of Regression coefficient.
 Sampling Variability & significance – Sampling Distribution, Standard error, null hypothesis,
alternative hypothesis, Type I & Type II errors, tests of significance, acceptance 7 rejection of null
hypothesis, level of significance, Z test, t test (paired & unpaired), chi-square test.
 Estimation of confidence limits & intervals.
 Vital Statistics
1) Rates & ratios of vital events.
2) Measures of Mortality: - Crude Death Rate, Specific Death Rate, Age Specific DeathRate,
Standardized Death Rates, Infant Mortality Rate.
3) Measures of Fertility: - Crude Birth Rate, General Fertility Rate, Specific FertilityRate, Age Specific
Fertility Rate, And Total Fertility Rate. Measurement of Population Growth: - Crude Rate of Natural
Increase & Pearli’s Vital Index, Gross Reproduction Rate, Net Reproduction Rate.
5) Measures of Morbidity: - Morbidity Incidence Rate, Morbidity Prevalence Rate.
6) Life Tables or Mortality Table.
Summary Measures

Summary Measures

Central Tendency Quartiles Variation


Mean Mode
Median
Range Coefficient of
Variation
Variance

Standard Deviation
Geometric Mean
Collection and Presentation of Data

 Collection
 Data: Foundation of Statistical analysis and interpretation
 Data Sources: Primary and Secondary, Internal and External records
 Presentation:
 Classification
 Chronological, Geographical, Qualitative, Quantitative
 Quantitative- Frequency distribution
 Class intervals- class limits, class mid point, inclusive and exclusive methods
 Tabulation
 Charting
Some important concepts

 Variable
 Continuous- Measurement(height. Weight, etc.)
 Discrete/ Discontinuous- counting (Number of Rooms, number of persons)
 frequency
Summary Measures

Summary Measures

Central Tendency Quartiles Variation


Mean Mode
Median
Range Coefficient of
Variation
Variance

Standard Deviation
Geometric Mean
Measures of Central Tendency

 Tendency of some central value around which data tends to cluster


 Averaging
 Example Average age, average weight
 Why Averaging:
 To get one single value that describes the characteristics of the entire data
 To facilitate comparison
Measures of Central Tendency

Central Tendency

Mean Median Mode


n

X i
X  i 1

n
N

X i
 i 1

N
Chap 3-9
Mean/ Arithmetic Mean

 Average
 Adding together all the observations and dividing this total by the number
of observations
Mean (Arithmetic Mean)

 Mean (Arithmetic Mean) of Data Values


 Sample mean Sample Size
n

X i
X1  X 2   Xn
X i 1

n n
 Population mean
Population Size
N

X i
X1  X 2   XN
 i 1

N N
Mean (Arithmetic Mean)
(continued
)

 The Most Common Measure of Central Tendency


 Affected by Extreme Values (Outliers)

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Mean = 5 Mean = 6
Mean (Arithmetic Mean)
(continued
From a Frequency Distribution )
 Approximating the Arithmetic Mean
 Used when raw data are not available

c

m
j 1
j fj
X
n
n  sample size
c  number of classes in the frequency distribution
m j  midpoint of the jth class
f j  frequencies of the jth class
Median
 Robust Measure of Central Tendency
 Not Affected by Extreme Values

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Median
 In an Ordered = 5the Median is the ‘Middle’ Number
Array, Median = 5

 If n or N is odd, the median is the middle number


 If n or N is even, the median is the average of the 2 middle numbers
Median
 Positional Average
 Appears in the middle of an ordered sequence of values
 Not influenced by extreme values
 Positional average, thus, doesn’t take all observations into consideration
 Not capable of algebraic treatment
 Calculation of Median (Ungrouped data)
Median – Size of (N+1)/ 2 th Observation

Grouped Data
Use N/2 to locate Median Class
Median= L + {(N/2 - P.C.F.)/ f } X i
Related Positional Measures

 Divide the series into equal number of parts


 Quatiles (Q) divide into 4 equal parts
 Deciles (D)divide total frequency into 10 equal parts
 Percentiles (P) divide total frequency into 100 equal parts

 3 Quartiles (Q1 to Q3)


 9 Deciles(D1 to D9)
 99 Percentiles (P1 to P99)
Calculation of Quartiles, Deciles and
Percentiles
 Q j = L + [(jN/4 – pcf)/f] i {j= 1 to 3}
 Dk = L + [(kN/10 – pcf)/f] i {k= 1,2,….. 9}
 Pl = L + [(lN/100 – pcf)/f] i {l= 1,2,….. 99}
Mode
 A Measure of Central Tendency
 Value that Occurs Most Often
 Not Affected by Extreme Values
 There May Not Be a Mode
 There May Be Several Modes

0 1 2 3 4 5 6
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
No Mode
Mode = 9
Calculation of Mode

 Ungrouped Data
 Tally Marks
 Grouped Data
Mo = L [(∆1/ ∆1+ ∆2) i]

L= Lower limit of Modal Class


∆1 = Diff between the frequency of modal class and the frequency of pre-modal
class
∆2= diff between frequency of modal class and frequency of post modal class
I = size of modal class
Relationship between Mean, Median
and Mode
 Mode = 3 Median – 2 Mean
 Symmetrical and Skewed distribution
Measures of Variation
 The extent to which observations vary from one another and from some
average value
 Spread/ Variation
 Measures the amount of variation not the direction of variation

A: Same Mean, B: Different Means, C: Different Means,


different variations Same variation different variations
Why measure variation?

 To:
 Determine reliability of average
 Basis to control variability
 Comparing two or more series on the basis of variability
 Use more statistical measures
Methods of Measuring Variation

Methods

Interquartile Average Standard Lorenz


Range
Range Deviation Deviation Curve
Absolute and Relative Measures

 Absolute: given in same statistical units in which original data is given


 Compare two distribution of same units and almost same average value
 Relative: different statistical Units’ comparison or distribution with different
average Values
 Coefficient: pure numbers independent of the unit of measurement

 Measure of relative variation or Coeff of Variation= Measure of Absolute


Variation/ Average
1. Range

 Simplest method of studying variation


 Diff. between value of the smallest observation and the value of the largest
observation
 Formulae
Ungrouped data Range= L-S
L= Largest Vale, S= Smalest Vale
Coefficient of Range= (L-S)/(L+S)
Grouped data
Range= Upper Limit of highest class- Lower Limit of Lowest Class
Limitations

 Most unreliable
 Not based on each and every observation
 Subject to fluctuations of considerable magnitude from sample to sample
 Cant tell us about the character of distribution within 2 extreme
observations
2. Interquartile Range and Quartile
Deviation
 Range which includes middle 50% of the observations i.e. one quartile
lower end and another quartile upper end

 Interquartile Range= Q3-Q1

 Quartile Deviation: average amount by which the two quartiles differ from
the median (An absolute measure)

 SemiQuartile Range or Quartile Deviation= (Q3-Q1)/ 2


 Coefficient of VD= (Q3-Q1)/(Q3+Q1)
Limitations

 Ignores 50% of items


 Not capable of mathematical manipulation
 Measure of patrician rather than variation
Average Deviation

 Calculating the absolute deviations of each observation from median or


median and then averaging these deviations by taking their arithmetic
mean
 Limitation: Signs are ignored
 ADMed =(∑│X- Med│)/ N
 ADMean= (∑│X- Mean│)/ N
 Coeff. Of ADMed = AD/Median
Standard Deviation

 Introduced by Karl Pearson in 1893


 Most widely used in studying variation
 Measure of spread or variability present in sample are very close to each
other
 Also called ‘Root Mean Square Deviation’

= Variance
Calculation of Standard Deviation
 Ungrouped Data
1. Deviations from Actual mean

2. Deviations from Assumed Mean


 Grouped Data
1. Deviations from Actual Mean

2. Assumed Mean
Mathematical Properties of Standard
Deviations
 1. Combined S.D.
 2. S.D. of Natural Numbers

 3. Sum of squares of deviations of all the observations from their arithmetic


mean is minimum
4. Symmetrical Distribution

Mean +- 1= 68.27%
Mean +- 2= 95.45%
Mean +- 3= 99.73%
Merits & Limitations

 High degree of accuracy


 Takes into account all the observations
 Capability of mathematical calculations

 Gives more weightage to extreme values


Correcting Incorrect Value of SD

 Example:
N= 100
Mean= 40
SD= 5

The computer by mistake took the value 50 in place of 40 for one of the
observations.
Find the Correct Mean & Variance.
MEASURES OF SKEWNESS & KURTOSIS
 Measures of central tendency and variation discussed do not reveal the
entire story about a frequency distribution
 Two distributions may have the same mean and SD but may differ in their
shape of the distribution
SKEWNESS

 In a symmetrical distribution, mean, median and mode are equal


 In a skewed distribution, these values differ
 Mean>Mode= Positively Skewed
 Mean<Mode=Negatively Skewed
Difference between Variation and
Skewness
 Variation tells about the amount of variation, skewness tells about the
direction of variation
 Variation is more practically applicable than skewness

 Measures of Skewness
 Lack of symmetry or departure from symmetry
 Measures- Absolute & Relative
 Farther Mean and Mode, Higher the skewness
 Distance between mean & mode is Karl Pearson’s basis for measuring skewness
 Relative Skewness= Absolute Skewness/SD
 Karl Pearson’s Coeff. Of Skewness

 Bowley’s Coeff.
Moments

 Used to describe the characteristic of a distribution. They represent a convenient and


unifying method for summarizing many of the most commonly used descriptive statistical
measures such as central tendency, variation, skewness and kurtosis.
 Represented by
 Ungrouped Data

 Grouped Data
Kurtosis

 Shows the degree of concentration either the value concentrated in the


area around the mode or decentralised from the mode of both tails of the
frequency curve. (Peaked Curve and Flat Topped Curve)
 Greek ‘Kurtosis’ for ‘Bulginess’
 Distribution is measured relative to the peakedness of the a normal curve
Measures of Kurtosis
CORRELATION

 Topics to be discussed:
 Bivariate distribution
 Scatter diagram
 Coefficient of Correlation
 Calculation & interpretation of correlation coefficient
Correlation

 “the tool with the help of which the relationships between two or more than
two variables is studied is called correlation”
 So far we have discussed problems relating to one variable only
 Correlation is for 2 or more than 2 variables
 measure= Coefficient of Correlation
 Denoted by ‘r’
Why use correlation?

 To measure in one figure the degree of relationship existing between the


variables
 With the help of regression we can estimate the value of one variable
given the value of another
 To list & locate critically important variables on which others depend
Types of Correlation

 Positive & Negative


 Simple, Partial & Multiple
 Linear & Non-linear
Methods of Studying Correlation

 Scatter Diagram
 Karl Pearson’s Coefficient of Correlation
 Spearman’s rank correlation coefficient
 Method of least squares
1. Scatter Diagram

 Dot chart/ dotogram


 Data plotted on graph & form an idea if the variables are related or not.
2. Karl Pearson’s Coefficient of
Correlation
 Most Widely used
 Denoted by ‘r’
Deviations from actual mean:

or where,
 When Original Observations are used, instead of deviations

Or

r=
R = +1; Perfect Positive Correlation
R= -1; Perfect Negative Correlation
R = 0; No Correlation
 Taking deviations from Assumed Mean
 Correlation between Bivariate Grouped Data
Properties of Correlation Coefficient
 Lies between
 Independent of Origin and Scale
 Geometric Mean of two regression coefficients
 r = 0 in case of independent variables

 Limitations
 Assumptions of linear relationship
 Affected by extreme values
 Time consuming method
Regression Analysis

 Topics to be studied
 Lines of Regression
 Calculation of Regression Coefficient
Regression

 “The statistical tool with the help of which we are in a position to estimate
(or predict) the unknown values of one variable from known values of
another variables”
 Dictionary Meaning- Act of Returning or Going Back
 Francis Galton, 1877, study of heights of fathers and sons.
 The line describing this tendency to regress or going back is called
‘regression line’
Types of Regression

 Simple Regression
 Multiple Regression
Linear Bivariate Regression Model

 Assumptions
 Value of dependent variable Y is dependent in
Some degrees upon independent Variable X
 Linear relationship between X and Y
Regression Lines

 2 Variables, 2 regression lines- Regression line of X on Y and Regression line


of Y on X
 Perfect Positive or Perfect Negative Correlation- Two lines will coincide into
one
 Two lines very far from each other- lesser degree of correlation
 Two lines closer to each other- higher degree of correlation
 Independent variable- lines at right angle to each other
 Regression lines cut each other at the point of average of X and Y
Regression Equations

 RE are algebraic expressions of the regression lines


 Two lines, two equations
 Regression line Y on X, Y=a+bX

 Regression line X on Y, X= a+bY


 Deviations from mean X and Y
 Deviations from Assumed Mean
Regression Coefficient
 “b” in the regression equation is called regression coefficient or slope
coefficient
 Measures the amount of change in one variable corresponding to a unit
change in another variable
 2 equations, so, 2 regression coefficients;
 RC of X on Y and RC of Y on X
 Reg. Coeff of X on Y is represented by bxy
 Reg. Coeff of Y on X is represented by byx
Regression Coefficient of X on Y
Regression Coefficient of X on Y
Correlation and Regression
Coefficients
SAMPLING VARIABILITY & SIGNIFICANCE

 SAMPLING DISTRIBUTION
 Census Vs. Sampling
 Sample Statistic Vs. Population Parameter
 Statistical Regulation and Inertia of Large Numbers
 Statistical Inference
 SAMPLING DESIGN
 Selecting a subset of units from a target population for the purpose of collecting
information
 Economical and accurate research process
 Types: Probability and Non Probability Sampling
Probability Sampling Techniques

Probability
Sampling

Simple Systematic
Stratified Cluster
Random Random
Sampling Sampling
Sampling Sampling
Non-Probability Sampling Techniques

Non-
Probability
Sampling

Convenience Purposive Quota Snow Ball Multi-Stage


Sampling Sampling Sampling Sampling Sampling
Sampling Error

 A statistical error that occurs when researcher doesn’t select a sample that
represents the entire population of data.
 Biased errors and Unbiased Errors
HYPOTHESIS

 An assumption about the population parameter to be tested based on


sample information
 To be tested using statistical tools
 Null and Alternate Hypothesis
 Null hypothesis (Ho) asserts that there is no difference in the sample statistic and
population parameter under consideration
 Alternate hypothesis (H1) is the hypothesis that is different from the null hypothesis
 Rejection of null hypothesis indicates that the differences have statistical
significance & acceptance of null hypothesis indicates that the differences
are due to chance
Significance level

 The confidence with which an experimenter rejects or retains null


hypothesis depends on the significance level adopted
 Denoted by α
 Generally 5% or 1%
 α=0.5 (significant test result)
 α=0.01 (highly significant)
Critical region

 Which values of test statistic will lead to rejection of Ho and acceptance of


H1
two-tailed test

 The hypothesis about the population mean is rejected for value of falling
into either tail of sampling distribution
 Ho: μ=100
And H1: μ≠100
One-tailed test

 Hypothesis about population mean is rejected only for value of falling into
one of the tails of sampling distribution
 Right tailed test or left tailed test
 Ho: μ=100
And H1: μ<100 or >100
Type I and II errors
 When a statistical hypothesis is tested, there
are 4 possible results
 Hypothesis true: accepted
 Hypothesis false: rejected
 Hypothesis true: rejected
 Hypothesis false: accepted
 Probability of committing type I error is level
of significance (α)
 Probability of committing type II error is beta
error (β)
T-test

 The T distribution, is a type of probability distribution that is similar to the


normal distribution with its bell shape but has heavier tails. T
distributions have a greater chance for extreme values than
normal distributions, hence the fatter tails.
 Original population is normally distributed, SD of population is unknown,
and sample size is small (less than 30), the sample statistic will follow t-
distribution
 At n=30, it nears normal distribution
 Mean
 μ= hypothesised population mean
 N= sample size
 S= SD
For 2 sample means
Inference

 Calculated t value > table value of t: null hypothesis rejected


Degrees of freedom

 No. of useful items of information generated by a sample of given size with


respect to the estimation of a given population parameter
 maximum number of logically independent values, which are values that
have the freedom to vary
 ν= n-1 or n-k
Chi square test

 When the assumption of normal population can not be justified, it is


necessary to use procedures that do not require these conditions. These
tests or procedures are called “Non-Parametric Tests”
 ChiSquare: describes the magnitude of discrepancy between theory and
observation, i.e. whether a given discrepancy between theory and
observation can be attributed to chance or whether it results from the
inadequacy of the theory to fit the observed facts.

 X2= 0 Means Observed Frequency and Expected Frequency completely


coincide
 Greater the valve if Χ2, greater would be the discrepancy between OF and
EF
 O= Observed Frequency
 E= Expected Frequency
 RT= Row Total
 CT= Column Total
Application of Chi Square

 At least 50 observations, not less than 5 in a cell


VITAL STATISTICS

 Branch of Biometry that deals with data


 All population statistics
 Registration of births, deaths and marriages

Uses
 For individuals and operating agencies
 Research demographics and medical research
 Population estimation and projection
 Public administration
 International Uses
Methods

 Registration Method
 Census Enumeration
 Analytical Method
 Estimation of Vital rates using census data

Pt= Total Population at a point of time


Pq= Total population recorded
B= no. of births during given period
D= no. of deaths during given period
I= No. of Immigrants during given period
E= No. of Emmigrants during given period
Measurement of Fertility

 Speed at which population Grows

Crude Birth Rate


 Specific Fertility Rate
 Effect of factors like Age, Sex, Marriage, State, Region, Urban and Rural,
etc.
 General Fertility Rate
 (No. of Live births occurred among the population of a given geographic
area during a given year/ Mid year female population of age 15-49 in the
given geographic area during same year) X 1000
 Total Fertility Rate
 Mean number of Children which a female aged 15 can expect to bear if
she lives until at least the age of 50

 T is the magnitude of age class


Reproduction Rates

 Gross Reproduction Rate


 Net Reproduction Rate
 Why only Female births are taken into consideration for Reproduction Rate:
 No. of births in population depends upon no. of women in reproductive age
 Reproduction depends on the number of female children born

 If value of NRR is 1: Population is maintaining itself


 NRR More than 1: Population increasing
 NRR Less than 1: Population is decreasing
Measurement of Morality

 Crude Death Rate


 Specific Death Rate
= (No. of deaths occurred among a specific age group of population of a
given geographic area during a given year/ Mid year population of specified
age group in the given geographic area during same year) X 1000
 Standardised Death Rates
 Infant Mortality Rate
 = (No. of deaths under 1 year of age which occurred among population of
a given geographic area in a given area/ No. of live births which occurred
among the population of given geographic area during same year) X 1000
Measurement of Population Growth

 Natural Increase in population


= CBR- CDR
Measures of Morbidity

 Morbidity Incidence Rate


 Morbidity Prevalence Rate
Life Table

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy