0% found this document useful (0 votes)
1K views92 pages

03 Statistics in Analytical Chemistry

Statistics play an important role in analytical chemistry by providing estimates of measurement error and uncertainty. Quantitative measurements obtained from instruments can vary over time due to noise or drift. Statistics are used to define probability distributions for measurement data and determine whether differences between values are statistically significant or could be explained by random error. Analytical chemists use statistical analysis to evaluate data quality and accuracy.

Uploaded by

Simran singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views92 pages

03 Statistics in Analytical Chemistry

Statistics play an important role in analytical chemistry by providing estimates of measurement error and uncertainty. Quantitative measurements obtained from instruments can vary over time due to noise or drift. Statistics are used to define probability distributions for measurement data and determine whether differences between values are statistically significant or could be explained by random error. Analytical chemists use statistical analysis to evaluate data quality and accuracy.

Uploaded by

Simran singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 92

STATISTICS IN

ANALYTICAL
CHEMISTRY
Statistics in Analytical Chemistry

 Modern analytical chemistry is concerned with


 Detection, identification, and measurement of the
chemical composition of unknown substances through
use of existing and new instrumental techniques
 It is a quantitative science, meaning that the desired
result is almost always numeric
Statistics in Analytical Chemistry

 Quantitative data about a sample are obtained with


the help of an instrument from an observable signal
 Over time, due to noise and/or drift within the
instrument there is some variation in the signal
 This variation in signal causes error in measurement
 Calibration of response is necessary in order to
obtain meaningful quantitative data
 However, it is impossible to get identical data
Statistics in Analytical Chemistry

 Statistics, therefore, provides


 An estimate of the likely value of that error; or
 Help in establishing the uncertainty associated with the
measurement
Statistics in Analytical Chemistry

 The fundamental hypothesis in statistics is the Null


Hypothesis
 The null hypothesis states that random error is
sufficient to explain differences between two values
 Statistical tests are designed to test the null
hypothesis
Statistics in Analytical Chemistry

 Passing a statistical test means that the null


hypothesis is retained
 However, it is impossible to show that two values are
the same
 It is only possible to show they are different
 Hence, null hypothesis retained (not rejected)
indicates that
 There is insufficient evidence to show that there is a
difference between the samples
What does statistics involve?

 Defining properties of probability distributions for


infinite populations
 Application of these properties to treatment of finite
(real-world) data sets
 Probabilistic approaches to:
 Finite sampling
 Experimental design
 Data treatment
 Reporting data
Some useful statistics terms

 Mean
 Average of a set of values
 Median
 Mid-point of a set of values
 Population
 A collection of an infinite number of measurements
carrying the property being studied
 N  infinity
 Sample
 A finite set of measurements which represent the
population
Some useful statistics terms

 True value (true mean)


 Mean value for the population ()
 Observed Mean
 Mean value of the sample set ()
Accuracy and Precision

 Accuracy is a measure of rightness


 Precision is a measure of exactness

 Accurate means "capable of providing a correct


reading or measurement." A measurement is accurate
if it correctly reflects the size of the thing being
measured
 Precise means “repeatable, reliable, getting the same
measurement each time”
Precision

 Every experimentally measured value has an


associated uncertainty
 Precision is characterized as the distribution of
random fluctuations about the ‘true’ value
 Statistics assumes that the distribution is Gaussian
(a.k.a. ‘normal’)
 A Gaussian distributions’ width is defined by a single
parameter, the standard deviation, 
Precision

 Figure illustrates the dependence on 


 68.3 % of the Gaussian's area is contained between -
and 
 95.4 % between -2 and 2, and
 99.7 % between -3 and 3
Precision

  of a series of observations is used to determine the


certainty with which we can report a value
 It is impossible to reduce  to zero, even with an
infinite number of observations
 A multitude of factors affect precision:
 Instrument (detector sensitivity, noise, etc.)
 Experimental technique (pipetting, weighing, filling,
etc.)
 Sample inhomogeneity
Accuracy

 Accuracy is a measure of the difference between an


experimental value and the true value
 Any difference is due to systematic error(s)
 Accuracy can only be determined where the ‘true’
value of a sample is known, i.e., a reference
 Since their actual value is known, systematic errors
can be detected by comparing the experimental value
with the true value
Accuracy

 Accuracy is usually expressed as either an absolute


error

 or a percent relative error, Er


Absolute Error

 Absolute error is the amount of physical error in a


measurement
 E.g.
 Let’s say a meter stick is used to measure a given
distance
 The error is ± 1 mm
 This is the absolute error of the measurement
 Therefore, absolute error = ± 1 mm (0.001 m)
Relative Error

 It gives an indication of how good a measurement is


relative to the size of the thing being measured
 E.g.
 Height of a room = 3.215 m ± 1 mm (0.001 m)
 Height of a cylinder = 0.075 m ± 1 mm (0.001 m)
 Relative error shows accuracy of ceiling height is better
than that of cylinder
 Comparative accuracy of the measurements can be
determined by looking at their relative errors

Accuracy and Precision
Tolerance

 Tolerance is not a statistical parameter


 It is the permissible range of variation from the
expected standard
 E.g., If tolerance of a 10.00 mL volumetric pipette is
±0.02 mL
 The pipette is guaranteed to deliver between 9.98 mL
and 10.02 mL
 It does not mean that the pipette will deliver an
average of 10.00 mL
 A given pipette might routinely deliver 9.997 mL or
10.015 mL or 9.981 mL
Tolerance

 Unlike precision, tolerance does not have a Gaussian


distribution
 Analytical chemists can repeatedly deliver within
±0.02 mL with a 10.00 mL pipette
 It is a systematic error if you report the volume
delivered by a 10 mL pipette as (10.00 ± 0.02) mL,
which is the tolerance, when the pipette actually
delivers (10.011 ± 0.004) mL
Common Questions

 When I repeat a measurement, I get different


numbers; which do I use?
 Remember that the inherent variation associated with
any real measurement means you would always expect
to get somewhat different values for replicate
measurements
 Calculate the mean and standard deviation of the
values; this is the starting point for any statistical
evaluation of your data
Common Questions

 Did I get the ‘right’ answer?


 It’s almost impossible to answer this question in any
meaningful way for real samples!
 If you know what the true value is, you can assess
whether your result is significantly different using a t-
test
 If true value (or an accepted true value) is not known,
determine the range on either side of the measurement
within which the true value most probably lies
 This is called the confidence interval, and is a measure
of the uncertainty associated with any measurement
Common Questions

 One sample gives a value of 2.1, the other gives 2.2;


these are different values, right?
 This depends on the uncertainty associated with each
measurement
 A significance test has to be performed to determine if
the values can be considered same or different
 Test whether there is a significant difference in the
spread of replicate measurements
Common Questions

 One of the values is quite different from the others;


can I simply ignore it?
 This depends on
 The range of the values you obtained,
 How different the suspect value is from all the others,
and
 How close the remaining results are to one another
 Use either Dixon's Q test or Grubb's test on the data
Rejecting data

 It is good practice to check outliers in a data set to


see if they can statistically be rejected
Outliers

 A data object that deviates significantly from the


normal objects as if it were generated by a different
mechanism
 Outliers are different from the noise data
 Noise is random error or variance in a measured
variable
 Noise should be removed before outlier detection
 Outliers are interesting as it violates the mechanism
that generates the normal data
Outliers
Outliers

 Three kinds of outliers


 Global
 Contextual
 Collective
Outliers

 Global outlier (or point outlier) (Og)


 Object is Og if it significantly deviates from the rest of
the data set
 Simplest type of outlier
Outliers

 Contextual outlier (or conditional outlier) (Oc)


 Object is Oc if it deviates significantly based on a
selected context
 E.g.
 18Cin Mumbai: outlier?
 Depends on summer or winter
Outliers

 Contextual outlier (or conditional outlier) (Oc)


 Attributes of data objects should be divided into two
groups
 Behavioral attributes: characteristics of the object, used
in outlier evaluation, e.g., temperature
 Contextual attributes: defines the context, e.g., time &
location
Outliers

 Collective Outliers
 A subset of data objects collectively deviate
significantly from the whole data set, even if the
individual data objects may not be outliers
Outliers

 Collective Outliers
 Detection of collective outliers
 Consider not only behavior of individual objects, but also
that of groups of objects
 Need to have the background knowledge on the
relationship among data objects, such as a distance or
similarity measure on objects
Outliers

 Simple example
 Global outlier
 A fist-size meteorite impacting a house in your
neighbourhood
 It’s a truly rare event that meteorites hit buildings
 Contextual outlier
 Your neighbourhood getting buried in two feet of snow
 If snowfall happened in middle of summer and you
normally don’t get any snow outside of winter
Outliers

 Simple example
 Collective outlier
 Every one of your neighbours is moving out of the
neighbourhood on the same day
 Although it’s definitely not rare that people move from
one residence to next, it is very unusual that an entire
neighbourhood relocates at same time
Rejecting Outliers

 Q-test for rejecting outliers

d = Gap from the nearest value = |x5 – x6|


w = Range/spread of the data = |x1 – x6|
Rejecting Outliers

 Q-test for rejecting outliers

 Qtab is looked up in a table and compared with Qcalc


 If Qcalc > Qtab, the outlier data point can be rejected at
the specified confidence level
Rejecting Outliers

 Consider the data set:


0.189, 0.167, 0.187, 0.183, 0.186, 0.182, 0.181,
0.184, 0.181, 0.177

Find if there is any outlier in the above data


Rejecting Outliers
Rejecting Outliers

 Grubb’s test for rejecting outliers

 denotes sample mean and ‘s’ denotes standard


deviation
 Ymin and Ymax denote minimum and maximum values,
respectively, that are suspect for outlier
 Gtab is looked up in a table and compared with Gcalc
 If Gcalc > Gtab, the outlier data point can be rejected at
the specified confidence level
Rejecting Outliers
Experimental Errors

 Successive measurements of same parameter, for the


same sample and method, will result in a set of
values which vary from the ‘true’ value by differing
amounts
 It means, our measurements are subject to error
 This is the principal reason why a result based on a
single measurement is meaningless in scientific terms
 Formally, the error is defined as the result of the
measurement minus the true value, (xi – )
 Consequently, errors have both sign and units
Experimental Errors

 Experimental scientists make a fundamental


distinction between three types of error
 These are known as
 Random (indeterminate) error,
 Systematic (determinate) error, and
 Gross error
Experimental Errors

 Random (indeterminate) error


 Affects precision – repeatability or reproducibility
 These are caused by many uncontrollable variables
 Caused by both humans and equipment
 Cannot be avoided
 Random distribution
 Mathematical laws of probability
 Normal distribution or Gaussian curve
 Cause replicate results to fall on either side of mean
Experimental Errors

 Random (indeterminate) error


 Represent the experimental uncertainty that occurs in
any measurement
 Small difference on successive measurements
 Can be minimized by good technique but not
eliminated
 If measurements repeated enough times, random errors
tend to cancel out when calculating the mean
Experimental Errors

 Random (indeterminate) error

Random errors follow a Gaussian or normal distribution


Experimental Errors

 Systematic (determinate) error


 Affects accuracy – produces bias
 An overall deviation of result from the true value even
when random errors are very small
 Cause all results to be affected in one sense (direction)
only
 All too high or all too low
 Cannot be detected by using replicate measurements
 Can be corrected, e.g. by using standard methods and
materials, calibrating the instrument etc.
 Caused by both humans and equipment
Experimental Errors

 E.g. Data demonstrating random and systematic errors


 Four students (A–D) each perform an analysis
 10.00 ml of 0.1 M NaOH is titrated with 0.1 M HCl
 Each student performs five replicate titrations, with the
results shown below

Student Results (ml) Mean


(ml)
A 10.08 10.11 10.09 10.10 10.12 10.10
B 9.88 10.14 10.02 9.80 10.21 10.01
C 10.19 9.79 9.69 10.05 9.78 9.90
D 10.04 9.98 10.02 9.97 10.04 10.01
Experimental Errors

 Bias and precision: dot-plots of the data

 =10.10
Precise, biased

 = 10.01
Imprecise, unbiased

 = 9.90
Imprecise, biased

 = 10.01
Precise, unbiased
Experimental Errors

 Gross error
 They are so serious that there is no alternative to
abandoning the experiment and making a completely
fresh start
 E.g.
 A complete instrument breakdown
 Accidentally dropping or discarding a crucial sample
 Discovering during the course of the experiment that a
supposedly pure reagent was in fact badly contaminated
Sources of Errors

 There are several sources of experimental errors that


affect accuracy and precision
 Sampling errors
 Instrument errors
 Method errors
 Personal or operational errors
Sources of Errors

 Determinate sampling error


 Occurs when our sampling strategy does not provide a
representative sample
 E.g. if you monitor the environmental quality of a lake
by sampling a single location near a point source of
pollution, such as an outlet for industrial effluent, then
your results will be misleading
Sources of Errors

 Determinate instrument error


 Arise from the way a measuring device is used
 E.g.pipettes, burettes, and volumetric flasks may have
volumes slightly different than those indicated by their
graduations
 These differences may result from
 Using the glassware at a temperature different than its
calibration temperature
 Contamination on the inner surfaces of the glass, or
 Distortions in the container walls due to heating
 These errors are usually corrected by proper calibration
of instruments and cleaning of glassware
Sources of Errors

 Determinate method error


 Arise from
 Non-ideal chemical or physical behavior of materials used
in analysis (Insolubility of precipitates, co-precipitates,
decomposition, and volatilization)
 Slow or incomplete reactions and
 Instability of some reactant species
 E.g. Use of indicator dye to signal end of titration
 Error due to small excess of reagent required to cause the
dye to change color
 This error can be minimized or eliminated by carefully
choosing the experimental method
Sources of Errors

 Determinate personal error


 Occur because most measurements require the analyst
to make a personal judgment
 E.g.
 Estimating level of a liquid between two graduation
marks on a burette. Some analysts may consistently read
a meniscus high
 Gauging color of a solution at the end point of a titration.
An analyst with color insensitivity may use excessive
reagent in a volumetric analysis
Errors, Uncertainty, and Residuals

 It is not uncommon for analytical chemists to use the


terms, “error” and “uncertainty” somewhat
interchangeably, although this can cause confusion
 The primary aim in analytical chemistry is to
determine
 How close a result is to the ‘true’ value (the accuracy),
and
 How well replicate values agree with one another (the
precision)
 Systematic and random errors cannot actually be
determined unless the true value, , is known
Error and Uncertainty

 E.g.,
 Consider a titration in which same 25.00 mL pipette is
used to dispense portions of sample for replicate
determinations
 If the volume of pure water delivered by the pipette at
a specified temperature is ±0.03 mL
 Itmeans, volume might be 24.97 mL (a systematic error
of -0.03 mL) or 25.03 mL (a systematic error of +0.03
mL)
 In theory error can be determined by calibrating it
through weighing replicate volumes dispensed by the
pipette, and converting the mass of water to volume
Error and Uncertainty

 This, however, raises other sources of error:


 Each weight will have its own associated error
 The operator will not use the pipette in exactly the
same way every time, introducing additional error
 To do the calculation, we need to measure the
temperature, which also has an associated error
 Evaporation losses, and changes in temperature and
humidity can also contribute to variation in the
measured volumes
Error and Uncertainty

 Clearly, it is unrealistic to try and account for all these


errors just to perform a titration
 We therefore use an estimate of the error in the volume
dispensed by the pipette, which we term as uncertainty
 Similarly, any measured value has an associated
measurement uncertainty, which is used as an estimate
of the range within which the error falls either side of
the actual value
 Since we cannot easily tell whether the result is above
or below the true value, such uncertainties are treated
in the same way as random errors
Residuals

 Residual is simply the difference between a single


observed value and the sample mean, (xi – )
 It has both sign and units
 Residuals can provide a useful comparison between
successive individual values within a set of
measurements, particularly when presented visually
in the form of a residual plot
Residuals
Residuals

 Such plots can reveal useful information about the


quality of the data set such as
 Whether there is a systematic drift in an instrument
under calibration, or
 If there might be cross-contamination between samples
of high and low concentration
Reporting Analytical Data

 Data are generated using appropriate procedures or


test plans
 Data generated is certified by analyst to be correct to
the best of the analyst's knowledge, with the
certification documented by a dated signature or
initials on the data
 If errors are found during the generation of the data,
the analyst corrects the errors. An explanation is
required if the reason for the correction is not
obvious
Reporting Analytical Data

 Significant Figures
 Significant figures reflect the accuracy and precision of
a given result
 A result should always be rounded to the number of
figures that are consistent with the confidence that can
be placed on it
 Thus, the number of significant figures is the number
of digits remaining after the data is rounded
 Reported values should contain only significant figures
Reporting Analytical Data

 Significant Figures
 A value is made up of significant figures when it
contains all digits known to be true and one last digit
in doubt
 E.g.,if a value is reported as 18.86 mg/l, the 18.8 must be
firm while the 0.06 is somewhat uncertain
 Final zeros after a decimal point are always meant to
be significant figures
 E.g. 9.800 g would be considered as having 4 significant
figures
Reporting Analytical Data

 Significant Figures
 Zeros before a decimal point with non-zero digits
preceding them are assumed to be significant
 With no preceding non-zero digit, a zero before the
decimal point is not considered significant
 With no non-zero digits preceding a decimal point, the
zeros after the decimal point but preceding other non-
zero digits are not considered to be significant. These
zeros only indicate the position of the decimal point
Reporting Analytical Data

 Significant Figures
 Final zeros in a whole number may or may not be
significant (e.g., in 1,000 g, zeros may not be
significant and only indicate magnitude of the number)
 A good way to determine if zeros interspersed in a
number are significant or not is by determining
whether they can be dropped by expressing the number
in exponential form
 E.g., no zeros can be dropped when expressing 100.08 g
in exponential form; therefore, the zeros are significant
 However, 0.0008 g can be expressed in exponential form
as 8 x 10−4 g; therefore, the zeros are not significant
Reporting Analytical Data

 Significant Figures
 Once the number of significant figures obtainable from
a given type of analysis is established, data resulting
from such analyses are reduced according to set rules
for rounding
Reporting Analytical Data

 Rounding Off Numbers


 If the figure following those to be retained is less than
5, the figure is dropped, and the retained figures are
kept unchanged (e.g., 11.443 is rounded off to 11.44)
 If the figure following those to be retained is greater
than 5, the figure is dropped, and the last retained
figure is raised by 1 (e.g., 11.446 is rounded off to
11.45)
Reporting Analytical Data

 Rounding Off Numbers


 If the figure following those to be retained is 5, and if
there are no figures other than zeros beyond the five,
the figure 5 is dropped, and
 The last-place figure retained is increased by one if it is
an odd number (e.g. 11.435 is rounded off to 11.44) or
 The last placed figure is kept unchanged if it is an even
number (E.g. 11.425 is rounded off to 11.42)
Reporting Analytical Data

 Ways of Expressing Accuracy


 Absolute Errors: difference between true and measured
values
 Mean Errors: difference between true and measured
mean values
Reporting Analytical Data

 Ways of Expressing Accuracy


 Relative Error: Absolute or Mean Errors expressed as a
percentage of the true value
 = % Relative Error
 Relative Accuracy: measured or mean value expressed
as a percentage of true value
 = % Relative Accuracy
Reporting Analytical Data

 The main measures in quantitative statistics are:


 Measures of central tendency
 Mean
 Median
 Mode
 Measures of dispersion
 Variance
 Standard deviation
 Relative standard deviation
 Standard error of mean

 These measures form the basis of any statistical analysis


Population vs. Sample Mean & Standard Deviation

 If we make only a limited number of measurements


(called replicates), some will be closer to the ‘true’
value than others
 This is because there can be variations in
 The amount of chemical being measured (e.g. as a
result of evaporation or reaction)
 The actual measurement itself (e.g. due to random
electrical noise in an instrument, or fluctuations in
ambient temperature, pressure, or humidity)
Population vs. Sample Mean & Standard Deviation

 This variability contributes to dispersion in the


measured values
 Greater the variability greater likelihood of all the
measured values differing significantly from ‘true’ value
 To adequately take the variability into account and
determine the actual dispersion, all possible values will
have to be obtained
 Infinite number of replicate measurements required
 n
 This would allow us to determine the population mean
and standard deviation, μ and 
Population vs. Sample Mean & Standard Deviation

 This is hardly practical, for a number of reasons!


 Therefore limited number of replicate measurements
are performed
 On the same sample, using the same instrument and
method, under the same conditions
 This allows us to calculate the sample mean and
standard deviation, and s
 The sample mean, standard deviation, and variance
(s2) provide estimates of the population values
Mean

 Used for describing an entire set of observations with


a single value representing the center of the data
 Many statistical analyses use the mean as a standard
reference point
 The mean is the sum of all observations divided by
the number of observations
 It has the same units as each individual measurement
value
 Denoted by μ or
Calculating the Mean

 The sample mean is the average value for a finite set


of replicate measurements on a sample
 It provides an estimate of the population mean for the
sample using the specific measurement method
 The sample mean, denoted , is calculated using the
formula:
Median

 Used for describing an entire set of observations with


a single value representing the center of the data
 Half of the observations are above the median, half
are below it
 Determined by ranking the data and finding the
middle observation
Mode

 It is the value that occurs most frequently in a set of


observations
 It may be used with mean and median to give an
overall characterization of your data distribution
 Identifying mode can help you understand your
distribution
 A distribution with more than one mode may indicate
that you actually have sampled from a mixed
population
Variance (2)

 Represents the spread (the dispersion) of the repeated


measurements on either side of the mean
 As the notation implies, the units of the variance are
the square of the units of the mean value
 The greater the variance, the greater the probability
that any given measurement will have a value
noticeably different from the mean
Calculating the Variance

 It is the average of the squared differences between


each measurement and the sample mean

 When n is sufficiently large so that n  (n – 1), the


sample mean and variance approximate the
population values and we can use the equation:
Standard Deviation ()

 Also provides a measure of the spread of repeated


measurements on either side of the mean
 An advantage of the standard deviation over the
variance is that its units are the same as those of the
measurement
Standard Deviation ()

 Standard deviation – the most important statistic


 Standard Deviation () of an infinite set of
experimental data is theoretically given by
 =

 xi = individual measurement
 = mean of infinite number of measurements (true
value)
 n = number of measurements
Standard Deviation ()

 Standard deviation
 Estimated Standard Deviation, s (n < 30)
 s=
 For finite sets the precision is represented by ‘s’
Calculating the Standard Deviation

 It is the square root of average of the squared


differences between each measurement and the
sample mean

 When n is sufficiently large so that n  (n – 1), the


sample mean and standard deviation approximate the
population values and we can use the equation
Reporting the Results

 It is more convenient to use the standard deviation,


which is simply the square root of the variance,

 The final value for the sodium content of the soup


would be written as:
C = 102.1 ± 4.7 mg (mean ± s, n = 5)
Relative Standard Deviation

 It tells whether the standard deviation is a small or


large quantity when compared to the mean for the
data set
 It is calculated by the following formula:

 100%
Relative Standard Deviation

 Relative Standard Deviation and Coefficient of


Variation
 In some cases, the coefficient of variation and the RSD
are the same thing
 However, RSD cannot be negative while the
Coefficient of Variation can be positive or negative
 This is because the two formulas differ in a minor way
 Coefficient
of Variation divides by the mean
100%
 RSD divides by the absolute value of the mean

100%
Standard Error of mean

 It is the standard deviation of the means of samples


from population mean
 Standard deviation of the mean smean

SEM (Smean) =
Test of Significance

 Analytical method should be free from bias


 E.g. the value it gives for the amount of analyte should
be the true value
 However, even if no systematic errors, the measured
amount will not be exactly equal to the standard due
to random errors
 Significance test helps to decide whether the
difference between the measured and standard
amounts is significant or can it be accounted for by
random errors
Test of Significance

 Types of significance tests


 One-sided and two-sided tests
 Comparison of two experimental means
 Unpaired t-test
 Paired t–test
 F-test
 Analysis of variance
 Chi-squared test

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy