Unit 2A Error Analysis and Statistics
Unit 2A Error Analysis and Statistics
Unit 2A
Outline
1.1 Random and systematic Errors
[Answer: A represents an analysis with measurements that were neither accurate or precise; B represents
a set of data that is fairly accurate but not precise; C illustrates a set of data that is accurate and precise
whilst D describes a situation where the measurements collected were precise but not accurate.]
Important terms
• Replicate measurements: Samples of the same size that are analysed in exactly the
same way. In any testing protocol, a critical decision is how many measurements to
make/items to test.
• Precision: Describes the internal agreement between results that were obtained in
the same way (how close or similar are the values collected). It is estimated by
evaluating standard deviations or confidence limits.
• Accuracy: A measure to determine how close the results are to the true or
accepted value for that measured quantity. It is estimated by comparing results to
those obtained using other methods and other laboratories, or through the use of
standard reference materials (SRMs). Estimates are available for standard methods
(ISO, ASTM, DIN, BS, SANS).
• True value: A theoretical value referring to the measured quantity without any
error.
1.2 Gaussian
distribution
Important terms
• Population data set: A large set of data containing all possible data values.
• Sample data set: A smaller set of data which is part of or a subset of the
population data set.
1.3 Mean value and standard deviation
• Chemists typically perform between two and five replicates of a chemical
measurement using the same analytical procedure.
• As already mentioned, when only a small number of measurements are collected,
we refer to this data set as a sample of the larger population data set.
• Rarely are the values obtained from small sample data sets identical but if a
normal Gaussian distribution of results is assumed then a mean value, 𝑥ҧ (an
average) can be calculated that represents the true value of the measured quantity
(the central value in our curve distribution of results around which other values are
spread symmetrically if systematic errors are absent).
1.3 Mean value and standard deviation
• A mean can be calculated for the sample of data, 𝑥ҧ , and the population data (μ).
• Going forward we will focus on mostly evaluating small sample sizes of data as this
is what we will typically work with in chemical analyses.
• To calculate the sample mean we begin by taking the sum of all the measurements
and dividing that total by N, the number of measurements as given by the
equation below (note that in some texts, the N value is not given as the uppercase
letter but as the lowercase letter, n).
σ𝑁
𝑖=1 𝑥𝑖
𝑥ҧ =
𝑁
1.3 Mean value and standard deviation
• The median is the middle result or value from a set of measurements that have
been arranged in ascending order (smallest to largest) e.g. for values 2, 3, 6, 8, 13,
the median value would be 6.
• The standard deviation, s, measures the spread of repeated measurements
(replicates) in a sample data set i.e. how clustered they are around the mean.
• The smaller the standard deviation, the more closely the data points are clustered
around the mean.
• This value can be calculated by using the following equation (standard deviation
can also be computed on a calculator using automated functions).
σ 𝑥𝑖 2
2
σ 𝑥𝑖 −
σ𝑁
𝑖=1 𝑥𝑖 − 𝑥ҧ 2
𝑁 σ 𝑥𝑖 2− 𝑁 𝑥ҧ 2
𝑠= = =
𝑁−1 𝑁−1 𝑁−1
• Take note that the (𝑥𝑖 − 𝑥)ҧ term represents the deviation of each measured value
(𝑥𝑖 ) from the mean.
• The N-1 term represents the number of degrees of freedom and is used to
estimate a standard deviation that closer resembles the population standard
deviation (σ).
1.3 Mean value and standard deviation
Example: The ratio of the number of atoms between isotopes 35Cl and 37Cl were measured in
eight different samples to help improve the reported atomic mass of chlorine on the periodic
table. Below you will find a summary of the measured ratios. Find the mean, median and
standard deviation for the set of data.
• Mean: The mean is the 'average' of all the measurements and is found by adding all
the values together and dividing by the number of measurements. It indicates what is
considered as the most representative value for a data set of measurements.
• Median: The 'middle' number, the measurement that is centrally listed in the data
when arranged from lowest to highest value. If two values are in the middle (i.e., N is
even), then the median is calculated by adding the values together and diving by two
(calculating the average of the two values).
• Standard deviation: This acts as a measure of how close replicate measurements are
to one another in either a sample data set or the population data set i.e. how clustered
measurements are around the mean given a normal distribution of results.
• Degrees of freedom: Defined as the number of members in a statistical sample that
provides an independent measure of the precision for a data set.
1.4 Variance and other measures of precision
• The sample standard deviation is the most common way to report on the
precision of an experiment but there also exists three other terms in analytical
chemistry that you should be familiar with.
• Variance is simply the square of the standard deviation as shown by the equation
below.
σ 𝑥 2
σ 𝑥𝑖 −2 𝑖
σ𝑁𝑖=1 𝑥𝑖 − 𝑥ҧ
2
𝑁 = σ 𝑥𝑖
2 − 𝑁 𝑥ҧ 2
𝑠= =
𝑁−1 𝑁−1 𝑁−1
𝑠
𝑅𝑆𝐷 = 𝑠𝑟 =
𝑥ҧ
1.4 Variance and other measures of precision
• The relative standard deviation when multiplied by 100% is called the coefficient
of variation (CV).
𝑠
𝐶𝑉 = × 100 %
𝑥ҧ
𝑠
𝑠𝑥ҧ =
𝑁
• The spread or range (w) of the sample data is the calculated difference between
the largest value measured in the data set and the smallest, e.g., for values 2.3,
2.5, 2.8, 2.6, the spread will be 2.8 - 2.3 = 0.5.
1.4 Variance and other measures of precision
• The ratio of the number of atoms between isotopes 35Cl and 37Cl were measured
in eight different samples to help improve the reported atomic mass of chlorine on
the periodic table. Below you will find a summary of the measured ratios. Using
the calculated mean and standard deviation, find the variance, relative standard
deviation, coefficient of variation and spread for the data set.