0% found this document useful (0 votes)
51 views12 pages

Quality Control: Methodology and Applications

This document discusses quality control methodology and applications for data assimilation. It begins with an introduction explaining that all data sources contain errors that must be detected and addressed. Section 2 introduces a Bayesian approach to data assimilation using conditional probabilities and Bayes' theorem. Section 3 discusses monitoring observations by comparing them to model background fields to detect errors, using innovations. The background check procedure and variational quality control are also presented for quality control.

Uploaded by

ghgh140
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views12 pages

Quality Control: Methodology and Applications

This document discusses quality control methodology and applications for data assimilation. It begins with an introduction explaining that all data sources contain errors that must be detected and addressed. Section 2 introduces a Bayesian approach to data assimilation using conditional probabilities and Bayes' theorem. Section 3 discusses monitoring observations by comparing them to model background fields to detect errors, using innovations. The background check procedure and variational quality control are also presented for quality control.

Uploaded by

ghgh140
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

QUALITY CONTROL: METHODOLOGY AND APPLICATIONS

PIERRE GAUTHIER, CLÉMENT CHOUINARD


AND BRUCE BRASNETT
Meteorological Service of Canada
Dorval, Québec, Canada

1. Introduction

All assimilation methods rely on data collected from many sources. There are the
conventional data sources like surface observations, radiosonde data, aircraft and ship
data to which is now added an increasing amount of satellite data (see also the chapters
Observing the atmosphere by R. Swinbank and Assimilation of remote sensing
observations in Numerical Weather Prediction, by J.N. Thépaut). Each instrument is
prone to error that could be systematic due to an incorrect calibration or random,
reflecting the accuracy and representativeness of the measurement. The assimilation
methods presented during this course assume that the data used in the assimilation have
unbiased errors and are devoid of any serious error due to a malfunction of the
instrument. Such errors are referred to as gross errors.
There are several sources of gross errors. These can be associated with the
calibration of the instrument, transmission problems when data are disseminated to
operational centres, pre-processing of the data at reception, etc. Methods had to be
developed to detect such errors to reject all data that have a high probability of being in
gross error. The background field, xb, reflects our a priori knowledge of the current
state of the atmosphere gained from the information of past observations. To check
whether the observations, represented here as a vector y, are within a reasonable range
of their expected value, they are compared against the background state equivalent of
the observations, H(xb), where H stands for the observation operator. The innovation
vector, y − H(xb), is therefore a measure of the departure of each observation against a
common atmospheric state. This information is extremely useful to assess whether any
type of gross error does not contaminate the new data. These ideas can be formalised by
using conditional probabilities to assess the probability of a datum y being correct given
a background state xb with known (or assumed) error probability distribution. Quality
control will be presented in this context using a Bayesian perspective to data
assimilation.
In this lecture, innovations will be used to diagnose potential problems with an
instrument through the monitoring phase of observations, which is done routinely at
operational centres to assess whether something wrong is going on with an instrument.
After a brief introduction to conditional probabilities and Bayes' theorem, the
background check procedure and variational quality control will be presented.

2. Bayesian approach to data assimilation

In Lorenc (1986), it is shown that the statistical estimation problem of data assimilation
can be approached using conditional probabilities and Bayes' theorem. This section
briefly introduces this approach and the reader is referred to this paper and to Rodgers
(2000) for more details. The joint probability distribution of x, the atmospheric state
vector, and y, the observation vector, being p(x,y), one can define the marginal
probability densities P(x) and P(y) as

P ( x ) = ∫ p ( x, y )dy, P ( y ) = ∫ p ( x , y ) dx
where P(x) and P(y) are obtained by integrating over all possible values of y and x
respectively. Assuming the background error probability distribution to be Gaussian,
then
1  1 
P( x ) = exp− ( x − xb )T B −1 ( x − xb ) , (1)
C  2 
where B is the background error covariance matrix and C, a normalization constant. In
absence of any other information, P(x) represents our a priori knowledge of the state of
the atmosphere contained within the background state xb. In absence of any other
information, the most probable state would have to be xb. Similarly, P(y) represents our
a priori knowledge about the true value of the observation. For example, in the
Gaussian case, it would correspond to
1  1 
P( y) = exp − ( y − y o ) T R −1 ( y − y o )  (2)
C2  2 
where R is the observation error covariance matrix, C2, a normalization constant and yo,
the actual value of the measurement.
The conditional probability p(y | x = x0) stands for the probability of y taking a
particular value given that x = x0. This can be expressed as
p( x0 , y) p( x0 , y ) .
p( y | x = x 0 ) = ≡ (3)
∫ p( x0 , y )dy P( x0 )

Similarly,
p( x, y 0 ) p ( x, y 0 ) .
p( x | y = y 0 ) = ≡ (4)
∫ p ( x , y 0 ) dx P( y 0 )

From (3) and (4), one gets that


p ( x = x 0 , y = y0 ) = p ( x | y = y0 ) P ( y 0 ) = p ( y | x = x 0 ) P ( x = x 0 ) (5)

Taking x0 to be the true value xt of the atmospheric state while y0 is associated with the
actual values of the observations, then (4) and (5) imply that
p ( y = yo | x = x t ) P ( x = x t )
p ( x = x t | y = yo ) = . (6)
P( y = yo )
This is Bayes' theorem on conditional probabilities expressing the probability
distribution that x is the true value of the atmospheric state given that yo has been
observed. The crux of this argument lies in the fact that since x = xt it means that yt =
H(xt), then the probability that y = yo corresponds to, in the Gaussian case,

1  1 
p( y = yo | x = x t ) = exp− ( yo − H ( xt )) T R −1 ( yo − H ( x t ))  . (7)
C2  2 
Here, we assume that R includes the representativeness error. This conditional
probability is then entirely described by the observation error probability distribution
with its mean centered at H(xt). Bayes' theorem therefore implies that

J ( x ) = − ln ( pa posteriori ( x) ) ≡ − ln p( x | y ) = − ln p( y | x ) − ln P ( x ) + C (8)

with p ( x | y ) , the a posteriori probability distribution of x being the true value. This
states that the most probable state, the maximum likelihood estimate, is obtained by
minimizing (8). In the Gaussian case, (1) and (7) apply and we get that
1 1
J ( x) = ( x − xb )T B −1 ( x − xb ) + ( H ( x) − y)T R −1 ( H ( x ) − y) . (9)
2 2
This corresponds to the variational form of the statistical estimation problem when the
error distributions are Gaussian. However, (8) makes it possible to consider non-
Gaussian probability distributions. This will be discussed in section 4.
Remark. In the case where H is linear, (9) implies that the a posteriori probability
distribution is also a Gaussian distribution with xa being its mean and the covariances
correspond to B −1 + H T R −1 H ≡ Pa−1 .

3. Monitoring the observations and the background check quality control

The background field used in the assimilation being our a priori estimate of the
atmosphere, it offers a common ground against which all observations can be compared.
The innovations are such that y − H(x b) ≅ εo − H'εb where H(xt) = yt and H' =∂H/∂X is
Jacobian of the observation operator. It is assumed here that εo includes the
representativeness error. Averaging the innovations by observation types will reveal
whether the observation and background error are unbiased. This phase is the
Figure 1. Time series of innovations for ATOVS Level 1d radiance. Innovations (O-F departures)
are plotted in red and the departure to the analysis (O-A) are plotted in blue. Both the bias (dashed
line) and standard deviations (solid line) are plotted.

monitoring of the observations and allows one to detect serious problems with the data.
Fig. 1 shows a timeseries of innovations computed here for radiance data over four
microwave channels of ATOVS level 1-d data. The innovations (plotted in red) have
been averaged over all data received for the assimilation, and are shown for the month
of January 2001. This figure shows that channel MSU-3 had a bias (dashed curve) that
was at the same level as the standard deviations (solid line). Such data can be very
harmful to the analysis and must be rejected. When a new instrument is added, the
monitoring is very useful to characterize the measurement errors and can be used for the
calibration.
Assuming now that both the observation and background error are unbiased, the
innovations are such that
(ε o − Hε b )(ε o − Hε b )T = R + HBH T

and represents the error covariances associated with the probability distribution of the
innovations. Assuming those to be Gaussian, one has that
1  1 
p ( y − H ( x b )) = exp− ( y − H ( x b ))T ( R + HBH T ) −1 ( y − H ( x b )) . (10)
C  2 
For each datum, this implies that, at the observation location, the innovation is
distributed as
1  1  ( y − Hx b ) 2  . (11)
p ( y − Hx b ) = exp−  
(2π(σ 2
o + σ b2 ))
1/ 2
 2  σ o2 + σ b2 

Based on this, one concludes that it is very unlikely that the innovation be such that

( y − Hxb ) 2 > λ(σ o2 + σb2 )


with λ being defined as the rejection criterion which may depend on observation type
and geographical location. For instance, if λ = 2, the probability of such an event
occurring is less than 5%. This background check procedure is used to flag data that
have a high probability of being in error. It is important to note that it is best to use the
background check to only eliminate data that are obviously in error. In this case, λ
should be chosen rather high.

4. Variational quality control

Observation error associated with measurement accuracy and representativeness is


random by nature and can often be described as normally distributed with mean yt and
2
variance σo: this will be denoted as (yt, σo). However, gross errors also occur but have
a probability that is independent of the true value yt. As discussed in Lorenc and
Hammon (1988), gross errors can be assumed to be equally probable over a range of
admissible values yˆ − D / 2 ≤ y ≤ yˆ + D / 2 . Following Dharssi et al. (1992) and
Ingleby and Lorenc (1993), the probability distribution for observation error is taken to

4
JQC
grad J QC
2
WQC
JoN

−2

−10 −8 −6 −4 −2 0 2 4 6 8 10
(y − yt)/σo

Figure 2. Variational QC-Var cost function (Jo), its gradient and the associated weight WQC
represented as a function of the normalized departure from the estimated true value yt = y
^ of the
observation value y. The probability of gross error has been set to P = 0.01
be of the form
po ( y) = P pG ( y) + (1 − P) ( yˆ , σ 0 ) (12)

with P being the overall probability of having a gross error while pG(y) = 1/D, a
constant. Finally, ( )
( yˆ , σ o ) = (σ o 2π ) −1 exp − ( y − yˆ ) 2 /( 2σ o2 ) and it is assumed that
σο/D<<1. At this stage, y^ = H(x) stands for the estimate of the true value of the
observation based on our current knowledge of the atmosphere denoted here by x.
In the Bayesian formulation, the argument is that p(y| x) = po(y − Hx). Assuming
those observation errors are uncorrelated, then
J o ( x) = − ln po ( y − Hx) = − ln(γ + (1 − P ) ( Hx, σ o ))
with γ = P/D. As in Andersson and Järvinen (1999), it is convenient to rewrite this in
1
terms of the Gaussian form for J No = 2 (H(X) − y)2/σ2o, which leads to
 (1 − P ) 
J o ( x ) = − ln  [ ( ) ]
exp − J oN ( x ) + γ 
 σ o 2π
(13)

( ( ) )
= − ln exp − J o ( x ) + γ + Const .
N

where γ = ( Pσ o 2π ) /(1 − P ) D while its gradient is


exp(− J oN )
∇ x J o ( x) = ∇ x J oN ( x ) ≡ WQC ∇ x J oN ( x ) . (14)
γ + exp(− J o )
N

Vobs.
×

Vobs.
×
× Analysis
Forecast
Vobs. Low
Low

Vobs.
×

Figure 3. Schematic of a situation in which a comparison of 5 wind observations with the backgound
would lead to a rejection of 4 good observations. With QC-Var, the datum that agrees the most with the
background state ends up being rejected.
In this case, observation error is considered uncorrelated so that J No can be computed as
in the Gaussian case. It then suffices to modify the value of Jo using (13) and the
gradient using (14) for each data. This is referred to as the QC-Var cost function that
can be implemented very easily with slight modifications to the cost function associated
with Gaussian error statistics. Fig. 2 illustrates the associated cost function, gradient and
weight WQC as a function of y' = ( y -y^ )/σo: this figure is similar to Fig.2 of Andersson
and Järvinen (1999). For departures exceeding ~ 4σo, the gradient becomes quite flat
and the observation has little influence on the minimization.
The background-check procedure introduced earlier presumes that the background
state yields a reasonable model equivalent of the observation and any significant
departure is an indication that the observation is in error. However, situations do occur
where it is the background state itself that is at fault and can disagree significantly with
the observation. As illustrated in Fig.3, the background-check procedure would lead to
the rejection of good data and the acceptance of bad ones. This figure is a schematic that
emphasises a situation where the forecast has wrongly placed a low-pressure system
while most of the data in the area would tend to reposition the system eastward. If the
comparison is made against the background state, all data would end up being rejected
except the one located closest to the forecast low. However, performing a preliminary
analysis based on all available data would lead to a repositioning of the system. In this
case, the observation departures with respect to this analysis would be the most
important for the datum that initially agreed the most with the background state.
In the context of optimal interpolation (OI), Lorenc (1981) proposed a quality
control procedure using a reduced analysis involving only a small number of
observations y1, y2, ... , yK. A set of K analyses x(k)a is obtained based on all data except
2 2
yk. If the observation is such that ( yk − H(x (k)a)) > λ (σo + σa )1/2 with λ being a pre-set
criterion, then the data is rejected. This requires the computation of the analysis error
variance that can be done when few observations are used. This procedure is therefore
indirectly comparing the consistency among a subset of observations and rejects the
data that do not concur with the rest of the data. A formal treatment of this procedure
can be found in Ingleby and Lorenc (1993). It will be referred to as the OI quality
control.
In the QC-Var algorithm, this is achieved by allowing the estimate to benefit partly
from the information contained from the ensemble of the observations (Andersson and
Järvinen, 1999). A limited number of iterations are therefore completed by turning off
QC-Var for a number of iterations at the beginning of the minimization to let the iterate
build the main features of the analysis based on all observations. At this point, the QC-
Var is activated and the observation departures y − H(x) as introduced in the QC-Var
cost function use an a posteriori estimate of the model state to assess whether data
should be implicitly rejected or not. If QC-Var were activated at the beginning of the
minimization, little weight would be given to the data with large departures from the
background. This would give more confidence to observations closest to the
background state and implicitly rejects those that strongly disagree with it. This is why
it is important to first let the minimization proceeds for a number of iterations to let the
bulk of the data reposition the analysis. If a hundred iterations are needed for
convergence, experimentation shows that turning off QC-Var for the first 30 iterations
or so is sufficient. When QC-Var is turned on, the bad data would then be given less
weight, as it should be.

5. Experimentation with QC-Var

The results presented here were obtained with the 3D-Var data assimilation system of
the Canadian Meteorological Centre (CMC), which includes both a background-check
and a variational quality control. This system has been implemented in December 2001
(Gauthier et al. 1998, 1999). It is shown here how the probability distribution for obser-
vation errors has been estimated. A comparison is presented of the amount of data that
have been rejected compared to the rejection rates obtained from the OI quality control.
Previously, the CMC 3D-Var relied on an OI quality control (OI-QC) inherited from
the previous optimal interpolation analysis used at CMC (Mitchell et al., 1996). The
QC-Var does not categorically accept or reject data. Here, data were considered rejected
by QC-Var if its a posteriori weight is less than 0.25. Table 1 gives the rejection rates

Table 1. Comparison of rejection rates and limits for the quality control used in the CMC optimal
interpolation analysis and QC-Var which includes also the background-check. The period extends from
19 January 2001 to 31 January 2001.

Obs. Type Obs. Quantity Rejection Ratio Approximate Rejection


(%) Limits
QC-Var OI-QC QC-Var OI-QC
SYNOP Pressure (height) 2.7 1.9 3.6 hPa n/a
(T- Td) 0.3 0.0 8.5 K 22 K
Temperature 2.2 1.2 6.6 K 16.6 K
SHIP Wind 7.6 0.5 8 m/s 19 m/s
Pressure (height) 2.3 3.5 8.5 hPa n/a
(T- Td) 0.3 0.0 9.5 K 26 K
Temperature 1.5 0.9 5.7 K 11.7 K
DRIBU Pressure (height) 2.8 3.1 6.6 hPa n/a
Temperature 3.1 2.4 5.8 K 6.2 K
TEMP Wind 2.7 0.4 8 - 14 m/s 11 - 20 m/s
(T- Td) 1.8 0.0 5 - 16 K 14 - 22 K
Temperature 3.0 1.3 2.1 - 6.6 K 3.4 - 9.4 K
AMDAR Wind 1.0 0.4 11 m/s 15 m/s
Temperature 0.7 0.5 4.0 K 5.0 K
SATOB Wind 1.3 0.2 13 - 27 m/s 16 - 36 m/s
AIREP Wind 5.2 1.0 13 m/s 29 m/s
Temperature 1.7 0.8 5.7 K 9.2 K
ACARS Wind 2.3 1.0 10 m/s 14 m/s
Temperature 1.6 2.1 4.0 K 5.0 K
Figure 4. Histogram of observation departures from the background for AIREP temperatures over the
period of March-April 2002. Number of accepted data is 31,926 (in gray), while 103 data were rejected
by the background check (in white) and 303 were rejected by QC-Var (in black)

and limits used by the background-check and QC-Var (weight WQC evaluated at
convergence) and the OI-QC which corresponds to the procedure of Lorenc (1981)
described above.
Table 1 shows the rejection limits defined as the largest difference between
observation and analysis, computed from samples of accepted observations. Note that it
is the QC-Var algorithm that determines the rejection limit since the definition specifies
the departure from the analysis, not the background. These results indicate that QC-Var
rejects more of almost every type of data. Among the most noteworthy differences are
those of the wind data that are rejected twice as much by QC-Var than by the OI-QC.
This is appropriate since, at various times over the past ten years, meteorologists at
CMC had noted that incorrect wind reports were too often allowed in the analysis.
Another significant difference between the two systems is in the rejection rates of
dewpoint depression (T- Td). The OI-QC rejects less than 0.1% of these while QC-Var
typically rejects between 0.3% and 1.8%. The rejection limits in Table 1 indicate that, in
the OI-QC, some dewpoint depression observations that differ by more than 20°C from
the analysis were not rejected while the QC-Var does not tolerate such large departures.
Fig. 4 shows the distribution of the number of data as a function of observation
departure from the background (mean value has been removed). It indicates that the
data rejected by QC-Var lie in that region where the Gaussian distribution differs from
the actual distribution of observation error, which includes the probability of gross
error. Fig. 5 represents the probability distribution resulting from having a probability of
8

4
JQC (P = 0.01)
3
WQC (P = 0.01)
N
2 Jo
JQC (P = 0.1)
1
WQC (P = 0.1)
0

−1

−2

−10 −8 −6 −4 −2 0 2 4 6 8 10
(y − yt)/σo

Figure 5. Comparison of the QC-Var cost functions and of WQC when the probability of having a gross
error is P = 0.01 (black dots) and when P = 0.1 (white dots).

gross error set to 10% and 1%. As discussed in Andersson and Järvinen (1999), the
probability of gross error can be estimated from the distribution of the innovations (y −
Hxb). As expected the region where the weight WQC is close to 1 narrows as the
probability of gross error increases.

6. Conclusion

In this chapter, a short introduction to quality control has been presented. It has been
shown in particular that a careful monitoring of the observations is necessary to detect
systematic biases in the measurements that can often be related to the physical
characteristics of the instrument or its environment. Monitoring is a useful and
necessary step for new instruments, satellite instruments in particular, during the
calibration phase before the data are assimilated (see the chapter Assimilation of remote
sensing observations in Numerical Weather Prediction). There are numerous cases in
which error in the new data could be related to instrument parameters (e.g., pointing
angle of an antenna on board a satellite). All operational centres that are involved in
data assimilation can then see in near real time any problem with the data they are
receiving.
In the first stage, the data pass through a number of crude checks associated with the
transmission and encoding/decoding phase. A comparison against the background state
is then performed to eliminate a good part of data that are obviously in error. However,
comparing against the background state can lead to rejection of good data because the
forecast itself is in error. To address this problem, the OI-QC performs a preliminary
analysis against which all data are individually compared. Data can then be flagged as
rejected and a new analysis without the rejected data is performed. This is the method
presented in Lorenc (1981). An alternative is to use a variational quality control that
implicitly compares the data individually against a preliminary analysis corresponding
to the current iterate of the variational analysis (Ingleby and Lorenc, 1993; Andersson
and Järvinen, 1999).
The presentation here has been kept deliberately simple. However, it must be
stressed that there are numerous complexities that need to be considered. For instance,
as discussed in Andersson and Järvinen (1999), the QC-Var algorithm becomes much
more complex when the observation error is correlated in space or even in time, a
situation that can occur in 4D-Var (Järvinen et al., 1999).
Acknowledgement. The authors would like to thank Peter Lynch for his comments on
the manuscript.

References

Andersson, E. and H. Järvinen, 1999: Variational quality control. Quart. J.R. Meteor. Soc., 125, 697-722.
Dharssi, I., A.C. Lorenc and N.B. Ingleby, 1992: Treatment of gross errors using maximum probability
theory. Quart. J.R. Meteor. Soc., 118, 1017-1036.
Gauthier, P., C. Charette, L. Fillion, P. Koclas and S. Laroche, 1999: Implementation of a 3D variational data
assimilation system at the Canadian Meteorological Centre. Part I: The global analysis. Atmosphere-
Ocean, 37, 103-156.
 , M. Buehner and L. Fillion, 1998: Background-error statistics modelling in a 3D variational data
assimilation scheme: estimation and impact on the analyses. Proceedings of the ECMWF Workshop on
diagnosis of data assimilation systems, Reading UK, p. 131-145.
Ingleby, N.B., and A.C. Lorenc, 1993: Bayesian quality control using multivariate normal distributions.
Quart. J.R. Meteor. Soc., 119, 1195-1225.
Järvinen, H., E. Andersson and F. Bouttier, 1999: Variational assimilation of time sequences of surface
observations with serially correlated errors. Tellus, 51A, 468-487.
Lorenc, A.C., 1986: Analysis methods for numerical weather prediction. Quart. J.R. Meteor. Soc., 112, 1177-
1194.
, A.C., 1981: A global three-dimensional multivariate statistical interpolation scheme. Mon. Wea.
Rev., 109, 701-721.
Mitchell, H.L., C. Charette, R. Hogue and S.J. Lambert, 1996: Impact of a revised analysis algorithm on an
operational data assimilation system. Mon Wea. Rev., 124, 1243-1255.
Rodgers, C.D., 2000: Inverse methods for atmospheric sounding: theory and practice. Series on atmospheric,
oceanic and planetary physics, vol.2, World Scientific Ed., New York, 238 pages.
Bayes' theorem,2, 3 Observation gross error,1
Observation quality control
Conditional probability,2 background check,5,7
monitoring,3
Innovation vector,1 optimal interpolation,7
variational,5
Joint probability distribution,2
Satellite data
Marginal probability density,2 ATOVS,4
Maximum likelihood,3

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy