QPCR Analysis Differently
QPCR Analysis Differently
Address: 1Department of Plant Sciences, University of Tennessee, Knoxville, TN 37996, USA, 2University of Tennessee Institute of Agriculture
Genomics Hub, University of Tennessee, Knoxville, TN 37996, USA and 3Statistical Consulting Center, University of Tennessee, Knoxville, TN
37996, USA
Email: Joshua S Yuan - syuan@utk.edu; Ann Reed - annreed@utk.edu; Feng Chen - fengc@utk.edu; C Neal Stewart* - nealstewart@utk.edu
* Corresponding author
Abstract
Background: Even though real-time PCR has been broadly applied in biomedical sciences, data
processing procedures for the analysis of quantitative real-time PCR are still lacking; specifically in
the realm of appropriate statistical treatment. Confidence interval and statistical significance
considerations are not explicit in many of the current data analysis approaches. Based on the
standard curve method and other useful data analysis methods, we present and compare four
statistical approaches and models for the analysis of real-time PCR data.
Results: In the first approach, a multiple regression analysis model was developed to derive ∆∆Ct
from estimation of interaction of gene and treatment effects. In the second approach, an ANCOVA
(analysis of covariance) model was proposed, and the ∆∆Ct can be derived from analysis of effects
of variables. The other two models involve calculation ∆Ct followed by a two group t-test and non-
parametric analogous Wilcoxon test. SAS programs were developed for all four models and data
output for analysis of a sample set are presented. In addition, a data quality control model was
developed and implemented using SAS.
Conclusion: Practical statistical solutions with SAS programs were developed for real-time PCR
data and a sample dataset was analyzed with the SAS programs. The analysis using the various
models and programs yielded similar results. Data quality control and analysis procedures
presented here provide statistical elements for the estimation of the relative expression of genes
using real-time PCR.
Page 1 of 12
(page number not for citation purposes)
BMC Bioinformatics 2006, 7:85 http://www.biomedcentral.com/1471-2105/7/85
Page 2 of 12
(page number not for citation purposes)
BMC Bioinformatics 2006, 7:85 http://www.biomedcentral.com/1471-2105/7/85
(E). ∆Ct for each gene (target or reference) is then calcu- influence the interpretation of ratio. Without a proper sta-
lated by subtracting the Ct number of target sample from tistical modeling and analysis, the interpretation of real-
that of control sample. As shown in Equation 1, the ratio time PCR data may lead the researcher to false positive
of target gene expression in treatment versus control can conclusions, which is especially potentially troublesome
be derived from the ratio between target gene efficiency in clinical applications. We hereby developed four statis-
(Etarget) to the power of target ∆Ct (∆Cttarget) and reference tical methodologies for processing real-time PCR data
gene efficiency (Ereference) to the power of reference ∆Ct using a modified ∆∆Ct method. The statistical methodol-
(∆Ctreference). The ∆∆Ct model can be derived from the effi- ogies can be adapted to other mathematical models with
ciency-calibrated model, if both target and reference genes modifications. SAS programs implementing the method-
reach their highest PCR amplification efficiency. In this ologies and data control are presented with real-time PCR
circumstance, both target efficiency (Etarget) and control practitioners in mind for turnkey data analysis. Standard
efficiency (Econtrol) equals 2, indicating amplicon dou- deviations, confidence levels and P values are presented
bling during each cycle, then there would be the same directly from the SAS output. We also included analysis of
expression ratio derived from 2-∆∆Ct [7,9]. the sample data set and SAS programs for the analysis in
the online supplementary materials.
∆Ctt arg et
(Et arg et ) Results and discussion
Ratio = Equation 1
∆Ct reference Data quality control
(Ereference )
From the two mathematical models for relative quantifi-
Whereas ∆Cttarget = Ctcontrol - Cttreatment and ∆Ctreference = Ctcon- cation of real-time PCR data, we observe disparities
trol - Cttreatment
between data quality standards. For efficiency-calibrated
method, the author who described this procedure [7]
Ratio = 2-∆∆Ct Equation 2 assumed that the amplification efficiency for each gene
(target and reference) is the same among different experi-
Whereas ∆∆Ct = ∆Ctreference - ∆Cttarget mental samples (treatment and control). In contrast,
whereas an amplification efficiency of 2 is not required,
Even though both the efficiency-calibrated and ∆∆Ct the ∆∆Ct method is more stringent by assuming that all
models are widely applied in gene expression studies, not reactions should reach an amplification efficiency of 2. In
many papers have thorough discussions of the statistical other words, the amount of product should double during
considerations in the analysis of the effect of each experi- each cycle [9]. Moreover, the ∆∆Ct method assumes that
mental factor as well as significance testing. One of the the PCR amplification efficiency for each sample will be 2,
few studies that employed substantial statistical analysis if PCRs for one set of the samples reaches full amplifica-
used the REST® program [8]. The software presented in this tion efficiency. However, this assumption neglects the
article is based on the efficiency-calibrated model and effect of different cDNA samples.
employed randomization tests to obtain the significance
level. However, the article did not provide a detailed Data quality could be examined through a correlation
model for the effects of different experimental factors model. Even though examining the correlation between
involved. Another statistical study of real-time PCR data Ct number and concentration can provide an effective
used a simple linear regression model to estimate the ratio quality control, a better approach might be to examine the
through Ct calculation [10]. However, the logarithm- correlation between Ct and the logarithm (base 2) trans-
based fluorescence was used as the dependent variable in formed concentration of template, which should yield a
the model, which we believe does not adequately reflect significant simple linear relationship for each gene and
the nature of real-time PCR data. It follows that Ct should sample combination. For example, for a target gene in the
be the dependent variable for statistical analysis, because control sample, the Ct number should correlate with the
it is the outcome value directly influenced by treatment, logarithm transformed concentration following the sim-
concentration and sample effects. Both studies used the ple linear regression model in equation 3. In the equation,
efficiency-calibrated models. Despite the publication of Xlcon represents the logarithm transformed concentration,
these two methods, many research articles published with β0 represents the intercept of the regression line, and βcon
real-time PCR data actually do not present P values and represents the slope of the regression line [14]. The accept-
confidence intervals [11-13]. We believe that these statis- able real-time PCR data should have two features from the
tics are desirable to facilitate robust interpretation of the regression analysis. First, the slope should not be signifi-
data. cantly different from -1. Second, the slopes for all four
combinations of genes and samples as shown in Table 1
A priori, we consider the confidence interval and P value of should not be significantly different from one another. A
∆∆Ct data to be very important because these directly
Page 3 of 12
(page number not for citation purposes)
BMC Bioinformatics 2006, 7:85 http://www.biomedcentral.com/1471-2105/7/85
Table 1: The sample real-time PCR data for analysis. In this data set, there two types of samples (treatment and control); two genes
(reference and target); and four concentrations of each combination of gene and sample. For data quality control and ANCOVA
analysis, the real-time PCR sample data set can be grouped in four groups according to the combination of sample and gene. The
Control-Target combination effect was named group 1, Treatment-Target group 2, Control-Reference group 3 and Treatment-
Reference group 4.
SAS program was developed to perform the data quality The input data is grouped as shown in Table 1 and addi-
control in Program1_QC.sas (additional file 1). tional file 2. Each combination of gene and sample was
classified in one group named from 1 to 4. The SAS proce-
Ct = β0 + βconXlcon + ε Equation 3 dure Proc Mixed was used for performing simple linear
Page 4 of 12
(page number not for citation purposes)
BMC Bioinformatics 2006, 7:85 http://www.biomedcentral.com/1471-2105/7/85
Figure
Data quality
2 control
Data quality control. The four classes represent four different combinations of sample and gene, which are reference gene
in control sample, target gene in control sample, reference gene in treatment sample, and target gene in treatment sample.
Each class should derive a linear correlation between Ct and logarithm transformed concentration pf PCR product with a slope
of -1.
regression for each group based on the model described Multiple regression model
above. The 95% confidence levels for slopes were esti- Several effects need to be taken in to consideration in the
mated, which are expected not be significantly different ∆∆Ct method, namely, the effect of treatment, gene, con-
from -1. The abbreviated SAS output for the analysis of a centration, and replicates. If we consider these effects as
sample data set is presented in SASOutput.doc (additional quantitative variables and have the Ct number relating to
file 3). Slopes for Ct and logarithm transformed concen- these multiple effects and their interactions, we can
trations for all four groups were not significantly different develop a multiple regression model as follows in Equa-
from -1 based on 95% confidence level. In addition to the tion 4.
numeric output, the program also provides a visualization
of data quality as shown in Figure 2, where the Ct number Ct = β0 + βconXicon + βtreatXitreat + βgeneXigene + βcontreatXiconXitreat
is plotted against logarithm transformed template con- + βcongeneXiconXigene + βgenetreatXigeneXitreat + βcongenetreatXiconX-
centration. A simple linear relationship should be itreatXigene + ε Equation 4
observed between the Ct number and logarithm trans-
formed concentration. In this model, Ct is the true dependent, the β0 is the inter-
cept, βxs are the regression coefficients for the correspond-
ing X (independent) terms, and ε is the error term [14].
Page 5 of 12
(page number not for citation purposes)
BMC Bioinformatics 2006, 7:85 http://www.biomedcentral.com/1471-2105/7/85
The model considers the effect of concentration, treat- Analysis of covariance and SAS code
ment, gene and their interactions. We are principally Another way to approach the real-time PCR data analysis
interested in the interaction between gene and treatment, is by using an analysis of covariance (ANCOVA). A simpli-
which addresses the degree of the Ct differences between fied model can be derived from transforming the data into
target gene and reference gene in treated vs. control sam- a grouped data as shown in Table 1 and additional file 2
ples: i.e., ∆∆Ct. ∆∆Ct can therefore be estimated from the resulting in Equation 5.
different combinations values of βgenetreat. The four groups
in Table 1 also represent the options of combinational Ct = β0 + βconXicon + βgroupXigroup + βgroupconXigroupXicon + ε.
effects of treatment and gene. The goal is to statistically Equation 5
test for differences between target and reference genes in
treatment vs. control samples. Therefore, the null hypoth- We are interested in two questions here. First, are the cov-
esis is the Ct differences between target and reference ariance adjusted averages among the four groups equal?
genes will be the same in treatment vs control samples, Second, what is the Ct difference of target gene value
which can be represented by combinational effect (CE) as: between treatment and control sample after corrected by
CE1-CE3 = CE2-CE4. An alternative formula will be: CE1- reference gene? In this case, the null hypothesis will be
CE2-CE3+CE4 = 0, which will yield an estimation of (µ2-µ1)-(µ4-µ3) = 0, and the test will yield a parameter
∆∆Ct. If the null hypothesis is not rejected, then the ∆∆Ct estimation of ∆∆Ct as shown in the
would not be significantly different from 0, otherwise, the Program3_ANCOVA.sas (additional file 6).
∆∆Ct can be derived from the estimation of the test. In this
way, we can perform a test of different combinational The SAS code implementing the ANCOVA model is simi-
effects of βgenetreat and estimate the ∆∆Ct from it. As shown lar to that of multiple regression model. Either SAS proce-
in the ∆∆Ct formula in Equation 2, if a ∆∆Ct is equal to 0, dures PROC GLM or PROC MIXED can be employed to
the ratio will be 1, which indicates no change in gene implement the ANCOVA model; and we used PROC
expression between control and treatment. MIXED here. The class statement defines which variables
will be grouped for significance testing. In this case, the
A SAS program for multiple regression model variables are concentration and group, and ANCOVA
SAS procedure PROC GLM was used for ∆∆Ct estimation assumes that these are co-varying in nature. The contrast
in Program2_MR.sas in additional file 4. The multiple and estimate statements were used to contrast the group
regression model is stated in a model statement. The com- effect, which will yield ∆∆Ct (-0.6848), as well as its stand-
binational effect of gene and treatment are evaluated in ard error (0.1185) and 95% confidence interval (-0.9262,
the estimate and contrast statement. The null hypothesis -0.4435). The SAS output with both confidence level and
of CE1-CE2-CE3+CE4 = 0 is tested in the contrast state- P value is presented in SASOutputs.doc (additional file 3).
ment and the parameter estimation yield the ∆∆Ct value.
The SAS input file is available in additional file 5 and the Simplified alternatives – T-test and wilcoxon two group
SAS output for the multiple regression is in SASOut- test
put.doc (additional file 3). More simplified alternatives can be used to analyze real-
time data with biological replicates for each experiment.
The SAS output gives a very comprehensive analysis of the The primary assumption with this approach is that the
data. We are interested in two aspects of the analysis. First, additive effect of concentration, gene, and replicate can be
we want to test whether the ∆∆Ct value is significantly dif- adjusted by subtracting Ct number of target gene from
ferent from 0 at P = 0.05. If the ∆∆Ct is not significantly that of reference gene, which will provide ∆Ct as shown in
different from 0, then we conclude the treatment does not Table 2. The ∆Ct for treatment and control can therefore
have a significant effect on target gene expression; other- be subject to simple t-test, which will yield the estimation
wise, the inverse is concluded. If the effect is significant, of ∆∆Ct.
we are interested in the standard deviation of ∆∆Ct value,
from which we can derive the ratio of gene expression as As a non-parametric alternative to the t-test, a Wilcoxon
discussed later. The SAS output provides the point estima- two group test can also be used to analyze the two pools
tion (-0.6848) and standard error (-0.1185) for the ∆∆Ct. of ∆Ct values. Two of the assumptions for t-test are that
PROC GLM or PROC MIXED are interchangeable in this the both groups of ∆Ct will have Gaussian distributions
application. If the experiments involve multiple biologi- and they will have equal variances. However, these
cal replicates, replicate effect can also be considered assumptions are not valid in many real-time PCR experi-
through modifying the SAS program. Then the estimation ments using realistically small sample sizes. Therefore a
will be the combined effect of gene, treatment and repli- distribution-free Wilcoxon test will be a more robust and
cate. appropriate alternative in this case [15].
Page 6 of 12
(page number not for citation purposes)
BMC Bioinformatics 2006, 7:85 http://www.biomedcentral.com/1471-2105/7/85
Table 2: ∆Ct calculation. The table presents the calculation of ∆Ct, which is derived from subtracting Ct number of reference gene
from that of the target gene. Con stands for concentration.
A SAS program has been developed for both t-test and nonparametric nature, is the Wilcoxon two group test,
Wilcoxon two group test as shown in the attached pro- which is distribution-independent.
gram Program4_TW.sas (additional file 7). The SAS proce-
dures TTEST and UNIVARIATE were used to analyze the Data quality control
data. The SAS Macro 'moses.sas' [15] in additional file 8 Many of the current real-time PCR experiments do not
has been employed to derive the confidence levels. The include a standard curve design, nor do they use a method
SAS input file is in additional file 9 and the SAS output for to estimate the amplification efficiency. We argue here
sample data analysis is available in SASOutput.doc (addi- that real-time PCR data without proper quality controls
tional file 3). Since the estimate of difference derives from are not reliable, since the efficiency of real-time PCR could
subtracting treatment from control sample, the actual have significant impact on the ratio estimation and
∆∆Ct should be the inverse of the output estimate. dynamic range. For example, if a PCR has a percentage
amplification efficiency (PE) of 0.8 (i.e. PCR product will
Comparison of four approaches and data presentation increase 20.8 times instead of two times per cycle), a ∆Ct
A comparison of the four approaches is presented in Table value of 3 can only be transformed into 5.27 times differ-
3. Multiple regression and ANCOVA yield exactly the ences in ratio instead of 8 times. This problem gets ampli-
same result for ∆∆Ct estimation, because both methods fied when the ∆∆Ct or ∆Ct values are larger and the
employ the same mathematical approach for parameter amplification efficiency is lower, which could lead to
estimation. The t-test provides the same point estimation severely skewed interpretations.
of ∆∆Ct, however, the standard error is slightly greater,
which leads to a larger confidence interval. Wilcoxon two We therefore propose two standards for real-time PCR
group test provides a slightly smaller estimation of ∆∆Ct. data quality control according to the model using the SAS
The highly similar results from the four approaches vali- programs presented in this paper. First, experiments with
dated the models and SAS programs presented. The choice a serial dilution of template need to be included in order
of the models and programs will depend on the experi- to estimate the amplification efficiency of each gene with
mental design and the stringency and quality of the exper- each sample. Some researchers assume that the amplifica-
iment. However, the most conservative test, owing to its tion efficiency for each gene is the same in different sam-
ples because the same primer pair and amplification
Page 7 of 12
(page number not for citation purposes)
BMC Bioinformatics 2006, 7:85 http://www.biomedcentral.com/1471-2105/7/85
Table 3: The comparison of four approaches. The table listed ∆∆Ct, standard error, P-value and confidence interval derived from the
four methods presented in the article. Neither SAS package nor the macro used provides the standard error for Wilcoxon two group
test. We consider confidence interval to be sufficient for further data transformation.
conditions are used. However, we found that sample standard deviation of ∆∆Ct; and the confidence interval of
effect does have an impact on the amplification efficiency. the ratio should be derived from the confidence interval
In other words, the amplification efficiency could be dif- of ∆∆Ct. In other words, the point estimation of ratio
ferent for the same gene when amplified from different should be 2-∆∆Ct and the confidence interval for ratio
cDNA template samples. We therefore consider the exper- should be (2-∆∆CtHCL, 2-∆∆CtLCL). Since Ct is the observed
imental design with standard curve for each gene and value from experimental procedures, it should be the sub-
sample combination as the optimal. Second, under opti- ject of statistical analysis. The practice of performing sta-
mal conditions, if a plot of the Ct number against the log- tistical analysis at ratio directly is not appropriate. The
arithm (2-based) template amount should yield a slope presentation of data needs to refer to the ∆∆Ct and subse-
not significantly different from -1, which indicates a quently the ratio and confidence intervals derived from 2-
nearly 2 amplification efficiency. Even though both effi- ∆∆Ct.
Page 8 of 12
(page number not for citation purposes)
BMC Bioinformatics 2006, 7:85 http://www.biomedcentral.com/1471-2105/7/85
matical complexity and the reliability controversy, this The Program5_LowQuatilityData.sas in additional file 10
method is not as widely applied as the traditional stand- provides the solution to derive the adjusted ∆∆Ct in the
first and second scenarios. A data set with amplification
ard curve method [10,16,18]. In our method, a standard
efficiency different by gene is provided in LowQuality-
curve already exists and can be used to derive amplifica-
Data.txt in additional file 11 to illustrate the use of the SAS
tion efficiency (E). Considering the simple linear regres- program. The data set is of lower quality mainly because
sion model in Equation 3, if Xlcon represents 10 based of the limited number of replicates involved in the exper-
logarithm transformed concentration, the amplification iment. Four steps are involved in calculating the ∆∆Ctadjust.
The first step is to perform the data quality control test as
efficiency (E) is 10-(1/slope) or 10−(1 / βcon ) according to
shown in Methods. From the SAS output, we can conclude
Ramussen 2001 and Pfaffl 2001 [7,21]. In our model, Xlcon that the LowQualityData dataset does not meet the
represents the 2 based logarithm transformed concentra- requirements for 2-∆∆Ct method, since one group of PCR
tion, the amplification efficiency (E) therefore is 2-(1/slope) has amplification efficiency significantly different from 1
as shown in the data quality control for LowQualityData
or 2−(1 / βcon ) , where the PE can be represented as -(1/
dataset part of SASOutput.doc (additional file 3).
βcon).
The second step is to test the equal PCR efficiency (or
In the first scenario discussed above, all PCR amplifica- slope) by observing the Type III sums of squares for lcon
tion have the same efficiency, but the efficiency is not and class interaction. A low p value will indicate the inter-
equal to 1. Then the ratio of gene expression can be repre- action of different groups of PCR (class) with logarithm
sented in the following equation. transformed concentration, which in turn indicates the
unequal slope among different groups of PCR. If all PCR
Ratio = E− ∆∆Ct = [2−(1 / β con ) ]− ∆∆Ct = 2− ∆∆Ct*PE Equation 6 amplification efficiency are equal, then the pooled ampli-
fication efficiency can be calculated and integrate into the
whereas PE = -(1/βcon), and ∆∆Ctadjust = PE*∆∆Ct SAS program for ∆∆Ctadjust calculation. In this set of data,
the Type III sums of squares has a p value smaller than
In the Equation 6, βcon is the pooled slope of the plot with 0.05, and the amplification efficiency are not equal for all
Ct against logarithm 2 based concentration. The βcon can PCRs. Tests of equal slopes are then performed for each
be calculated with a correlation function in SAS as shown gene to decide whether PCR amplification efficiency is the
in Program5_LowQualityData.sas in Additional file 10. In same for each gene. For either gene, the amplification effi-
the second scenario, the amplification efficiency differs by ciency is not significantly different with an α of 0.05. All
gene only. According to Equation 1, we have the following of the Type III sums of squares outputs can be found in
equation, in which the β0 is the pooled slope of the plot of SASOutput.doc (additional file 3).
Ct against log2 (concentration) for each gene.
The next step is to calculate the pooled slope (βcon) for
∆Ctt arg et
(Et arg et ) [2
−(1 / β conT arg et ) ∆Ctt arg et
] (PEt arg et ∗∆Ctt arg et − PEcontrol ∗∆Ct control ) each gene to derive the percentage amplification efficiency
Ratio = = =2 Equation 7
(Econtrol )∆Ctcontrol [2−(1 / β conControl ) ]∆Ctcontrol (PE = -(1/βcon)) for each gene. The pooled slopes are
derived based on the correlation between Ct and loga-
whereas PEtarget = -(1/βconTarget), PEcontrol = -(1/βconControl), rithm 2 based concentrations. The βcons for the two genes
and ∆∆Ctadjust = PEtarget*∆Cttarget-PEcontrol*∆Ctcontrol are -1.0813 and -1.0137 respectively as shown in SASOut-
put.doc (additional file 3) for the amplification efficiency
In the Equation 7, βconTarget and βconControl are the pooled calculation of LowQualityData dataset. With the βcon, -(1/
slope for the plot of Ct against logarithm 2 based concen- βcon) or PE can be calculated for each gene as 0.925 and
tration for target gene and reference gene respectively. The 0.987 respectively. The ∆∆Ctadjust can then be computed
slopes can be calculated by the with PEs substituting the 1 for each gene in the 'estimate'
Program5_LowQualityData.sas (additional file 10). The and 'contrast' statement. The SAS program is as follows in
∆∆Ctadjust can be calculated with the same program. Theo- additional file 10.
retically, an equation can also be derived for the third sce-
nario when PCR amplification efficiency differs both by Title 2 'Calculate the deltadeltaCt with Adjusted effi-
gene and by sample. However, in actual application, we ciency';
don't consider the data in the third scenario as acceptable
due to the significant variation of the amplification effi- PROC MIXED data=TR2 Order=Data;
ciency [10,18].
CLASS Class Con;
Page 9 of 12
(page number not for citation purposes)
BMC Bioinformatics 2006, 7:85 http://www.biomedcentral.com/1471-2105/7/85
Page 10 of 12
(page number not for citation purposes)
BMC Bioinformatics 2006, 7:85 http://www.biomedcentral.com/1471-2105/7/85
Page 11 of 12
(page number not for citation purposes)
BMC Bioinformatics 2006, 7:85 http://www.biomedcentral.com/1471-2105/7/85
Page 12 of 12
(page number not for citation purposes)