Husen Cood
Husen Cood
http://dx.doi.org/10.18576/jsap/100311
Abstract: Comparison of treatments is a frequently used phenomenon in clinical studies. Accelerated failure time (AFT) models that
express the relationship between logarithm of survival time and covariates are used for such type of comparison. Three log-location-
scale models- Weibull, log-normal and log-logistic are evaluated to compare two treatment procedures of head-and-neck cancer data.
Censored data are analyzed under Bayesian framework using Stan language. The models are assessed on the basis of LOOIC and
WAIC.
Keywords: Accelerated failure time, log-location-scale, Head-and-neck cancer, Stan, LOOIC, WAIC
1 Introduction
In survival analysis main response variable is the time between a well defined origin and an event. Comparison of
treatments is frequently made in clinical studies. Researchers in this arena are interested to know whether a new
treatment procedure prolongs the survival process more than that of an existing standard treatment procedure.
Accelerated failure time models and proportional hazards models are mostly used for comparing treatments. Proportional
hazards (PH) model, proposed by [1] is a popular choice by the researchers for analyzing survival data. In PH models,
the main assumption is that the hazard rate of an individual is proportional to the hazard rate of another individual.
Logarithm of hazard ratio does not depend on time and, as such, no parametric model is required for survival times.
Under proportionality hazards assumption, logarithm of hazard rate is expressed in terms of linear combination of a
number of potential covariates and the effects of covariates are measured in terms of hazards.
Accelerated failure time(AFT) model is considered as an alternative to Cox’s proportional hazards model that study the
effect of the covariates on the time to event [2]. AFT models do not require the assumption of proportionality. [1]
mentioned accelerated life tests in his famous article ‘Regression models and life-tables’. AFT models establish
relationship between logarithm of survival time and covariates and have become popular among the survival analysis
researchers because of its easy interpretation in terms of lifetime. Professor Nancy Reid of University of Toronto had a
long conversation with Sir D. R. Cox on October 26 and 27, 1993. During the conversation, comparing PH and AFT
models, Cox mentioned, “that accelerated lifetime models are in many ways more appealing because of their direct
physical interpretation” [3].
In this paper, we discuss the head-and-neck cancer data with Bayesian modelling of Weibull, log-normal and log-logistic
distributions as survival models; Stan language is used for analysis, treatments are compared with estimated survival and
hazard curves and finally fitted models are assessed based on LOOIC and WAIC.
∗ Corresponding author e-mail: mdashrafulal@gmail.com
c 2021 NSP
Natural Sciences Publishing Cor.
716 Md. Ashraf-Ul-Alam, A. A. Khan: Comparison of Accelerated Failure Time Models:...
Location-scale models
An accelerated failure time model of survival time T is also known as a log-location-scale model as the distribution of
Y = log(T ) is a location-scale model. A random variable Y is said to have a location-scale distribution if its probability
density function (pdf ) f (y), cumulative distribution function (cdf ) F(y) and survival function S(y) have the following
form [5]:
1 y − µ
f (y|µ , σ ) = g
σ σ
y − µ
F(y|µ , σ ) = G (3)
σ
y − µ
S(y|µ , σ ) = S
σ
where t > 0, −∞ < y < ∞, µ (−∞ < µ < ∞) is a location parameter, σ (> 0) is a scale parameter, g(z) and G(z) are the
pdf and the cdf of standardized location-scale distribution of Z = (y − µ )/σ respectively.
A random variable T is said to have a log-location-scale distribution if Y = log(T ) has a location-scale distribution given
by Equation (3) and the pdf, cdf and survival function of T are given as [5]:
1 log(t) − µ
f (t|µ , σ ) = g
σt σ
log(t) − µ
F(t|µ , σ ) = G (4)
σ
log(t) − µ
S(t|µ , σ ) = S
σ
Suppose that a random variable Z has a standard form location-scale distribution with survival function S0 (z). Then the
survival function of T defined by log(t) = x′ β + σ z = µ + σ z can be written as
log(t) − µ
S(t) = Pr(T > t) = Pr(Z >
σ
log(t) − µ
= S0
σ
t 1/σ
∗
= S0 (5)
exp(x′ β )
c 2021 NSP
Natural Sciences Publishing Cor.
J. Stat. Appl. Pro. 10, No. 3, 715-738 (2021) / www.naturalspublishing.com/Journals.asp 717
where S0∗ (t) = S0 (log(t)) is the survival function of the standard form of log(t), survival time t is rescaled by exp(x′ β ) =
eµ and the effect of rescaling can be thought of as ‘accelerating time’ which is the rationale to consider
log-location-scale distributions as accelerated failure time models. The effect of the covariates in an accelerated failure
time model is to change the scale but not the location of a baseline distribution of survival time.
The present paper aims to discuss log-location-scale models of T , Weibull, log-normal and log-logistic corresponding to
standard location-scale, extreme value, normal and logistic distributions of Z = (log(T ) − µ )/σ for the analysis of
head-and-neck cancer data to compare the treatments whether they accelerate or decelerate the survival process.
Weibull distribution is a widely used model in reliability and survival analysis. Its hazard function is monotone increasing
and decreasing. Moreover, algebraic expressions for the survival and hazard functions can be obtained explicitly. Because
of flexibility and tractability of hazard and survival functions, Weibull model is popular among the researchers.
Suppose survival time T follows Weibull distribution with shape α (> 0) and scale parameter λ (> 0) then the pdf f (t),
survival function S(t) and hazard function h(t) of Weibull(α , λ ) distribution are given as follows [5]:
The density and hazard curves of Weibull model for different values of parameters are shown in Figure 1.
shape=0.5
shape=1.5
2.0
shape=3.0
1.0
1.5
shape=0.5
h(t)
f(t)
shape=1.5
shape=3.0
1.0
0.5
0.5
0.0
0 1 2 3 4 5 0 1 2 3 4 5
t t
(a) (b)
Fig. 1: Weibull density and hazard functions for different values of shape and scale is unity.
Log-normal distribution is a survival model whose hazard increases from zero to a maximum, then decreases to zero as
time approaches infinity [5]. The survival and hazard function of log-normal distribution can not be expressed explicitly.
Log-normal model does not behave well in the presence of heavy censoring. Accordingly, this model can be applied to
describe non-monotonic hazards only when the survival data do not contain many censored observations [7].
Suppose that lifetime T is such that Y = log(T ) follows normal distribution with mean µ , and variance σ 2 then T follows
c 2021 NSP
Natural Sciences Publishing Cor.
718 Md. Ashraf-Ul-Alam, A. A. Khan: Comparison of Accelerated Failure Time Models:...
log-normal distribution, LogNormal(µ , σ 2 ) with location parameter µ (−∞ < µ < ∞), scale parameter σ (> 0) having pdf
f (t), survival function S(t) and hazard function h(t) given below:
1 1 1 log(t) − µ 2
f (t|µ , σ ) = √ exp − ,t > 0
2π σ t σ
2
logt − µ
S(t|µ , σ ) = 1 − Φ (7)
σ
f (t|µ , σ )
h(t|µ , σ ) =
S(t|µ , σ )
Rz
where Φ (z) = √1 exp(−u2 /2)du.
−∞ 2π
The density and hazard functions of log-normal model for different values of the parameters are shown in Figure 2.
scale=1.50
6
scale=0.50
scale=0.25
1.5
5
4
1.0
scale=1.50
h(t)
f(t)
scale=0.50
3
scale=0.25
2
0.5
1
0.0
0 1 2 3 4 0 1 2 3 4
t t
(a) (b)
Fig. 2: Log-normal density and hazard functions for different values of scale and location is zero.
Log-logistic distribution is also a frequently used model in reliability and survival analysis. It has monotone decreasing or
non-monotone hazard function having a single maximum that increases to the maximum, then decreases thereafter [8,5].
Its shapes of density and hazard are similar to the shapes of the density and hazard of log-normal distribution. Log-logistic
model has explicit algebraic expression for survival and hazard functions which make it more suitable for the analysis
of censored survival data than the log-normal model. This is the only lifetime model that belongs to both accelerated
failure time model and proportional odds model. [9] and [10] explored the distribution as survival models from classical
framework. [11] studied this distribution as a reliability model under Bayesian perspective.
Suppose that survival time T follows log-logistic distribution with shape parameter α (> 0) and scale parameter λ (> 0).
Then the pdf f (t), survival function S(t) and hazard function h(t) of log-logistic distribution, LLogist(α , λ ) [5], are given
as follows:
α t α −1 h t α i−2
f (t|α , λ ) = 1+ ,t > 0
λ λ λ
h t α i−1
S(t|α , λ ) = 1 + (8)
λ
α t α −1 h t α i−1
h(t|α , λ ) = 1+
λ λ λ
The density and hazard curves of log-logistic model for different values of parameters are shown in Figure 3.
c 2021 NSP
Natural Sciences Publishing Cor.
J. Stat. Appl. Pro. 10, No. 3, 715-738 (2021) / www.naturalspublishing.com/Journals.asp 719
1.5
1.0
h(t)
f(t)
0.5
0.0
0 1 2 3 4 5 0 2 4 6 8
t t
(a) (b)
Fig. 3: Log-logistic density and hazard functions for different values of shape and scale is unity.
The fundamental assumption of Bayesian statistics is that parameters are random variables having prior distributions
p(θ ). In Bayesian analysis, we seek the exact distributions of parameters combining prior distribution and data which is
called posterior distribution of parameters. Bayesian statistics is based on Bayes theorem. Suppose that the data values y
= (y1 , y2 , . . . , yn ) are obtained independently from the model f (y|θ ), then the likelihood function is given by
n
L(θ |y) = f (y1 , y2 , . . . , yn |θ ) = ∏ f (yi |θ ) (9)
i=1
c 2021 NSP
Natural Sciences Publishing Cor.
720 Md. Ashraf-Ul-Alam, A. A. Khan: Comparison of Accelerated Failure Time Models:...
Taking logarithm of both sides of likelihood function, the log likelihood can be written by the
following two alternative equations:
n
l(σ , β |D) = ∑ δi log f (ti |σ , β ) − logS(ti |σ , β ) + logS(ti |σ , β )
(12)
i=1
n
l(σ , β |D) = ∑ δi (logh(ti |σ , β )) + logS(ti |σ , β ) (13)
i=1
[13] reported the survival times of two groups of head and neck cancer patients treated with two
treatments in a randomized clinical trial. The study was conducted by the Northern California
Oncology Group. One group of patients were treated with radiation therapy alone (RT) and the
patients in the other group were treated with radiation plus chemotherapy (RCT). Survival time was
measured in days. Table 1 shows the survival times of patients treated with RT and the survival times
of patients treated with RCT are given in Table 2. Censored observations are indicated with a plus
sign. Efron analyzed the data with classical parametric and nonparametric methods and compared
the survival curves under the two treatments. The treatment procedure RCT showed higher survival
than RT. [14] discussed the data with Bayesian approach using log-normal model.
Table 2: Survival times (in days) of 45 HNC patients treated with RCT
37, 84, 92, 94, 110, 112, 119, 127, 130, 133, 140, 146, 155,
159,169+, 173, 179, 194, 195, 209, 249, 281, 319, 339, 432, 469,
519,528+, 547+, 613+, 633, 725, 759+, 817, 1092+, 1245+, 1331+,
1557+, 1642+, 1771+, 1776, 1897+, 2023+, 2146+, 2297+
Stan is a probabilistic programming language for Bayesian analysis in the sense that a random variable
is a bonafide first-class object. In Stan, variables may be treated as random, and among the random
variables, some are observed and some are unknown and need to be estimated or used for posterior
predictive inference. It uses No-U-Turn (NUTS) sampler, an adaptive form of Hamiltonian Monte
Carlo sampling that is more efficient than other Metropolis-Hastings algorithms, specially for high-
dimensional models regardless of whether the priors are conjugate or not [15, 16]. A complete Stan
program consists of six code blocks. A sequence of programming statements surrounded by curly
braces {} form a block. A statement ends with a semi-colon. A comment in Stan is indicated by a
double slash //. Each block contains a list of instructions for specific tasks.In Stan, statements are
c 2021 NSP
Natural Sciences Publishing Cor.
J. Stat. Appl. Pro. 10, No. 3, 715-738 (2021) / www.naturalspublishing.com/Journals.asp 721
executed imperatively in the order in which they occur in a program . [17] called the language as
‘Stan’ in honor of Stanislaw Ulam, one of the creators of the Monte Carlo methods. The component
blocks of a Stan algorithm are described below.
–Data Block: The data code block declares the variables that must be input into the algorithm- the
type, dimension and name of every variable has to be declared.
–Transformed Data Block: The transformed data block may be used to define new variables that can
be computed based on the data. Any temporary variable used to store a transformation performed
on the data without the involvement of parameters should be defined here. The transformed data
block starts with a sequence of variable declarations and continues with a sequence of statements
defining the variables. For example, standardized versions of data can be defined in a transformed
data block.
–Parameters Block: In the parameters block, all the unknown model parameters are declared that
are to be sampled by Stan from the posterior density.
–Transformed Parameters Block: The transformed parameters are functions of data and parameters.
Any variable declared as a transformed parameter is part of the output produced for samples. Any
variable that is defined wholly in terms of data or transformed data should be declared and defined
in the transformed data block, defining such quantities in the transformed parameters block is legal,
but much less efficient than defining them as transformed data.
–Model Block: The model block contains the model specification. This block is the core of the
code structure in which the Bayesian model is defined. The variables defined in the model block
are local variables, i.e. other blocks do not know about the variable initialized in this block. After
defining the local variables, the model block defines a sampling statement. The sampling statement
indicates the priors and the likelihood. The default prior distribution for a parameter is uniform over
its support. Stan does not require proper priors.
–Generated Quantities Block: The generated quantities block allows values that depends on
parameters and data but do not affect the sampled parameter values. The block is executed only
after a sample has been generated. It may be used to calculate posterior expectations,
log-likelihood and deviances to generate predictions for new data and to carry out forward
simulation for posterior predictive check. Pseudo-random number generators are also available in
generated quantities block.
Assessing convergence of MCMC algorithm and evaluating model fit:
After implementing a Stan program, it is essential to check whether an MCMC algorithm converges
to the target posterior distribution, because all inference are made from the simulated samples from
the distribution. Convergence of MCMC sampling process to the target posterior distribution is
checked quantitatively by potential scale reduction factor, R̂ [18], effective sample size, n e f f and
Monte Carlo (MC) error, se mean and visually by trace plot and autocorrelation plot [19, 20, 21]. R̂
is defined based on between chain variance and within chain variance and is approximately 1 if
convergence is reached. The effective sample size n e f f is a measure of the number of independent
samples from the posterior distribution. The larger the effective sample size, the greater the precision
of the MCMC estimates. Monte Carlo error, se mean is a measure of variability of each estimate due
to simulation and it is obtained by dividing standard deviation (sd) by the square root of the effective
sample size. A low MC error, relative to standard deviation will result in a higher number of
independent samples, which is expected. [19] recommended acceptable limit of effective sample size
as n e f f = 100 and of potential scale reduction factor R̂ < 1.1.
Visual interpretation of convergence is also important. Plotting the values of draws against the
iteration, trace plot is obtained. If all the values are within a band showing no discernible periodic
c 2021 NSP
Natural Sciences Publishing Cor.
722 Md. Ashraf-Ul-Alam, A. A. Khan: Comparison of Accelerated Failure Time Models:...
tendencies, then convergence can be assumed. The adjacent samples produced by MCMC algorithms
are autocorrelated. If the values of autocorrelation function quickly decrease to 0 with the increase of
lag, the distance between successive samples and the MCMC algorithm can be said to be converged.
A chain is converged means that it is a stationary chain and adding more samples will not
meaningfully change the location and shape of the density of the posterior distribution and so will
not change the estimates and other relevant results.
A fitted Bayesian model is accepted as adequate if it predicts the future observations that are consistent
with the present data. Posterior predictive density plot is used for evaluating model fit.
Suppose that a random variable Z has a standard extreme value distribution with density function
g(z) = exp[z − exp(z)] and survival function S(z) = exp[−exp(z)]. Substituting z = (logt − x′ β )/σ
from Equation (1) in the extreme valuedistribution and using Equation (4) and Equation (6), Weibull
AFT model, T ∼ Weibull σ1 , exp(x′ β ) , is obtained as follows:
σ −1 t (1/σ −1) t 1/σ
f (t|σ , β ) = exp −
′
exp(x β ) exp(x β ) ′ ′
exp(x β )
t 1/σ
S(t|σ , β ) = exp − (14)
exp(x′ β )
σ −1 (1/σ −1)
t
h(t|σ , β ) =
′ ′
exp(x β ) exp(x β )
where h(t) = f (t)/S(t).
Bayesian fitting of Weibull AFT model:
The weakly informative prior distribution for scale parameter σ is considered half-Cauchy(0, 25) and
for the regression coefficients β as normal (0, 100). That is, p(σ ) = half-Cauchy(0, 25) and p(β j )
= normal(0, 100) [22]. [23] used half-Cauchy(0, 25) as prior for scale parameter. Thus, the joint
posterior distribution of the parameters (σ , β ) = (σ , β0, β1 , . . . , β p) of Weibull AFT model can be
written by Equation (10) and (11), assuming the parameters are independent, as follows:
c 2021 NSP
Natural Sciences Publishing Cor.
J. Stat. Appl. Pro. 10, No. 3, 715-738 (2021) / www.naturalspublishing.com/Journals.asp 723
are calculated and stored for future use. Three functions, i.e. (i) log survival function, (ii) log hazard
function and (iii) log-likelihood function are defined at the beginning, before writing the code
blocks. Stan codes are written in the editor RStudio [24]. Stan has interface with R [25] by rstan
[26].
library(rstan)
stancode_waft = "
functions{
// defines the log survival
vector log_S (vector t, real shape, vector scale){
vector[num_elements(t)] log_S;
for (i in 1:num_elements(t)){
log_S[i] = weibull_lccdf(t[i]|shape, scale[i]);
}
return log_S;
}
//defines the log hazard
vector log_h (vector t,real shape, vector scale){
vector[num_elements(t)] log_h ;
vector[num_elements(t)] ls ;
ls = log_S(t,shape, scale) ;
for (i in 1:num_elements(t)){
log_h[i] = weibull_lpdf(t[i]|shape,scale[i])-ls[i];
}
return log_h;
}
//defines the log likelihood for right censored data
real surv_weibull_lpdf( vector t,vector d,
real shape,vector scale){
vector[num_elements(t)] log_lik;
real prob;
log_lik = d .* log_h(t,shape,scale)+log_S(t,shape,scale);
prob = sum(log_lik);
return prob;
}
}
//data block
data{
int N; // number of observations
vector <lower=0> [N] y; // observed times
vector <lower=0,upper=1> [N] event;//censoring (1=obs.,
// 0=cens.)
int M; // number of covariates
matrix[N,M] x;//matrix of covariates (N rows, M columns)
}
//parameters block
parameters{
c 2021 NSP
Natural Sciences Publishing Cor.
724 Md. Ashraf-Ul-Alam, A. A. Khan: Comparison of Accelerated Failure Time Models:...
c 2021 NSP
Natural Sciences Publishing Cor.
J. Stat. Appl. Pro. 10, No. 3, 715-738 (2021) / www.naturalspublishing.com/Journals.asp 725
Model fitting
To fit the Weibull AFT model under Bayesian framework and to simulate from the posterior
distribution, the function stan() from the package rstan [26] is called and a stanfit object M1
(say) is created. In Stan, default choices for chain and iteration are 4 and 2000 respectively. We have
also fixed 4 chains and 2000 iterations for each chain. To explain, for each of the 4 chains 2000
samples are drawn for each of the parameters. Stan uses half of the iterations as warmup iterations,
so post-warmup draws per chain is 1000.
M1 <- stan(model_code=stancode_waft,data=dat1,
iter=2000,chains=4)
Summarizing output of Stanfit Weibull AFT model:
Using print() command, summary results are obtained from the fitted object M1 and are reported in
Table 3. Trace plots and autocorrelation plots are made for visual convergence checking. For posterior
predictive density plot bayesplot package [27] and for model comparison criteria LOOIC and
WAIC, loo package [28] are used and they are reported in Table 6.
print(M1,c("beta","sigma"),digits=3,
probs= c(0.025,0.50,0.975))
require(bayesplot)
require(loo)
stan_trace(M1, pars=c("beta","sigma"))+
ggtitle("Trace plot (Weibull AFT model)")
stan_ac(M1, pars=c("beta","sigma"))+grid_lines()+
ggtitle("Autocorrelation plot (Weibull AFT model)")
#posterior predictive check
# posterior predictive value y_rep
y_rep <- as.matrix(M1,pars="y_rep")
ppc_dens_overlay(y,y_rep[100:130,])+grid_lines()+
ggtitle("PPD plot (Weibull AFT model)")
# Caterpillar plot for showing credible interval
stan_plot(M1,pars=c("beta","sigma"),ci_level=0.95)+
grid_lines()+
ggtitle("Caterpillar plot (Weibull AFT model)")
#calculating LOOIC and WAIC using loo package
log_lik_1 <- extract_log_lik(M1,parameter_name="log_lik",
c 2021 NSP
Natural Sciences Publishing Cor.
726 Md. Ashraf-Ul-Alam, A. A. Khan: Comparison of Accelerated Failure Time Models:...
merge_chains = TRUE)
loo_1 <- loo(log_lik_1,r_eff=NULL,save_psis=FALSE)
print(loo_1)
waic1 <- waic(log_lik_1)
print(waic1)
1.5
6.5
1.50
chain
1.0 1
2
6.0 1.25 3
0.5 4
1000 1250 1500 1750 2000 1000 1250 1500 1750 2000 1000 1250 1500 1750 2000
Fig. 4: Trace plot of fitted Weibull AFT model are obtained by plotting parameter values along the Y-axis against their corresponding
iterations along the X-axis and there is no tendency of periodicity of the plot showing convergence of the algorithm.
c 2021 NSP
Natural Sciences Publishing Cor.
J. Stat. Appl. Pro. 10, No. 3, 715-738 (2021) / www.naturalspublishing.com/Journals.asp 727
1.00
0.75
Avg. autocorrelation
0.50
0.25
0.00
0 10 20 0 10 20 0 10 20
Lag
Fig. 5: Autocorrelation plot of fitted Weibull AFT model shows that autocorrelation drops to values close to zero at around lags of 4.
Fig. 6: Posterior predictive density (PPD) plot of Weibull AFT model is done by plotting the data y and then overlaying the density of
the predicted values y rep. The plot shows that the posterior predictive density fits the data well.
of the treatment is statistically significant. The acceleration factor is exp(0.794) = 2.21 for a patient
treated with RCT. The time to death of a patient treated with RCT is therefore delayed by a factor of
about 2.21, compared to a patient treated with RT under Weibull AFT model.
Fitted survival curves and hazard curves are drawn in Figure 8 and the curves resemble the numerical
results that a patient treated with RCT would survive longer than a patient treated with RT.
beta[1]
beta[2]
sigma
0 2 4 6
Fig. 7: Caterpillar plot of Weibull AFT model shows that 95% credible intervals of the parameters do not include zero value so the
parameters are statistically significant.
c 2021 NSP
Natural Sciences Publishing Cor.
728 Md. Ashraf-Ul-Alam, A. A. Khan: Comparison of Accelerated Failure Time Models:...
h(t)
0 500 1500 2500 0 500 1500 2500
t t
(a) (b)
Fig. 8: Fitted survival curves and hazard curves of Weibull AFT model. Survival curve is higher and hazard rate is lower for the patients
treated with RCT than that of the patients treated with RT.
Suppose that a random variable Z has a standard normal distribution with density function
g(z) = N(0, 1) and survival function S(z) = 1 − Φ (z). Substituting z = (logt − x′ β )/σ from
Equation (1) in the standard normal distribution and using Equation (4) and(7) log-normal AFT
model, T ∼ log-normal x′ β , σ 2 ) is obtained, as follows:
1 log(t) − x′β 2
1 1
f (t|β , σ ) = √ exp −
2π σ t 2 σ
logt − x′ β
S(t|β , σ ) = 1 − Φ (16)
σ
f (t|β , σ )
h(t|β , σ ) =
S(t|β , σ )
Bayesian fitting of log-normal AFT model
The weakly informative prior distributions are considered p(σ ) = half-Cauchy(0, 25) and p(β j ) =
normal(0, 100) [22]. Thus, the joint posterior distribution of the parameters
(σ , β ) = (σ , β0, β1 , . . ., β p ) of log-normal AFT model can be written by Equation (10) and (11),
assuming independence of the parameters, as below:
c 2021 NSP
Natural Sciences Publishing Cor.
J. Stat. Appl. Pro. 10, No. 3, 715-738 (2021) / www.naturalspublishing.com/Journals.asp 729
vector[num_elements(t)] log_S ;
for (i in 1:num_elements(t)){
log_S[i] = lognormal_lccdf(t[i]|location[i],scale);
}
return log_S;
}
//defines the log hazard
vector log_h (vector t,vector location,real scale){
vector[num_elements(t)] log_h ;
vector[num_elements(t)] ls ;
ls = log_S(t,location, scale) ;
for (i in 1:num_elements(t)){
log_h[i] = lognormal_lpdf(t[i]|location[i],scale)-
ls[i];
}
return log_h;
}
//defines the sampling distribution for right censored data
real surv_lognormal_lpdf( vector t,vector d,
vector location,real scale){
vector[num_elements(t)] log_lik;
real prob;
log_lik = d .* log_h(t, location, scale) +
log_S(t, location, scale);
prob = sum(log_lik);
return prob;
}
}
//data block
data{
int N; // number of observations
vector <lower=0> [N] y; // observation vector
vector <lower=0,upper=1> [N] event;//censoring(1=obs.,
// 0=cens.)
int M; // number of covariates
matrix [N,M] x;//matrix of covariates (N rows, M columns)
}
//parameters block
parameters{
vector [M] beta;//coeff. in the linear predictor
real<lower=0> sigma; // scale parameter sigma=1/shape
}
// transformed parameters block
transformed parameters{
vector[N] linpred;
vector[N] mu;
linpred = x*beta;//linear predictor
c 2021 NSP
Natural Sciences Publishing Cor.
730 Md. Ashraf-Ul-Alam, A. A. Khan: Comparison of Accelerated Failure Time Models:...
for (i in 1:N){
mu[i] = linpred[i];
}
}
// model block
model{
sigma ˜ cauchy(0,25);// prior for sigma
beta ˜ normal(0,100);//prior for beta coefficients
y ˜ surv_lognormal(event,mu,sigma); // model for data
}
//generated quantities block
generated quantities{
vector[N] y_rep;//posterior predictive value
vector[N] log_lik;//log-likelihood
for(n in 1:N)
log_lik[n] = (((lognormal_lpdf(y[n]|(x[n,]*beta),sigma))-
(lognormal_lccdf(y[n]|(x[n,]*beta),sigma)))*
event[n])+
(lognormal_lccdf(y[n]|(x[n,]*beta),sigma));
for(n in 1:N)
y_rep[n] = lognormal_rng((x[n,]*beta), sigma);
}
"
The whole code blocks are saved as stancode lnaft. The same head and neck cancer data
object (dat1) prepared for Weibull AFT model is applied here for fitting log-normal AFT model.
Convergence diagnostics and evaluating model fit for log-normal AFT model:
The summary results show that R̂ is 1, n e f f is greater than 100 and se mean is less relative to the
c 2021 NSP
Natural Sciences Publishing Cor.
J. Stat. Appl. Pro. 10, No. 3, 715-738 (2021) / www.naturalspublishing.com/Journals.asp 731
standard deviations for all of the parameters that indicate convergence of MCMC algorithm. Trace plot
(Figure 9) and autocorrelation plot (Figure 10) also show that the MCMC sampling process converged
to the joint posterior distribution. Moreover, from posterior predictive density plot (Figure 11), it is
observed that log-normal model is well suited to the data.
1.8
1.5
6.0 1.6
1.0
chain
1
1.4
5.6 2
0.5 3
4
1.2
0.0
5.2
Fig. 9: Trace plot of fitted log-normal AFT model shows no tendency of periodicity indicating convergence of the algorithm.
1.00
0.75
Avg. autocorrelation
0.50
0.25
0.00
0 10 20 0 10 20 0 10 20
Lag
Fig. 10: Autocorrelation plot of fitted log-normal AFT model shows that autocorrelation drops to values close to zero as lag increases.
Fig. 11: Posterior predictive density (PPD) plot of log-normal AFT model shows that the PPD fits the data well.
c 2021 NSP
Natural Sciences Publishing Cor.
732 Md. Ashraf-Ul-Alam, A. A. Khan: Comparison of Accelerated Failure Time Models:...
include zero. It is evident from the credible interval and caterpillar plot (Figure 12) that the coefficient
of the treatment is statistically significant. The acceleration factor is exp(0.623) = 1.86 that explains
that the time to death of a patient treated with RCT is delayed by a factor of about 1.86, compared to
a patient treated with RT under log-normal AFT model.
Fitted survival curves and hazard curves are drawn in Figure 13 and the curves mimic the quantitative
results that a patient treated with RCT would decelerate death than a patient treated with RT.
beta[1]
beta[2]
sigma
0 2 4 6
Fig. 12: Caterpillar plot of log-normal AFT model shows that 95% credible intervals of the parameters do not include zero value, so
the parameters are statistically significant.
RT RT
RCT RCT
S(t)
h(t)
t t
(a) (b)
Fig. 13: Fitted survival curves and hazard curves of log-normal AFT model. Survival curve is higher and hazard rate is lower for the
patients treated with RCT than that of the patients treated with RT.
c 2021 NSP
Natural Sciences Publishing Cor.
J. Stat. Appl. Pro. 10, No. 3, 715-738 (2021) / www.naturalspublishing.com/Journals.asp 733
The likelihood function, L(σ , β |D) is obtained by substituting f (ti |σ , β ) and S(ti|σ , β ) from
Equation (17) in Equation (11). The joint posterior distribution is obtained using Bayesian software
Stan and MCMC algorithm is implemented to find the estimates and other relevant results. Stan
codes are given in the following section.
Stan code for fitting Bayesian Log-logistic AFT model:
Stan codes are written with comments for fitting the log-logistic accelerated failure time (AFT)
model under Bayesian setting.
library(rstan)
stancode_llaft = "
functions{
// defines the log survival
vector log_S (vector t,real shape,vector scale){
vector[num_elements(t)] log_S ;
for (i in 1:num_elements(t)){
log_S[i] = -log(1+(t[i]/scale[i])ˆshape);
}
return log_S;
}
//defines the log hazard
vector log_h (vector t,real shape,vector scale){
vector[num_elements(t)] log_h ;
vector[num_elements(t)] ls ;
ls = log_S(t,shape, scale) ;
for (i in 1:num_elements(t)){
log_h[i] = log(shape)-shape*log(scale[i])+
(shape-1)*log(t[i])-
2*log(1+(t[i]/scale[i])ˆshape)-ls[i];
}
return log_h;
}
//defines the log likelihood for right censored data
real surv_llogist_lpdf( vector t,vector d,
real shape,vector scale){
vector[num_elements(t)] log_lik;
real prob;
log_lik = d .* log_h(t,shape,scale)+
log_S(t,shape,scale);
prob = sum(log_lik);
c 2021 NSP
Natural Sciences Publishing Cor.
734 Md. Ashraf-Ul-Alam, A. A. Khan: Comparison of Accelerated Failure Time Models:...
return prob;
}
}
//data block
data{
int N; // number of observations
vector <lower=0> [N] y;//observation vector(times)
vector <lower=0,upper=1> [N] event;//censoring (1=obs.,
// 0=cen.)
int M; // number of covariates
matrix[N,M] x;//matrix of covariates(N rows, M columns)
}
//parameters block
parameters{
vector [M] beta;//coeff.in the linear predictor
real<lower=0> sigma;//scale parameter sigma=1/shape
}
// transformed parameters block
transformed parameters{
vector[N] linpred;
vector[N] mu;
linpred = x*beta;//linear predictor
for (i in 1:N){
mu[i] = exp(linpred[i]);
}
}
// model block
model{
sigma ˜ cauchy(0,25);//prior for sigma
beta ˜ normal(0,100);//prior for beta coefficients
y ˜ surv_llogist(event,1/sigma,mu);//density for data
}
// generated quantities block
generated quantities{
vector[N] y_rep;//posterior predictive value
vector[N] log_lik;// log-likelihood
{ for(n in 1:N)
log_lik[n] = (((log(1/sigma)-(1/sigma)*(x[n,]*beta)+
((1/sigma)-1)*log(y[n])-
2*log(1+(y[n]/(exp(x[n,]*beta)))ˆ(1/sigma)))-
(-log(1+(y[n]/(exp(x[n,]*beta)))ˆ(1/sigma))))*
event[n])+
(-log(1+(y[n]/(exp(x[n,]*beta)))ˆ(1/sigma)));
}
{ real u;
u=uniform_rng(0,1);
for (n in 1:N){
c 2021 NSP
Natural Sciences Publishing Cor.
J. Stat. Appl. Pro. 10, No. 3, 715-738 (2021) / www.naturalspublishing.com/Journals.asp 735
1.2
1.5
6.0 1.0
1.0
chain
1
0.5 2
3
5.5 0.8
4
0.0
0.6
5.0 −0.5
1000 1250 1500 1750 2000 1000 1250 1500 1750 2000 1000 1250 1500 1750 2000
Fig. 14: Trace plot of fitted log-logistic AFT model depicts no tendency of periodicity showing convergence of the algorithm.
c 2021 NSP
Natural Sciences Publishing Cor.
736 Md. Ashraf-Ul-Alam, A. A. Khan: Comparison of Accelerated Failure Time Models:...
1.00
0.75
Avg. autocorrelation
0.50
0.25
0.00
0 10 20 0 10 20 0 10 20
Lag
Fig. 15: Autocorrelation plot of fitted log-logistic AFT model shows that autocorrelation drops to values close to zero as lag increases.
Fig. 16: Posterior predictive density (PPD) plot of log-logistic AFT model shows that the PPD fits the data well.
beta[1]
beta[2]
sigma
0 2 4 6
Fig. 17: Caterpillar plot of log-logistic AFT model shows that 95% credible intervals of the parameters do not contain zero value so the
parameters are statistically significant.
c 2021 NSP
Natural Sciences Publishing Cor.
J. Stat. Appl. Pro. 10, No. 3, 715-738 (2021) / www.naturalspublishing.com/Journals.asp 737
0.0030
0.2 0.4 0.6 0.8 1.0
RT RT
RCT RCT
0.0020
S(t)
h(t)
0.0010
0.0000
0 500 1500 2500 0 500 1500 2500
t t
(a) (b)
Fig. 18: Survival curve is higher and hazard rate is lower for the patients treated with RCT than that of the patients treated with RT.
4 Model Comparison
Selecting the best model from among the several competitive models is always crucial in Bayesian
statistics and in classical statistics as well. Based on information criteria Leave-one-out cross
validation LOO and Widely Applicable or Watanabe Akaike Information Criterion WAIC [29, 30, 19]
the fitted models are compared. Pointwise log-likelihoods are calculated in the generated quantities
block of Stan program and afterwards ‘loo-package’ [28] extracts and uses these quantities to
obtain numerical measures LOOIC (LOO information criterion) or WAIC for model comparison. A
model with smaller LOOIC or WAIC is a better fitted model than the others. On the basis of these
measures (Table 6), the log-normal and log-logistic models are almost indistinguishable in fitting the
head-and-neck cancer data. However, both the models fit the data better than the Weibull model.
Table 6: LOOIC and WAIC for model comparison and their standard errors (SE)
Model LOOIC SE WAIC SE
Log-logistic 1067.5 47.4 1067.5 47.4
Log-normal 1066.4 48.0 1066.4 48.0
Weibull 1082.8 48.7 1082.8 48.7
5 Conclusion
Three accelerated failure time models- Weibull, log-normal and log-logistic are fitted under Bayesian
framework to the head and neck cancer data. For all the models, treatment variable was statistically
significant. The acceleration factor was greater than one, i.e. survival time is longer for the patients
treated with radiation and chemotherapy (RCT) than that of the patients treated with radiation therapy
(RT) only. Considering posterior predictive density plots for the models and comparing LOOIC and
WAIC, it can be concluded that log-normal model fits the data better than log-logistic and Weibull
models.
Conflict of Interest
c 2021 NSP
Natural Sciences Publishing Cor.
738 Md. Ashraf-Ul-Alam, A. A. Khan: Comparison of Accelerated Failure Time Models:...
References
[1] D.R. Cox, Journal of the Royal Statistical Society: Series B (Methodological), 34(2), 187–202 (1972).
[2] L.-J. Wei, Statistics in Medicine, 11(14-15), 1871–1879 (1992).
[3] N. Reid, Statistical Science, 9(3), 439–455 (1994).
[4] D. Collett, Modelling Survival Data in Medical Research, Chapman and Hall/CRC, 2015.
[5] J.F. Lawless, Statistical Models and Methods for Lifetime data, John Wiley & Sons, 2003.
[6] E.T. Lee, and J. Wang, Statistical Methods for Survival Data Analysis, John Wiley & Sons, 2013.
[7] X. Liu, Survival Analysis: Models and Applications, John Wiley & Sons, 2012.
[8] D.R. Cox and D. Oakes, Analysis of Survival Data, Chapman and Hall, 1984.
[9] S. Bennett, Journal of the Royal Statistical Society: Series C (Applied Statistics), 32(2), 165–171 (1983).
[10] J. O’Quigley and L. Struthers, Computer Programs in Biomedicine, 15(1), 3–11 (1982).
[11] M.T. Akhtar and A.A. Khan, American Journal of Mathematics and Statistics, 4(3), 162–170 (2014).
[12] X. Wang, Y.R. Yue and J.J. Faraway, Bayesian Regression Modeling with INLA, Chapman and Hall/CRC, 2018.
[13] B. Efron, Journal of the American Statistical Association, 83(402), 414–425 (1988).
[14] P. Makkar, P.K. Srivastava, R.S. Singh and S.K. Upadhyay, Communications in Statistics-Theory and Methods, 43(2), 392–407
(2014).
[15] B. Carpenter, A. Gelman, M.D. Hoffman, D. Lee, B. Goodrich, M. Betancourt, M. Brubaker, J. Guo, P. Li and A. Riddell, Journal
of Statistical Software, 76(1), (2017).
[16] M.D. Hoffman and A. Gelman, Journal of Machine Learning Research, 15(1), 1593–1623 (2014).
[17] Stan Development Team, Stan Modeling Language Users Guide and Reference Manual, Version 2.16.0, 2017, http://mc-stan.org/.
[18] A. Gelman and D.B. Rubin, Statistical Science, 7(4), 457–472 (1992).
[19] A. Gelman, H.S. Stern, J.B. Carlin, D.B. Dunson, A. Vehtari and D.B. Rubin, Bayesian Data Analysis, Chapman and Hall/CRC,
2013.
[20] I. Ntzoufras, Bayesian Modeling using WinBUGS, Vol. 698, John Wiley & Sons, 2009.
[21] G. Hamra, R. MacLehose and D. Richardson, International Journal of Epidemiology, 42(2), 627–634 (2013).
[22] A. Gelman, Bayesian analysis, International Society for Bayesian Analysis, 1(3), 515–534 (2006).
[23] N. Khan and A.A. Khan, Austrian Journal of Statistics, 47(4), 1–15 (2018).
[24] RStudio Team, RStudio: Integrated Development Environment for R, Boston, MA, http://www.rstudio.com/, 2015.
[25] R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna,
Austria, http://www.R-project.org/, 2017.
[26] Stan Development Team, RStan: the R interface to Stan, R package version 2.17.3, http://mc-stan.org/, 2018.
[27] J. Gabry, T. Mahr, P.-C. Bürkner, M. Modrák and M. Barrett, bayesplot: Plotting for Bayesian models, R paackage version 1.6.0,
http://mc-stan.org/bayesplot, 2018.
[28] A. Vehtari, A. Gelman and J. Gabry and Y. Yao, loo: Efficient leave-one-out cross-validation and WAIC for Bayesian models, R
package version 2.0. 0, 2018.
[29] A. Vehtari, A. Gelman and J. Gabry, Statistics and Computing, (27)5, 1413–1432 (2017).
[30] R. McElreath, Statistical rethinking: A Bayesian course with examples in R and Stan, Chapman and Hall/CRC, 2015.
[31] T. Oetiker, H. Partl, I. Hyna and E. Schlegel, The Not So Short Introduction to LATEX 2e or LATEX 2e in 157 minutes, 2016.
c 2021 NSP
Natural Sciences Publishing Cor.