Allergic Rhinitis RCT Data Analysis
Allergic Rhinitis RCT Data Analysis
Abstract
The article Efficacy of Bifidobacterium longum and Lactobacillus plantarum (NVP-1703) in Children With
Allergic Rhinitis: A Randomized Controlled Trial (https://doi.org/10.3346/jkms.2024.39.e266) appeared in the
Journal of Korean Medical Science on the 21st of October 2024. It can be found online here:
https://jkms.org/pdf/10.3346/jkms.2024.39.e266
Upon reading it, we noticed some inconsistencies with the numerical data presented in Table 2.
This prompted further investigation which is the topic of this report. While we accuse no one of fraud, the
elements found are beyond simple chance and we found no discussion of these issues in the paper. Since
these inconsistencies harm the credibility of the paper's conclusion, we think this prompts further
investigation.
All details of the analysis have been compiled in a single R script that is given in section A1 for verification.
This report was written on the 7th of November 2024 by Cédric Picard. This analysis is unaffiliated with any
private or public research establishment or effort and we report no conflict of interest.
1
Issue 1: invalid changes from baseline
Table 2 provides changes from baseline. These are the main statistics on which stands the relevancy of the
paper's results. In all cases the change from baseline is the difference between the baseline mean and the
studied mean.
However this turns out not to be always the case in the paper :
2
Issue 3: impossible means
The TNSS presented in Table 2 is the sum of 4 evaluations ranked from 0 to 3. These are integers, and
therefore the TNSS should be an integer as well. The same is true for the NSDS.
This matters because not all values are possible for means of integers. This insight is the key behind the
GRIM test (https://doi.org/10.7287/peerj.preprints.2064v1).
The test is best explained by hand with an example. Let's say we have a mean of 5.49, computed over 32
TNSS scores. Since the scores are integers, the sum must be an integer as well. What was it?
5.49×32=175.68 isn't an integer, but it's expected with rounding. The sum must have been either 175 or 176.
3
Issue 4: improbable standard deviations
In issue 2 we discussed the fact that there is a strong link between the means of morning, afternoon and
overall day measurements. For the same reason the variances of these measurements should also be
correlated although this correlation is much less direct.
We decided to study that point by testing the hypothesis that the reported standard deviation for the overall
day does not fit a distribution that is a combination of the two reported distributions for the morning and
evening of the same day. The null hypothesis is that the overall day standard deviation is coherent with a
distribution resulting from the combination of the two reported distributions for morning and evening of the
same day.
The idea we developed to test this is, for each day, to sample randomly from the morning and evening
distributions, combine them into an overall day dataset and compute the standard deviation of this combined
sample. By doing this several thousand times we can identify the distribution of probability corresponding to
the standard deviation. We then computed the p-value for the reported standard deviation over the computed
distribution of standard deviations. Finally we adjust them to be two-tailed p-values and for multiple
comparisons using the Benjamini-Hochberg procedure.
Here are the results (some variation is expected from random sampling):
Figure 3: SD checks
As we see, all adjusted p-values are below 0.02. We consider all these differences significant. All reported
standard deviations are therefore improbable.
4
Conclusion
Here is a cartography of all inconsistencies found:
References
• The GRIM test: A simple technique detects numerous anomalies in the reporting of results in
psychology. https://doi.org/10.7287/peerj.preprints.2064v1
5
A1: R script for analysis
This script reproduces the analysis performed. It requires the CSV file in section A2 that contains all
numerical data from Table 2.
Example output (some variability is to be expected due to random sampling):
GRIM test failures: 72.5 %
5.49 11.08 4.92 9.63 4.64 4.7 5.13 5.24 5.36 4.85 4.63 9.48 4.8 9.52 5.55
11.13 4.27 3.88 8.15 3.65 3.41 7.07 5.6 11.32 4.51 4.18 3.79 3.63 7.41
SD check: Adjusted p_values: 0.0107 0.0107 0.0107 0.0107 0.0107 0.0107 0.0107 0.0107 0.0107 0.0107 0.0107 0.0107
All SD checks significantly failed (p<0.05)
data_analysis.r
data = read.table(file="jkms_data.csv",
header=TRUE, sep=",", stringsAsFactor=TRUE)
6
check_sd <- function(n, mean_1, sd_1, mean_2, sd_2, reported_sd, round=FALSE) {
total <- c()
for (i in 1:10000) {
data_1 <- rnorm(n, mean_1, sd_1)
data_2 <- rnorm(n, mean_2, sd_2)
if (round) {
data_1 <- round(data_1)
data_2 <- round(data_2)
}
total <- c(total, sd(data_1 + data_2))
}
p
}
# Check variance in all cases where we have AM, PM and AM+PM data (weeks 0 to 4)
p_values <- c()
invalid_mean_sum_count <- 0
for (group in unique(data$group)) {
n <- data[data$group == group, ]$n[1]
7
sde <- data$sd[ data$group == group
& data$test == test
& data$week == week
& data$period == "AMPM"]
if (invalid_mean_sum_count > 0) {
number_mean_sum <- length(na.omit(data$mean[data$period == "AMPM"]))
cat("Percentage of invalid mean sum",
round(invalid_mean_sum_count * 100 / number_mean_sum, 1),
"%\n")
}
cat("\n")
8
& data$period == period]
if (invalid_change_count > 0) {
number_mean_sum <- length(na.omit(data$mean_change))
cat("Percentage of invalid mean sum",
round(invalid_change_count * 100 / number_mean_sum, 1),
"%\n")
}
cat("\n")
9
A2: CSV file of table 2 data
File jkms_data.csv mentioned in the script A1.
group,n,test,week,period,mean,sd,mean_change,sd_change
control,32,TNSS,0,AM,5.59,2.07,,
control,32,TNSS,0,PM,5.49,1.95,,
control,32,TNSS,0,AMPM,11.08,3.83,,
control,32,TNSS,2,AM,4.92,1.91,-0.67,2.01
control,32,TNSS,2,PM,4.72,2.16,-0.78,1.49
control,32,TNSS,2,AMPM,9.63,3.83,-1.45,3.11
control,32,TNSS,4,AM,4.64,2.52,-1.02,2.62
control,32,TNSS,4,PM,4.70,2.55,-0.88,2.22
control,32,TNSS,4,AMPM,9.34,5.02,-1.90,4.64
control,32,TNSS,6,AM,,,,
control,32,TNSS,6,PM,5.13,3.05,-0.37,2.82
control,32,TNSS,6,AMPM,,,,
control,32,NSDS,0,AM,5.24,2.13,,
control,32,NSDS,0,PM,5.36,2.10,,
control,32,NSDS,0,AMPM,10.59,4.14,,
control,32,NSDS,2,AM,4.85,2.14,-0.39,1.67
control,32,NSDS,2,PM,4.63,2.39,-0.73,1.49
control,32,NSDS,2,AMPM,9.48,4.31,-1.12,2.87
control,32,NSDS,4,AM,4.72,2.53,-0.60,2.29
control,32,NSDS,4,PM,4.80,2.58,-0.65,2.18
control,32,NSDS,4,AMPM,9.52,5.06,-1.25,4.35
control,32,NSDS,6,AM,,,,
control,32,NSDS,6,PM,5.03,2.96,-0.33,2.75
control,32,NSDS,6,AMPM,,,,
treatment,36,TNSS,0,AM,5.55,2.11,,
treatment,36,TNSS,0,PM,5.58,2.27,,
treatment,36,TNSS,0,AMPM,11.13,4.25,,
treatment,36,TNSS,2,AM,4.27,1.74,-1.28,1.87
treatment,36,TNSS,2,PM,3.88,1.71,-1.70,1.72
treatment,36,TNSS,2,AMPM,8.15,3.28,-2.99,3.46
treatment,36,TNSS,4,AM,3.65,1.69,-1.90,2.07
treatment,36,TNSS,4,PM,3.41,1.76,-2.17,2.14
treatment,36,TNSS,4,AMPM,7.07,3.32,-4.07,4.05
treatment,36,TNSS,6,AM,,,,
treatment,36,TNSS,6,PM,4.25,2.42,-1.33,2.78
treatment,36,TNSS,6,AMPM,,,,
treatment,36,NSDS,0,AM,5.72,2.07,,
treatment,36,NSDS,0,PM,5.60,2.21,,
treatment,36,NSDS,0,AMPM,11.32,4.20,,
treatment,36,NSDS,2,AM,4.51,2.15,-1.21,1.65
treatment,36,NSDS,2,PM,4.18,2.04,-1.42,1.49
treatment,36,NSDS,2,AMPM,8.69,4.12,-2.63,3.02
treatment,36,NSDS,4,AM,3.79,1.97,-1.94,1.97
treatment,36,NSDS,4,PM,3.63,2.21,-1.97,2.05
treatment,36,NSDS,4,AMPM,7.41,4.09,-3.91,3.91
treatment,36,NSDS,6,AM,,,,
treatment,36,NSDS,6,PM,4.17,2.17,-1.43,2.10
treatment,36,NSDS,6,AMPM,,,,
10