0% found this document useful (0 votes)
8 views10 pages

Allergic Rhinitis RCT Data Analysis

The document analyzes inconsistencies in a randomized controlled trial on the efficacy of Bifidobacterium longum and Lactobacillus plantarum in children with allergic rhinitis. Key issues identified include invalid changes from baseline, incorrect sums of means, impossible means, and improbable standard deviations, affecting the credibility of the study's conclusions. The analysis suggests that these discrepancies warrant further investigation to ensure the reliability of the reported results.

Uploaded by

jdhffszlagsxyv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views10 pages

Allergic Rhinitis RCT Data Analysis

The document analyzes inconsistencies in a randomized controlled trial on the efficacy of Bifidobacterium longum and Lactobacillus plantarum in children with allergic rhinitis. Key issues identified include invalid changes from baseline, incorrect sums of means, impossible means, and improbable standard deviations, affecting the credibility of the study's conclusions. The analysis suggests that these discrepancies warrant further investigation to ensure the reliability of the reported results.

Uploaded by

jdhffszlagsxyv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Allergic Rhinitis RCT Data Analysis

Abstract
The article Efficacy of Bifidobacterium longum and Lactobacillus plantarum (NVP-1703) in Children With
Allergic Rhinitis: A Randomized Controlled Trial (https://doi.org/10.3346/jkms.2024.39.e266) appeared in the
Journal of Korean Medical Science on the 21st of October 2024. It can be found online here:
https://jkms.org/pdf/10.3346/jkms.2024.39.e266
Upon reading it, we noticed some inconsistencies with the numerical data presented in Table 2.

Figure 1: Table 2 from the paper

This prompted further investigation which is the topic of this report. While we accuse no one of fraud, the
elements found are beyond simple chance and we found no discussion of these issues in the paper. Since
these inconsistencies harm the credibility of the paper's conclusion, we think this prompts further
investigation.
All details of the analysis have been compiled in a single R script that is given in section A1 for verification.
This report was written on the 7th of November 2024 by Cédric Picard. This analysis is unaffiliated with any
private or public research establishment or effort and we report no conflict of interest.

1
Issue 1: invalid changes from baseline
Table 2 provides changes from baseline. These are the main statistics on which stands the relevancy of the
paper's results. In all cases the change from baseline is the difference between the baseline mean and the
studied mean.
However this turns out not to be always the case in the paper :

• Control TNSS week 4 AM: 5.59-1.02 = 4.57 ≠ 4.64


• Control TNSS week 4 PM: 5.49-0.88 = 4.61 ≠ 4.70
• Control TNSS week 4 AMPM: 11.08-1.9 = 9.18 ≠ 9.34
• Control NSDS week 4 AM: 5.24-0.6 = 4.64 ≠ 4.72
• Control NSDS week 4 PM: 5.36-0.65 = 4.71 ≠ 4.80
• Control NSDS week 4 AMPM: 10.59-1.25 = 9.34 ≠ 9.52
This impacts 21.4% of the results. It is worth noticing that all of week 4 for the control group and only that is
impacted. Week 4 also happens to be the week with the majority of significant results in this table.

Issue 2: invalid sums of means


Table 2 provides the results of TNSS and NSDS evaluations for morning, evening and overall day. We can
expect the sum of morning and evening to make up the entire day. This means that normally the sum of the
mean over morning and the one over evening should be equal to the mean over the entire day.
This turns out not to be the case in 4 instances (33.3%):

• Control TNSS week 2: 4.92 + 4.72 = 9.64 ≠ 9.63


• Control NSDS week 0: 5.24 + 5.36 = 10.60 ≠ 10.59
• Treatment TNSS week 4: 3.65 + 3.41 = 7.06 ≠ 7.07
• Treatment NSDS week 4: 3.79 + 3.63 = 7.42 ≠ 7.41
The sums are not far from the expected ones, but nevertheless incorrect.

2
Issue 3: impossible means
The TNSS presented in Table 2 is the sum of 4 evaluations ranked from 0 to 3. These are integers, and
therefore the TNSS should be an integer as well. The same is true for the NSDS.
This matters because not all values are possible for means of integers. This insight is the key behind the
GRIM test (https://doi.org/10.7287/peerj.preprints.2064v1).
The test is best explained by hand with an example. Let's say we have a mean of 5.49, computed over 32
TNSS scores. Since the scores are integers, the sum must be an integer as well. What was it?
5.49×32=175.68 isn't an integer, but it's expected with rounding. The sum must have been either 175 or 176.

• If it is 175, then 175/32 = 5.468… ~ 5.47 when rounding to 2 decimals


• If it is 176, then 176/32 = 5.5 exactly
Neither is equal to 5.49 and there is no reason to round them up or down to 5.49 from 5.47 or 5.5. There is no
integer sum that, divided by 32, gives a mean of 5.49: this value is impossible.
Of course it can be the result of many different common mistakes: bad range of values in an excel sheet,
mistakes in copy, unreported change in sample size… These issues are common and do not represent fraud.
In this case it is worth noting that 72.5% of all reported means in table 2 suffer from this issue. Almost 3
reported results in 4 is impossible. Here they are in red in the table:

Figure 2: In red, impossible means

3
Issue 4: improbable standard deviations
In issue 2 we discussed the fact that there is a strong link between the means of morning, afternoon and
overall day measurements. For the same reason the variances of these measurements should also be
correlated although this correlation is much less direct.
We decided to study that point by testing the hypothesis that the reported standard deviation for the overall
day does not fit a distribution that is a combination of the two reported distributions for the morning and
evening of the same day. The null hypothesis is that the overall day standard deviation is coherent with a
distribution resulting from the combination of the two reported distributions for morning and evening of the
same day.
The idea we developed to test this is, for each day, to sample randomly from the morning and evening
distributions, combine them into an overall day dataset and compute the standard deviation of this combined
sample. By doing this several thousand times we can identify the distribution of probability corresponding to
the standard deviation. We then computed the p-value for the reported standard deviation over the computed
distribution of standard deviations. Finally we adjust them to be two-tailed p-values and for multiple
comparisons using the Benjamini-Hochberg procedure.
Here are the results (some variation is expected from random sampling):

Figure 3: SD checks

Test Expected SD Reported SD Raw p-value Adjusted p-value


control TNSS week 0 2.85 ± 0.36 3.83 0.00334 0.0112
control TNSS week 2 2.89 ± 0.37 3.83 0.00560 0.0112
control TNSS week 4 3.58 ± 0.46 5.02 0.0008 0.0112
control NSDS week 0 3.00 ± 0.38 4.14 0.00136 0.0112
control NSDS week 2 3.21 ± 0.41 4.31 0.00362 0.0112
control NSDS week 4 3.61 ± 0.47 5.06 0.00091 0.0112
treatment TNSS week 0 3.11 ± 0.37 4.25 0.00105 0.0112
treatment TNSS week 2 2.46 ± 0.30 3.28 0.00273 0.0112
treatment TNSS week 4 2.45 ± 0.30 3.32 0.00174 0.0112
treatment NSDS week 0 3.03 ± 0.36 4.20 0.00067 0.0112
treatment NSDS week 2 2.97 ± 0.35 4.12 0.00059 0.0112
treatment NSDS week 4 2.97 ± 0.35 4.09 0.00079 0.0112

As we see, all adjusted p-values are below 0.02. We consider all these differences significant. All reported
standard deviations are therefore improbable.

4
Conclusion
Here is a cartography of all inconsistencies found:

Figure 4: cartography of inconsistencies

• In pink: Issue 1, invalid changes from baseline


• In blue: Issue 2, invalid sums of means
• In red: Issue 3, impossible means
• In orange: Issue 4, improbable standard deviations
No analysis of the standard deviations of the changes from baseline was attempted.
As we can see, few results are free of inconsistencies. We are not claiming that this is due to fraudulent
manipulations, but there are too many issues for it to be due to chance alone and no explanation can be
found in the report. Regardless of the source of these inconsistencies, the credibility of the results suffer and,
in our opinion, warrant further investigation to ascertain the solidity of the paper's conclusions.

References
• The GRIM test: A simple technique detects numerous anomalies in the reporting of results in
psychology. https://doi.org/10.7287/peerj.preprints.2064v1

5
A1: R script for analysis
This script reproduces the analysis performed. It requires the CSV file in section A2 that contains all
numerical data from Table 2.
Example output (some variability is to be expected due to random sampling):
GRIM test failures: 72.5 %
5.49 11.08 4.92 9.63 4.64 4.7 5.13 5.24 5.36 4.85 4.63 9.48 4.8 9.52 5.55
11.13 4.27 3.88 8.15 3.65 3.41 7.07 5.6 11.32 4.51 4.18 3.79 3.63 7.41

Expected: 2.850149 ± 0.3647353 Reported: 3.83 p: 0.003610565


Invalid mean sum: control TNSS w2 4.92 + 4.72 = 9.64 ≠ 9.63
Expected: 2.890757 ± 0.3680843 Reported: 3.83 p: 0.005359863
Expected: 3.575875 ± 0.4514869 Reported: 5.02 p: 0.0006904883
Invalid mean sum: control NSDS w0 5.24 + 5.36 = 10.6 ≠ 10.59
Expected: 2.991917 ± 0.3819479 Reported: 4.14 p: 0.001324138
Expected: 3.20899 ± 0.4209109 Reported: 4.31 p: 0.004451211
Expected: 3.608333 ± 0.4578368 Reported: 5.06 p: 0.0007603385
Expected: 3.103841 ± 0.3707028 Reported: 4.25 p: 0.0009945563
Expected: 2.45525 ± 0.2947731 Reported: 3.28 p: 0.002571685
Invalid mean sum: treatment TNSS w4 3.65 + 3.41 = 7.06 ≠ 7.07
Expected: 2.450287 ± 0.2962692 Reported: 3.32 p: 0.001664775
Expected: 3.031759 ± 0.3640678 Reported: 4.2 p: 0.0006663193
Expected: 2.967232 ± 0.3530322 Reported: 4.12 p: 0.0005466737
Invalid mean sum: treatment NSDS w4 3.79 + 3.63 = 7.42 ≠ 7.41
Expected: 2.962367 ± 0.3548726 Reported: 4.09 p: 0.0007425678
Percentage of invalid mean sum 33.3 %

SD check: Adjusted p_values: 0.0107 0.0107 0.0107 0.0107 0.0107 0.0107 0.0107 0.0107 0.0107 0.0107 0.0107 0.0107
All SD checks significantly failed (p<0.05)

Invalid change: control TNSS w4 AM 5.59-1.02 = 4.57 ≠ 4.64


Invalid change: control TNSS w4 PM 5.49-0.88 = 4.61 ≠ 4.7
Invalid change: control TNSS w4 AMPM 11.08-1.9 = 9.18 ≠ 9.34
Invalid change: control NSDS w4 AM 5.24-0.6 = 4.64 ≠ 4.72
Invalid change: control NSDS w4 PM 5.36-0.65 = 4.71 ≠ 4.8
Invalid change: control NSDS w4 AMPM 10.59-1.25 = 9.34 ≠ 9.52
Percentage of invalid mean sum 21.4 %

data_analysis.r

data = read.table(file="jkms_data.csv",
header=TRUE, sep=",", stringsAsFactor=TRUE)

grim_test <- function(n, m, decimals) {


# Rounding to limit floating point comparison issues
round(round(n * m)/n, decimals) == m
}

means_grim_fail <- c()


for (group in unique(data$group)) {
n <- data[data$group == group & data$week == 0, ]$n[1]
for (m in na.omit(data$mean[data$group == group])) {
if (grim_test(n, m, 2) == FALSE) {
means_grim_fail <- c(means_grim_fail, m)
}
}
}

cat("GRIM test failures:",


round(length(means_grim_fail) / length(na.omit(data$mean)) * 100, 1), "%\n")
cat(means_grim_fail, "\n")
cat("\n")

# Check variance of sum by sampling from normal distributions

6
check_sd <- function(n, mean_1, sd_1, mean_2, sd_2, reported_sd, round=FALSE) {
total <- c()
for (i in 1:10000) {
data_1 <- rnorm(n, mean_1, sd_1)
data_2 <- rnorm(n, mean_2, sd_2)
if (round) {
data_1 <- round(data_1)
data_2 <- round(data_2)
}
total <- c(total, sd(data_1 + data_2))
}

# Two-tail test, switching tail for coherent p-values


p <- pnorm(reported_sd, mean(total), sd(total),
lower.tail=reported_sd <= mean(total))

cat("Expected:", mean(total), "±", sd(total),


"Reported:", reported_sd, "p:", p, "\n")

p
}

# Check variance in all cases where we have AM, PM and AM+PM data (weeks 0 to 4)
p_values <- c()
invalid_mean_sum_count <- 0
for (group in unique(data$group)) {
n <- data[data$group == group, ]$n[1]

for (test in unique(data$test)) {


for (week in c(0, 2, 4)) {

m1 <- data$mean[ data$group == group


& data$test == test
& data$week == week
& data$period == "AM"]

sd1 <- data$sd[ data$group == group


& data$test == test
& data$week == week
& data$period == "AM"]

m2 <- data$mean[ data$group == group


& data$test == test
& data$week == week
& data$period == "PM"]

sd2 <- data$sd[ data$group == group


& data$test == test
& data$week == week
& data$period == "PM"]

me <- data$mean[ data$group == group


& data$test == test
& data$week == week
& data$period == "AMPM"]

7
sde <- data$sd[ data$group == group
& data$test == test
& data$week == week
& data$period == "AMPM"]

if (round(m1 + m2, 2) != me) {


invalid_mean_sum_count <- invalid_mean_sum_count + 1
cat(sprintf("Invalid mean sum: %s %s w%s\t%s + %s = %s ≠ %s\n",
group, test, week, m1, m2, m1+m2, me))
}

# Rounded because original paper had integer values


p <- check_sd(n, m1, sd1, m2, sd2, sde, round=TRUE)
p_values <- c(p_values, p)
}
}
}

if (invalid_mean_sum_count > 0) {
number_mean_sum <- length(na.omit(data$mean[data$period == "AMPM"]))
cat("Percentage of invalid mean sum",
round(invalid_mean_sum_count * 100 / number_mean_sum, 1),
"%\n")
}
cat("\n")

# Doubled for two-tail p-value


p_values <- (p_values)
# Adjust for multiple measurements
adjusted_p_values <-p.adjust(2*p_values, method="hochberg")
cat("SD check: Adjusted p_values:", round(adjusted_p_values, 4), "\n")
if (all(adjusted_p_values < 0.05)) {
cat("All SD checks significantly failed (p<0.05)\n")
} else if (any(adjusted_p_values < 0.05)) {
cat("Some SD checks significantly failed (p<0.05)\n")
}
cat("\n")

# Check baseline change consistency


invalid_change_count <- 0
for (group in unique(data$group)) {
n <- data[data$group == group, ]$n[1]

for (test in unique(data$test)) {


for (week in c(2, 4, 6)) {

for (period in unique(data$period)) {


# No data for week 6 aside from PM
if (week == 6 & period != "PM") {
next
}

baseline <- data$mean[ data$group == group


& data$test == test
& data$week == 0

8
& data$period == period]

reported_mean <- data$mean[ data$group == group


& data$test == test
& data$week == week
& data$period == period]

reported_change <- data$mean_change[ data$group == group


& data$test == test
& data$week == week
& data$period == period]

diff <- round(abs(baseline + reported_change - reported_mean),2)

# Allow for rounding errors


if (diff > 0.01) {
invalid_change_count <- invalid_change_count + 1
cat(sprintf("Invalid change: %s %s w%s %s\t%s%s = %s ≠ %s\n",
group, test, week, period,
baseline, reported_change,
baseline+reported_change, reported_mean))
}
}
}
}
}

if (invalid_change_count > 0) {
number_mean_sum <- length(na.omit(data$mean_change))
cat("Percentage of invalid mean sum",
round(invalid_change_count * 100 / number_mean_sum, 1),
"%\n")
}
cat("\n")

9
A2: CSV file of table 2 data
File jkms_data.csv mentioned in the script A1.

group,n,test,week,period,mean,sd,mean_change,sd_change
control,32,TNSS,0,AM,5.59,2.07,,
control,32,TNSS,0,PM,5.49,1.95,,
control,32,TNSS,0,AMPM,11.08,3.83,,
control,32,TNSS,2,AM,4.92,1.91,-0.67,2.01
control,32,TNSS,2,PM,4.72,2.16,-0.78,1.49
control,32,TNSS,2,AMPM,9.63,3.83,-1.45,3.11
control,32,TNSS,4,AM,4.64,2.52,-1.02,2.62
control,32,TNSS,4,PM,4.70,2.55,-0.88,2.22
control,32,TNSS,4,AMPM,9.34,5.02,-1.90,4.64
control,32,TNSS,6,AM,,,,
control,32,TNSS,6,PM,5.13,3.05,-0.37,2.82
control,32,TNSS,6,AMPM,,,,
control,32,NSDS,0,AM,5.24,2.13,,
control,32,NSDS,0,PM,5.36,2.10,,
control,32,NSDS,0,AMPM,10.59,4.14,,
control,32,NSDS,2,AM,4.85,2.14,-0.39,1.67
control,32,NSDS,2,PM,4.63,2.39,-0.73,1.49
control,32,NSDS,2,AMPM,9.48,4.31,-1.12,2.87
control,32,NSDS,4,AM,4.72,2.53,-0.60,2.29
control,32,NSDS,4,PM,4.80,2.58,-0.65,2.18
control,32,NSDS,4,AMPM,9.52,5.06,-1.25,4.35
control,32,NSDS,6,AM,,,,
control,32,NSDS,6,PM,5.03,2.96,-0.33,2.75
control,32,NSDS,6,AMPM,,,,
treatment,36,TNSS,0,AM,5.55,2.11,,
treatment,36,TNSS,0,PM,5.58,2.27,,
treatment,36,TNSS,0,AMPM,11.13,4.25,,
treatment,36,TNSS,2,AM,4.27,1.74,-1.28,1.87
treatment,36,TNSS,2,PM,3.88,1.71,-1.70,1.72
treatment,36,TNSS,2,AMPM,8.15,3.28,-2.99,3.46
treatment,36,TNSS,4,AM,3.65,1.69,-1.90,2.07
treatment,36,TNSS,4,PM,3.41,1.76,-2.17,2.14
treatment,36,TNSS,4,AMPM,7.07,3.32,-4.07,4.05
treatment,36,TNSS,6,AM,,,,
treatment,36,TNSS,6,PM,4.25,2.42,-1.33,2.78
treatment,36,TNSS,6,AMPM,,,,
treatment,36,NSDS,0,AM,5.72,2.07,,
treatment,36,NSDS,0,PM,5.60,2.21,,
treatment,36,NSDS,0,AMPM,11.32,4.20,,
treatment,36,NSDS,2,AM,4.51,2.15,-1.21,1.65
treatment,36,NSDS,2,PM,4.18,2.04,-1.42,1.49
treatment,36,NSDS,2,AMPM,8.69,4.12,-2.63,3.02
treatment,36,NSDS,4,AM,3.79,1.97,-1.94,1.97
treatment,36,NSDS,4,PM,3.63,2.21,-1.97,2.05
treatment,36,NSDS,4,AMPM,7.41,4.09,-3.91,3.91
treatment,36,NSDS,6,AM,,,,
treatment,36,NSDS,6,PM,4.17,2.17,-1.43,2.10
treatment,36,NSDS,6,AMPM,,,,

10

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy