Rev Peru Med Exp Salud Publica. 2023;40(2):220-8.
https://doi.org/10.17843/rpmesp.2023.402.12217.
220
Cite as: Saaibi Meléndez M, Bote-
ro-Rodríguez F, Rincón Rodríguez CJ.
Samples in randomized clinical trials
with interim analysis. Rev Peru Med
Exp Salud Publica. 2023;40(2):220-8.
doi: 10.17843/rpmesp.2023.402.12217.
_________________________________
Correspondence:
Michelle Saaibi Meléndez;
msaaibi@javeriana.edu.co
_________________________________
Received:11/10/2022
Approved: 26/04/2023
Online: 30/06/2023
is work is licensed under a
Creative Commons Attribution 4.0
International
SPECIAL SECTION
SAMPLES IN RANDOMIZED CLINICAL TRIALS
WITH INTERIM ANALYSIS
Michelle Saaibi Meléndez
1,2,a
, Felipe Botero-Rodríguez
1,2,a
,
Carlos Javier Rincón Rodríguez
1,2,b
Semillero de Bioestadística, School of medicine, Pontificia Universidad Javeriana, Bogotá, Colombia.
Department of Clinical Epidemiology and Biostatistics, Pontificia Universidad Javeriana, Bogotá, Colombia.
a
Physician;
b
statistician, master in Clinical Epidemiology.
ABSTRACT
is article introduces randomized clinical trials and basic concepts of statistical inference. We present
methods for calculating the sample size by outcome type and the hypothesis to be tested, together with the
code in the R programming language. We describe four methods for adjusting the original sample size for
interim analyses. We sought to introduce these topics in a simple and concrete way, considering the ma-
thematical expressions that support the results and their implementation in available statistical programs;
therefore, bringing health students closer to statistics and the use of statistical programs, which are aspects
that are rarely considered during their training.
Keywords: Sample Size; Clinical Trials; Hypothesis Tests (source: MeSH NLM).
INTRODUCTION
e approach to medicine has shied from an initial paternalistic view to pragmatic reductio-
nism. is change occurred because of the drive to improve the quality of care, decrease indi-
vidual economic incentives and prioritize the importance of research to improve the quality
of evidence
(1)
. Evidence-based medicine emerged as a new paradigm in the 1990s as scientic
support for clinical decision-making and is based on a hierarchy of three statements: a) rando-
mized clinical trials (RCTs) or systematic reviews of many experiments usually provide more
evidence than observational studies; b) analytical clinical studies provide better evidence than
pathophysiological rationale alone; and c) analytical clinical studies provide more evidence
than expert judgment
(2)
.
Obtaining valid results from RCTs depends on the quality of the data, which must be su-
cient to address the research question. To obtain these quality data, the sample size must be
large enough to obtain an accurate estimate of the eect of the intervention. Random errors
will not aect the interpretability of the results as long as the sample is large enough; however,
a systematic error can invalidate a study
(3)
.
e interim analysis consists of setting an observation point(s), so the behavior of the
sample can be assessed up until that point. Depending on the results, the committee may de-
termine if the study is relevant enough to continue or not
(3)
. is article seeks to provide an
introduction to the calculation of sample size by type of outcome and hypothesis. We also aim
to provide information on its adjustment by interim analysis, considering the mathematical
Saaibi Meléndez et al.Rev Peru Med Exp Salud Publica. 2023;40(2):220-8.
221
https://doi.org/10.17843/rpmesp.2023.402.12217.
formulas and their implementation in available statistical
programs such as the R programming language. e objec-
tive is to bring health personnel closer to statistics and the
use of programs, aspects that are little considered in their
training. Although there are already several sources that de-
velop the above topics, there are not many documents that
merge both theory and practice, including all the aspects
mentioned above regarding RCTs. Reviewing articles allows
young researchers and health professionals to make a rst
approach to these topics, without generating an initial rejec-
tion due to their complexity. Connecting mathematical ex-
pressions with their implementation in a statistical program
seeks to avoid that, once again, young researchers execute
pre-established functions such as TwoSampleMean.Equality
or TwoSampleMean.NIS (included in the “TrialSize” packa-
ge
(4)
) without understanding where the results come from,
the eect of the parameters on the sample size or the need
to choose parameters with values consistent with the type
of hypothesis that is being evaluated. is seeks to promote
understanding of the mechanical execution of tasks only to
meet the requirements of an evaluation committee.
Randomized clinical trials
e equipoise principle corresponds to a state of uncertainty
regarding the therapeutic results of a treatment, which jus-
ties an RCT
(5)
. RCTs with a control group are prospective
studies that compare the outcomes of an intervention(s) with
the best available alternative. In these studies, patient safety
should always be a priority, so the possible benets, harms
and treatment alternatives for the patient’s condition should
be explained. Although it may have limitations, it is conside-
red the best alternative for evaluating the ecacy or safety of
an intervention
(6,7)
. It is characterized by: a) an intervention
that is compared with a control group that can be placebo
or the usual treatment, b) randomized assignment of the in-
terventions in the population to reduce possible confusion
bias by obtaining homogeneous groups and the possibility of
selection bias by avoiding foreseeing the group to which the
patient is assigned c) the blinding of the treatment groups
can be performed both for researchers, patients or analysts,
which minimizes possible information biases
(6,7)
.
e RCTs are divided into four phases. Phase I seeks to
determine possible toxic eects, absorption, distribution
and metabolism of the drug in a group of 20-80 healthy
people. Phase II is conducted in a diseased population to
determine the safety and ecacy of the drug, based on bio-
logical markers and evaluating adverse reactions. Phase III is
performed when there is evidence on the safety and ecacy
of the intervention and additional information is sought on
the safety and eectiveness of the drug in a larger number
of participants. e intervention is compared with the usual
therapy or placebo in a long-term follow-up in order to iden-
tify possible side eects. In phase IV, aer the molecule has
been approved for marketing, it is compared with other exis-
ting products in the general population; pharmacovigilance
is also carried out in order to look for adverse events not
identied in phase III due to their low incidence or long pe-
riods of occurrence
(3,6)
.
In this article we will focus on phase III and IV studies,
which require a sample size calculation. Additionally, we will
work on parallel RCTs characterized by a simultaneous fo-
llow-up of each group to which they were assigned
(3)
.
Inferential statistics
Inferential statistics allows estimating the behavior of the
entire population from the results obtained in a sample.
is behavior is summarized in measures such as means,
proportions or variances, which, if obtained for the whole
population, would be called parameters
(7)
. ere are two
alternatives: condence intervals and hypothesis tests; with
consistent results, the rst seeks a range of values that, with
a degree of condence, contains the parameter of interest,
while the second evaluates a statement about the parameter
of interest, making the decision to reject it or not.
Since this paper presents the sample size calculation in
parallel RCTs to evaluate parameter statements, we will de-
scribe the process of hypothesis testing. Initially, two hypoth-
eses are proposed, the null hypothesis (H
0
) which is a state-
ment about the parameter, and the alternative hypothesis
(H
a
) which is its negation; almost always the alternative hy-
pothesis
(8)
, which is related to the research question is sought
to be tested; at the end a decision is made to reject or not
the H
0
. Taking into account that this decision depends on the
results obtained only from a sample, there is the probability
of committing two errors, type I error or signicance level
(α) that occurs when rejecting H
0
when it is true, and type II
error (β), occurs when not rejecting H
0
when it is false
(9)
. e
opposite of type I error is the condence level (1-α) and cor-
responds to the probability of not rejecting H
0
when it is true,
and the opposite of type II error (1-β), which is the power, is
the probability of rejecting H
0
when it is false
(9)
. When per-
forming a hypothesis test, the probability of committing the
type I and II error is low, which implies that the condence
level and power have a high probability (typically: α=0.05 and
e sample in RCT with interim analysis
Rev Peru Med Exp Salud Publica. 2023;40(2):220-8.
222
https://doi.org/10.17843/rpmesp.2023.402.12217.
β=0.1 or 0.2). In order to guarantee these values, it is neces-
sary to calculate the sample size.
In order to make the decision to reject H
0
, an operation
is carried with the values from the sample (test statistic) and
contrasted with the behavior that should occur if H
0
were
true. If the value found by the test statistic is unlikely, this
is evidence that H
0
is false and is rejected in favor of H
a
,
otherwise it is considered that there is not enough evidence
to reject H
0
. e probability reecting the evidence “for” or
againstH
0
is called the p-value
(10,11)
, and is equal to the
probability, assuming the null hypothesis is true, of obtai-
ning a value of the test statistic “...as extreme or more (in
the appropriate direction of H
a
) than the value actually cal-
culated
(11,12)
; nally, H
0
is rejected under the assumption of
a value of p<α. Statistical signicance, commonly evaluated
by means of the p-value, does not account for clinical signi-
cance; we speak of statistical signicance when the premi-
se of a value of p<α is fullled, while clinical signicance is
dened by those results that improve the physical, mental
and social functionality of the patient, which can lead to an
improvement in the quality or quantity of life, depending on
the context
(14)
.
Types of hypotheses
ere can be dierent research questions in an RCT that relate
to four dierent ways of stating the H
0
. e sample size calcu-
lation depends on the type of hypothesis to be tested; therefo-
re, Table 1 presents their denitions along with an example.
Sample size
Generally, it is not possible to study the entire population,
therefore, a specic sample size (n) is required to repre-
sent its behavior. As the sample size increases, the results
approach that of the population, so that from a specic size,
the results will not present large changes, making it unne-
cessary to continue collecting participants
(15)
. Recruiting
more subjects than necessary increases both the complexity
of the logistical operation and the costs, and poses an ethical
dilemma by unnecessarily assigning subjects to a treatment
that has not proven its benet. On the other hand, dening a
very small sample size implies a high risk of the type II error
mentioned above. e calculation of the sample size makes
it possible to determine whether a study is feasible based on
a priori assumptions, given the power, signicance and bac-
kground of previous studies addressing the same research
question, taking into account the ethical considerations of
subjecting people to an experiment
(13,16)
.
In addition, when conducting an RCT, the possibility arises
of observing the results obtained as the sample is collected.
is is called “interim analyses, which should be planned
from the beginning of the investigation during the prepara-
tion of the protocol. ese additional analyses increase the
possibility of type I and II errors, and for this reason, the
sample size must be adjusted to maintain a level of conden-
ce and overall power throughout the RCT. e above reects
the importance of the sample size calculation, therefore, this
article presents how to calculate the sample size for RCTs,
showing the expression from which it is obtained, and its
application using the R programming language
(17)
. Additio-
nally, we present how to perform the adjustment for interim
analysis together with an example.
MATERIALS AND METHODS
Based on the review of the book “Sample size calculations in
clinical research” by Chow et al.
(13)
, this article presents how
to calculate the sample size for a parallel RCT, by: 1) type of
outcome (dichotomous, continuous) and 2) type of hypothe-
sis to be evaluated (equality, non-inferiority, superiority and
equivalence). e corresponding mathematical expressions
and the code to create a function in the R
(17)
and RStudio
(18)
programs are included. For the use of this code, the reader is
required to have a basic knowledge of the use of these pro-
grams, where the function must be copied and executed; the
function can then be used including the required parameters
described in the results section. For each scenario, an exam-
ple with ctitious data is included, specic considerations
related to the function parameters are mentioned as well.
e methods of Pocock, O’Brien and Fleming, Wang and
Tsiatis and Inner Wedge to adjust the original sample size
obtained from the functions created previously in order to
perform the interim analysis are described below. e ad-
justment consists of multiplying the original sample size by
the coecients included in Annexes 1 to 4 depending on the
method used, and considering the number of planned eva-
luations (R), the power and signicance level dened for the
study. In addition, the expression of the test statistic used
for each evaluation by type of outcome is included, based
on the information of the participants entering the study. In
summary, we present the following for each method: 1) the
critical values that correspond to the values of the standard
normal distribution that determine the rejection zone for
evaluating the null hypothesis at each point in time, and 2)
the coecients for adjusting the sample size calculation.
Saaibi Meléndez et al.Rev Peru Med Exp Salud Publica. 2023;40(2):220-8.
223
https://doi.org/10.17843/rpmesp.2023.402.12217.
Type Denition
(24)
Hypothesis
(25,26)
Example
Hypothesis Interpretation
Equality
Evaluates whether there
are dierences between
the treatment and control
groups.
H
0
: ere is no dierence
between the two therapies.
Pressure over the
estimated sternal
projection of the aortic
valve at the sternum is
not associated with a
change in hemodynamic
parameters in the
hypotensive patient.
Patients who underwent a pressure
of 6 mm depth over the estimated
sternal projection of the aortic valve
on the sternum, maintained for 90
seconds, showed a homogeneous
decrease of blood pressure and heart
rate parameters
(27)
.
H
a
: ere is a dierence
between the two therapies.
Pressure over the
estimated sternal
projection of the aortic
valve on the sternum
is associated with a
change in hemodynamic
parameters in the
hypotensive patient.
Non-inferiority
It evaluates whether the
eect of a new treatment
(whose eect is lower
than the conventional
treatment, but greater
than the placebo) is within
an accepted range and is
established on the basis of
the best available evidence.
is dierence is justied
by side eects or feasibility.
H
0
: e eect of the new
intervention is less than or
equal to the placebo.
e new antimicrobial
has the same eectiveness
as the placebo.
e new antimicrobial, although better
tolerated than conventional therapy, is
less eective clinically and statistically,
so it cannot be recommended as rst
line
(28)
.
H
a
: e eect of the new
intervention is greater than
the placebo.
e new antimicrobial is
more eective than the
placebo.
Superiority
Seeks to evaluate whether
a new intervention
generates better clinical
outcomes than a well-
established therapy or
placebo.
H
0
: e new intervention
is not superior to the
established therapy.
Volunteering does not
reduce social isolation
or impact better mental
health outcomes.
Volunteering did not prove to be
superior compared to the control
group regarding mental health
outcomes or isolation
(29)
.
H
a
: New intervention is
superior to established
therapy.
Volunteering reduces
social isolation and
impacts better mental
health outcomes.
Equivalence
It seeks to evaluate
whether the eect of the
treatment is identical to
that of another therapy.
H
0
: erapies are not
equivalent.
e inclusion of
metformin, associated
with oral contraceptives
in the treatment of
polycystic ovary
syndrome, is not as
eective as monotherapy
with oral contraceptives
alone.
e ultrasound remission time was
shorter, there were less symptoms
and the recurrence rate at 3 months
was lower with the combined therapy,
which shows greater eectiveness
compared to the study group that
received monotherapy
(30)
.
H
a
: e therapies are
equivalent.
Oral contraceptive
monotherapy is
as eective as oral
contraceptive therapy
plus metformin for the
treatment of polycystic
ovary syndrome.
Table 1. Types of hypotheses in randomized clinical trials.
H
0
: null hypothesis, H
a
: alternative hypothesis
e sample in RCT with interim analysis
Rev Peru Med Exp Salud Publica. 2023;40(2):220-8.
224
https://doi.org/10.17843/rpmesp.2023.402.12217.
Sample size calculation for a dichotomous
outcome
As an example, we assume that two treatments are to be
compared and the outcome of interest is the proportion of
deaths. For all expressions below, we denote p
T
and p
C
as the
proportion of deceased in the treatment and control group,
respectively; ϵ is the expected dierence between these two
proportions (ϵ=p
T
-p
C
), δ is the margin of tolerance or supe-
riority dened by the researchers, and k is the ratio between
the sample size of the treatment group and the control group
(k=n
T
/n
C
), i.e., n
T
=kn
C
. Finally, we denote α and β as the type
I and II error, respectively; and z
(q)
as the q percentile of the
standard normal distribution function. In Table 2, we pres-
ent the expressions to obtain n
C
, and, in the inset, the code
in the R programming language that creates a function for
its implementation, along with an example where α=0.05, β
=0.2 and k=1.
In all four hypotheses, the smaller the expected dier-
ence (ϵ) and the closer the proportions are to 0.5, the larger
the sample size. When testing a non-inferiority hypothesis,
if the higher the proportion of the event the greater the ef-
fectiveness, then δ<0; if the lower the proportion of the
event the greater the eectiveness, then δ>0. When testing
a superiority hypothesis, if the higher the proportion of the
event the greater the eectiveness, then δ>0; if the lower the
proportion of the outcome the greater the eectiveness, then
δ<0. When testing an equivalence hypothesis, always δ>0.
Continuous outcome sample size calculation
As an example, we assume that two treatments are to be
compared, and the outcome is systolic blood pressure in
mmHg (SBP). For all the expressions presented below, we
denote μ
T
and μ
C
as the mean SBP in the treatment and con-
trol group, respectively; ϵ is the expected dierence between
the two means ϵ
T-
μ
C
and is the standard deviation of the
two samples together. δ, k, α, β and z
(q)
represent the same
values as in the previous section. In Table 3, we present the
expressions to obtain the code in the R programming lan-
guage for implementation with an example where α=0.05,
β=0.2 and k=1.
In all four hypotheses, higher s and lower ϵ require larg-
er sample sizes. While testing a hypothesis of noninferiority,
if the higher μ the greater the eectiveness, then δ<0; if the
lower μ the greater the eectiveness, then δ>0. When testing a
superiority hypothesis, if the higher μ the greater the eective-
ness, then δ>0; if the lower μ the greater the eectiveness, then
δ<0. When testing an equivalence hypothesis, always δ>0.
RESULTS
Interim Analysis
In an RCT, the study hypothesis can be tested sequentially as
the sample is collected, giving the possibility of interrupting
the collection if a clear benet of the intervention is identi-
ed early. Depending on the number of evaluations (R) that
are programmed, it is necessary to adjust the initial sample
size to maintain the overall signicance level of the study,
and to establish the critical values on the distribution of the
test statistic to reject or accept the null hypothesis in each
evaluation. R evaluations are performed as subjects accu-
mulate, and the test statistic z
r
(r=1,2,...,R) for a dichotomous
outcome is equal to:
n
R
z
r
=
√n
r
(
T,r
)-
C,r
)
T,r
(1-p̂
T,r
) + p̂
C,r
(1-p̂
C,r
)
where n
r
, p̂
T,r
and p̂
C,r
are the sample size per intervention
group and the estimated proportions of the outcome at the
r time of assessment of the treatment group and the control
group, respectively.
For a continuous outcome, the test statistic is equal to:
z
r
=
1
2 2
+
n
r
(
Tr Cr
)
n
r
n
r
j=1 j=1
x
Tj
x
cj
(
(
where n
r
, and are the sample size in each intervention
group, and the estimated variances at the time of the r_th
evaluation of the treatment group and the control group, res-
pectively. x
Tj
and x
Cj
are the observed values of the outcome
in each subject collected until time r.
We present four methods that allow the adjustment of
the sample size depending on the number of programmed
evaluations, the signicance level and the power established
in a hypothesis of equality. First, the Pocock method, in
which the sample size is adjusted by multiplying the sam-
ple size initially obtained from the expressions presented in
the previous section, by the coecients included in Annex
1, depending on the number of evaluations and the signi-
cance and power levels established. Now, if |z
r
|>CP
(r,α),
the H
0
is rejected and data collection is suspended, otherwise, the
collection continues. e critical values CP
(r,α)
are presented
in Annex 1 for dened R and α. e second method is that
of O’Brien and Fleming and the coecients to perform the
initial sample size adjustment are presented in Annex 2. In
this approach H
0
is rejected at each evaluation if |z
r
|>COF
(r,α)
2
s
Tr
̂
2
Cr
Saaibi Meléndez et al.Rev Peru Med Exp Salud Publica. 2023;40(2):220-8.
225
https://doi.org/10.17843/rpmesp.2023.402.12217.
Table 2. Types of hypotheses with dichotomous outcome and their code in the R programming language.
Types of hypotheses Code in R programming language
Equality hypothesis
en, if it is expected that the proportion of deaths in the treatment
group and in the control group are equal to p
T
= 0.15 and p
C
= 0.2,
n
C
=n
T
=903.
n.2prop.igual<-function(alpha,beta,k,pT,pC){
nC<-(qnorm(1-alpha/2)+qnorm(1-beta))^2/(pT-pC)^2*(pT*(1-pT)/
k+pC*(1-pC))
nT<-k*nC
Grupo<-c("Tratamiento=","Control=")
n<-ceiling(c(nT,nC))
n<-data.frame(Grupo,n)
print(n)
}
Non-inferiority hypothesis
en, if the expected proportion of deaths in the treatment group
and in the control group are p
T
=0.2 and p
C
=0.22 and an increase in
mortality is tolerated from δ=0.03, n
C
=n
T
=821.
n.2prop.noinf<-function(alpha,beta,k,pT,pC,delta){
nC<-(qnorm(1-alpha)+qnorm(1-beta))^2/((pT-pC)-delta)^2*(pT*(1-
pT)/k+pC*(1-pC))
nT<-k*nC
Grupo<-c("Tratamiento=","Control=")
n<-ceiling(c(nT,nC))
n<-data.frame(Grupo,n)
print(n)
}
Superiority hypothesis
en, if it is expected that the proportion of deaths in the treatment
group and in the control group are p
T
=0.18 and p
C
=0.25, and it
is considered superior if it reduces mortality by at least δ=- 0.01,
n
C
=n
T
=576.
n.2prop.sup<-function(alpha,beta,k,pT,pC,delta){
nC<-(qnorm(1-alpha)+qnorm(1-beta))^2/((pT-pC)-delta)^2*(pT*(1-
pT)/k+pC*(1-pC))
nT<-k*nC
Grupo<-c("Tratamiento=","Control=")
n<-ceiling(c(nT,nC))
n<-data.frame(Grupo,n)
print(n)
}
Equivalence hypothesis
en, if it is expected that the proportion of deaths in the treatment
group and in the control group are p
T
=0.22 and p
C
=0.18 and they
are dened as equivalent if they do not dier by more than |δ|=0.1,
n
C
=n
T
=760.
n.2prop.equi<-function(alpha,beta,k,pT,pC,delta){
nC<-(qnorm(1-alpha)+qnorm(1-beta/2))^2/(delta-abs(pT-
pC))^2*(pT*(1-pT)/k+pC*(1-pC))
nT<-k*nC
Grupo<-c("Tratamiento=","Control=")
n<-ceiling(c(nT,nC))
n<-data.frame(Grupo,n)
print(n)
}
n
c
=
(1-β)
2
z
(
+ z
ϵ
2
α
2
1-
(
(
+ p
C
(1-p
C
)
k
p
T
(1-p
T
)
n
c
=
(ϵ-δ)
2
+ p
C
(1-p
C
)
k
p
T
(1-p
T
)
(z
(1-α)
+ z
(1-β)
)
2
n
c
=
(ϵ-δ)
2
+ p
C
(1-p
C
)
k
p
T
(1-p
T
)
(z
(1-α)
+ z
(1-β)
)
2
n
c
=
(δ -|ϵ|)
2
+ p
C
(1-p
C
)
K
p
T
(1-p
T
)
(z
(1-α)
+ z
(1-β/2)
)
2
√(R/r), otherwise it continues. e critical values COF
(r,α)
are
presented in Annex 2 according to the number of evaluations
and signicance level. e third method is that of Wang and
Tsiatis, which includes a new parameter  ; coecients for
sample size adjustment for α=0.05 are included in Appendix
3. In this method H
0
is rejected if otherwise
it continues. e critical CWT
(r,α,Δ)
values are presented in
Annex 3 for α=0.05. e methods of Pocock and O’Brien
and Fleming are particular cases of the method of Wang and
Tsiatis when =0.5 and =0, respectively, therefore, the crit-
ical values for these values of  are obtained from Annexes
1 and 2.
Finally, we present the Inner Wedge method; in this me-
thod unlike the previous three, two critical values are pro-
posed: if |zr| ≥b
r
rejects H
0
and collection is suspended, the
conclusion is that a signicant treatment eect was found,
on the other side, if |zr| <a
r
does not reject H
0
and collection
is suspended the conclusion is that no dierences between
treatment and control are going to be found, otherwise, co-
llection continues. e critical values a
r
and b
r
are equal to:
|z
r
|>CWT
r
R
(r,α,Δ)
Δ-0,5
)
)
a
r
=[Cw1
(r,α,β,Δ)
+ Cw2
(r,α,β,Δ)
] r
-Cw2
(r,α,β,Δ)
R R
(Δ-0,5)
r
(
(
, if
a
r
< 0 a
r
=0
and
R
(Δ-0,5)
r
(
(
b
r
=Cw1
(r,α,β,Δ)
e sample in RCT with interim analysis
Rev Peru Med Exp Salud Publica. 2023;40(2):220-8.
226
https://doi.org/10.17843/rpmesp.2023.402.12217.
Type of hypothesis Code in R programming language
Equality hypothesis
en, if the mean μ
T
=150 and μ
C
=160 and s=28, n
C
=n
T
=124.
n.2mu.igual<
-function(alpha,beta,k,muT,muC,s){
nC<-(qnorm(1-alpha/2)+qnorm(1-
beta))^2*s^2*(1+1/k)/(muT-muC)^2
nT<-k*nC
Grupo<-c("Tratamiento =","Control =")
n<-ceiling(c(nT,nC))
n<-data.frame(Grupo,n)
print(n)
}
Non-inferiority hypothesis
en, if μ
T
=155, μ
C
=160 and s=28, and is dened to be non-inferior if the
maximum increases by δ=5, n
C
=n
T
=97.
n.2mu.noinf<-
function(alpha,beta,k,muT,muC,s,delta){
nC<-(qnorm(1-alpha)+qnorm(1-
beta))^2*s^2*(1+1/k)/((muT-muC)-delta)^2
nT<-k*nC
Grupo<-c("Tratamiento =","Control =")
n<-ceiling(c(nT,nC))
n<-data.frame(Grupo,n)
print(n)
}
Superiority hypothesis
en, if μ
T
=145, μ
C
=160 and s=28, and is considered superior if it at least
decreases by δ=-10, n
C
=n
T
= 388.
n.2mu.sup<-
function(alpha,beta,k,muT,muC,s,delta){
nC<-(qnorm(1-alpha)+qnorm(1-
beta))^2*s^2*(1+1/k)/((muT-muC)-delta)^2
nT<-k*nC
Grupo<-c("Tratamiento =","Control =")
n<-ceiling(c(nT,nC))
n<-data.frame(Grupo,n)
print(n)
}
Equivalence hypothesis
en, if μ
T
=150, μ
C
=160 and s=28, and they are dened as equivalent if they do
not dier by more than de |δ|=5, n
C
=n
T
=538.
n.2mu.equi<-function(alpha,beta,k,muT,muC,s,delta){
nC<-(qnorm(1-alpha)+qnorm(1-beta/2))^2*s^2*(1+1/k)/(delta-abs(muT-
muC))^2
nT<-k*nC
Grupo<-c("Tratamiento =","Control =")
n<-ceiling(c(nT,nC))
n<-data.frame(Grupo,n)
print(n)
}
Table 3. Types of hypotheses by continuous outcome and code in R programming language..
n
C
=(z
(1-α/2)
+ z
(1-β)
)
2
s
2
(1+1/k)
ϵ
2
(z
(1-α)
+z
(1-β)
)
2
s
2
(1+1/k)
n
C
=
(
ϵ
-δ)
2
n
C
=(z
(1-α)
+z
(1-β)
)
2
s
2
(1+1/k)
(ϵ-δ)
2
n
C
=(z
(1-α)
+ z
(1-β/2)
)
2
s
2
(1+1/k)
-|ϵ|)
2
Annex 4 presents the values of Cw1 and Cw2 for an
α=0.05 and a power of 0.8 and 0.9, and the columns coef.t
include the coecients, by which the original sample must
be multiplied to perform the R evaluations.
As an example, consider that you want to compare drug
A vs. placebo and the outcome is the proportion of deaths at
the end of follow-up. In a hypothesis of equality, assuming
α=0.05, β=0.1, p
T
=0.1 and p
C
=0.2 (i.e. ϵ=0.1), for two groups
of the same size, k=1, the required sample size in each group
is 263 subjects. If we plan to perform R=5 evaluations, the
adjusted sample size by Pococks method is 263 x 1.207=318
for each group and the critical value in each evaluation is
CP
0.05,r
=2.413. e adjusted sample size by the method of
O’Brien and Fleming is 263 x 1,026=270 for each group and
the critical values for each evaluation are COF
(r,0,05)
=4.562;
3.226; 2.634; 2.281 and 2.040. e adjusted sample size with
the method of Wang and Tsiatis for each group, with =0.25,
would be 263 x 1.066=281, and the critical values at each
evaluation would be CWT
(r;0.05;0.25)
=3.194; 2.686; 2.427; 2.259
and 2.136. Finally, the adjusted sample size with the Inner
Wedge method for each group would be, with =0.25, 263
x 1.199=316, and the critical values for each evaluation are
a
r
=0;0.388; 1.072; 1.613 and 2.073 and b
r
=3.1; 2.607; 2.355;
2.192 and 2.073.
DISCUSSION
In this article we present an approach to sample size adjust-
ment by interim analysis in parallel RCTs, starting from the
calculation of the original sample size for subsequent adjust-
ment by one of the four methods described. is paper is
aimed at students and young researchers, mainly from the
health area, who will nd in this article an initial context on
RCTs and a review of the main concepts of statistical inferen-
ce from hypothesis testing. We seek, in a simple and concre-
te way, to provide an introduction to this topic, integrating
the dierent aspects such as the mathematical expressions
that support the results and their implementation in availa-
Saaibi Meléndez et al.Rev Peru Med Exp Salud Publica. 2023;40(2):220-8.
227
https://doi.org/10.17843/rpmesp.2023.402.12217.
ble statistical programs. Although there are other resources
available for the calculation of sample size such as Internet
pages
(19)
or packages in the R programming language
(4)
,
mainly in languages other than Spanish, we found that pro-
viding the possibility of using statistical programs that allow
students to apply the theory gives a greater understanding
of these topics, as opposed to following a sequence of steps
in a mechanical way, oen without understanding what is
generated by the dierent programs or resources available.
is brings students of health areas closer to statistics and
the use of statistical programs, an aspect oen considered
not important during their training.
is article allows the reader to plan an RCT in para-
llel by dening the sample size and allowing the results to
be monitored during the course of the study. At this point,
we recommend reviewing additional methods that provide
more exibility, for example, planning the intermediate eva-
luations on specic dates and not when a xed number of
participants in both groups are completed, which is the main
restriction for the four methods presented in this article. e
method proposed by Lan and DeMets
(20)
and R program-
ming language packages such as gsDesign
(21)
would be inte-
resting material to further explore these issues.
Finally, the decision to discontinue the execution of an RCT,
either because of great benets, potential harms or if it is very
unlikely to obtain benets (futility), should be taken by a group of
people independent of the researchers, made up of experts in the
clinical area under study, in methodological aspects such as epi-
demiologists or biostatisticians, and in ethical aspects
(3,22,23)
. All
the necessary aspects should be considered in this decision and
not only the result of the evaluation of a statistical test. Planning
an interim analysis by adjusting the sample size when the study
is being designed will allow supporting, to a greater extent, the
value of this criterion during the decision-making process.
Authorship contributions. All authors declare that they meet the
authorship criteria recommended by the ICMJE.
Roles according to CRediT. MSM: Conceptualization. Methodology.
Investigation. Writing – original dra. Writing – review and editing.
Visualization. FBR: Methodology. Investigation. Writing – original
dra. Visualization. CJR: Conceptualization. Methodology. Soware.
Investigation. Writing – original dra. Writing – review and editing.
Funding. Self-funding.
Conicts of interest. e authors declare that they have no conicts
of interest.
Supplementary material. Available in the digital version of the
RPMESP.
REFERENCES
1. F Berwick DM. Era 3 for Medicine and Health Care. Obstet Gynecol
Surv. 2016;315(13):1329–30. doi:10.1001/jama.2016.1509.
2. Gordon G, Drummond R, Maureen OM, Deborah JC. Users’ Guides
to the Medical Literature, 3rd ed [Internet]. McGraw Hill; 2008 [cited
2022 Oct 11]. Available from: https://jamaevidence.mhmedical.com/
content.aspx?bookid=847&sectionid=69030714.
3. Friedman LM, Furberg CD, DeMets DL, Reboussin DM, Granger CB.
Fundamentals of clinical trials. In: Fundamentals of Clinical Trials
[Internet]. Spinger; 2015 [cited 2022 Oct 11]. Available from: https://
link.springer.com/book/10.1007/978-3-319-18539-2.
4. Zhang E, Wu VQ, Chow S-C, Zhang HG. TrialSize: R Functions for
Chapter 3,4,6,7,9,10,11,12,14,15 of Sample Size Calculation in Clinical
Research_. R package version 1.4 [Internet]. 2020 [cited 2022 Oct 11].
Available from: https://cran.r-project.org/package=TrialSize.
5. Miller FG, Joffe S. Equipoise and the Dilemma of Randomized
Clinical Trials. N Engl J Med. 2011;364(5):476–80. doi: 10.1056/
nejmsb1011301.
6. Lazcano-Ponce E, Salazar-Martínez E, Gutiérrez-Castrellón P,
Angeles-Llerenas A, Hernández-Garduño A, Viramontes JL. En-
sayos clínicos aleatorizados: variantes, métodos de aleatorización,
análisis, consideraciones éticas y regulación. Salud Publica Mex.
2004;46(6):559–84.
7. Diana MF. Conceptos básicos sobre bioestadística descriptiva y bioes-
tadística inferencial. Rev Argentina Anestesiol. 2006;64(6):241–51.
8. Flight L, Julious SA. Practical guide to sample size calculations: An
introduction. Pharm Stat. 2016;15(1):68–74. doi: 10.1002/pst.1709.
9. Cohen HW. P values: Use and misuse in medical literature. Am J
Hypertens. 2011;24(1):18–23. doi: 10.1038/ajh.2010.205.
10. Prel J-B du, Hommel G, Röhrig B, Blettner M. Condence interval
or P value. Dtsch Arztebl. 2009;106(19):335–9. doi: 10.3238/arzte-
bl.2009.0335.
11. Goodman S. A Dirty Dozen: Twelve P-Value Misconceptions. Semin
Hematol. 2008;45(3):135–40. doi: 10.1053/j.seminhematol.2008.04.003.
12. Wayne DW. Bioestadística. Base para el análisis de las ciencias de la
salud. 4ta ed. LIMUSA WILEY. 2006.
13. Chow S-C, Shao J, Wang H, Lokhnygina Y. Sample size calculations
in clinical research. 2nd ed. Chapman & Hall; 2008.
14. Armijo-Olivo S, Warren S, Fuentes J, Magee DJ. Clinical relevance vs.
statistical signicance: Using neck outcomes in patients with tempo-
romandibular disorders as an example. Man er. 2011;16:563–72.
doi: 10.1016/j.math.2011.05.006.
15. Chan A, Tetzla JM, Altman DG, Laupacis A, Gøtzsche PC, Krleža-je-
ric K, et al. Declaración SPIRIT 2013: denición de los elementos
estándares del protocolo de un ensayo clínico. Rev Panam salud
pública. 2015;38(6):506–14.
16. Gowda GS, Komal S, Sanjay TN, Mishra S, Kumar CN, Math SB.
Sociodemographic, legal, and clinical proles of female forensic
inpatients in Karnataka: A retrospective study. Indian J Psychol Med.
2019;41(2):138–43. doi: 10.4103/IJPSYM.IJPSYM_152_18.
17. R Core Team. R: A language and environment for statistical computing
[Internet]. R Foundation for Statistical Computing, Vienna, Austria;
2022 [cited 2022 Oct 11]. Available from: https://www.r-project.org/.
18. RStudioTeam. RStudio: Integrated Development Environment for
R [Internet]. RStudio, PBC, Boston, MA; 2022 [cited 2022 Oct 11].
Available from: http://www.rstudio.com/.
19. Sample Size Calculator [Internet]. Cleveland Clinic; 2022 [cited 2022
Oct 11]. Available from: https://riskcalc.org/samplesize/.
20. Demets DL, Lan KKG. Interim analysis: the alpha spending approach.
Stat Med. 1994;13:1341–52. doi: 10.1002/sim.4780131308.
21. Anderson K. gsDesign: Group Sequential Design [Internet]. R pac-
kage version 3.4.0; 2022 [cited 2022 Oct 11]. Available from: https://
CRAN.R-project.org/package=gsDesign.
e sample in RCT with interim analysis
Rev Peru Med Exp Salud Publica. 2023;40(2):220-8.
228
https://doi.org/10.17843/rpmesp.2023.402.12217.
22. Ellenberg S, Fleming T, DeMets D. Data Monitoring Committees in
Clinical Trials: A Practical Perspective. Wiley; 2003.
23. Fisher M, Roecker E, DeMets D. e role of an independent statistical
analysis center in the industry-modied National Institutes of Health
model. Drug Inf J. 2001;(35):115–29. doi: 10.1177/009286150103500113.
24. Kumbhare D, Alavinia M, Furlan J. Hypothesis Testing in Superiori-
ty, Noninferiority, and Equivalence Clinical Trials. Am J Phys Med
Rehabil. 2019;98(3):226–30. doi: 10.1097/PHM.0000000000001023.
25. Services H. Non-Inferiority Clinical Trials to Establish Eectiveness
Guidance for Industry [Internet]. FDA; 2016 [cited 2022 Oct 11].
Available from: https://www.fda.gov/media/78504/download.
26. Flight L, Julious SA. Practical guide to sample size calculations: an
introduction. Pharm Stat. 2016;15(1):68–74. doi: 10.1002/pst.1709.
27. Benito MM, Marín RC. Cambios en la presión arterial y frecuencia
cardíaca después de una presión sobre la válvula aórtica en sujetos
con hipertensión arterial esencial. Osteopat Cient. 2008;3(3):100–7.
doi: 10.1016/S1886-9297(08)75758-8.
28. Chadwick D. Safety and ecacy of vigabitrin and carbamazepine in
newly diagnosed epilepsy: a multicentre randomised double-blind
study. Lancet. 1999;354:13–9. doi: 10.1016/s0140-6736(98)10531-7.
29. Priebe S, Chevalier A, Hamborg T, Golden E, King M, Pistrang N.
Eectiveness of a volunteer befriending programme for patients with
schizophrenia: Randomised controlled trial. Br J Psychiatry. 2019;1–7.
doi: 10.1192/bjp.2019.42.
30. Bascope EL, Ortiz YM, Llanos GRL, Lizbeth MHA, Lazo L. Metformi
-
na en el tratamiento del síndrome de ovarios poliquísticos. Un ensayo
clínico aleatorizado. Rev Cient Cienc Med. 2017;20(2):45–52.