SAMPLES IN RANDOMIZED CLINICAL TRIALS WITH INTERIM

Rev Peru Med Exp Salud Publica. 2023;40(2):220-8.

https://doi.org/10.17843/rpmesp.2023.402.12217.

220

Cite as: Saaibi Meléndez M, Bote-

ro-Rodríguez F, Rincón Rodríguez CJ.

Samples in randomized clinical trials

with interim analysis. Rev Peru Med

Exp Salud Publica. 2023;40(2):220-8.

doi: 10.17843/rpmesp.2023.402.12217.

_________________________________

Correspondence:

Michelle Saaibi Meléndez;

msaaibi@javeriana.edu.co

_________________________________

Received:11/10/2022

Approved: 26/04/2023

Online: 30/06/2023

is work is licensed under a

Creative Commons Attribution 4.0

International

SPECIAL SECTION

SAMPLES IN RANDOMIZED CLINICAL TRIALS

WITH INTERIM ANALYSIS

Michelle Saaibi Meléndez

1,2,a

, Felipe Botero-Rodríguez

1,2,a

Carlos Javier Rincón Rodríguez

1,2,b



Semillero de Bioestadística, School of medicine, Pontificia Universidad Javeriana, Bogotá, Colombia.



Department of Clinical Epidemiology and Biostatistics, Pontificia Universidad Javeriana, Bogotá, Colombia.

Physician;

statistician, master in Clinical Epidemiology.

ABSTRACT

is article introduces randomized clinical trials and basic concepts of statistical inference. We present

methods for calculating the sample size by outcome type and the hypothesis to be tested, together with the

code in the R programming language. We describe four methods for adjusting the original sample size for

interim analyses. We sought to introduce these topics in a simple and concrete way, considering the ma-

thematical expressions that support the results and their implementation in available statistical programs;

therefore, bringing health students closer to statistics and the use of statistical programs, which are aspects

that are rarely considered during their training.

Keywords: Sample Size; Clinical Trials; Hypothesis Tests (source: MeSH NLM).

INTRODUCTION

e approach to medicine has shied from an initial paternalistic view to pragmatic reductio-

nism. is change occurred because of the drive to improve the quality of care, decrease indi-

vidual economic incentives and prioritize the importance of research to improve the quality

of evidence

(1)

. Evidence-based medicine emerged as a new paradigm in the 1990s as scientic

support for clinical decision-making and is based on a hierarchy of three statements: a) rando-

mized clinical trials (RCTs) or systematic reviews of many experiments usually provide more

evidence than observational studies; b) analytical clinical studies provide better evidence than

pathophysiological rationale alone; and c) analytical clinical studies provide more evidence

than expert judgment

(2)

Obtaining valid results from RCTs depends on the quality of the data, which must be su-

cient to address the research question. To obtain these quality data, the sample size must be

large enough to obtain an accurate estimate of the eect of the intervention. Random errors

will not aect the interpretability of the results as long as the sample is large enough; however,

a systematic error can invalidate a study

(3)

e interim analysis consists of setting an observation point(s), so the behavior of the

sample can be assessed up until that point. Depending on the results, the committee may de-

termine if the study is relevant enough to continue or not

(3)

. is article seeks to provide an

introduction to the calculation of sample size by type of outcome and hypothesis. We also aim

to provide information on its adjustment by interim analysis, considering the mathematical

Saaibi Meléndez et al.Rev Peru Med Exp Salud Publica. 2023;40(2):220-8.

221

https://doi.org/10.17843/rpmesp.2023.402.12217.

formulas and their implementation in available statistical

programs such as the R programming language. e objec-

tive is to bring health personnel closer to statistics and the

use of programs, aspects that are little considered in their

training. Although there are already several sources that de-

velop the above topics, there are not many documents that

merge both theory and practice, including all the aspects

mentioned above regarding RCTs. Reviewing articles allows

young researchers and health professionals to make a rst

approach to these topics, without generating an initial rejec-

tion due to their complexity. Connecting mathematical ex-

pressions with their implementation in a statistical program

seeks to avoid that, once again, young researchers execute

pre-established functions such as TwoSampleMean.Equality

or TwoSampleMean.NIS (included in the “TrialSize” packa-

(4)

) without understanding where the results come from,

the eect of the parameters on the sample size or the need

to choose parameters with values consistent with the type

of hypothesis that is being evaluated. is seeks to promote

understanding of the mechanical execution of tasks only to

meet the requirements of an evaluation committee.

Randomized clinical trials

e equipoise principle corresponds to a state of uncertainty

regarding the therapeutic results of a treatment, which jus-

ties an RCT

(5)

. RCTs with a control group are prospective

studies that compare the outcomes of an intervention(s) with

the best available alternative. In these studies, patient safety

should always be a priority, so the possible benets, harms

and treatment alternatives for the patient’s condition should

be explained. Although it may have limitations, it is conside-

red the best alternative for evaluating the ecacy or safety of

an intervention

(6,7)

. It is characterized by: a) an intervention

that is compared with a control group that can be placebo

or the usual treatment, b) randomized assignment of the in-

terventions in the population to reduce possible confusion

bias by obtaining homogeneous groups and the possibility of

selection bias by avoiding foreseeing the group to which the

patient is assigned c) the blinding of the treatment groups

can be performed both for researchers, patients or analysts,

which minimizes possible information biases

(6,7)

e RCTs are divided into four phases. Phase I seeks to

determine possible toxic eects, absorption, distribution

and metabolism of the drug in a group of 20-80 healthy

people. Phase II is conducted in a diseased population to

determine the safety and ecacy of the drug, based on bio-

logical markers and evaluating adverse reactions. Phase III is

performed when there is evidence on the safety and ecacy

of the intervention and additional information is sought on

the safety and eectiveness of the drug in a larger number

of participants. e intervention is compared with the usual

therapy or placebo in a long-term follow-up in order to iden-

tify possible side eects. In phase IV, aer the molecule has

been approved for marketing, it is compared with other exis-

ting products in the general population; pharmacovigilance

is also carried out in order to look for adverse events not

identied in phase III due to their low incidence or long pe-

riods of occurrence

(3,6)

In this article we will focus on phase III and IV studies,

which require a sample size calculation. Additionally, we will

work on parallel RCTs characterized by a simultaneous fo-

llow-up of each group to which they were assigned

(3)

Inferential statistics

Inferential statistics allows estimating the behavior of the

entire population from the results obtained in a sample.

is behavior is summarized in measures such as means,

proportions or variances, which, if obtained for the whole

population, would be called parameters

(7)

. ere are two

alternatives: condence intervals and hypothesis tests; with

consistent results, the rst seeks a range of values that, with

a degree of condence, contains the parameter of interest,

while the second evaluates a statement about the parameter

of interest, making the decision to reject it or not.

Since this paper presents the sample size calculation in

parallel RCTs to evaluate parameter statements, we will de-

scribe the process of hypothesis testing. Initially, two hypoth-

eses are proposed, the null hypothesis (H

) which is a state-

ment about the parameter, and the alternative hypothesis

) which is its negation; almost always the alternative hy-

pothesis

(8)

, which is related to the research question is sought

to be tested; at the end a decision is made to reject or not

the H

. Taking into account that this decision depends on the

results obtained only from a sample, there is the probability

of committing two errors, type I error or signicance level

(α) that occurs when rejecting H

when it is true, and type II

error (β), occurs when not rejecting H

when it is false

(9)

. e

opposite of type I error is the condence level (1-α) and cor-

responds to the probability of not rejecting H

when it is true,

and the opposite of type II error (1-β), which is the power, is

the probability of rejecting H

when it is false

(9)

. When per-

forming a hypothesis test, the probability of committing the

type I and II error is low, which implies that the condence

level and power have a high probability (typically: α=0.05 and

e sample in RCT with interim analysis

Rev Peru Med Exp Salud Publica. 2023;40(2):220-8.

222

https://doi.org/10.17843/rpmesp.2023.402.12217.

β=0.1 or 0.2). In order to guarantee these values, it is neces-

sary to calculate the sample size.

In order to make the decision to reject H

, an operation

is carried with the values from the sample (test statistic) and

contrasted with the behavior that should occur if H

were

true. If the value found by the test statistic is unlikely, this

is evidence that H

is false and is rejected in favor of H

otherwise it is considered that there is not enough evidence

to reject H

. e probability reecting the evidence “for” or

“against” H

is called the p-value

(10,11)

, and is equal to the

probability, assuming the null hypothesis is true, of obtai-

ning a value of the test statistic “...as extreme or more (in

the appropriate direction of H

) than the value actually cal-

culated”

(11,12)

; nally, H

is rejected under the assumption of

a value of p<α. Statistical signicance, commonly evaluated

by means of the p-value, does not account for clinical signi-

cance; we speak of statistical signicance when the premi-

se of a value of p<α is fullled, while clinical signicance is

dened by those results that improve the physical, mental

and social functionality of the patient, which can lead to an

improvement in the quality or quantity of life, depending on

the context

(14)

Types of hypotheses

ere can be dierent research questions in an RCT that relate

to four dierent ways of stating the H

. e sample size calcu-

lation depends on the type of hypothesis to be tested; therefo-

re, Table 1 presents their denitions along with an example.

Sample size

Generally, it is not possible to study the entire population,

therefore, a specic sample size (n) is required to repre-

sent its behavior. As the sample size increases, the results

approach that of the population, so that from a specic size,

the results will not present large changes, making it unne-

cessary to continue collecting participants

(15)

. Recruiting

more subjects than necessary increases both the complexity

of the logistical operation and the costs, and poses an ethical

dilemma by unnecessarily assigning subjects to a treatment

that has not proven its benet. On the other hand, dening a

very small sample size implies a high risk of the type II error

mentioned above. e calculation of the sample size makes

it possible to determine whether a study is feasible based on

a priori assumptions, given the power, signicance and bac-

kground of previous studies addressing the same research

question, taking into account the ethical considerations of

subjecting people to an experiment

(13,16)

In addition, when conducting an RCT, the possibility arises

of observing the results obtained as the sample is collected.

is is called “interim analyses”, which should be planned

from the beginning of the investigation during the prepara-

tion of the protocol. ese additional analyses increase the

possibility of type I and II errors, and for this reason, the

sample size must be adjusted to maintain a level of conden-

ce and overall power throughout the RCT. e above reects

the importance of the sample size calculation, therefore, this

article presents how to calculate the sample size for RCTs,

showing the expression from which it is obtained, and its

application using the R programming language

(17)

. Additio-

nally, we present how to perform the adjustment for interim

analysis together with an example.

MATERIALS AND METHODS

Based on the review of the book “Sample size calculations in

clinical research” by Chow et al.

(13)

, this article presents how

to calculate the sample size for a parallel RCT, by: 1) type of

outcome (dichotomous, continuous) and 2) type of hypothe-

sis to be evaluated (equality, non-inferiority, superiority and

equivalence). e corresponding mathematical expressions

and the code to create a function in the R

(17)

and RStudio

(18)

programs are included. For the use of this code, the reader is

required to have a basic knowledge of the use of these pro-

grams, where the function must be copied and executed; the

function can then be used including the required parameters

described in the results section. For each scenario, an exam-

ple with ctitious data is included, specic considerations

related to the function parameters are mentioned as well.

e methods of Pocock, O’Brien and Fleming, Wang and

Tsiatis and Inner Wedge to adjust the original sample size

obtained from the functions created previously in order to

perform the interim analysis are described below. e ad-

justment consists of multiplying the original sample size by

the coecients included in Annexes 1 to 4 depending on the

method used, and considering the number of planned eva-

luations (R), the power and signicance level dened for the

study. In addition, the expression of the test statistic used

for each evaluation by type of outcome is included, based

on the information of the participants entering the study. In

summary, we present the following for each method: 1) the

critical values that correspond to the values of the standard

normal distribution that determine the rejection zone for

evaluating the null hypothesis at each point in time, and 2)

the coecients for adjusting the sample size calculation.

Saaibi Meléndez et al.Rev Peru Med Exp Salud Publica. 2023;40(2):220-8.

223

https://doi.org/10.17843/rpmesp.2023.402.12217.

Type Denition

(24)

Hypothesis

(25,26)

Example

Hypothesis Interpretation

Equality

Evaluates whether there

are dierences between

the treatment and control

groups.

: ere is no dierence

between the two therapies.

Pressure over the

estimated sternal

projection of the aortic

valve at the sternum is

not associated with a

change in hemodynamic

parameters in the

hypotensive patient.

Patients who underwent a pressure

of 6 mm depth over the estimated

sternal projection of the aortic valve

on the sternum, maintained for 90

seconds, showed a homogeneous

decrease of blood pressure and heart

rate parameters

(27)

: ere is a dierence

between the two therapies.

Pressure over the

estimated sternal

projection of the aortic

valve on the sternum

is associated with a

change in hemodynamic

parameters in the

hypotensive patient.

Non-inferiority

It evaluates whether the

eect of a new treatment

(whose eect is lower

than the conventional

treatment, but greater

than the placebo) is within

an accepted range and is

established on the basis of

the best available evidence.

is dierence is justied

by side eects or feasibility.

: e eect of the new

intervention is less than or

equal to the placebo.

e new antimicrobial

has the same eectiveness

as the placebo.

e new antimicrobial, although better

tolerated than conventional therapy, is

less eective clinically and statistically,

so it cannot be recommended as rst

line

(28)

: e eect of the new

intervention is greater than

the placebo.

e new antimicrobial is

more eective than the

placebo.

Superiority

Seeks to evaluate whether

a new intervention

generates better clinical

outcomes than a well-

established therapy or

placebo.

: e new intervention

is not superior to the

established therapy.

Volunteering does not

reduce social isolation

or impact better mental

health outcomes.

Volunteering did not prove to be

superior compared to the control

group regarding mental health

outcomes or isolation

(29)

: New intervention is

superior to established

therapy.

Volunteering reduces

social isolation and

impacts better mental

health outcomes.

Equivalence

It seeks to evaluate

whether the eect of the

treatment is identical to

that of another therapy.

: erapies are not

equivalent.

e inclusion of

metformin, associated

with oral contraceptives

in the treatment of

polycystic ovary

syndrome, is not as

eective as monotherapy

with oral contraceptives

alone.

e ultrasound remission time was

shorter, there were less symptoms

and the recurrence rate at 3 months

was lower with the combined therapy,

which shows greater eectiveness

compared to the study group that

received monotherapy

(30)

: e therapies are

equivalent.

Oral contraceptive

monotherapy is

as eective as oral

contraceptive therapy

plus metformin for the

treatment of polycystic

ovary syndrome.

Table 1. Types of hypotheses in randomized clinical trials.

: null hypothesis, H

: alternative hypothesis

e sample in RCT with interim analysis

Rev Peru Med Exp Salud Publica. 2023;40(2):220-8.

224

https://doi.org/10.17843/rpmesp.2023.402.12217.

Sample size calculation for a dichotomous

outcome

As an example, we assume that two treatments are to be

compared and the outcome of interest is the proportion of

deaths. For all expressions below, we denote p

and p

as the

proportion of deceased in the treatment and control group,

respectively; ϵ is the expected dierence between these two

proportions (ϵ=p

-p

), δ is the margin of tolerance or supe-

riority dened by the researchers, and k is the ratio between

the sample size of the treatment group and the control group

(k=n

), i.e., n

=kn

. Finally, we denote α and β as the type

I and II error, respectively; and z

(q)

as the q percentile of the

standard normal distribution function. In Table 2, we pres-

ent the expressions to obtain n

, and, in the inset, the code

in the R programming language that creates a function for

its implementation, along with an example where α=0.05, β

=0.2 and k=1.

In all four hypotheses, the smaller the expected dier-

ence (ϵ) and the closer the proportions are to 0.5, the larger

the sample size. When testing a non-inferiority hypothesis,

if the higher the proportion of the event the greater the ef-

fectiveness, then δ<0; if the lower the proportion of the

event the greater the eectiveness, then δ>0. When testing

a superiority hypothesis, if the higher the proportion of the

event the greater the eectiveness, then δ>0; if the lower the

proportion of the outcome the greater the eectiveness, then

δ<0. When testing an equivalence hypothesis, always δ>0.

Continuous outcome sample size calculation

As an example, we assume that two treatments are to be

compared, and the outcome is systolic blood pressure in

mmHg (SBP). For all the expressions presented below, we

denote μ

and μ

as the mean SBP in the treatment and con-

trol group, respectively; ϵ is the expected dierence between

the two means ϵ=μ

T-

and is the standard deviation of the

two samples together. δ, k, α, β and z

(q)

represent the same

values as in the previous section. In Table 3, we present the

expressions to obtain the code in the R programming lan-

guage for implementation with an example where α=0.05,

β=0.2 and k=1.

In all four hypotheses, higher s and lower ϵ require larg-

er sample sizes. While testing a hypothesis of noninferiority,

if the higher μ the greater the eectiveness, then δ<0; if the

lower μ the greater the eectiveness, then δ>0. When testing a

superiority hypothesis, if the higher μ the greater the eective-

ness, then δ>0; if the lower μ the greater the eectiveness, then

δ<0. When testing an equivalence hypothesis, always δ>0.

RESULTS

Interim Analysis

In an RCT, the study hypothesis can be tested sequentially as

the sample is collected, giving the possibility of interrupting

the collection if a clear benet of the intervention is identi-

ed early. Depending on the number of evaluations (R) that

are programmed, it is necessary to adjust the initial sample

size to maintain the overall signicance level of the study,

and to establish the critical values on the distribution of the

test statistic to reject or accept the null hypothesis in each

evaluation. R evaluations are performed as subjects accu-

mulate, and the test statistic z

(r=1,2,...,R) for a dichotomous

outcome is equal to:

√n

(

p̂

T,r

p̂

C,r

)

p̂

T,r

(1-p̂

T,r

) + p̂

C,r

(1-p̂

C,r

)

where n

, p̂

T,r

and p̂

C,r

are the sample size per intervention

group and the estimated proportions of the outcome at the

r time of assessment of the treatment group and the control

group, respectively.

For a continuous outcome, the test statistic is equal to:

2 2

( 

Tr Cr

)

∑ ∑

j=1 j=1

(

where n

, and are the sample size in each intervention

group, and the estimated variances at the time of the r_th

evaluation of the treatment group and the control group, res-

pectively. x

and x

are the observed values of the outcome

in each subject collected until time r.

We present four methods that allow the adjustment of

the sample size depending on the number of programmed

evaluations, the signicance level and the power established

in a hypothesis of equality. First, the Pocock method, in

which the sample size is adjusted by multiplying the sam-

ple size initially obtained from the expressions presented in

the previous section, by the coecients included in Annex

1, depending on the number of evaluations and the signi-

cance and power levels established. Now, if |z

|>CP

(r,α),

the H

is rejected and data collection is suspended, otherwise, the

collection continues. e critical values CP

(r,α)

are presented

in Annex 1 for dened R and α. e second method is that

of O’Brien and Fleming and the coecients to perform the

initial sample size adjustment are presented in Annex 2. In

this approach H

is rejected at each evaluation if |z

|>COF

(r,α)



Saaibi Meléndez et al.Rev Peru Med Exp Salud Publica. 2023;40(2):220-8.

225

https://doi.org/10.17843/rpmesp.2023.402.12217.

Table 2. Types of hypotheses with dichotomous outcome and their code in the R programming language.

Types of hypotheses Code in R programming language

Equality hypothesis

en, if it is expected that the proportion of deaths in the treatment

group and in the control group are equal to p

= 0.15 and p

= 0.2,

=903.

n.2prop.igual<-function(alpha,beta,k,pT,pC){

nC<-(qnorm(1-alpha/2)+qnorm(1-beta))^2/(pT-pC)^2*(pT*(1-pT)/

k+pC*(1-pC))

nT<-k*nC

Grupo<-c("Tratamiento=","Control=")

n<-ceiling(c(nT,nC))

n<-data.frame(Grupo,n)

print(n)

}

Non-inferiority hypothesis

en, if the expected proportion of deaths in the treatment group

and in the control group are p

=0.2 and p

=0.22 and an increase in

mortality is tolerated from δ=0.03, n

=821.

n.2prop.noinf<-function(alpha,beta,k,pT,pC,delta){

nC<-(qnorm(1-alpha)+qnorm(1-beta))^2/((pT-pC)-delta)^2*(pT*(1-

pT)/k+pC*(1-pC))

nT<-k*nC

Grupo<-c("Tratamiento=","Control=")

n<-ceiling(c(nT,nC))

n<-data.frame(Grupo,n)

print(n)

}

Superiority hypothesis

en, if it is expected that the proportion of deaths in the treatment

group and in the control group are p

=0.18 and p

=0.25, and it

is considered superior if it reduces mortality by at least δ=- 0.01,

=576.

n.2prop.sup<-function(alpha,beta,k,pT,pC,delta){

nC<-(qnorm(1-alpha)+qnorm(1-beta))^2/((pT-pC)-delta)^2*(pT*(1-

pT)/k+pC*(1-pC))

nT<-k*nC

Grupo<-c("Tratamiento=","Control=")

n<-ceiling(c(nT,nC))

n<-data.frame(Grupo,n)

print(n)

}

Equivalence hypothesis

en, if it is expected that the proportion of deaths in the treatment

group and in the control group are p

=0.22 and p

=0.18 and they

are dened as equivalent if they do not dier by more than |δ|=0.1,

=760.

n.2prop.equi<-function(alpha,beta,k,pT,pC,delta){

nC<-(qnorm(1-alpha)+qnorm(1-beta/2))^2/(delta-abs(pT-

pC))^2*(pT*(1-pT)/k+pC*(1-pC))

nT<-k*nC

Grupo<-c("Tratamiento=","Control=")

n<-ceiling(c(nT,nC))

n<-data.frame(Grupo,n)

print(n)

}

(1-β)

(



+ z

(

+ p

(1-p

)

(1-p

)

(ϵ-δ)

+ p

(1-p

)

(1-p

)

(1-α)

+ z

(1-β)

)

(ϵ-δ)

+ p

(1-p

)

(1-p

)

(1-α)

+ z

(1-β)

)

(δ -|ϵ|)

+ p

(1-p

)

(1-p

)

(1-α)

+ z

(1-β/2)

)

√(R/r), otherwise it continues. e critical values COF

(r,α)

are

presented in Annex 2 according to the number of evaluations

and signicance level. e third method is that of Wang and

Tsiatis, which includes a new parameter  ; coecients for

sample size adjustment for α=0.05 are included in Appendix

3. In this method H

is rejected if otherwise

it continues. e critical CWT

(r,α,Δ)

values are presented in

Annex 3 for α=0.05. e methods of Pocock and O’Brien

and Fleming are particular cases of the method of Wang and

Tsiatis when =0.5 and =0, respectively, therefore, the crit-

ical values for these values of  are obtained from Annexes

1 and 2.

Finally, we present the Inner Wedge method; in this me-

thod unlike the previous three, two critical values are pro-

posed: if |zr| ≥b

rejects H

and collection is suspended, the

conclusion is that a signicant treatment eect was found,

on the other side, if |zr| <a

does not reject H

and collection

is suspended the conclusion is that no dierences between

treatment and control are going to be found, otherwise, co-

llection continues. e critical values a

and b

are equal to:

|>CWT

(r,α,Δ)

Δ-0,5

)

=[Cw1

(r,α,β,Δ)

+ Cw2

(r,α,β,Δ)

] r

-Cw2

(r,α,β,Δ)

R R

(Δ-0,5)

(

, if

< 0 ⇒ a

and

(Δ-0,5)

(

=Cw1

(r,α,β,Δ)

e sample in RCT with interim analysis

Rev Peru Med Exp Salud Publica. 2023;40(2):220-8.

226

https://doi.org/10.17843/rpmesp.2023.402.12217.

Type of hypothesis Code in R programming language

Equality hypothesis

en, if the mean μ

=150 and μ

=160 and s=28, n

=124.

n.2mu.igual<

-function(alpha,beta,k,muT,muC,s){

nC<-(qnorm(1-alpha/2)+qnorm(1-

beta))^2*s^2*(1+1/k)/(muT-muC)^2

nT<-k*nC

Grupo<-c("Tratamiento =","Control =")

n<-ceiling(c(nT,nC))

n<-data.frame(Grupo,n)

print(n)

}

Non-inferiority hypothesis

en, if μ

=155, μ

=160 and s=28, and is dened to be non-inferior if the

maximum increases by δ=5, n

=97.

n.2mu.noinf<-

function(alpha,beta,k,muT,muC,s,delta){

nC<-(qnorm(1-alpha)+qnorm(1-

beta))^2*s^2*(1+1/k)/((muT-muC)-delta)^2

nT<-k*nC

Grupo<-c("Tratamiento =","Control =")

n<-ceiling(c(nT,nC))

n<-data.frame(Grupo,n)

print(n)

}

Superiority hypothesis

en, if μ

=145, μ

=160 and s=28, and is considered superior if it at least

decreases by δ=-10, n

= 388.

n.2mu.sup<-

function(alpha,beta,k,muT,muC,s,delta){

nC<-(qnorm(1-alpha)+qnorm(1-

beta))^2*s^2*(1+1/k)/((muT-muC)-delta)^2

nT<-k*nC

Grupo<-c("Tratamiento =","Control =")

n<-ceiling(c(nT,nC))

n<-data.frame(Grupo,n)

print(n)

}

Equivalence hypothesis

en, if μ

=150, μ

=160 and s=28, and they are dened as equivalent if they do

not dier by more than de |δ|=5, n

=538.

n.2mu.equi<-function(alpha,beta,k,muT,muC,s,delta){

nC<-(qnorm(1-alpha)+qnorm(1-beta/2))^2*s^2*(1+1/k)/(delta-abs(muT-

muC))^2

nT<-k*nC

Grupo<-c("Tratamiento =","Control =")

n<-ceiling(c(nT,nC))

n<-data.frame(Grupo,n)

print(n)

}

Table 3. Types of hypotheses by continuous outcome and code in R programming language..

=(z

(1-α/2)

+ z

(1-β)

)

(1+1/k)

(1-α)

(1-β)

)

(1+1/k)

(

-δ)

=(z

(1-α)

(1-β)

)

(1+1/k)

(ϵ-δ)

=(z

(1-α)

+ z

(1-β/2)

)

(1+1/k)

(δ-|ϵ|)

Annex 4 presents the values of Cw1 and Cw2 for an

α=0.05 and a power of 0.8 and 0.9, and the columns coef.t

include the coecients, by which the original sample must

be multiplied to perform the R evaluations.

As an example, consider that you want to compare drug

A vs. placebo and the outcome is the proportion of deaths at

the end of follow-up. In a hypothesis of equality, assuming

α=0.05, β=0.1, p

=0.1 and p

=0.2 (i.e. ϵ=0.1), for two groups

of the same size, k=1, the required sample size in each group

is 263 subjects. If we plan to perform R=5 evaluations, the

adjusted sample size by Pocock’s method is 263 x 1.207=318

for each group and the critical value in each evaluation is

0.05,r

=2.413. e adjusted sample size by the method of

O’Brien and Fleming is 263 x 1,026=270 for each group and

the critical values for each evaluation are COF

(r,0,05)

=4.562;

3.226; 2.634; 2.281 and 2.040. e adjusted sample size with

the method of Wang and Tsiatis for each group, with =0.25,

would be 263 x 1.066=281, and the critical values at each

evaluation would be CWT

(r;0.05;0.25)

=3.194; 2.686; 2.427; 2.259

and 2.136. Finally, the adjusted sample size with the Inner

Wedge method for each group would be, with =0.25, 263

x 1.199=316, and the critical values for each evaluation are

=0;0.388; 1.072; 1.613 and 2.073 and b

=3.1; 2.607; 2.355;

2.192 and 2.073.

DISCUSSION

In this article we present an approach to sample size adjust-

ment by interim analysis in parallel RCTs, starting from the

calculation of the original sample size for subsequent adjust-

ment by one of the four methods described. is paper is

aimed at students and young researchers, mainly from the

health area, who will nd in this article an initial context on

RCTs and a review of the main concepts of statistical inferen-

ce from hypothesis testing. We seek, in a simple and concre-

te way, to provide an introduction to this topic, integrating

the dierent aspects such as the mathematical expressions

that support the results and their implementation in availa-

Saaibi Meléndez et al.Rev Peru Med Exp Salud Publica. 2023;40(2):220-8.

227

https://doi.org/10.17843/rpmesp.2023.402.12217.

ble statistical programs. Although there are other resources

available for the calculation of sample size such as Internet

pages

(19)

or packages in the R programming language

(4)

mainly in languages other than Spanish, we found that pro-

viding the possibility of using statistical programs that allow

students to apply the theory gives a greater understanding

of these topics, as opposed to following a sequence of steps

in a mechanical way, oen without understanding what is

generated by the dierent programs or resources available.

is brings students of health areas closer to statistics and

the use of statistical programs, an aspect oen considered

not important during their training.

is article allows the reader to plan an RCT in para-

llel by dening the sample size and allowing the results to

be monitored during the course of the study. At this point,

we recommend reviewing additional methods that provide

more exibility, for example, planning the intermediate eva-

luations on specic dates and not when a xed number of

participants in both groups are completed, which is the main

restriction for the four methods presented in this article. e

method proposed by Lan and DeMets

(20)

and R program-

ming language packages such as gsDesign

(21)

would be inte-

resting material to further explore these issues.

Finally, the decision to discontinue the execution of an RCT,

either because of great benets, potential harms or if it is very

unlikely to obtain benets (futility), should be taken by a group of

people independent of the researchers, made up of experts in the

clinical area under study, in methodological aspects such as epi-

demiologists or biostatisticians, and in ethical aspects

(3,22,23)

. All

the necessary aspects should be considered in this decision and

not only the result of the evaluation of a statistical test. Planning

an interim analysis by adjusting the sample size when the study

is being designed will allow supporting, to a greater extent, the

value of this criterion during the decision-making process.

Authorship contributions. All authors declare that they meet the

authorship criteria recommended by the ICMJE.

Roles according to CRediT. MSM: Conceptualization. Methodology.

Investigation. Writing – original dra. Writing – review and editing.

Visualization. FBR: Methodology. Investigation. Writing – original

dra. Visualization. CJR: Conceptualization. Methodology. Soware.

Investigation. Writing – original dra. Writing – review and editing.

Funding. Self-funding.

Conicts of interest. e authors declare that they have no conicts

of interest.

Supplementary material. Available in the digital version of the

RPMESP.

REFERENCES

1. F Berwick DM. Era 3 for Medicine and Health Care. Obstet Gynecol

Surv. 2016;315(13):1329–30. doi:10.1001/jama.2016.1509.

2. Gordon G, Drummond R, Maureen OM, Deborah JC. Users’ Guides

to the Medical Literature, 3rd ed [Internet]. McGraw Hill; 2008 [cited

2022 Oct 11]. Available from: https://jamaevidence.mhmedical.com/

content.aspx?bookid=847&sectionid=69030714.

3. Friedman LM, Furberg CD, DeMets DL, Reboussin DM, Granger CB.

Fundamentals of clinical trials. In: Fundamentals of Clinical Trials

[Internet]. Spinger; 2015 [cited 2022 Oct 11]. Available from: https://

link.springer.com/book/10.1007/978-3-319-18539-2.

4. Zhang E, Wu VQ, Chow S-C, Zhang HG. TrialSize: R Functions for

Chapter 3,4,6,7,9,10,11,12,14,15 of Sample Size Calculation in Clinical

Research_. R package version 1.4 [Internet]. 2020 [cited 2022 Oct 11].

Available from: https://cran.r-project.org/package=TrialSize.

5. Miller FG, Joffe S. Equipoise and the Dilemma of Randomized

Clinical Trials. N Engl J Med. 2011;364(5):476–80. doi: 10.1056/

nejmsb1011301.

6. Lazcano-Ponce E, Salazar-Martínez E, Gutiérrez-Castrellón P,

Angeles-Llerenas A, Hernández-Garduño A, Viramontes JL. En-

sayos clínicos aleatorizados: variantes, métodos de aleatorización,

análisis, consideraciones éticas y regulación. Salud Publica Mex.

2004;46(6):559–84.

7. Diana MF. Conceptos básicos sobre bioestadística descriptiva y bioes-

tadística inferencial. Rev Argentina Anestesiol. 2006;64(6):241–51.

8. Flight L, Julious SA. Practical guide to sample size calculations: An

introduction. Pharm Stat. 2016;15(1):68–74. doi: 10.1002/pst.1709.

9. Cohen HW. P values: Use and misuse in medical literature. Am J

Hypertens. 2011;24(1):18–23. doi: 10.1038/ajh.2010.205.

10. Prel J-B du, Hommel G, Röhrig B, Blettner M. Condence interval

or P value. Dtsch Arztebl. 2009;106(19):335–9. doi: 10.3238/arzte-

bl.2009.0335.

11. Goodman S. A Dirty Dozen: Twelve P-Value Misconceptions. Semin

Hematol. 2008;45(3):135–40. doi: 10.1053/j.seminhematol.2008.04.003.

12. Wayne DW. Bioestadística. Base para el análisis de las ciencias de la

salud. 4ta ed. LIMUSA WILEY. 2006.

13. Chow S-C, Shao J, Wang H, Lokhnygina Y. Sample size calculations

in clinical research. 2nd ed. Chapman & Hall; 2008.

14. Armijo-Olivo S, Warren S, Fuentes J, Magee DJ. Clinical relevance vs.

statistical signicance: Using neck outcomes in patients with tempo-

romandibular disorders as an example. Man er. 2011;16:563–72.

doi: 10.1016/j.math.2011.05.006.

15. Chan A, Tetzla JM, Altman DG, Laupacis A, Gøtzsche PC, Krleža-je-

ric K, et al. Declaración SPIRIT 2013: denición de los elementos

estándares del protocolo de un ensayo clínico. Rev Panam salud

pública. 2015;38(6):506–14.

16. Gowda GS, Komal S, Sanjay TN, Mishra S, Kumar CN, Math SB.

Sociodemographic, legal, and clinical proles of female forensic

inpatients in Karnataka: A retrospective study. Indian J Psychol Med.

2019;41(2):138–43. doi: 10.4103/IJPSYM.IJPSYM_152_18.

17. R Core Team. R: A language and environment for statistical computing

[Internet]. R Foundation for Statistical Computing, Vienna, Austria;

2022 [cited 2022 Oct 11]. Available from: https://www.r-project.org/.

18. RStudioTeam. RStudio: Integrated Development Environment for

R [Internet]. RStudio, PBC, Boston, MA; 2022 [cited 2022 Oct 11].

Available from: http://www.rstudio.com/.

19. Sample Size Calculator [Internet]. Cleveland Clinic; 2022 [cited 2022

Oct 11]. Available from: https://riskcalc.org/samplesize/.

20. Demets DL, Lan KKG. Interim analysis: the alpha spending approach.

Stat Med. 1994;13:1341–52. doi: 10.1002/sim.4780131308.

21. Anderson K. gsDesign: Group Sequential Design [Internet]. R pac-

kage version 3.4.0; 2022 [cited 2022 Oct 11]. Available from: https://

CRAN.R-project.org/package=gsDesign.

e sample in RCT with interim analysis

Rev Peru Med Exp Salud Publica. 2023;40(2):220-8.

228

https://doi.org/10.17843/rpmesp.2023.402.12217.

22. Ellenberg S, Fleming T, DeMets D. Data Monitoring Committees in

Clinical Trials: A Practical Perspective. Wiley; 2003.

23. Fisher M, Roecker E, DeMets D. e role of an independent statistical

analysis center in the industry-modied National Institutes of Health

model. Drug Inf J. 2001;(35):115–29. doi: 10.1177/009286150103500113.

24. Kumbhare D, Alavinia M, Furlan J. Hypothesis Testing in Superiori-

ty, Noninferiority, and Equivalence Clinical Trials. Am J Phys Med

Rehabil. 2019;98(3):226–30. doi: 10.1097/PHM.0000000000001023.

25. Services H. Non-Inferiority Clinical Trials to Establish Eectiveness

Guidance for Industry [Internet]. FDA; 2016 [cited 2022 Oct 11].

Available from: https://www.fda.gov/media/78504/download.

26. Flight L, Julious SA. Practical guide to sample size calculations: an

introduction. Pharm Stat. 2016;15(1):68–74. doi: 10.1002/pst.1709.

27. Benito MM, Marín RC. Cambios en la presión arterial y frecuencia

cardíaca después de una presión sobre la válvula aórtica en sujetos

con hipertensión arterial esencial. Osteopat Cient. 2008;3(3):100–7.

doi: 10.1016/S1886-9297(08)75758-8.

28. Chadwick D. Safety and ecacy of vigabitrin and carbamazepine in

newly diagnosed epilepsy: a multicentre randomised double-blind

study. Lancet. 1999;354:13–9. doi: 10.1016/s0140-6736(98)10531-7.

29. Priebe S, Chevalier A, Hamborg T, Golden E, King M, Pistrang N.

Eectiveness of a volunteer befriending programme for patients with

schizophrenia: Randomised controlled trial. Br J Psychiatry. 2019;1–7.

doi: 10.1192/bjp.2019.42.

30. Bascope EL, Ortiz YM, Llanos GRL, Lizbeth MHA, Lazo L. Metformi

na en el tratamiento del síndrome de ovarios poliquísticos. Un ensayo

clínico aleatorizado. Rev Cient Cienc Med. 2017;20(2):45–52.