Analysis of a Crossover Clinical Trial by Permutation Methods


Phillip Good (, Huntington Beach CA 92648  USA.,


Fang Xie (Biostatistics, Cephalon Inc., 41 Moores Road, Fazer, PA, 19355, USA  to whom requests for reprints should be addressed



Summary. To obtain exact significance levels, an initially balanced crossover design is analyzed by permutation methods. The analysis is first restricted to those subjects who completed all nine periods.  Next, it is extended to those who successfully completed two of the three blocks of treatments or six periods. Finally, we show how these results may be extended to any number of treatment sequences and treatments.

Key words:  permutation methods, cross-over design, randomized blocks, choosing test statistic, choosing randomization set.



1. Introduction.


Several two-treatment by nine-period (2x9) crossover designs involving large numbers of  patients are to be analyzed. The designs are balanced in the sense that the ratio of treatment versus control is always 2:1 within a period and across all periods.  An example of such a design, to be considered at length in what follows, is one in which patients were assigned at random to one of the three possible treatment sequences:





This is a balanced scheme and the 3 “super blocks” (110, 101, and 011) form a Latin Square.  So much for the theory.  In practice, different numbers of patients {ni, i=1,2,3}were assigned to each of the treatment sequences and not all patients completed all nine periods.  We had three motivations for undertaking a permutation analysis of the data:  First, we wanted a test that would provide exact significance levels regardless of the underlying distribution.   Second, Good and Lunneborg (2006) showed that the permutation method provides greater power than the analysis of variance when there are unequal numbers in the various samples.  Last, but not least, the FDA requested that we employ a distribution-free test when analyzing our data.


A balanced treatment assignment for each patient (2:1 ratio for active vs placebo) is necessary to produce an unbiased statistical treatment comparison using the analysis of variance model.  This balance can be achieved if an equal proportion of patients are assigned to each treatment sequence and each patient completes the treatment sequence.  However, the above conditions are unlikely met in reality.  Consequently, the imbalance could induce a treatment by period interaction that may be confounded with the treatment effect in the analysis of variance model.  A permutation test provides a means to examine the robustness of the analysis of variance model to such a confounding effect.

In what follows, we first restrict the analysis to those subjects who completed all nine periods.  Next, we analyze the full data set. Last, we show how these results may be extended to any number of treatment sequences and treatments.


2. Analysis of a Complete Balanced Design

Our test statistic is the sum of the observations with the control label, S=∑∑∑Cijk or simply ∑C  where i denotes the treatment sequence, j the period, and k the subject.  Other test statistics are possible, but those that include summations that are invariant under permutations of the labels such as 2∑C    ∑T = 3∑C    (∑T + ∑C)   are equivalent to the one proposed here and thus are unnecessarily complex.


The traditional test for the two-sample comparison involves an exchange of labels between those subjects that receive the control treatment and those who do not (see, for example, Section 4.2 of Good, 2005).   Such exchanges also have been employed by those authors who considered permutation tests for the 2x2 crossover (for example, Johnson and Mercante, 1996; Guilbaud (1999); and Patefield, 2000). But when treatments are administered over more than two periods as in the present case such exchanges of treatment labels would confound carry-over with main effects.

We obtain the permutation distribution of our test statistic by rearranging the sequence labels on the subjects so that the number of subjects {ni} in each sequence is preserved. 

Note that because the sum of all the observations and the sum of the squares of all the observations remain invariant under rearrangements of the labels, use of the test statistic S provides the same results as would use of the more complex test statistics S1 =   2∑C  - ∑T  or F = S1T where σT  is the variance of all the observations about the grand mean.

As the design is balanced and the exchange of subject labels results in the exchange of observations within the periods, differences among the periods will not affect the permutation distribution.  Indeed, if the treatment has no effect, and all observations have equal inherent variance, then all values of the permutation distribution are equally likely.  On the other hand, if the treatment results in a decrease in the expected value of an observation, then one would expect the original value of the test statistic to be at the high end of the permutation distribution.

When we applied this method to the data from 61  patients who completed all nine periods in a recent clinical study by Portenoy et al. (2007) with the patients divided { 22 , 22 ,17 } among the three treatment sequences, the observations ranged from –8 to 27  units.  The mean of the control observations was 3.83 and the mean of the treatment observations was 9.11.   The value of the test statistic for the data as labeled originally was 3.83*61*3.   The values of the test statistic for 1600 random relabelings of the data ranged from 3.88*61*3  to 9.47*61*3 with the 2.5th percentile equal to 5.40*61*3.  We concluded that there was a statistically significant difference between the two treatments.

3. Analysis of a Balanced Design When Not All Subjects Complete Treatment

Although 74 patients entered our study, only 61 patients completed all 9 periods, a not unexpected result in a clinical trial.  Still, 9 of the remaining patients completed at least the first six periods.   As the first six periods also constitute a balanced design, we proceeded to make use of the data from the patients who did not complete all 9 periods as follows:

All the data from patients who completed all nine periods was retained as a single block.  A second block was formed from the data for the first six periods from all patients who completed at least six but no more than eight periods.  Thus, in either block, all the retained data included exactly twice as many treatment observations as control observations for each patient. 

As before, our test statistic was the sum of the observations labeled as controls.  But the relabeling was conducted separately and independently within each block.  This division into blocks ensures that the observations are exchangeable under the null hypothesis so that the resulting significance level is exact (see Lehmann, 1986, pp 233-4).

In the clinical study described above, 74 patients began the study, 9 patients, divided  (2,3,4) among the three treatment sequences completed at least the first six periods, and 61 completed all nine periods.  As before, the contribution to the test statistic of those who completed all nine periods was 3.83*61*3=700.9.  The contribution of those who completed at least six periods but no more than 8 was 43.4.  The value of our new test statistic for the data as labeled originally is 744.  The values of the test statistic for 1600 random relabelings of the data ranged from 1255  to 1490  with the 2.5th percentile equal to 1487.  Again, we concluded that there was a statistically significant difference between the two treatments.

4. Discussion

The permutation test for treatment in a cross-over design described here yields exact p-values which are independent of the underlying distribution.  Our research was motivated by the desire of the FDA for just such a distribution-free exact test.

A large number of treatment sequences could rise inadvertently in a crossover study when patients did not follow the designated order of the treatment sequences.  As a result, the designed treatment balance would be violated and the treatment comparison could be complicated by confounding factors such as a treatment by period interaction.

Fortunately, the procedures described here are readily extended to the case of multiple treatments and to large numbers of treatment sequences.  For example, if there were 18 treatment sequences, the permutation distribution would be derived by applying the {n1,n2,…n18} sequence labels at random to the observations within each period before recomputing the test statistic.

In designing the clinical study considered in this article, the treatment sequences were restricted so that a patient would never go two days in a row without receiving the active treatment.  A treatment sequence such as TTC CTT TCT was not permitted.  Relabeling the observations to obtain a permutation distribution using the method described here creates sequences not envisioned in the original study (though without risk to the patients).  The authors invite comments on the validity of this approach.  See, for example, the discussion at Kempthorne (1979).


[1] Good P, Lunneborg L. Limitations of the analysis of variance.  The one-way design.   Journal Modern Applied Statistical Methods 2006; 5:41-43.


[2] Good P. Introduction to Statistics via Resampling Methods and R.  New York: Wiley. 2005. 


[3] Johnson Wd, Mercante De. Analyzing multivariate data in crossover designs using permutation tests. Journal of Biopharmaceutical Statistics 1996; 63: 327-342.


[4] Guilbaud O. Exact comparisons of means and within-subject variances in 2 x 2 crossover trials. Drug Information Journal 1999; 33: 455-469.


[5] Patefield M. Conditional and exact tests in crossover trials. Journal of Biopharmaceutical Statistics 2000; 101: 109-129.


[6] Portenoy Rk., Messina J, Xie F, Peppin J. Fentanyl buccal tablet FBT for relief of breakthough pain in opioid-treated patients with chronic low back pain: a randomized, placebo-controlled study. Current Medical Research and Opinions 2007; 23: 223-233.


[7] Lehmann E.  Testing Statistical Hypotheses.  New York: John Wiley, 1986.


[8] Kempthorne O. In dispraise of the exact test: reactions. Journal Statistical Planning and Inference 1979; 3: 199-213.


The data set analyzed here is posted at for those who may wish to subject it to alternate analyses.