Phillip Good (statisticsonline.info, Huntington Beach CA 92648 USA., courses@statisticsonline.info)

and

Fang Xie (Biostatistics, Cephalon Inc., 41 Moores Road, Fazer, PA, 19355, USA fxie@cephalon.com) to whom requests for reprints should be addressed

**S**ummary.
To obtain exact significance levels, an initially balanced crossover design is
analyzed by permutation methods. The analysis is first restricted to those
subjects who completed all nine periods.
Next, it is extended to those who successfully completed two of the
three blocks of treatments or six periods. Finally, we show how these results
may be extended to any number of treatment sequences and treatments.

Key words: permutation methods, cross-over design, randomized blocks, choosing test statistic, choosing randomization set.

** **

**1. Introduction.**

Several two-treatment by nine-period (2x9) crossover designs involving large numbers of patients are to be analyzed. The designs are balanced in the sense that the ratio of treatment versus control is always 2:1 within a period and across all periods. An example of such a design, to be considered at length in what follows, is one in which patients were assigned at random to one of the three possible treatment sequences:

TTC TCT CTT

TCT CTT TTC

CTT TTC TCT

This is a balanced scheme and the 3 “super blocks” (110,
101, and 011) form a Latin Square. So
much for the theory. In practice, different
numbers of patients {n_{i}, i=1,2,3}were assigned to each of the
treatment sequences and not all patients completed all nine periods. We had three motivations for undertaking a
permutation analysis of the data:
First, we wanted a test that would provide exact significance levels
regardless of the underlying distribution.
Second, Good and Lunneborg (2006) showed that the permutation method
provides greater power than the analysis of variance when there are unequal
numbers in the various samples. Last,
but not least, the FDA requested that we employ a distribution-free test when
analyzing our data.

A balanced treatment assignment for each patient (2:1 ratio for active vs placebo) is necessary to produce an unbiased statistical treatment comparison using the analysis of variance model. This balance can be achieved if an equal proportion of patients are assigned to each treatment sequence and each patient completes the treatment sequence. However, the above conditions are unlikely met in reality. Consequently, the imbalance could induce a treatment by period interaction that may be confounded with the treatment effect in the analysis of variance model. A permutation test provides a means to examine the robustness of the analysis of variance model to such a confounding effect.

In what follows, we first restrict the analysis to those subjects who completed all nine periods. Next, we analyze the full data set. Last, we show how these results may be extended to any number of treatment sequences and treatments.

** **

**2.
Analysis of a Complete Balanced Design**

Our test
statistic is the sum of the observations with the control label,
S=∑∑∑C_{ijk} or simply ∑C where i denotes the treatment sequence, j
the period, and k the subject. Other
test statistics are possible, but those that include summations that are
invariant under permutations of the labels such as 2∑C –
∑T = 3∑C – (∑T + ∑C) are equivalent to the one proposed here and
thus are unnecessarily complex.

The traditional test for the two-sample comparison involves an exchange of labels between those subjects that receive the control treatment and those who do not (see, for example, Section 4.2 of Good, 2005). Such exchanges also have been employed by those authors who considered permutation tests for the 2x2 crossover (for example, Johnson and Mercante, 1996; Guilbaud (1999); and Patefield, 2000). But when treatments are administered over more than two periods as in the present case such exchanges of treatment labels would confound carry-over with main effects.

We obtain
the permutation distribution of our test statistic by rearranging the *sequence
labels* on the subjects so that the number of subjects {n_{i}} in
each sequence is preserved.

Note that
because the sum of all the observations and the sum of the squares of all the
observations remain invariant under rearrangements of the labels, use of the
test statistic S provides the same results as would use of the more complex
test statistics S_{1} =
2∑C - ∑T or F = S_{1}/σ_{T}
where σ_{T} is the
variance of all the observations about the grand mean.

As the design is balanced and the exchange of subject labels results in the exchange of observations within the periods, differences among the periods will not affect the permutation distribution. Indeed, if the treatment has no effect, and all observations have equal inherent variance, then all values of the permutation distribution are equally likely. On the other hand, if the treatment results in a decrease in the expected value of an observation, then one would expect the original value of the test statistic to be at the high end of the permutation distribution.

When we
applied this method to the data from 61
patients who completed all nine periods in a recent clinical study by
Portenoy et al. (2007) with the patients divided { 22 , 22 ,17 } among the
three treatment sequences, the observations ranged from –8 to 27 units.
The mean of the control observations was 3.83 and the mean of the
treatment observations was 9.11. The
value of the test statistic for the data as labeled originally was
3.83*61*3. The values of the test
statistic for 1600 random relabelings of the data ranged from 3.88*61*3 to 9.47*61*3 with the 2.5^{th}
percentile equal to 5.40*61*3. We
concluded that there was a statistically significant difference between the two
treatments.

**3.
Analysis of a Balanced Design When Not All Subjects Complete Treatment**

Although 74 patients entered our study, only 61 patients completed all 9 periods, a not unexpected result in a clinical trial. Still, 9 of the remaining patients completed at least the first six periods. As the first six periods also constitute a balanced design, we proceeded to make use of the data from the patients who did not complete all 9 periods as follows:

All the data from patients who completed all nine periods was retained as a single block. A second block was formed from the data for the first six periods from all patients who completed at least six but no more than eight periods. Thus, in either block, all the retained data included exactly twice as many treatment observations as control observations for each patient.

As before, our test statistic was the sum of the observations labeled as controls. But the relabeling was conducted separately and independently within each block. This division into blocks ensures that the observations are exchangeable under the null hypothesis so that the resulting significance level is exact (see Lehmann, 1986, pp 233-4).

In the clinical
study described above, 74 patients began the study, 9 patients, divided (2,3,4) among the three treatment sequences
completed at least the first six periods, and 61 completed all nine periods. As before, the contribution to the test
statistic of those who completed all nine periods was 3.83*61*3=700.9. The contribution of those who completed at
least six periods but no more than 8 was 43.4.
The value of our new test statistic for the data as labeled originally
is 744. The values of the test statistic
for 1600 random relabelings of the data ranged from 1255 to 1490
with the 2.5^{th} percentile equal to 1487. Again, we concluded that there was a
statistically significant difference between the two treatments.

**4.
Discussion**

The permutation test for treatment in a cross-over design described here yields exact p-values which are independent of the underlying distribution. Our research was motivated by the desire of the FDA for just such a distribution-free exact test.

A large number of treatment sequences could rise inadvertently in a crossover study when patients did not follow the designated order of the treatment sequences. As a result, the designed treatment balance would be violated and the treatment comparison could be complicated by confounding factors such as a treatment by period interaction.

Fortunately, the procedures described here are readily extended to the case of multiple treatments and to large numbers of treatment sequences. For example, if there were 18 treatment sequences, the permutation distribution would be derived by applying the {n1,n2,…n18} sequence labels at random to the observations within each period before recomputing the test statistic.

In designing the clinical study considered in this article, the treatment sequences were restricted so that a patient would never go two days in a row without receiving the active treatment. A treatment sequence such as TTC CTT TCT was not permitted. Relabeling the observations to obtain a permutation distribution using the method described here creates sequences not envisioned in the original study (though without risk to the patients). The authors invite comments on the validity of this approach. See, for example, the discussion at Kempthorne (1979).

**References**

[1]
Good P, Lunneborg L. Limitations of the analysis of variance. The one-way design. *Journal Modern Applied Statistical
Methods *2006; **5**:41-43.

[2]
Good P. *Introduction to Statistics via Resampling Methods and R*. New York: Wiley. 2005.

[3] Johnson Wd, Mercante De. Analyzing
multivariate data in crossover designs using permutation tests. *Journal of
Biopharmaceutical Statistics *1996; 63: 327-342.

[4] Guilbaud O. Exact comparisons of means and
within-subject variances in 2 *x *2 crossover trials. *Drug Information
Journal *1999; 33: 455-469.

[5] Patefield M. Conditional and exact tests
in crossover trials. *Journal of Biopharmaceutical Statistics *2000; 101:
109-129.

[6]
Portenoy Rk., Messina J, Xie F, Peppin J. Fentanyl buccal tablet FBT for relief
of breakthough pain in opioid-treated patients with chronic low back pain: a
randomized, placebo-controlled study. *Current Medical Research and Opinions *2007;
**23**: 223-233.

[7]
Lehmann E. *Testing Statistical
Hypotheses*. New York: John Wiley,
1986.

[8]
Kempthorne O. In dispraise of the exact test: reactions. *Journal Statistical
Planning and Inference *1979; **3**: 199-213.

The
data set analyzed here is posted at statisticsonline.info/blk.csv for those who
may wish to subject it to alternate analyses.