· Summary · Design « · Useful numbers

Search :

## Experimental design

To ensure that impurities in the two samples are not responsible for a difference in perceived odour, our experiments will use a gas chromatograph to produce pure samples.

### Statistical details

Let the two chemicals be called A and B. We plan to ask S (eg 5) subjects each to participate in T (eg 4) trials. In any one trial, each subject will be presented with a training set (A followed by B, or vice versa, with the subject being told which is which) then asked to make identifications of a sequence of N (eg 5) test samples. The identities of the N test samples will be assigned completely randomly, and blinding will be used as much as possible. The subjects will be told that the N test samples were chosen by flipping a coin. (This gives a 1/16 chance per run that the test samples are AAAAA or BBBBB, but I think this is the best way to do it, otherwise the subject may be tempted to use `the law of averages' to boost their performance when they are uncertain.)

For each of the N test samples, the subject will be asked to label the sample A or B, and to assign a strength of confidence on a three-level scale (very unsure, reasonably sure but not certain, certain).

Unfortunately perfect double-blinding of the experiment will be impossible since the operator of the Gas Chromatography machine will be able to infer what each sample is from the time it takes to traverse the machine; and in order to space evenly the arrival times of the samples at the subjects, the operator will have to insert the samples at appropriately chosen uneven times. We will therefore have to recruit an independent observer to communicate between the operator, who knows what's what, and the other experimenters and subjects.

The hypotheses that we wish to compare are:
 H_0 (null) That all labels assigned by the subjects are independent of the truth; the probability that each guess is right is f=1/2. H_1 That all subjects have the same ability f>1/2 to identify the truth, and f is the same for all trials. This model has one parameter, f, whose prior distribution might be assigned uniform in (1/2,1). H_S That each subject s has a personal ability f_s>1/2 to identify the truth, and f_s is the same for all trials. This model has S parameters. We might use uniform priors or perhaps a hierarchical prior might be preferred by professional Bayesians. H_3S That each subject s has an ability to identify the truth that varies from test sample to test sample, because of order-dependence of odours, fatigue, or whatever; and that the subject is able to identify these variations in ability and report them using the confidence indicator. This model has 3S parameters. I'd suggest assigning priors like this _|\, __--, _/| to the three categories of parameter.

If we have to resort to statistics to reveal the result, then the experiment will have failed. We hope to design an experiment where the result is obvious to the eye.

Let's anticipate the sort of evidence we might obtain if a subject has an ability f=75%. (We'll assume H_1 to be true.) In TN=20 test samples, we expect roughly 15 successes. Such a result gives an evidence ratio

```P(d=15|H_1)     Int(2*x^15*(1-x)^5, x = 1/2..1)
----------- =   -------------------------------- = 6.4/1
P(d=15|H_0)               (1/2)^20
```
which by itself is not overwhelming evidence. (If the result were d=10 then the evidence ratio would be 1/3.7.)

If all S=5 subjects have the same ability and do the same number of trials then the evidence ratio would become

```P(d=75|H_1)
----------- =  1,000,000 .
P(d=75|H_0)
```
Thus 5 subjects each doing 20 tests is sufficient to get a good strong result in favour of the rather simpistic model H_1, assuming H_1 is true and f=0.75.