To ensure that impurities in the two samples are not
responsible for a difference in perceived odour,
our experiments will use a gas chromatograph to
produce pure samples.
Let the two chemicals be called A and B.
We plan to ask S (eg 5) subjects each to participate in T (eg 4) trials.
In any one trial, each subject will be presented with
a training set (A followed by B, or vice versa, with the subject being told which is which)
then asked to make identifications of a sequence of N (eg 5) test samples.
The identities of the N test samples will be assigned completely randomly,
and blinding will be used as much as possible. The subjects will be told that the
N test samples were chosen by flipping a coin.
(This gives a 1/16 chance per run that the test samples are AAAAA
or BBBBB, but I think this is the best way to do it, otherwise the
subject may be tempted to use `the law of averages' to boost
their performance when they are uncertain.)
For each of the N test samples, the subject will be asked
to label the sample A or B,
and to assign a strength of confidence on a three-level scale
(very unsure, reasonably sure but not certain, certain).
Unfortunately perfect double-blinding of the experiment will
be impossible since the operator of the Gas Chromatography
machine will be able to infer what each sample is from
the time it takes to traverse the machine; and
in order to space evenly the arrival times of the samples
at the subjects,
the operator will have to insert the samples at appropriately
chosen uneven times. We will therefore have to recruit an
independent observer to communicate between the
operator, who knows what's what, and the other experimenters
The hypotheses that we wish to compare are:
|That all labels assigned by the subjects
are independent of the truth; the probability
that each guess is right is f=1/2.
|That all subjects have the same ability f>1/2 to identify the truth,
and f is the same for all trials.
This model has one parameter, f, whose prior distribution might be assigned uniform in (1/2,1).
|That each subject s has a personal ability f_s>1/2 to identify the truth,
and f_s is the same for all trials.
This model has S parameters. We might use uniform priors or perhaps
a hierarchical prior might be preferred by professional Bayesians.
|That each subject s has an ability to identify the truth
that varies from test sample to test sample, because of order-dependence of odours, fatigue,
or whatever; and that the subject is able to identify these variations in ability
and report them using the confidence indicator. This model has 3S
parameters. I'd suggest assigning priors like this _|\,
__--, _/| to the three categories of parameter.
If we have to resort to statistics to reveal the result, then the experiment will
have failed. We hope to design an experiment where the result is
obvious to the eye.
Let's anticipate the sort of evidence we might obtain
if a subject has an ability f=75%. (We'll assume H_1 to be true.)
In TN=20 test samples, we expect roughly 15 successes.
Such a result gives an evidence ratio
P(d=15|H_1) Int(2*x^15*(1-x)^5, x = 1/2..1)
----------- = -------------------------------- = 6.4/1
which by itself is not overwhelming evidence.
(If the result were d=10 then the evidence ratio would be 1/3.7.)
If all S=5 subjects have the same ability and do
the same number of trials then the evidence ratio would become
----------- = 1,000,000 .
Thus 5 subjects each doing 20 tests is sufficient to get a
good strong result in favour of the rather simpistic model H_1,
assuming H_1 is true and f=0.75.