The One-Way
Between-Subjects Analysis of Variance
John K. Adams
For example, suppose that we have three groups of subjects and each group has been treated differently:
|
Group 1 |
Group 2 |
Group 3 |
|
|
1 |
3 |
5 |
|
|
2 |
4 |
6 |
|
|
3 |
5 |
7 |
|
|
4 |
6 |
8 |
|
|
5 |
7 |
9 |
|
|
|
|
|
|
Note the value and notation of the grand mean:
.
Given these observations, we ask the question: “Are the
apparent differences in
due to the different treatments
of the three groups, or could the differences be due to sampling error alone?”
Formally: If we let![]()
(say “gamma sub
jay”) stand for “the treatment effect for group j” then we can form a set of
statistical hypotheses appropriate for the one-way ANOVA:
for all j
for some j
The set of hypotheses is “admissible” because it satisfies two requirements: The members of the set are mutually exclusive (i.e., they don’t overlap); and they are exhaustive (i.e., between them, the two hypotheses cover all possibilities).
The alternate hypothesis requires some explanation. The word
some means from “one” to “all,” inclusive. So if there are k
treatment means, the word some means that, under the alternate
hypothesis, from one mean to all k means could be associated with
a non-zero treatment effect. This complexity arises from the fact that, with
ANOVA,
.
With ANOVA, everything hinges on the concept of variation.
Let’s begin our discussion by asking, What could make a given score
vary from the grand
mean
?
Here’s the data table again:
|
Group 1 |
Group 2 |
Group 3 |
|
|
1 |
3 |
5 |
|
|
|
4 |
6 |
|
|
3 |
5 |
7 |
|
|
4 |
6 |
8 |
|
|
5 |
7 |
9 |
|
|
|
|
|
|
Consider the score
(marked with an arrow
in the table). The score has the value of “2.”
Its deviation from
is
= 2
– 5 = -3
Now, we could think of that deviation as being composed of two separate deviations:
= 2
– 3 = -1 (a within-groups deviation)
and
= 3
– 5 = -2 (a between-groups deviation).
Now, what causes the within-groups deviation,
? The answer is “error
alone.” That answer makes sense because every subject in a particular group was
treated identically. Identical treatment cannot be expected to produce
differences. The only remaining source of differences is something that is
always present, error.
What causes the between-groups deviation,
? The answer is
“error plus any treatment effect.” Here, because the different groups were
“treated differently” during the experiment, a treatment effect can play
a role in the deviation (in addition to error).
We are now ready to approach one of the core ideas of ANOVA:
We want to get some idea of the variability in the data that is due to “error”
and some idea of the variability that is due to treatments (
). We accomplish this by partitioning the total
variability into “error-alone” and “error-plus-treatment” components.
Partitioning makes use of a familiar concept, the sum of squares (SS). We know that, in general,
.
There are three
’s we are interested in for the one-way between-subjects
ANOVA, and the relationship among them is
![]()
This relationship reflects the “mini-partitioning” we carried out earlier for an individual score when we divided its deviation from the grand mean into two components: the within-groups deviation of the score from its own group mean; and the between-groups deviation of the group mean from the grand mean. Try to convince yourself that the within-groups deviation is an “error-alone” component and the between-groups deviation is an “error-plus-any-treatment-effect” component.
But now, for the partitioning we are about to carry out, and unlike the simple treatment we gave the single score, we are going to work with all of the scores at once; moreover, instead of working with simple differences, which are linear commodities, we are going to work with sums of squares (and, ultimately, true variance estimates), which are area commodities. By the way, the fact that we end up with true variance estimates for some final purpose in an analysis tells us exactly how ANOVA got its name.
Now let’s turn to the partitioning. For this purpose, we will use a reduced data set to keep things manageable. Consider the following data set with three treatment groups:
|
Group 1 |
Group 2 |
Group 3 |
|
|
1 |
5 |
9 |
|
|
2 |
6 |
10 |
|
|
3 |
7 |
11 |
|
|
|
|
|
|
Here is the partitioning of this data set:
|
Score |
|
|
|
|
|
|
1 |
|
16 |
25 |
|
|
0 |
16 |
16 |
|
|
|
1 |
16 |
9 |
|
|
|
1 |
|
0 |
1 |
|
|
0 |
0 |
0 |
|
|
|
1 |
0 |
1 |
|
|
|
1 |
|
16 |
9 |
|
|
0 |
16 |
16 |
|
|
|
1 |
16 |
25 |
|
|
|
|
|
|
|
Here are the formulae that summarize the operations carried out in the partitioning:
= 6
= 96
= 102
Also please note that the relationship among sums of squares described by the equation
![]()
is seen in the partitioning. That is, after substituting the sums we obtained from the partitioning into the equation, we have
![]()
which is obviously a valid expression. So everything holds, as it should.
As you know, the SS is the numerator of a variance estimate. In analysis of variance, we want actual variance estimates. In ANOVA, we call these variance estimates mean squares (MS).
You know, in general, that the estimate of a population variance is:
.
We do something like that here. But we get two independent variance estimates:


(Note that k is the number of levels on the independent variable.)
The ratio of
is distributed as
. Note that F has two parameters (and their order is
important). Putting everything together we have:
.
For the current partitioning we have:
.
If we consult an F table and locate the “.05” cutoff
for the F(2, 6) distribution, we would find that an F of 48.00 is
way beyond the cutoff (i.e., way out into the tail of the distribution). By the
way, there is a point of potential confusion that needs to be addressed: An F
test is a “one-tailed” test on the distribution of F (because it
is a distribution of positive values that can only expand in one direction,
toward
); however, conceptually, F is a “two-tailed”
test (because F represents values of squared components which,
unlike linear components, are directionless). So, in order for your F
and t tests to be comparable, all your t-tests should be two-tailed.
Recall that higher-order sampling distributions (like t
and F) are models of reality under the assumption that the null
hypothesis is true. With ANOVA, the null hypothesis states that all
treatment effects are zero. So if F is a model representing the null
hypothesis, and if we observe a value of
that is 48 times
larger than an
, that would be pretty unusual under the assumption that
is true. That is, if we consider the null hypothesis to be
true, the “expected value” if F is 1.0. (Look at the definition of the F
ratio above and try to understand why this is so.)
Now if we observe an unusual outcome under a model that assumes the null hypothesis is true, then we are moved to “reject” the null hypothesis as fitting the reality we are observing. Of course, it’s possible that an unusual outcome occurred with the null hypothesis actually true (i.e., our outcome is, after all, represented on the F distribution). If we reject the null hypothesis when it is, in fact, true, then a Type 1 error has occurred. But as long as the chance of a Type 1 error is less than 1 in 20 (i.e., p < .05) in a given situation, we usually decide to reject the null hypothesis as a model for the reality we are observing. In statistical decision-making, the concern is not whether we are actually right or wrong, because that is unknowable; the concern is how we bet.
The more extreme the outcome, the safer the bet. So, if the value of the test statistic is extreme under the assumption that the null hypothesis is true, then there is less chance of a Type 1 error when we reject the null hypothesis. In the present situation, for example, the probability that we are making a Type 1 error is very small, p = .0002 (i.e., only 2 chances in 10,000).
Once again, good experimenting is all about power: The more powerful the experiment, the larger the observed value of the test statistic will be (if there is, in fact, a causal relationship between IV and DV), and the more extreme that test statistic will be on a distribution representing reality under the null hypothesis, and the more likely you will be to reject the null hypothesis as an adequate model of reality, and the less likely it will be that, in rejecting it, you will be committing a Type 1 error.