The One-Way Between-Subjects Analysis of Variance

 

John K. Adams

 

An analysis of variance (ANOVA) tests two or more means for treatment effects.

 

For example, suppose that we have three groups of subjects and each group has been treated differently:

 

Group 1

Group 2

Group 3

 

1

3

5

 

2

4

6

 

3

5

7

 

4

6

8

 

5

7

9

 

 

Note the value and notation of the grand mean: .

 

Given these observations, we ask the question: “Are the apparent differences in  due to the different treatments of the three groups, or could the differences be due to sampling error alone?”

 

Formally: If we let   (say “gamma sub jay”) stand for “the treatment effect for group j” then we can form a set of statistical hypotheses appropriate for the one-way ANOVA:

 

  for all j

 

  for some j

 

The set of hypotheses is “admissible” because it satisfies two requirements: The members of the set are mutually exclusive (i.e., they don’t overlap); and they are exhaustive (i.e., between them, the two hypotheses cover all possibilities).

 

The alternate hypothesis requires some explanation. The word some means from “one” to “all,” inclusive. So if there are k treatment means, the word some means that, under the alternate hypothesis, from one mean to all k means could be associated with a non-zero treatment effect. This complexity arises from the fact that, with ANOVA, .

 

With ANOVA, everything hinges on the concept of variation. Let’s begin our discussion by asking, What could make a given score  vary from the grand mean ?

 

 

 

 

Here’s the data table again:

 

Group 1

Group 2

Group 3

 

1

3

5

 

           2

4

6

 

3

5

7

 

4

6

8

 

5

7

9

 

 

Consider the score  (marked with an arrow in the table). The score has the value of “2.”  Its deviation from  is

 

 =  2    5  =  -3

 

Now, we could think of that deviation as being composed of two separate deviations:

 

 =  2    3  =  -1        (a within-groups deviation)

and

  =  3    5  =  -2        (a between-groups deviation).

 

Now, what causes the within-groups deviation, ?  The answer is “error alone.” That answer makes sense because every subject in a particular group was treated identically. Identical treatment cannot be expected to produce differences. The only remaining source of differences is something that is always present, error.

 

What causes the between-groups deviation, ?  The answer is “error plus any treatment effect.” Here, because the different groups were “treated differently” during the experiment, a treatment effect can play a role in the deviation (in addition to error).

 

We are now ready to approach one of the core ideas of ANOVA: We want to get some idea of the variability in the data that is due to “error” and some idea of the variability that is due to treatments (). We accomplish this by partitioning the total variability into “error-alone” and “error-plus-treatment” components.

 

Partitioning makes use of a familiar concept, the sum of squares (SS). We know that, in general,

 

.

 

There are three ’s we are interested in for the one-way between-subjects ANOVA, and the relationship among them is

 

 

This relationship reflects the “mini-partitioning” we carried out earlier for an individual score when we divided its deviation from the grand mean into two components: the within-groups deviation of the score from its own group mean; and the between-groups deviation of the group mean from the grand mean. Try to convince yourself that the within-groups deviation is an “error-alone” component and the between-groups deviation is an “error-plus-any-treatment-effect” component.

 

But now, for the partitioning we are about to carry out, and unlike the simple treatment we gave the single score, we are going to work with all of the scores at once; moreover, instead of working with simple differences, which are linear commodities, we are going to work with sums of squares (and, ultimately, true variance estimates), which are area commodities. By the way, the fact that we end up with true variance estimates for some final purpose in an analysis tells us exactly how ANOVA got its name.

 

Now let’s turn to the partitioning. For this purpose, we will use a reduced data set to keep things manageable. Consider the following data set with three treatment groups:

 

Group 1

Group 2

Group 3

 

1

5

9

 

2

6

10

 

3

7

11

 

 

Here is the partitioning of this data set:

 

Score

 

1

16

25

0

16

16

1

16

9

1

0

1

0

0

0

1

0

1

1

16

9

0

16

16

1

16

25

 =  6

 

 =  96

 =  102

 

 

 

 

Here are the formulae that summarize the operations carried out in the partitioning:

 

    =    6

 

      =    96

 

      =   102

 

Also please note that the relationship among sums of squares described by the equation

 

 

is seen in the partitioning. That is, after substituting the sums we obtained from the partitioning into the equation, we have

 

 

which is obviously a valid expression. So everything holds, as it should.

 

As you know, the SS is the numerator of a variance estimate. In analysis of variance, we want actual variance estimates. In ANOVA, we call these variance estimates mean squares (MS).

 

You know, in general, that the estimate of a population variance is:

 

 .

 

We do something like that here. But we get two independent variance estimates:

 

 

 

(Note that k is the number of levels on the independent variable.)

 

 

The ratio of  is distributed as . Note that F has two parameters (and their order is important). Putting everything together we have:

 

.

For the current partitioning we have:

 

.

 

If we consult an F table and locate the “.05” cutoff for the F(2, 6) distribution, we would find that an F of 48.00 is way beyond the cutoff (i.e., way out into the tail of the distribution). By the way, there is a point of potential confusion that needs to be addressed: An F test is a “one-tailed” test on the distribution of F (because it is a distribution of positive values that can only expand in one direction, toward ); however, conceptually, F is a “two-tailed” test (because F represents values of squared components which, unlike linear components, are directionless). So, in order for your F and t tests to be comparable, all your t-tests should be two-tailed.

 

Recall that higher-order sampling distributions (like t and F) are models of reality under the assumption that the null hypothesis is true. With ANOVA, the null hypothesis states that all treatment effects are zero. So if F is a model representing the null hypothesis, and if we observe a value of  that is 48 times larger than an , that would be pretty unusual under the assumption that is true. That is, if we consider the null hypothesis to be true, the “expected value” if F is 1.0. (Look at the definition of the F ratio above and try to understand why this is so.)

 

Now if we observe an unusual outcome under a model that assumes the null hypothesis is true, then we are moved to “reject” the null hypothesis as fitting the reality we are observing. Of course, it’s possible that an unusual outcome occurred with the null hypothesis actually true (i.e., our outcome is, after all, represented on the F distribution). If we reject the null hypothesis when it is, in fact, true, then a Type 1 error has occurred. But as long as the chance of a Type 1 error is less than 1 in 20 (i.e., p < .05) in a given situation, we usually decide to reject the null hypothesis as a model for the reality we are observing. In statistical decision-making, the concern is not whether we are actually right or wrong, because that is unknowable; the concern is how we bet.

 

The more extreme the outcome, the safer the bet. So, if the value of the test statistic is extreme under the assumption that the null hypothesis is true, then there is less chance of a Type 1 error when we reject the null hypothesis. In the present situation, for example, the probability that we are making a Type 1 error is very small, p = .0002 (i.e., only 2 chances in 10,000).

 

Once again, good experimenting is all about power: The more powerful the experiment, the larger the observed value of the test statistic will be (if there is, in fact, a causal relationship between IV and DV), and the more extreme that test statistic will be on a distribution representing reality under the null hypothesis, and the more likely you will be to reject the null hypothesis as an adequate model of reality, and the less likely it will be that, in rejecting it, you will be committing a Type 1 error.