Friday, 20 December 2013

Interpreting the two-way ANOVA test

In this blog post, I will try to explain how to interpret the two-way ANOVA test using a simple example.

Suppose we were testing the yield of a crop plant based on seed types and which field they were planted in, so we have two factors: Seed Type, and Field Type. The yield could be the number of grains in a plant. For the first factor, let us assume we have three seed types, which we call Seed 1, Seed 2 and Seed 3. As for the second factor, let us assume we have two field types, which we denote as Field 1 and Field 2. For each field type and seed type, let us assume we have three samples (also known as replicates). We can represent the results in a table as below, where entries $a_{ij}(k)$ are the number of grains in a plant.


Seed 1 Seed 2 seed 3
Field 1 $a_{11}(1),a_{11}(2),a_{11}(3)$ $a_{12}(1),a_{12}(2),a_{12}(3)$ $a_{13}(1),a_{13}(2),a_{13}(3)$
Field 2 $a_{21}(1),a_{21}(2),a_{21}(3)$ $a_{22}(1),a_{22}(2),a_{22}(3)$ $a_{23}(1),a_{23}(2),a_{23}(3)$

Now, in a two-way ANOVA test, we calculate the F statistic for factor 1, factor 2 and the interaction. Based on the F-statistic, we calculate the p-value for factor 1, factor 2 and the interaction. What do we mean by these values?

A very low p-value for factor 1 (Seed Type) (i.e. the result is significant for the first factor), arises when the the mean values of the seed yields are different for each Seed Type. Suppose this is indeed the case, where Seed 3 has the highest yield followed by Seed 2, then Seed 1. The mean values of the yield for field 1 could look as follows.

Replicate means for field 1
Seed yield means
Seed type number

Now let us look at the second factor, Field type, and suppose the p-value for this is very low as well (i.e. the result is significant for the second factor). This tells us that the plant yields are different for different field types, and suppose that Field 2 has the lower yield plants, as it has poorer irrigation than field 1. Supposing we plotted the means of the three seed types for the two fields, and we obtain the result below

Replicate means for fields 1 and 2
Seed yield means
Seed type number

Examining the plot above, we are in a position to describe what p-value the interaction will take. Note that the mean plots are parallel - the difference in means for all three seed types between field 1 and field 2 are the same. The p-value for interaction will thus tend to 1, and so there will be no significant interaction.

A worthwhile question to pose would be what if there was significant interaction? In such a scenario, the difference in yields for each of the seed type between field 1 and field 2 would not be the same. For example, the difference in means for Seed type 3 could be much lower, resulting in the plot below. This means that there is interaction between seed type and field type - seed 3 appears to more resistant to lower water supply, for example.

Replicate means for fields 1 and 2
Seed yield means
Seed type number