# Libraries

```
library(plyr)
library(tidyr)
```

# Reviewing Analysis of Variance (ANOVA)

I’ll be following Chapter 10 of *A Primer of Ecological Statistics* by Gotelli and Ellison

Key Concepts:

ANOVA aims to determine differences in a continuous variable between 2 or more groups.

ANOVA bulit on partitioning on the concept of paritioning of the sum of squares (\(SS_{total}\)). How do we calculate this?

The total sum of squares of the data is the sum of squared deviations of each observation (\(Y_i\)) from the grand mean (\(\bar{Y}\)). There are \(i\) = 1 to \(a\) treatment levels and \(j\) = 1 to \(n\) replicates per treatment.

**recap**:

- \(i\) refers to the treatment levels
- it looks like \(a\) represents the number of different treatments

- it looks like \(a\) represents the number of different treatments
- \(j\) refers to each observations, with \(n\) being the number of replicates

\[ SS_{total} = \sum_{i=1}^a\sum_{j=1}^n (Y_{ij}-\bar{Y})^2\]

The total sum of squares is the deviation of each observation from the grand mean.

\(SS_{total}\) can then be partitioned into different components, mainly **among and within groups**.

\[SS_{total} = SS_{among} + SS_{within}\] So for the *among* group SS:

\[ SS_{among} = \sum_{i=1}^a\sum_{j=1}^n (\bar{Y}_{i}-\bar{Y})^2\] So for the *within* group SS:

\[ SS_{within} = \sum_{i=1}^a\sum_{j=1}^n (Y_{ij}-\bar{Y}_i)^2\]

**Bringing it all together**:

\[ \sum_{i=1}^a\sum_{j=1}^n (Y_{ij}-\bar{Y})^2 = \sum_{i=1}^a\sum_{j=1}^n (\bar{Y}_{i}-\bar{Y})^2 + \sum_{i=1}^a\sum_{j=1}^n (Y_{ij}-\bar{Y}_i)^2\]

## Enter data

```
#data,
# a = 3, 3 treatments
# n = 4, 4 reps
n=4
unman<-c(10,12,12,13)
control<-c(9,11,11,12)
treat<-c(12,13,15,16)
wide.dat<-data.frame(unman,control,treat);wide.dat
```

```
## unman control treat
## 1 10 9 12
## 2 12 11 13
## 3 12 11 15
## 4 13 12 16
```

`long.dat<-gather(wide.dat,treatment,measure,unman:treat);long.dat`

```
## treatment measure
## 1 unman 10
## 2 unman 12
## 3 unman 12
## 4 unman 13
## 5 control 9
## 6 control 11
## 7 control 11
## 8 control 12
## 9 treat 12
## 10 treat 13
## 11 treat 15
## 12 treat 16
```

```
#global mean
grandmean<-round(mean(c(unman,control,treat)),2);grandmean
```

`## [1] 12.17`

## \(SS_{total}\) calculations

`sum((long.dat$measure-grandmean)^2)`

`## [1] 41.6668`

## \(SS_{among}\) calculations

```
## calculating 1 case
(mean(unman)-grandmean)^2
```

`## [1] 0.1764`

```
### making a whole function
#with ddply
ssa<-function(n=n,vec=c(1,3,3),grandmean=grandmean){
SSa<-(mean(vec)-grandmean)^2
SSa
}
ssa(vec=unman,n=n,grandmean=grandmean) # verify function
```

`## [1] 0.1764`

```
## executing function
ssam<-ddply(long.dat,.(treatment),summarize,ssamong=ssa(vec=measure,n=n,grandmean=grandmean));ssam
```

```
## treatment ssamong
## 1 control 2.0164
## 2 treat 3.3489
## 3 unman 0.1764
```

`SSAM<-n*sum(ssam$ssamong);SSAM`

`## [1] 22.1668`

`# for a balanced design!`

## \(SS_{within}\) calculations

```
## calculating 1 case
sum((unman-mean(unman))^2)
```

`## [1] 4.75`

```
### making a whole function
#with ddply
sswi<-function(x){
SSwithin<-sum((x-mean(x))^2)
SSwithin
}
sswi(unman) # verify function
```

`## [1] 4.75`

`SSwi<-ddply(long.dat,.(treatment),summarize,sswi=sswi(measure));SSwi`

```
## treatment sswi
## 1 control 4.75
## 2 treat 10.00
## 3 unman 4.75
```

`sumwithin<-sum(SSwi$sswi);sumwithin`

`## [1] 19.5`

# Assumptions of ANOVAs

- Samples are indepednent and identically distributed.
- Variances are homogeneous among groups
- variance within each group approx to variance within other groups

- variance within each group approx to variance within other groups
- Residuals are normally distributed

- Samples are classified correctly

- Maine effects are additive

# Hypothesis testing

Definitions:

- \(Y_{ij}\): replicated \(j\) associated with treatment level \(i\)

- \(\mu\) is the
*true*grand mean or average

- \(A_i\) is the additive linear component associated with level \(i\) of treatment \(A\).
- There is a different coefficient \(A_i\) associated with each treatment level (\(i\)).

- positive coefficients mean that the treatment level has a higher value than grand mean

Alternative Hypothesis, \(H_a\): \(Y_{ij} = \mu + A_i + \epsilon_{ij}\)

Null Hypothesis, \(H_o\): \(Y_{ij} = \mu + \epsilon_{ij}\)

# Verify with aov() function

`knitr::kable(round(summary(aov(measure~treatment,data=long.dat))[[1]],2))`

Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
---|---|---|---|---|---|

treatment | 2 | 22.17 | 11.08 | 5.12 | 0.03 |

Residuals | 9 | 19.50 | 2.17 | NA | NA |