## E N0ct289b

where j = 1 if an observation has the propofol treatment, and j = 2 if the observation does not have propofol treatment. The two unknown variance parameters are sl and s2. An explanatory variable that is used in the variance is called a variance covariate. With three drug treatment variables there is a lot of choice as we can also have combinations of them (e.g. j = 1 if an observation received propofol and ethanol treatment, and 2 else). All these variance structures can be compared with each other using the AIC. However, instead of applying every possible combination, it is better to try and understand why we have heterogeneity, and for this we need to plot the residuals of the GAM in Eq. 8.8 against every drug treatment, combination of drug treatment, Series and Larval stage. The explanatory variable that shows the strongest heterogeneity can be selected as variance covariate. In the previous section, we only showed the graph with residuals plotted versus Larval stage, as it showed the strongest violation of heterogeneity. This means that Larval stage is the prime candidate to be the first variance covariate. Therefore, we can use the variance structure in Eq. 8.9b, where j = 1, ..., 4. Hence, we have 4 variances, and each variance corresponds to a Larval stage. The observations from a particular stage all have the same residual spread, but the different stages are allowed to have different spread.

The first thing we have to do is investigating whether this is actually a significant improvement. This is also part of the model selection process. Unfortunately, for a (additive) linear mixed effects model, the model selection process is slightly more complicated and consists of the following steps.

1. Start with a linear regression model or additive model from which you think that it contains all the important explanatory variables in the fixed part. If this model shows homogeneity and independence, you are finished.

2. If the model shows heterogeneity or dependence, then refit the model using restricted maximum likelihood (REML).

3. Inspect the residuals of the model fitted, and add a correlation or heterogeneity structure (or random intercepts).

4. Compare the model from step 2 with that in step 3. This can be done with the AIC or likelihood ratio test. Both models must contain the same fixed structure).

5. Inspect the residuals of the new model and if the residuals are homogenous and independence, go to step 6, else go to step 3 and add another heterogeneity or dependence structure.

6. Now that we have the optimal random structure it is time to find the optimal fixed structure. This process is similar to a model selection in linear regression; drop terms in turn and apply a likelihood ratio test. To compare nested models via a likelihood ratio test, use maximum likelihood estimation.

7. Present the final model estimated with REML and give an interpretation.

The justification of this protocol can be found in Diggle et al. (2002), and applications can be found in West et al (2006,), and Zuur et al. (2007, 2009). It contains phrases like REML estimation and nested models, but an explanation of these terms is outside the scope of this chapter.

We believe that the model in Eq. 8.8 is a reasonable starting point for step 1 of the protocol. We then added the variance structure in Eq. 8.9b, where Larval stage was the variance covariate. The likelihood ratio test shows that adding the multiple variances gives a significant improvement (L = 176.16, df = 3, p < 0.001). In step 5 of the protocol, we need to assess whether adding a variance covariate results in a model that complies with the assumptions. For this, you need to use the standardised residuals, which are defined as observed minus fitted values, divided by the square root of the variance in Eq. 8.9b. We plotted the standardised residuals versus Larval stage (Fig. 8.7a), fitted values (Fig. 8.7b) and Series (Fig. 8.7c) and none of these graphs showed any clear violation of homogeneity or independence, hence we can proceed to step 6 of the protocol.

In the second part of the protocol, we need to find the optimal fixed structure for the selected random structure. This means that we need to assess the significance of the three-way drug interaction term and the smoother of Series. Using a likelihood ratio Fig. 8.7 (a) Standardised residuals versus fitted Larval stage. (b) Standardised residuals versus fitted values. C: Standardised residuals versus time. Note that there is no clear violation of homogeneity or independence in these graphs
 Estimate S.E t value Pr(>|t|) (Intercept) 13.87033 0.09789 141.697 <2e-16 Amino FZ1 with 0.52095 0.11529 4.519 7.3e06 Propofol with -0.29017 0.08106 -3.580 0.000368 Ethanol with 0.11874 0.11413 1.040 0.298516 Amino_FZ1&Ethano1 with -0.56055 0.16221 -3.456 0.000582 Approximate significance of smooth terms edf Est.rank F p-value s(Series) 8.56 9 2,737 <2e-16

test, the three-way interaction was not significant. Using the same procedure, two two-way drug interaction terms were dropped and the final model is as follows.

Length = a + p1 x Amino_FZ. + b2 x Propofol. + b3 x Ethanol.

+ b4 x Amino_FZ. x Ethanol. + f (Series.) + e. (8.10)

Hence, there are three main terms, and a two-way interaction between Amino_FZ and Ethanol, and a smoother representing the growth curve. So far, we have not presented any numeral output, as we first wanted to go over the underlying assumptions that need to be fulfilled. As this seems to be the best model, we now concentrate on main terms and interactions (fixed part of the model), and we provide information on p-values and fitted curves.

The estimated values for the regression parameters are given below. Note that the two interaction between Amino_FZ and Ethanol is highly significant (see Table 8.3).

When confronted for the first time with a GAM, one tends to be confused with the output and its interpretation. We therefore present again the GAM that we are fitting, but this time we substitute the estimated parameters.

Length = 13.9 + 0.5 x Amino_FZi - 0.3Propofoli + 0.11x Ethanoli

The coding in the drug variables in the statistical software is such that if a drug treatment is equal to "without" then the corresponding variable is 0, and 1 else. Therefore, an observation that was without any drug treatment (Amino_FZ = Propofol = Ethanol = "without", we get the following equation:

If an observation has for example only the Amino_FZ treatment, then we have to add 0.52 to the intercept. Hence, the estimated regression parameters in Table 8.3 show the difference between "without" and "with" for each drug treatment and the interaction. The last thing we need to discuss is the role of the smoother, which is given in Fig. 8.8. Suppose that we have an observation that did not receive any drug Fig. 8.8 Estimated smoother for the GAM in Eq. 8.11a. The solid line is the smoother and the dotted lines are the 95% point-wise confidence bands. The horizontal axis shows the values of Series, and the vertical axis is the contribution from the smoother to the fitted values. The smoother has 8.56 degrees of freedom

### Series

Fig. 8.8 Estimated smoother for the GAM in Eq. 8.11a. The solid line is the smoother and the dotted lines are the 95% point-wise confidence bands. The horizontal axis shows the values of Series, and the vertical axis is the contribution from the smoother to the fitted values. The smoother has 8.56 degrees of freedom Fig. 8.9 Fitted values and 95% confidence intervals for the mean. Bootstrapping was used to obtain 95% confidence intervals. The solid lines are the predicted values and the dotted lines the 95% confidence intervals for the mean. The notation APE stand for the drug treatments Amino_ FZ, Propofol and Ethanol and 0 for "without" and 1 for "with"

Fig. 8.9 Fitted values and 95% confidence intervals for the mean. Bootstrapping was used to obtain 95% confidence intervals. The solid lines are the predicted values and the dotted lines the 95% confidence intervals for the mean. The notation APE stand for the drug treatments Amino_ FZ, Propofol and Ethanol and 0 for "without" and 1 for "with"

treatment, hence its underlying Equation is given in Eq. 8.11b. In order to get the fitted values for all such observations, we need to add the value of 13.9 to the smoother in Fig. 8.8. This means that the smoother is shifted vertically with a value of 13.9. Similar vertical shifts can be calculated for other drug treatment combinations, and the resulting eight curves are given in Fig. 8.9. Note that the confidence intervals

5 10 15

5 10 15

Series

### 5 10 15

Fig. 8.10 Fitted values and 95% confidence intervals for the population. Bootstrapping was used to obtain 95% confidence intervals are for the mean relationship. This means that if you repeat this experiment a large number of times, in 95% of the cases the real mean relationship is within the confidence bands. It is also possible to add confidence intervals for the population, and these show the range of possible length values at a certain time and drug treatment combination (Fig. 8.10).

For ordinary linear regression models and GAM, it is trivial to get the confidence intervals for the population, see for example Montgomery and Peck (1992) or Draper and Smith (1998). Suppose that the variance of the fitted values is given by ofl? and for the residuals o2. A confidence interval for the mean around a linear regression line or smoother is based on of. For the population confidence intervals, you need to add these two variances, and the square root of this can be used to obtain confidence intervals for the population values. Hence, the confidence intervals for the population are always larger. A problem arises because we use multiple variance a2 for the residuals, where j stands for Larval stage. Therefore, we cannot use standard statistical equations to obtain a confidence interval for the population values around the smoother. There are multiple ways of still getting a confidence interval for the population, and whichever approach we used, the results were similar. The easiest approach is bootstrapping.

0 0