Bootstrapping

To obtain 95% confidence intervals in Fig. 8.9 and 8.10, bootstrapping was used, following Keele (2008). In a bootstrapping procedure, we create a large number (say 1,000) of similar data sets, and on each of these data sets we fit the GAMM.

The 1,000 estimated smoothers can be used to obtain confidence intervals. We implemented the following algorithm.

1. Fit the GAMM and store the fitted values and residuals. Denote these by F. and E. respectively.

2. For j is 1 to 1,000, carry out the following steps.

a. Permute the residuals, and call these Eb, where i is the observation index. Because the variance structure in Eq. 8.9b is used, we cannot close our eyes and randomly permute the residuals. Instead, we need to permute the residuals that have the same s. Formulated differently, we permute the residuals from the same larvae stage.

b. Add the residuals E.b to the fitted values F. and apply the GAM in Eq. 8.10 on the bootstrapped data Yb = F. + Eb.

c. Predict the fitted values for each drug combination and store these in a matrix B .

3. Once the loop for the bootstrap has finished, sort the 1,000 bootstrapped values for each observation i, and take the median, 25th and 975th values. The latter two form the lower and upper bands for the 95% quantile confidence interval.

The method above gives the bootstrapped 95% quantile confidence interval for the mean of a GAM (Keele 2008). To obtain the 95% quantile confidence interval for the population, there are various options. In the first option that we tried, we added a random value e from a Normal distribution with mean 0 and variance s2

<■ j to the predicted values in step 2c. The only problem is that we don't know the exact Larval stage at a certain time. We also sampled larvae stages from a multinomial logistic regression model in which drug treatment and Series were used as explanatory variables. This was done in each bootstrap iteration. As an alternative larvae stages can be drawn from a distribution based on the observed frequencies in the measured data. Another option is to store the 1,000 predicted standard errors for the population as well, and use the median standard error in Fig. 8.10.

This all sounds overly complicated, but all that it does is to generate 1,000 similar data sets for each of the 8 drug treatments, based on our (hopefully valid) model. The complicating factor is that for the confidence interval for the population, we need to know the Larval stage at a certain time. Because there is considerable variation in the distribution of the larvae stages along time, it takes a bit bootstrapping effort to create 1,000 realistic data sets. The different bootstrap approaches give very similar results.

0 0

Post a comment