Blue-Green Algae
Bootstrapping the Blue-Green Algae.
By
|
|
|
||
|
Many Danish lakes encounter problems during the summer with growth of the poisonous Blue-Green Algae. The Blue-Green algal density of the lakes are caused by an increased adicity (the pH level) of the lakes. During the summer, biologists measure the water's hydrogenion concentration, i.e. the pH levels, and report the risk for Blue-Green Algal poissoning by signs erected at the lake indicating the dangers. Regional Council officials are urging potential bathers to take the warnings seriously, as this particular strain of algae is extremely poisonous and could result in death. The concern in particular is aimed at young children and dogs who tend to swallow water while bathing. |
|
The biologists have suggested two different treatments, A and B say, with four and two levels, respectively. The levels within treatment A are:
Treatment B indicates the addition of two specific additives, that is, the levels within treatment B are
Hereby the biologists have gathered eight samples. The samples have been monitored daily for two weeks and the algal density was recorded. The growth curves are shown in Figure 1.
|
|
|
|
|
|
|
|
||
|
|
Figure 1: Algal density plotted against time. |
||
A naive bootstrapping approach
The data illustrated in Figure 1 are evidently growth data. When dealing with growth data non-linear modelling is the path to follow, and we consider the non-linear model named the Mitscherlich equation

where growth are expected to be exponential, but the exponential growth parameter is modified as growth progresses. This explains the parameters
b and t. There is an upper bound to growth, which is symbolized by a. The parameter d denotes the displacement. The usual error term is eijk, which is assumed identically and identically normally distributed for any i, j, and k. HereThe eight different models were fitted in S-Plus using the non-linear least squares function nls, see Table 1 below for results. For all eight sets of parameter estimates we find a relatively low standard error.
|
Model |
|
|
|
|
|
|
|
|
1 |
4.271 |
0.1917 |
0.1254 |
0.01203 |
-0.6089 |
0.1316 |
|
|
2 |
7.378 |
0.5817 |
0.1020 |
0.01546 |
-0.4255 |
0.2019 |
|
|
3 |
4.936 |
0.1298 |
0.1396 |
0.00891 |
-0.2190 |
0.1029 |
|
|
4 |
6.047 |
0.1450 |
0.2092 |
0.01582 |
-0.5713 |
0.1159 |
|
|
5 |
8.021 |
1.1872 |
0.0554 |
0.01234 |
0.2098 |
0.2459 |
|
|
6 |
6.081 |
0.3067 |
0.1406 |
0.01553 |
-0.9838 |
0.1391 |
|
|
7 |
4.657 |
0.1372 |
0.1657 |
0.01318 |
-0.3943 |
0.1265 |
|
|
8 |
4.675 |
0.0855 |
0.2116 |
0.01273 |
-0.4182 |
0.0975 |
|
|
Table 1.: |
Achieved parameter estimates for the Mitscherlich equation. The parameter estimates are obtained by non-linear least squares techniques. |
||||||
To examine the effects of the different treatments, we propose a bootstrapping method, which permutes the residuals obtained when fitting the models. Hence the null hypothesis becomes
|
H0: |
No difference in treatments |
|
|
No day effect |
The assumption of no day effect implies that observations are independent, which is certainly not the case (cf. Figure 1). With this in mind we carry on with the following bootstrap algorithm:
|
1. |
Fit data by nonlinear regression model via least squares, |
|
|
2. |
Estimate residuals, |
|
|
3. |
Permute residuals, |
|
|
4. |
Add them to fitted values, |
|
|
5. |
Goto 3. |
|
Hereby we permute the obtained residuals within each treatment allowing us to create the 95% bootstrap confidence intervals. The results are shown in Figure 2.
|
|
||
|
|
Figure 2.: |
Achieved 95% bootstrap confidence intervals. The four possible treatments in A are shown horizontically, whereas the treatments in B are shown vertically. It is seen that only some of the treatments are significantly different. The best combination seems to be no shaking, no CO2, and additive X. |
A bootstrapping approach taking serial dependence into account
The serial dependence present in the time series have been ignored so far. Modelling the depencies which are obviously present in Figure 1 are done by an autoregressive process or order 1, i.e. an AR(1) process. This model can be fitted by using the mixed random effects model implemented in S-Plus. Hence the model becomes

where both
a and b are assumed random. The displacement parameter d is assumed fixed. Furthermore, the pertubation vector ejk = (e1,jk,…, e14,jk)T is assumed to be 14-dimensionally normally distributed with mean 0 and variance S given by
Hence the assumptions can be stated as in the following null-hypothesis
|
H0: |
The random effects are normally distributed, i.e. |
|
|
a jk ~ N(a,sa2) |
|
|
b jk ~ N(b,sb2) |
|
|
d fixed |
|
|
e jk ~ N14(0, S) |
We now present a bootstrap method to perform a parametric test on the above hypothesis. The method is based on the nlme-function in S-Plus, which fits any non-linear mixed effects model and returns the estimated parameters. Moreover, it is also possible to obtain the fitted values, the residual values for any given model, and what is more important the correlation structure of the residuals. For each curve we generate a set of residuals with the same correlation structure as the ones estimated with the nlme-function. These sets of residuals are added to the corresponding curves whereby we obtain new sets of observed values representing growth curves, which we again fit with the non-linear Mitscherlich equation in S-Plus. The obtained estimated parameters are recorded and we again simulate new residual values and so forth. In this way bootstrapped confidence intervals are obtained by ordering the recorded parameters and extracting the percentiles corresponding to a prescribed confidence level. The algorithm is sketched below:
|
1. |
Fit data by NLME model |
|
2. |
Compute fitted values and estimate residuals |
|
3. |
Simulate residuals within treatment levels |
|
4. |
Obtain new obervations |
|
5. |
Fit new models, and record parameter estimates |
|
6. |
Goto 3 |
The obtained estimates and bootstrap confidence intervals based on 200 simulations are shown in Table 2 below. Also the standard confidence intervals based on an approximatively normal theory from the first fit with the nlme-function is shown.
|
Parameter (model) |
Bootstrap |
Standard |
|||||||||||
|
shake |
CO2 |
add-on |
Lower |
Mean |
Upper |
Lower |
Estimate |
Upper |
|||||
|
a |
+ |
+ |
X |
3.4486 |
4.5802 |
6.3260 |
3.0651 |
4.4878 |
5.9104 |
||||
|
a |
+ |
- |
Y |
4.4681 |
5.6762 |
7.3829 |
4.2162 |
5.6389 |
7.0615 |
||||
|
a |
+ |
+ |
X |
5.8842 |
6.9977 |
8.8092 |
5.6077 |
7.0304 |
8.4530 |
||||
|
a |
+ |
- |
Y |
5.4787 |
6.4914 |
7.7146 |
5.0199 |
6.4425 |
7.8651 |
||||
|
a |
- |
+ |
X |
3.8002 |
4.7534 |
5.8796 |
3.3057 |
4.7283 |
6.1509 |
||||
|
a |
- |
- |
Y |
4.1587 |
4.8526 |
5.9222 |
3.3462 |
4.7689 |
6.1915 |
||||
|
a |
- |
+ |
X |
5.4599 |
6.2647 |
7.3367 |
4.8532 |
6.2758 |
7.6984 |
||||
|
a |
- |
- |
Y |
4.0088 |
4.7348 |
5.4861 |
3.3055 |
4.7281 |
6.1507 |
||||
|
|
|
|
|
|
|
|
|
|
|||||
|
b |
+ |
+ |
X |
-2.6168 |
-2.1733 |
-1.7867 |
-2.2875 |
-2.1633 |
-2.0390 |
||||
|
b |
+ |
- |
Y |
-2.5495 |
-2.2481 |
-1.9279 |
-2.3880 |
-2.2638 |
-2.1395 |
||||
|
b |
+ |
+ |
X |
-2.4590 |
-2.1928 |
-1.9711 |
-2.3359 |
-2.2117 |
-2.0875 |
||||
|
b |
+ |
- |
Y |
-2.4387 |
-2.1572 |
-1.9045 |
-2.2847 |
-2.1604 |
-2.0362 |
||||
|
b |
- |
+ |
X |
-2.0708 |
-1.8390 |
-1.6295 |
-1.9699 |
-1.8456 |
-1.7214 |
||||
|
b |
- |
- |
Y |
-2.1154 |
-1.8635 |
-1.6554 |
-1.9734 |
-1.8491 |
-1.7249 |
||||
|
b |
- |
+ |
X |
-1.9172 |
-1.7203 |
-1.5373 |
-1.8521 |
-1.7279 |
-1.6036 |
||||
|
b |
- |
- |
Y |
-1.7991 |
-1.5944 |
-1.3645 |
-1.7170 |
-1.5928 |
-1.4686 |
||||
|
|
|
|
|
|
|
|
|
|
|||||
|
d |
(same for all models) |
-0.4730 |
-0.3330 |
-0.1648 |
-0.4718 |
-0.3251 |
-0.1784 |
||||||
|
Table 2 |
95% bootstrap and standard confidence intervals for model parameters. |
||||||||||||
We observe that more of standard intervals are narrower than the bootstrap intervals. Since estimation of non-linear mixed effects models is highly computer intensive we only managed to bootstrap a sample of 200 sets of parameters. This number is far from enough to ensure bootstrap confidence intervals comparable with standard confidence intervals.