Blue-Green Algae

Bootstrapping the Blue-Green Algae.

By

Frede Aakmann Tøgersen

Kim Emil Andersen

Danish Institute of Agricultural Science

Department of Mathematical Sciences

Department of Agricultural Systems

Aalborg University

 

 

 

Many Danish lakes encounter problems during the summer with growth of the poisonous Blue-Green Algae. The Blue-Green algal density of the lakes are caused by an increased adicity (the pH level) of the lakes. During the summer, biologists measure the water's hydrogenion concentration, i.e. the pH levels, and report the risk for Blue-Green Algal poissoning by signs erected at the lake indicating the dangers.

Regional Council officials are urging potential bathers to take the warnings seriously, as this particular strain of algae is extremely poisonous and could result in death. The concern in particular is aimed at young children and dogs who tend to swallow water while bathing.

The biologists have suggested two different treatments, A and B say, with four and two levels, respectively. The levels within treatment A are:

    1. No shaking and no CO2 aeration,
    2. CO2 bubbled through the culture,
    3. Continuous shaking, but no CO2 aeration, and
    4. Continuous shaking and CO2 bubbled trough the culture.

Treatment B indicates the addition of two specific additives, that is, the levels within treatment B are

    1. Additive X added and
    2. Additive Y added.

Hereby the biologists have gathered eight samples. The samples have been monitored daily for two weeks and the algal density was recorded. The growth curves are shown in Figure 1.

 

 

 

 

 

Figure 1: Algal density plotted against time.

 

A naive bootstrapping approach

The data illustrated in Figure 1 are evidently growth data. When dealing with growth data non-linear modelling is the path to follow, and we consider the non-linear model named the Mitscherlich equation

The Mitscherlich equation

where growth are expected to be exponential, but the exponential growth parameter is modified as growth progresses. This explains the parameters b and t. There is an upper bound to growth, which is symbolized by a. The parameter d denotes the displacement. The usual error term is eijk, which is assumed identically and identically normally distributed for any i, j, and k. Here

The eight different models were fitted in S-Plus using the non-linear least squares function nls, see Table 1 below for results. For all eight sets of parameter estimates we find a relatively low standard error.

Model

1

4.271

0.1917

0.1254

0.01203

-0.6089

0.1316

2

7.378

0.5817

0.1020

0.01546

-0.4255

0.2019

3

4.936

0.1298

0.1396

0.00891

-0.2190

0.1029

4

6.047

0.1450

0.2092

0.01582

-0.5713

0.1159

5

8.021

1.1872

0.0554

0.01234

0.2098

0.2459

6

6.081

0.3067

0.1406

0.01553

-0.9838

0.1391

7

4.657

0.1372

0.1657

0.01318

-0.3943

0.1265

8

4.675

0.0855

0.2116

0.01273

-0.4182

0.0975

 

Table 1.:

 

Achieved parameter estimates for the Mitscherlich equation. The parameter estimates are obtained by non-linear least squares techniques.

To examine the effects of the different treatments, we propose a bootstrapping method, which permutes the residuals obtained when fitting the models. Hence the null hypothesis becomes

H0:

No difference in treatments

 

No day effect

The assumption of no day effect implies that observations are independent, which is certainly not the case (cf. Figure 1).  With this in mind we carry on with the following bootstrap algorithm:

1.

Fit data by nonlinear regression model via least squares,

 

2.

Estimate residuals,

 

3.

Permute residuals,

 

4.

Add them to fitted values,

 

5.

Goto 3.

 

Hereby we permute the obtained residuals within each treatment allowing us to create the 95% bootstrap confidence intervals. The results are shown in Figure 2.

 

Figure 2.:

Achieved 95% bootstrap confidence intervals. The four possible treatments in A are shown horizontically, whereas the treatments in B are shown vertically. It is seen that only some of the treatments are significantly different. The best combination seems to be no shaking, no CO2, and additive X.

 

 

A bootstrapping approach taking serial dependence into account

The serial dependence present in the time series have been ignored so far. Modelling the depencies which are obviously present in Figure 1 are done by an autoregressive process or order 1, i.e. an AR(1) process. This model can be fitted by using the mixed random effects model implemented in S-Plus. Hence the model becomes

where both a and b are assumed random. The displacement parameter d is assumed fixed. Furthermore, the pertubation vector ejk = (e1,jk,…, e14,jk)T is assumed to be 14-dimensionally normally distributed with mean 0 and variance S given by

Hence the assumptions can be stated as in the following null-hypothesis

H0:

The random effects are normally distributed, i.e.

 

ajk ~ N(a,sa2)

 

bjk ~ N(b,sb2)

 

d fixed

 

ejk ~ N14(0, S)

We now present a bootstrap method to perform a parametric test on the above hypothesis. The method is based on the nlme-function in S-Plus, which fits any non-linear mixed effects model and returns the estimated parameters. Moreover, it is also possible to obtain the fitted values, the residual values for any given model, and what is more important the correlation structure of the residuals. For each curve we generate a set of residuals with the same correlation structure as the ones estimated with the nlme-function. These sets of residuals are added to the corresponding curves whereby we obtain new sets of observed values representing growth curves, which we again fit with the non-linear Mitscherlich equation in S-Plus. The obtained estimated parameters are recorded and we again simulate new residual values and so forth. In this way bootstrapped confidence intervals are obtained by ordering the recorded parameters and extracting the percentiles corresponding to a prescribed confidence level. The algorithm is sketched below:

1.

Fit data by NLME model

2.

Compute fitted values and estimate residuals

3.

Simulate residuals within treatment levels

4.

Obtain new obervations

5.

Fit new models, and record parameter estimates

6.

Goto 3

 

 The obtained estimates and bootstrap confidence intervals based on 200 simulations are shown in Table 2 below. Also the standard confidence intervals based on an approximatively normal theory from the first fit with the nlme-function is shown.

Parameter (model)

 

Bootstrap

 

Standard

shake

CO2

add-on

Lower

Mean

Upper

Lower

Estimate

Upper

a

+

+

X

 

3.4486

4.5802

6.3260

 

3.0651

4.4878 

 5.9104

a

+

-

Y

 

4.4681

5.6762

7.3829

 

4.2162

 5.6389

7.0615 

a

+

+

X

 

5.8842

6.9977

8.8092

 

5.6077

7.0304 

8.4530 

a

+

-

Y

 

5.4787

6.4914

7.7146

 

5.0199

6.4425

7.8651  

a

-

+

X

 

3.8002

4.7534

5.8796

 

3.3057

4.7283 

6.1509 

a

-

-

Y

 

4.1587

4.8526

5.9222

 

3.3462

4.7689 

6.1915 

a

-

+

X

 

5.4599

6.2647

7.3367

 

4.8532

6.2758

7.6984  

a

-

-

Y

 

4.0088

4.7348

5.4861

 

3.3055

4.7281 

6.1507 

 

 

 

 

 

 

 

 

 

b

+

+

X

 

-2.6168

-2.1733

-1.7867

 

-2.2875

-2.1633 

 -2.0390

b

+

-

Y

 

-2.5495

-2.2481

-1.9279

 

-2.3880

-2.2638 

-2.1395 

b

+

+

X

 

-2.4590

-2.1928

-1.9711

 

-2.3359

-2.2117 

-2.0875 

b

+

-

Y

 

-2.4387

-2.1572

-1.9045

 

-2.2847

-2.1604

-2.0362 

b

-

+

X

 

-2.0708

-1.8390

-1.6295

 

-1.9699

-1.8456 

-1.7214 

b

-

-

Y

 

-2.1154

-1.8635

-1.6554

 

-1.9734

-1.8491 

-1.7249 

b

-

+

X

 

-1.9172

-1.7203

-1.5373

 

-1.8521

-1.7279 

-1.6036 

b

-

-

Y

 

-1.7991

-1.5944

-1.3645

 

-1.7170

-1.5928 

-1.4686 

 

 

 

 

 

 

 

 

   

 

 

d

(same for all models)

 

-0.4730

-0.3330

-0.1648

 

-0.4718

-0.3251 

-0.1784 

 

Table 2

95% bootstrap and standard confidence intervals for model parameters.

 

We observe that more of standard intervals are narrower than the bootstrap intervals. Since estimation of non-linear mixed effects models is highly computer intensive we only managed to bootstrap a sample of 200 sets of parameters. This number is far from enough to ensure bootstrap confidence intervals comparable with standard confidence intervals.