Nordic Informatics Network in the Agricultural Sciences

Bootstrap Methods and Their Applications

Umeå Campus, SLU, Sweden, June 16-27, 2008

Background

Bootstrap methods introduced by Bradley Efron (1979) belong to the class of computer-intensive methods useful for statistical analysis. They are based on simulation techniques and can be used to calculate standard errors, confidence intervals and significance tests. The methods are today not only used by statisticians, but also applied by researchers in the life sciences, medical sciences, social sciences, business, econometrics, and other areas where by tradition statistics is used. The methods apply for any level of modelling, and can be used for  parametric, semi-parametric, and nonparametric analysis. This course will present a broad and up-to-date coverage of bootstrap methods with many applied examples.  

The explicit recognition and understanding of uncertainty is essential to statisticians and to the statistical sciences in general. One works with probability models, likelihoods, estimation procedures, tests and often one need variances as a measure of uncertainty or natural variation. Sometimes it is possible to do explicit calculations, or perform approximations but an alternative which Efron pointed out was to use simulations. Moreover, simulations can be used in much more complicated problems than approximate or explicit methods can handle.

The key idea in Bootstrap is to resample from the original data, either directly or via a fitted model. In this way one creates replicated data sets from which uncertainty for those objects of interest can be obtained. The use of Bootstrap methods gives researchers a good chance not to use over-simplified models.

Goals

To resample from original data in order to estimate measures of uncertainty is an important issue in data analysis. The course should present a balanced account of both theoretical statistical ideas and the potential of bootstrap methods in applications when analytic soultions are difficult or impossible to obtain. The course should present a number of techniques how the methods can be used and evaluate their performance.

This course

The course starts with a discussion of properties of Bootstrap/resampling methods which includes methods for single samples in parametric and nonparametric models. In particular it is focused on practical issues such as the numbers of replicate data sets needed. The delta method for variance approximations based on different forms of the so called jackknife method will be presented.

It will be discussed how the basic ideas of Bootstrap sampling can be extended to several samples. Semiparametric and smooth models, simple cases where data have hierachical structure or are sampled from finite populations and missing/censored data will be discussed.

The course will consider basic principles of significance testing, in particular Monte Carlo tests and tests using parametric Bootstrap. The problem of constructing confidence intervals has a long history in the Bootstrap literature. It will be described how simple intervals based on simulations can be used but also more complex methods such as studentized Bootstrap, percentile methods and the double Bootstrap.

The course will also focus on regression models: linear, non-linear semi- and nonparametric regression which for example are of wide use in the agricultural sciences. Special attention will also be on survival analysis and generalized linear models.

Some time will be devoted to how variance reduction techniques such as balanced control variates and importance sampling can be adapted to improve simulations with the aim to reduce the number of simulations.

A lot of practical training will be carried out during the course. In particular will several data sets be analysed with the help of the S-Plus language or R

Learning outcome

The participants should be familiar with programs for applying Bootstrap techniques on estimation and testing problems. In particular partcipants should know how to construct Bootstrap confidence intervals. Moreover, participants should know how to use the R-program for Bootstrap calculations.

Required knowledge

Familiarity with computers at user level, with basic probability calculus and with basic statistics.

Target audience

PhD students and other researchers, primarily within the agricultural and biological sciences, who consider to analyse data from their area of study when simple linear and multilinear models can not be used.

Key Words

Throughout the course, the theory will be supplemented with exercises and computer assignments. At the end of the course, the students will work on a two-day project that involves both modelling and computing aspects.

Scientifically responsible

Professor, Anthony Davison, Institute of Mathematics, Ecole Polytechnique Federal de Lausanne, IMA-FSB-EPFL, Station 8, CH-1015 Lausanne, Switzerland. Professor David Hinkley, Statistics and Applied Probability, University of California, Santa Barbara, California 93106-3110, USA., Jun Yu, Forest Economics, Swedish University of Agricultural Sciences, Sweden, Dietrich von Rosen, Department of Biometry and Engineering, Swedish University of Agricultural Sciences, Sweden.

Organisational responsible

Jun Yu, Forest Economics, Swedish University of Agricultural Sciences, Sweden, Dietrich von Rosen, Department of Biometry and Engineering, Swedish University of Agricultural Sciences, Sweden.

External Lecturers

Refer to the appendices for presentation of the teachers.

Teaching methods

Lectures alternating with intensive use of computer exercises. The availablility of network connected computers is therefore essential for the benefit of the students. A small project is carried out by the students at the end of the course (individually or preferably in small groups).

Examination

Examination will be based on a written project report handed in at the end of the course in combination with an oral presentation. The number of credits proposed is 6 ECTS.

Dina logoAuthor: phd@dina.kvl.dk. Updated: 10 april 2007