To give a broad introduction to a modern use of computer intensive methods in applied statistics. To give the participants a thorough understanding of the principles behind the two most important such methodologies, the bootstrap and Markov chain Monte Carlo (MCMC) simulation. To give the participants a workingknowledge of and practical experience with these methods, using specialised software or modules in general-purpose programming languages.
The recent massive increases in computer power have led to developments of new statistical methods, so-called computer intensive methods, that rely on heavy computation. Many of these methods are based on (stochastic) simulation, that is generating articificial observation from a particular statistical model. Two main trends may be seen in these developments.
The first one focuses on relaxation of model assumptions in classical statistics, e.g. normal distribution models, as these assumptions are difficult to check andfrequently severely violated in practice. This field is known as robust statistics, robust in the sense that the results are less sensitive to specific model assumptions. An (extreme) example is nonparametric statistics where model assumptions about a particular distributional form of the data are avoided. In general, nonparametric tests rely on simulation or tables constructed by simulation. A particularly succesful robust method for point and interval estimation is bootstrapping, which in a sense is based entirely on the data. The basic idea is to draw inference from simulated samples from the actual distribution of the data.
The second trend in the development of computer intensive statistical methods is that analysis is becoming feasible in statistical models of high complexity that precluded analysis only few years ago. Examples include image analysis, spatial statistics, huge data sets with hierarchical structures and many more. The Markov chain Monte Carlo (MCMC) approach is based on the somewhat astonishing fact that in models, where exact compuation is infeasible due to high-dimensional and intractable integrals, artificial observations can be obtained by running (simulating) suitably constructed Markov chains until convergence to the desired model (target distribution). The complexity of the Markov chains may be many-fold less than the target distributions; however, difficult statistical problems arise when deciding whether a chain has converged to the target distribution.
Computer intensive statistical methods have increasingly been found useful in biological and agricultural applications. In particular, the analysis of biologically motivated spatial models has become possible by MCMC methods. The robust methods, including bootstrapping, have by now become a standard statistical tool which is equally useful in biological contexts as in other areas. However, as the methods are relatively new, and as most applied statistics computer packages focus on classical statistical methods, the dissemination of computer intensive methods into the applied work is slow.
A few references to groups within biology/agriculture where computer intensive methods are being studied for research and applications
A textbook or lecture notes on bootstrap and MCMC, notes on the remaining course topics.
A basic understanding of statistical theory (theory of estimation, tests and confidence intervals) and a good working knowledge of practical statistical modelling and analysis, including regression models - as a general rule this would correspond to at least two statistics courses.
No preliminaries are assumed wrt. particular programming languages or packages, only a general acquaintance with statistical analysis on computers.
Author: phd@dina.kvl.dk. Updated: 23 September 1998