| St@tmaster > ST113 |
Module 11
|
Examples | Exercises | SAS | R | About |
|
Prepared by The Statistics Group, KVL
- Last modified: Feb 27, 2004
Printer friendly version : [PDF] [PS] Module 11: Repeated measures I, simple methods11.1 Notes11.1.1 Main example: Activity of rats 11.1.2 Separate analysis for each time-point 11.1.3 Analysis of summary statistic 11.1.4 Random effects approach 11.1.5 Pros and cons of simple approaches
![]()
11.1 Notes This module describe various simple approaches for analyzing ``repeated measurements'', and show how these analysis can be carried out in SAS. Data referred to as ``Repeated measurements'' (or sometimes as ``longitudinal data'') can be characterized by having several measurements on the same individuals or experimental units. These measurements are typically taken at different times, or at different positions within the individuals. Consider for instance the following experimental design, to compare two drugs (A and B) to reduce blood pressure:
This module describes some fairly simple (and maybe crude) methods for analyzing these data types. These methods include:
The simplest approach to analyse repeated measurements would be to include time as a factor, and ignore the dependence between two observations on the same individual. Such an approach may lead to completely wrong conclusions. The essence of the problem is that this is the same as pretending to have more observations than are actually available. Two correlated observations contain less information than two independent observations, because one is partly explained by the other. This approach is unacceptable.
![]()
11.1.1 Main example: Activity of rats To investigate the effect of a certain type of exposure on the activity of rats, the following experiment was carried out. The experimental unit was a cage with two rats. During the entire experimental period the rats were daily exposed to the matter under investigation, in the concentration of 1, 2 or 3 units (treatment 1, 2 and 3, respectively). Once per month during 10 months the activity of the rats was measured by placing the rats from one cage in a chamber in which each intersection of a light beam was counted. The total count through a period of 57 hours was used as the result for that cage. Notice that in this setting the ``individual'' variable is cage. Summary of experiment:
![]()
11.1.2 Separate analysis for each time-point One way to avoid the problem of correlated measurements is to do a separate analysis for each point in time. This way only one observation from each individual is used, and hence they are independent. This way of analyzing repeated measurements is not wrong, but it is very inefficient, as all the remaining observations are waisted. This approach avoids the problem, instead of dealing with it. Separate analysis can be carried out for all the observed time-points, but it will likely be very difficult to reach a coherent conclusion from all these sub-tests. These sub-tests will be correlated, and because the correlation structure is not part of the model, it is not possible to tell how strong this correlation is. Separate analysis can be carried out for selected time-points ``far apart''. This will (hopefully) cause the separate sub-tests to be uncorrelated, or at least less correlated. Even with uncorrelated tests it will be difficult to reach a coherent conclusion, because of a problem known as mass significance (or multiplicity) . For instance, if 20 tests are carried out at a 5% significance level, one of them is likely to be false significant. This problem is partly solved by using the Bonferroni correction for performing n tests (one for each time-point). The Bonferroni correction simply states that the P-value 0.05/n should be used instead of the usual 0.05. When selecting time-points far apart, is important that the selection must be done independently of the actual observations. Naturally the time-points may not be selected systematically where there is large (or small) difference between treatments. Ideally the time points should be selected before data is collected.
Example: Activity of rats analyzed separately for each monthFor the analysis in this section it is assumed that the data is read into a SAS data set named rats with the columns treatm (=1,2,3), cage (=cage number), month (=1¼10), and lnc (=log(counts)). The data set has 300 lines. Here is the fist few lines: Obs treatm cage month lnc 1 1 1 1 9.9323 2 1 1 2 9.6447 3 1 1 3 9.7628 4 1 1 4 9.6014 5 1 1 5 9.3227 6 1 1 6 9.2463 . . . . . . . . . . . . . . .
To analyze the rats data set separately for each month,
a simple one way analysis of variance model with treatment
treatm as the only factor is used. The information about cage
can not be included, as we only have one observation from
each cage in each monthly analysis. The model for each month is:
proc sort data=rats; by month; run; proc mixed data=rats method=REML; class treatm month; model lnc = treatm; by month; run;The fist line ensures that the data set is sorted in ascending order by month. This is required by proc mixed in order to use the by statement. The second line calls proc mixed with the rats data set, and requests that the restricted/residual maximum likelihood method is used to estimate the parameters. The third line specifies the factors, and the fourth specifies the model. The fifth line states that this should be fitted for each month separately. This line is effectively cuts the data set into ten data sets (one for each observed month) and runs the model on each data set independent of each other. The SAS output is summarized in the following table of F-tests for no treatment effect:
A few significant values are found, and even one where the Bonferroni correction is used, so the conclusion should be that weak evidence of group difference have been seen.
It is possible to make a correct analysis time-by-time, but it is weak and often confusing, because it does not combine all information into one test.
![]()
11.1.3 Analysis of summary statistic Another way to avoid the problem of correlated measurements is to choose a single measure to summarize the individual curves, and then base the analysis on this measure. This again reduces the data set to independent ``observations'' - one for each individual. To analyze the summary data set, standard methods for independent observations for instance analysis of variance can be used. The key is to choose a good summary measure. One possibility is to choose the value at a given time-point, which reduces this summary method to the separate time-point analysis described in the previous section. This choice is poor in most cases, because all other measurements are waisted. It is difficult to give general advice about the choice of summery measurement. Ideally, the summary measure should capture the most important feature of the curve. In some situations the most important feature is the net growth (last minus first), the average growth (slope), or time to reach the maximum point. It depends on the problem at hand. Some common choices of summary measures are:
With the right choice of summary measure this type of analysis can be very useful, at least as a first step. These models have relatively few assumptions, and they can be checked via standard residual methods. Of course the downside of this method is that information may be lost by reducing each curve to one single measure.
Example: Activity of rats analyzed via summary measureThe choice of summary measure for the rats data set is partly inspired by figure 11.1. It seems that the average slope is similar for the three treatments, but that the curves from dose=3 tends to be a slightly higher than the rest of the curves. To see if this is a significant difference the logarithm of the total count during all ten months lnTot=log(Total count) is used as summary measure. To calculate this summary measure from the previously described data set, the variable containing the log counts from each month lnc must be transformed back to the original counts, then the sum must be calculated, and finally the logarithm must be applied to the sum. These operations can be done in SAS by writing: data ratsTot; set rats; count=exp(lnc); run; proc sort data=ratsTot; by cage treatm; run; proc means data=ratsTot sum noprint; var count; by cage treatm; output out=TenMonthTot sum=count10; run; data TenMonthTot; set TenMonthTot; lnTot=log(count10); run; The new data set is called TenMonthTot and the variable containing the logarithm of the total counts is called lnTot.
This summary data set consists of independent measurements, as the
each cage is only used to generate one summary observation.
Because it is now independent observations, it can be analyzed with
a simple one way ANOVA model:
proc mixed data=TenMonthTot; class treatm; model lnTot = treatm; run;Notice that the new data set TenMonthTot, and the response variable lnTot is used. The P-value for no treatment effect in this summary model is 5.23%. This is above the standard 5% significant level, but only slightly. In this analysis the entire curve has been summarized into a single measure, so a lot of information has been lost. A P-value this low for the crude summary analysis could indicate that a significant treatment effect might be found with a more sophisticated analysis.
![]()
11.1.4 Random effects approach The two approaches described above both illustrated ways to reduce the data set to independent measures. This section explains the first step in modeling the actual covariance structure in the data set. As seen in previous modules, for instance the module about hierarchial random effects, the effect of adding a random effect is that two observations from the same level will possibly be positively correlated. Adding the ``individual'' factor to the model as a random effect will allow two observations from the same individual to be positively correlated.
Example: Activity of rats analyzed via random effects model
It is reasonable to assume that two observations from the
same cage could be correlated, so the model with cage as
random effect is used. The factor month and the
interaction between month and treatment are included.
This was not possible in the previous models, because
each curve was reduced into one number. In this analysis
all observations are included into one coherent analysis.
The model is:
Recall from previous modules that the covariance structure
for this model is:
The following lines implement this model in SAS: proc mixed data=rats; class treatm cage month; model lnc = treatm month treatm*month / ddfm=satterth; random cage; run;Notice the random cage statement to specify the random cage effect, and the option /ddfm=satterth to choose Satterthwaite's approximation of the degrees of freedom. The relevant part of the SAS output is listed below:
Covariance Parameter Estimates
Cov Parm Estimate
cage 0.02748
Residual 0.03790
-2 Res Log Likelihood 8.6
Num Den
Effect DF DF F Value Pr > F
treatm 2 27 3.22 0.0557
month 9 243 46.11 <.0001
treatm*month 18 243 2.12 0.0059
This output give estimates of the variance parameters (s2d=0.02748 and s2=0.03790), twice the negative restricted/residual log likelihood (2lre=8.6), and an ANOVA table for the fixed effects of the model. From this ANOVA table it is seen that the interaction between treatment and month is significant with a P-value=0.0059. The conclusion from this model is that treatment does have an effect on the activity, but the effect is not the same in all ten months.
The main problem with this random effects approach is that all measurements on the same individual are assumed equally correlated, but some measurements are taken far apart and some measurements are taken close to each other, so this assumption is not always valid. The next module will suggest a few ways to deal with this problem. However, this random effects approach may give reasonable results for short series (with 2, 3, or 4 measurements on each individual) since the assumption of equal correlation may be ok in those cases. This random effects approach is also known as the split-plot approach, or the split-plot model. It is possible to view repeated measurements data as resulting from a kind of split-plot experiment, with individuals as the ``main-plots'' to which the the treatments are applied. The ``sub-plots'' are then the single measurements on each individuals. This interpretation is a bit weak, as the single measurements on each individual (typically at different times) cannot be randomized within the individual.
![]()
11.1.5 Pros and cons of simple approaches In this module a few simple approaches to the analysis of repeated measurements have been described. In many practical cases these simple approaches, especially the summary method, will give a sufficient and useful analysis of the data. Even in those cases where more sophisticated models are needed it is often helpful to run a few simple models first. Here follows a few pros and cons of the different methods:
Optimized for Microsoft Internet Explorer 6.0 for Windows webmaster |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||