## From the Executive Editor

# Critiquing the statistical methods section of a manuscript

In the last editorial, I discussed two fundamental principles of statistical tests: pigs are randomly allocated to treatment, and each observation is independent of the rest. This editorial will cover the analysis of normally distributed outcomes and controlling for clustering.

Statistical tests are selected on the basis of the expected distribution of the outcome parameter or the dependent variable. Think about the height of AASV members. Most people are of average height, some are very short, others very tall, and the rest are in between. Height represents normally distributed data and forms a bell-shaped curve when it is graphed. We describe these data using the average or mean height and the variation around the mean, often with standard deviation or standard error. The standard error is used in statistical tests to determine whether differences between two groups are larger than we would expect by chance alone. If we compare the height of AASV members with blue eyes to that of members with brown eyes, we will likely find that the heights of the two groups do not differ. However, if we compare the height of women versus that of men, we will likely conclude that they differ and that the difference is statistically significant. Although the height of men is normally distributed and the height of women is normally distributed, they represent two distinct bell-shaped curves. Statistical tests compare the average height of men minus two standard errors and the average height of women plus two standard errors and determine whether these numbers overlap.

Average daily gain (ADG) is a continuous variable that can be measured to multiple decimal points and, within a group of pigs of similar ages, tends to be normally distributed. We can use a Student’s t-test to compare the ADG between two groups. For example, we might compare the ADG of vaccinated versus unvaccinated pigs in a finisher barn. The assumptions of the test are that the observations are independent of one another, the variation in ADG is the same in vaccinated and unvaccinated pigs, and the data are normally distributed. However, the only way we can fulfill the assumption of independence is to put only one pig in each pen. Once pigs are grouped in pens, the assumption of independence is violated. Average daily gain may cluster by pen for many reasons: for example, if the feeder becomes plugged or the pen is drafty. Anything that might affect the ADG of pigs in a pen makes the pigs within the pen more similar than pigs from two different pens. These are not independent observations, and if we treat them as such, we have violated a key assumption of all statistical tests. We control for this clustering by adding pen to the analysis using multiple linear regression.

Important assumptions of the multiple linear regression are that the data are independent and normally distributed, that the variances of the data do not change in a systematic manner with the independent variables, and that the errors sum to zero. Multiple linear regression allows us to determine whether or not ADG differs by vaccination status after controlling for other variables that may affect ADG. We are asking “Is vaccination status associated with ADG after we control for pen, initial weight of the pig, and gender of the pig?”

Once we have used the multiple linear regression, we must test the assumptions of the model using a series of standard tests of the residuals. The residuals are the differences between the observed ADG and the ADG estimated by the model. If any of the tests identify a problem, then the data must be reanalyzed using another statistical technique. For example, the dependent variable may have to be transformed mathematically so that its distribution more closely matches a normal distribution. As a critical reader of the statistical section of the materials and methods, I hope that you verify that the authors have tested the assumptions of the models. If they have violated the assumptions of a model, then the conclusions made on the basis of the model are invalid.

How does the analysis differ if we are still interested in measuring ADG, but the treatment is applied to the pen rather than to the individual pig? The unit of analysis must be the smallest unit to which the treatment is applied. Therefore, if we compare feed additives, the comparison groups would be ADG of the group of pigs in a pen.

Mixed models typically refer to multiple linear regression with a random variable included in the model. Fixed effects, such as gender and parity, are variables that we can reliably measure in one study and repeat in another study. Random variables cannot be repeated. Examples of this include farm or pen, which represent the cluster of pigs. If I do a study in Ontario on 10 different farms, I need to control for the farm effect. However, no one can repeat the study in Iowa with the identical farm effect measured in Ontario. By putting farm into the model as a random effect, we are controlling for the random variation due to farm and then saying “After controlling for the random variation due to farm, is there any variation in our outcome parameter due to the treatment?” This random farm effect includes measurable and unmeasurable farm factors that are not in the model.

The key concepts are these: the statistical methods must be valid for the results to be valid, the statistical models must be evaluated to determine whether or not the methods are valid, and finally, the statistical methods must be explained in sufficient detail for the reader to critique the methods and for another researcher to repeat the study. In the next editorial, I will discuss dependent variables that are not normally distributed.

--Cate Dewey, DVM, MSc, PhD