Question your statistical assumptions

John Roberts

College of Veterinary Medicine, North Carolina State University, Raleigh, NC 276957-621

Swine veterinarians often try to solve production problems by visually scanning pages of computer-generated statistics in an attempt to identify factors that might be associated with the problem. The scanning process has two inherent assumptions. If those assumptions are faulty, your interpretations may be misleading.

The first assumption is that your benchmarks are appropriate for the farm under investigation. As practitioners, our detection of atypical values is biased by previous experience. Remember, expected production values may vary across farms, depending on farm size, weaning age, or farm purpose (e.g., commercial farm or seedstock producer).

When you find a production value that you consider abnormal, you must question whether your benchmark is applicable. Doing this forces you to better understand the determinants of the value and the benchmark.

The second assumption is that reported values are endpoints of information. For instance, if preweaning mortality is 20%, you could assume that every sow has 10 live-born pigs and loses two. However, it is more likely that sows of parity 3 or less lose one pig, sows of parity four or more lose three pigs, and the population contains as many old as young sows. The second scenario is more descriptive and better identifies the problem.

Production statistics contain much information, but what they do not show may be more valuable than what they reveal. Populations are composed of many subpopulations (e.g., different parities, breeding groups, or barn locations). A production statistic is actually a weighted average of all the subpopulations in a farm. The weights are the proportions of the population contained in the subpopulations.

Weights are easily determined. Finding meaningful subpopulations is the challenge. Finding production values of each subpopulation group tests the importance of a subpopulational description. A subpopulation description is meaningful when the production values of its groups are quite different. A large proportional weighting of a single group acts to emphasize the effect of an unusual value on the reported summary value.