This model-free framework can subsequently be extended to a multivariate estimator with two or more discrete variables. In the bivariate case, the distribution of one variable is compared across each encoded level of the other in order to decompose the total entropy in the joint encoding into three terms: the conditional entropy unique to the first variable, the conditional entropy unique to the second variable, and the mutual information that is redundant between the two encodings . This mutual information estimate in turn reflects how much information we learn about one encoded variable if we know the value of the other, and subsequently can be used to reflect the strength of a bivariate association between two sets of encoded data regardless of the underlying dynamic ±linear, quadratic, exponential, etc. Suppose our farmer with the overstocked cows, now fully aware of the welfare issues this management choice has created, reduces their stocking rate to a 1:1 ratio and continues to monitor the lying time of the animals to provide proof to their milk buyer that the issue has been resolved. After several months at this lower stocking rate, they review the data and are dismayed to find that there are still days where animals have inadequate lying times. To solve this new problem, they hire yet another consultant to help them track down the source of the problem. Suppose that this new dataset were collected in the summer, high quality shelving systems and so naturally, this consultant includes Temperature-Humidity Index as one of many candidate variables to consider as a potential source of the continued welfare concerns for this herd .
Using stochastic sampling techniques, the full details of which are provided in Supplemental Materials, we have simulated a fairly straight forward but nonlinear dynamic between these two variables. On days where the observed THI values are low, animals are not heat stressed and so spend the majority of their day out on pasture grazing. As the THI rises, cows become heat stressed for progressively larger proportions of the day, resulting in a gradual increase in the proportion of each day that cows spend lying down in the shade of their free stall barn. Above a certain high THI threshold, however, cows struggle to thermoregulate while lying down, causing them to stand for extended periods of time. When lying time is plotted against THI, as in Figure 7A, we can see that there is a clear nonrandom pattern in this data that is perhaps best characterized by a threshold model ±a dynamic that is commonly found when a single behavioral response is subject to the influence of competing underlying behavioral response mechanisms. If a simple linear effect were utilized to probe for a significant bivariate association between these two variables, however, a near-zero slope would be returned, as shown by the red line overlaid in Figure 7A. In this case, not only would a Pearson correlation test fail to identify this nonrandom but also nonlinear pattern, but because this pattern is also not monotonically increasing, even anonparametric Spearman Rank correlation test would fail to identify THI as a significant influence on lying times within this herd. Suppose both the THI and lying time measurements are discretized using simple equal-sized binning rules. If we compare the mutual information estimate from the two observed encodings against estimates generated from a simple permutation, the resulting p-value for this test of bivariate association would be highly significant .
To further characterize this dynamic, a simple contingency table, wherein each cell represents the total number of observations for each joint encoding, can be easily visualized as in Figure 7B. The mutual information estimate for the overall bivariate association can subsequently be decomposed into point wise mutual information estimates to reflect how much each cell in the observed table differs from the counts that would be expected by multiplying the marginal probabilities, which would be the distribution of joint observations anticipated if no association existed between the two encodings . Here blue cells indicate that there are fewer observations with the corresponding joint encoding than would be expected if no association between these variables were present, whereas orange cells are over-represented relative to the null. From this visualization we can clearly see that the probability of observing a given lying pattern is being shifted in different directions based on the level of the THI encoding. Thus, absent any prior intuition or assumptions about the relationship between these two variables, an information theoretic approach has successfully identified a significant bivariate association and provided insights into the underlying dynamic to inform further interpretation of the underlying behavioral mechanisms at play and subsequently the correct management interventions needed to remediate this welfare concern.For much of its history, ethological research in livestock has relied on human observers to encode behaviors of interest . While developing a detailed ethogram and observer training protocols constitute no simple task, there are several inherent advantages to this approach for subsequent statistical analyses. Continuous involvement of a human in the incoming data stream allows many erroneous data points to be identified and excluded from downstream analyses that they might otherwise destabilize. Extensive involvement of research personnel in the data collection phase also nurtures a deeper familiarity with the system under study. This not only aids in the specification of an appropriate statistical model and interpretation of results, but is often critical in identifying unexpected behavioral patterns that can inspire novel hypotheses. Unfortunately, the inherent quality of such data imposes practical limitations on the quantity that can be produced. This can restrict both the number of animals utilized in a study and the period of time over which they are observed. Restrictions on the number of animals that can be studied, on the other hand, can fundamentally alter the behavioral mechanisms at play in a herd. For example, the linearity of dominance hierarchies are known to change with group size .As commercial herds and flocks become ever larger, this only serves to broaden the gap between experimental findings and the welfare challenges they are meant to inform. Subsampling of animals or observations windows may be employed to reduce the number of observations collected without restricting the size of the study system. If the pre-existing base of scientific literature does not provide clear guidance on the selection of target animals or focal periods, however, such strategies may risk overlooking finer-grain behavioral patterns and skewing inferences about the collective behavior of the group . In recent years, livestock sensor technologies have become a popular alternative to visual observation . While the behaviors recorded are neither as complex or as detailed as those quantified via an observational ethogram, such devices have the capacity to continuously monitor hundreds or even thousands of animals for extended periods of time. Such a substantial expansion in the bandwidth capacity of ethological studies creates many new opportunities to better understand the behavior of livestock, particularly in large-scale commercial settings, commercial drying rack but also raises new methodological challenges.
Replacing nuanced human intuition with basic computer logic may increase the risk of erroneous data points, an issue that is only further compounded by the scale of data produced by such technologies, which renders many conventional visualizations techniques ineffective in identifying outliers. Observations recorded over extended time periods with high sampling frequency from large heterogeneous social groups may also contain a range of complex stochastic features ±autocorrelation, temporal nonstationary, heterogeneous variance structures, non-independence between experimental units, etc. ±that can lead to spurious inferences when not appropriately specified in a conventional liner model. The hands-off and somewhat black-boxed nature of many sensor platforms, however, do not nurture the intuition needed to identify many of these model structures a priori. Such insights must instead be drawn directly from the data itself, but here again, standard visualization tools may not scale to such large datasets. Unsupervised machine learning tools offer a distinct empirical approach to knowledge discovery that are purpose built for large and complex datasets . Whereas conventional linear models excel at providing answers to targeted experimental hypotheses, UML algorithms strive to identify and characterize the nonrandom patterns hiding beneath the stochastic surface of a dataset using model-free iterative techniques that impose few structural assumptions. This open-ended and highly flexible approach to data exploration may offer an empirical means by which to recover much of the familiarity with a study system that is lost with the shift from direct observation to sensor platforms. The purpose of this research was contrast the behavioral insights gleaned from UML algorithms with those recovered using conventional exploratory data analysis techniques, and to then explore how such information could be best integrated into standard linear analysis pipelines. Milking order, or the sequence in which cows enter the parlor to be milked, is recorded in all RFID equipped milking systems, making such records one of the most universal automated data streams to be found on modern dairies. Despite their ubiquity, such records are seldom used to inform individual or herd-level management strategies. This lack of utility, however, has not been for lack of study. The modest base of scientific literature that has since been compiled on this topic, however, has struggled to recover repeatable evidence of such associations . While such inconsistency may simply reflect non-uniformity in the behavioral strategies driving queueing patterns across different herds and farm environments, misspecification of the linear models used to describe this system could also contribute to volatility in these statistical inferences. The objective of this methodological case study will be visualize the various stochastic aspects of such records using UML tools in an effort to identify erroneous data points and heterogeneous variance structures that may not be recovered using conventional EDA techniques.Data for this case study was repurposed from a feed trial assessing the effect of an organic fat supplement on cow health and productivity through the first 150 days of lactation . All animal handling and experimental protocols were approved by the Colorado State University Institution of Animal Care and Use Committee . The study ran from January to July of 2017 on a certified organic dairy in Northern Colorado. A total of 200 mixed-parity Holstein cows were enrolled over a 1.5 month period as study-eligible animals calved. Cows were maintained in a closed herd for the duration of the study, with sick animals temporarily removed to a hospital pen when necessary. The study pen was an open-sided free stall barn, stocked at just above half capacity with respect to bunk space and beds, with free access to an adjacent outdoor dry lot. At roughly the midpoint of the trial, cows were moved overnight to a grass pasture that conformed with organic grazing requirements . Cows had access to total mixed ration ration following each milking. Animals were temporarily split into two subsections of the pen following the morning milking to facilitate administration of control and treatment diets. Cows remained locked for roughly 45 minutes following this division so that farm and research staff could collect health and fertility data. Additionally, all animals were fitted with CowManager® ear tag accelerometers . This commercial sensor platform, while designed and optimized for disease and heat detection, also provided hourly time budget estimates for total time engaged in a range of behaviors – eating, rumination, non-activity, activity, and high activity – as well as average skin temperature.Raw milk logs were exported from the rotary parlor following each morning milking , and were processed using data wrangling tools available in R version 3.5.1 . To account for missing records due to illnesses and RFID reader errors, ordinal entry positions were normalized by the total number of cows recorded in a given milking . For example, if a cow were always the last animal to enter the parlor, her ordinal entry position might vary widely with herd size, but her entry quantile would always be 1. The first 55 days of records were excluded from analyses to allow all animals to enter the herd over the rolling enrollment period and become established in their parlor entry position . To avoid irregularities in cow movements, several observation days surrounding management changes were also dropped, including: the two days preceding transition to pasture, the four days following pasture access, and the final seven days on trial.