
Part Two: The Study (https://www.mdpi.com/2076-393X/12/1/96)
Abstract
This study evaluates the immunocontraceptive efficacy of Porcine Zona Pellucida (PZP) treatment on the Virginia Range free-roaming horse population, analysing the impacts of PZP on fertility rates over four years (2019-2022). Researchers monitored 2,817 mares, tracking vaccination records and resulting reproductive outcomes. The analysis demonstrated a significant reduction in foaling rates, suggesting a nearly 60% decrease in pregnancies due to the pzp treatment.
However, the study’s methodology faced criticism for lacking rigorous statistical analysis, insufficient control for confounding variables, and reliance on descriptive statistics without inferential modelling. Recommendations for future research emphasise the need for mixed-effects models and survival analysis to assess vaccine efficacy and duration of effect better, enhancing the overall robustness of findings in wild horse population management.
Critical Scholarly Evaluation
Scientific Rigour and Methodology
- Strengths: Comprehensive field dataset; basic tracking of age, treatment, and reproductive outcomes.
- Major Weaknesses:
- No statistical testing or modelling to establish causation.
- No control group or comparison cohort.
- No adjustment for confounders (e.g., predation, environmental change, mortality).
- Relied on visual trends rather than hypothesis-driven methods.
- Verdict: Methodologically weak. Fails to meet the baseline standard for quantitative evaluation of treatment efficacy in population science.
Statistical Soundness
- Strengths: Descriptive statistics are presented clearly.
- Major Weaknesses:
- No inferential statistics, p-values, confidence intervals, or effect size estimates.
- Despite rich longitudinal data, no regression, survival analysis, or repeated measures modelling.
- Assumptions about vaccine efficacy duration were not tested against observed outcomes.
- Verdict: The study’s lack of statistical rigour is a serious flaw, as it makes population-level claims about contraceptive efficacy.
Journal Quality (MDPI Vaccines)
- Concerns:
- Critics have criticised MDPI for operating with predatory tendencies, including fast-tracked peer review and questionable editorial practices.
- While Vaccines is indexed and has an impact factor, it is not a top-tier veterinary science, ecology, or public health journal.
- The article likely passed with minimal statistical peer scrutiny given its analytic weaknesses.
- Verdict: Publishing in MDPI Vaccines undermines the credibility of an already under-analysed dataset.
Overall Scholarly Contribution
- Comment: While the field data is valuable and the observational findings are intuitively aligned with known PZP effects, the analytical execution is too weak to support evidence-based decision-making. It reads more as a program summary than a rigorous scientific study.
Final Assessment
This study would not pass peer review in a journal with strict standards for statistical analysis or epidemiological rigour. It is unsuitable for drawing firm conclusions about efficacy, duration, or policy recommendations without reanalysis using appropriate statistical models.
Detailed evaluation of Statistical Methodology in the PZP Wild Horse Study
Introduction
The “Immunocontraceptive Efficacy of Native Porcine Zona Pellucida (PZP) Treatment of Nevada’s Virginia Range Free-Roaming Horse Population” study (Vaccines 2024, 12, 96) evaluated PZP fertility control effects through darts on birth rates and demographic patterns of a significant wild horse population. Researchers monitored monthly records about each mare within the population across four years from 2019 through 2022. The researchers measured vaccination records, birth status, and other factors. The goal was to evaluate vaccine effectiveness by measuring annual birthrate changes and population dynamics as PZP treatment reached more horses. Our assessment focuses on the statistical techniques from this research study for their suitability alongside their reliability and bias control mechanisms, as well as their approach to modelling vaccine persistence and longitudinal data analysis to determine if alternative analytical strategies would have produced more substantial findings.
Summary of Data and Statistical Methods Used
Data Structure: The study examined 2817 female horses across 48 months. Each mare had multiple monthly data points that recorded her study presence. Using different efficacy models, the authors recorded and computed several variables during each mare-month observation, including pre-2019 vaccination status and total pzp vaccination count. The authors monitored key variables, which included mare pregnancy status and conception status, alongside foaling status, age classification, and social group affiliations, and observed granuloma or abscesses at injection sites. The detailed longitudinal data at an individual level allowed researchers to analyse treatment delivery and reproductive results throughout the observation period.
The analysis employed vaccine efficacy scenarios to model the PZP duration of effect. The field longevity of PZP contraceptive effects remained unknown, so researchers established four vaccine duration scenarios that spanned from permanent effects to six-month, twelve-month, and eighteen-month efficacy. The researchers modified cumulative vaccination records for each mare by removing expired vaccinations based on selected time intervals. A mare’s vaccine efficacy period would expire six months after vaccination unless she received additional booster shots. The team calculated average vaccination numbers per mare through time for all specified scenarios. The program started by giving initial primers and boosters before conducting yearly boosters as needed. The average number of active vaccinations per mare reached approximately one by 2022 under the twelve-month efficacy assumption after an initial boost (Vaccines 2024, 12, 96). The study conducted a sensitivity analysis using these scenarios to show how changing vaccine persistence would impact both coverage measures and booster dosage requirements.
The study analysis focused primarily on descriptive analysis. The authors used database queries for basic count and summary statistics while employing R for exploratory data analysis and graphing. The researchers produced graphical representations of essential metrics, which included mean vaccinations per mare and proportion of mares foaling across different time points. The research results presented statistical data through proportion analysis, count methods, and time-series graphical displays while focusing on changes in the four years. They documented that vaccination coverage reached 72.5% of the herd during the fourth year in 2022, while foaling results underwent substantial alterations. The foaling rate calculated from the percentage of birthing mares revealed a consistent downward trend until 2022, when only 10% of mares became pregnant, while the pre-program percentage stood at approximately 33%. The foaling rate declined by approximately 58-60%, and pregnant mare numbers decreased to 10% of the total population, thus proving contraceptive effectiveness. Statistical significance tests or confidence intervals did not support the observed population associations because the results were presented without these statistical measures.
The research study omitted complex statistical model analysis to determine the vaccine effects on various outcomes while managing other influencing variables. The analysis results lack information about p-values, confidence intervals, and formal model coefficients. The authors show patterns (e.g. “foaling rates approximately halved in 2021 and by 2022 were further reduced by ~60%”) that result from higher vaccine coverage. Yet, they do not include logistic regression models to assess the probability of foaling against treatment variables. The entire study population acted as its time-based comparison in the observational and descriptive research methodology. The study presented visual results showing fertility metric reductions as vaccination programs expanded, instead of performing statistical hypothesis tests.
Appropriateness and Rigour of the Statistical Methods
The study used descriptive statistical analysis to analyse a whole-population field program without an untreated control group. The study effectively linked PZP vaccine distribution with dramatic reductions in foaling statistics. The descriptive analysis method enables effective communication of raw results through examples, such as comparing the annual foaling rate of 1 in 3 mares before the program to 1 in 10 mares by 2022. The analysis gained strength from using monthly data examination and mare age class separation between mature and yearling groups to distinguish between juvenile non-breeders and adult breeders. The selection of descriptive statistics improved their accuracy by using the correct population of mature mares for foaling rate calculations.
The absence of inferential statistical tests or models restricts the analysis from correctly measuring uncertainty or establishing causality. The study implements PZP treatment changes to explain observed outcomes, but does not present any statistical evidence to support this assumption. The authors fail to demonstrate any statistical analysis which would have allowed them to determine how vaccinated mares compare to unvaccinated mares in terms of foaling rates after adjusting for age. The lack of statistical analysis creates uncertainty about certain aspects of the study. The absence of statistical modelling techniques prevents the assessment of year-to-year statistical significance and individual and subgroup response variability. The lack of error measurements (confidence intervals on the 58% foaling reduction or the 10% conception rate) prevents us from determining the extent to which natural variation or unreported variables contributed to the observed changes. The analysis shows associations, but fails to establish statistical causation or provide measures of treatment effect uncertainty.
The study results are population-level descriptions because the authors include all observed subjects in their analysis, freeing them from typical sampling errors. When considering the outcome as a complete census population, hypothesis testing becomes less significant. Scientists often employ modelling to generalise findings even in situations that could be classified as “census”. The study would have benefited from including inferential analysis to confirm that other factors did not influence the observed patterns. The study should have included a statistical model to account for the missing 2019 data and to check if the observed decrease from 2020 to 2022 surpassed the expected patterns by chance or natural trends. Without statistical evidence for “efficacy”, the study depends on assuming no other substantial changes occurred during the program period.
The observational nature of this study requires an explicit evaluation of the assumption that no other significant changes took place during the study period. A more rigorous approach would involve demonstrating that similar declines were not observed in control areas or untreated periods. The descriptive statistical methods offered essential data insights, yet failed to reach the necessary standards for drawing causal conclusions. The initial field report approach was suitable for the study, but it provides opportunities for additional analysis. The following section highlights the challenges related to confounding and bias that emerge from this study design.
The treatment of confounders, assumptions, and biases in the study.
Researchers took several data handling steps to minimise possible biases in their study. The efficacy analysis was protected from confounding factors by excluding mares that lived in prohibited areas or received treatment in the last year. The analysis excluded horses living south of a highway because these horses missed one year of treatment opportunities, and this bias would distort results if they were included. The authors removed the mares from the Fernley area who participated in the program during the last year because their observation period was insufficient. The exclusion of these regions was appropriate because including primarily untreated areas in the analysis might have weakened the vaccine effect or introduced geographic factors as confounders. The authors removed these horses from the study because the core population received the treatment consistently. The study distinguished immature females from mature females to prevent foaling rates from being confounded by yearling population growth. The researchers implemented age-based stratification to guarantee that foaling rate data measured breeding-capable mare fertility while excluding young non-reproductive animals. Uncontrolled Confounders: Despite those steps, several potential confounders and biases remain unaddressed or only qualitatively acknowledged:
Time Trends and Environmental Factors: The study took place over 4 years, during which other changes could influence foaling rates. For example, forage availability can fluctuate, drought can occur, disease can spread, predation can be a significant problem, and all of these can influence reproduction and foal survival independent of pzp. The authors noted an excessively high foal mortality rate (up to 63% in 2022) due to predator presence. Such mortality doesn’t prevent conception, but could mean some foals were never documented (if predation occurred shortly after birth). No adjustment or sensitivity analysis was done to account for year-specific environmental factors. A more rigorous approach might include year effects in a model or compare to historic data on foaling in that herd under similar ecological conditions.
Population Structure Changes: The program itself likely altered the population structure (fewer foals born means the age distribution shifts older, and fewer mature mares may be added each year). The authors observed a decline in the absolute number of mature mares over time, which could partly be an outcome of contraception (fewer new females) but also could result from natural mortality or removals. In fact, by the end of 2022, 1,089 of the study mares were classified as “deceased” (natural attrition) and 22 were “removed” by management. If mortality was high (possibly related to drought or predation), the decrease in foaling could partly reflect fewer mares being alive or healthy enough to reproduce, not just contraceptive effects. The analysis did not control for the changing number of mares at risk each year beyond expressing foaling as a percentage. This is primarily acceptable (foaling rate inherently accounts for several mares). Still, if specific subsets of mares were more likely to die or be removed (for example, perhaps untreated mares ranging in risky areas suffered higher predation), that could bias comparisons. The study did not examine whether the mares that remained in the dataset differed from those lost, which is a potential source of bias
Pre-existing Fertility Differences: In any observational study, treated and untreated subjects might differ. Treatment rollout was widespread here, but not every mare was darted immediately. One potential confounder is individual mare fertility or social status. It’s conceivable that the easiest mares to dart (those often near people or water) might have different reproductive rates than those in remote areas (which possibly had lower baseline foaling or higher foal predation). Since treatment wasn’t randomised, any such differences could confound results. The study did not explicitly compare foaling rates in untreated versus treated mares in the same period – a comparison that could have been informative. Instead, all mares in treated areas were analysed as one group, with increasing overall treatment coverage. This means we assume no systematic differences between the first and last mares darted, which might not hold strictly. The authors did not address this possible bias.
Initial Conditions and Lag: The authors acknowledge a key temporal bias: many mares were already pregnant when vaccination started (April 2019, peak foaling season). Consequently, the first foaling season’s data do not reflect the vaccine’s impact (those foals were conceived pre-program), and even the second year included some foals from mares vaccinated mid-pregnancy. They correctly note that a lag in effect was expected until the second year. They handled this by interpreting 2019’s foaling rate as artificially low (due to undercounting) and focusing on drops after 2020. However, they did not formally adjust the analysis to exclude those initial pregnancies or control for whether a mare had already foaled when first treated. A more rigorous analysis could have, for example, excluded foals born in 2019 from the efficacy calculation or started the “clock” for each mare after her last pre-treatment foal to more cleanly measure new conceptions under treatment. The study’s efficacy metrics (58% reduction in foaling) were computed at the population level without such fine-tuning, which could slightly underestimate true efficacy due to that initial lag.T he authors took sufficient measures to eliminate apparent confounding effects (entirely untreated subpopulations) and acknowledged some biases (for example, underestimation of foals at the beginning). However, they did not use statistical controls for confounders in the analysis. The approach assigns all changes in foaling to the treatment, which, although likely true, is not definitively proven without either a control group or a multivariate analysis. The study would have been more methodologically rigorous if the authors had explicitly controlled for year effects, regional differences, or pre-treatment fertility in a model. A before-and-after comparison of the same mares or a subgroup analysis (e.g. horses that remained untreated longer) could have helped isolate the vaccine’s effect. There remains some (albeit small) possibility that factors other than PZP contributed to the observed outcomes.
Modelling of Vaccine Efficacy and Assumptions about Longevity
One of the strengths of the study’s methodology was its attempt to bracket the unknown duration of pzp’s contraceptive effect by analysing multiple scenarios. The four efficacy-duration scenarios (permanent, eighteen months, twelve months, six months) adjusted cumulative vaccine counts per mare. This sensitivity analysis is commendable in recognising a key assumption – how long a single primer+booster prevents pregnancy – and showing how different assumptions would change the interpretation of how many “effective treatment units” each mare received. For example, by month 48, mares had 3.74 shots on average (no efficacy loss scenario), but under a 12-month efficacy assumption, this translated to maintaining roughly one active shot per mare (since older shots “expired”). The study found that under a 12-month efficacy model, the program reached a steady state of ~1 vaccination per mare per year after the second year, which the authors cite as a “robust recommendation for treatment frequency” (i.e. an annual booster). In practical terms, their data supports the idea that yearly boosters are sufficient to keep fertility low, aligning with the 12-month efficacy assumption.
The scenario-based modelling was helpful but relatively simplistic, and it assumes rather than infers the vaccine’s longevity. The study did not test which assumption was most consistent with the observed foaling data. Ideally, one could try to infer the duration of vaccine effect from the data, for instance, by examining if pregnancies occur around 12+ months after treatment without a booster. The paper did not report an analysis correlating time since vaccination with pregnancy risk. Instead, it effectively sidestepped that by presenting all scenarios. This is conservative (it doesn’t over-claim how long PZP works), but it means the study doesn’t pinpoint vaccine longevity. They lean on the 12-month scenario as a likely case, noting the system “reached stability” at annual boosting, which suggests the authors’ interpretation that ~12 months is close to the actual duration of strong efficacy.
A more rigorous approach could have used time-to-event modelling or regression to estimate vaccine efficacy decay. For example, a survival analysis could treat the “event” as a mare conceiving or foaling, and include a time-dependent covariate for whether the mare is within X months of a vaccination. This would allow an estimate of the hazard of conception returning as months since the last treatment increase. If the hazard jumps after 12 months, that would empirically support a 1-year duration. Alternatively, a logistic regression for each breeding season could include the number of months since last shot as a predictor of pregnancy, to see if efficacy significantly drops off at 12–18 months. The current study did not undertake these analyses. Instead, assuming several fixed durations provided a range of possible outcomes (from best case permanent to worst case 6-month). It showed that even in the worst case, the program still achieved a substantial fertility reduction (because boosters were given frequently enough). This addresses the question “how sensitive are our results to vaccine longevity?” but not “what is the vaccine longevity given our results?”The study determined modelling success by comparing foaling rate reduction and contraceptive achievement. They employed foaling rate as a surrogate measure for conception rate and an indicator of the efficacy’s opposite effect. The foaling rate directly measures pregnancy prevention because mares can foal only once yearly. The researchers directly confirmed that about 1/3 of mares used to become pregnant annually until the program started, but only 1/10 of mares became pregnant thereafter. According to the study data, the conception rate decreased by about 67%, which matches the reported 58–60% decrease in foaling numbers due to some initial underreporting. The vaccine effectiveness model operated on a basic binary system, which tracked whether mares were pregnant or not pregnant without assessing individual vaccine performance or foaling probabilities. Smaller controlled research studies demonstrate that a primer followed by a booster achieves 90% success in preventing foals during one year. The field study avoided publishing such statistics because mares received multiple vaccinations, and no untreated control group existed for comparison. The researchers based their program evaluation on a 58% reduction in foaling as the primary metric.
The study presented vaccine efficacy assumptions with clear transparency through basic methods. The model presented realistic scenarios instead of performing a statistical analysis to determine how the vaccine effectiveness changed over time. Future research should adopt models that analyse the data to match efficacy-decay patterns. The evaluation could use mixed-effects models with random mare effects to examine the relationship between foaling rates and months since vaccination. The authors must calculate “time since last vaccination” to understand how fertility probability evolves post-treatment. The research would deliver data-based results about how long PZP maintains its effectiveness within this population. Such modelling could become more accurate by adding previously established individual vaccine efficacy from prior studies (e.g., two initial doses produce about 90% contraceptive success in the first year ) as prior knowledge or constraints. The study proves that annual booster injections are necessary for operations, but fails to deliver a precise analysis of vaccine duration, which presents an opportunity to expand research.
The data collection structure allows for advanced statistical analysis by implementing longitudinal data structures. The research contains natural longitudinal data because scientists monitored the same mares for four consecutive years. Advanced statistical models, including mixed-effects models, GEE, and time-series methods, can be applied to utilise repeated measures while accounting for individual differences in the data. The study authors omitted employing advanced statistical models in their published work, and this paper evaluates their decision.
The analysis presented a simplified view by combining monthly observations into yearly summaries and skipped explicit modelling of mare-specific patterns, thus treating each monthly observation as an independent event. The reproductive outcomes between consecutive years for the same mare remain correlated because mares treated consistently have higher probabilities of not foaling in 2021 and 2022. Statistical tests performed on this data without repeated measures consideration would result in incorrect uncertainty calculations because 2817×48 mare-month points cannot be treated as entirely independent data points. The authors safely avoided this error by refraining from carrying out statistical tests on these data points. The longitudinal data served to calculate population-level metrics, including monthly foaling rates, which automatically combined information from different mares. The problem of pseudo-replication remains negligible because their analysis focuses on descriptive statistics rather than statistical tests.
Not implementing a mixed-effects model prevented the study from quantifying mare and band-specific variability. Some horses would fail to become pregnant regardless of treatment, while others might achieve pregnancy despite repeated vaccinations, possibly due to vaccine resistance. A mixed-effects logistic regression model that includes mare-specific random intercepts could measure fertility baseline variations between mares. The analysis would determine vaccine effectiveness by considering these individual differences in the data. The analysis would determine intra-mare correlation (how consistent mare status remains) and residual variance levels. The current study presents an aggregate view that might hide significant patterns regarding the birth of a few foals from specific “problem” mares who either missed booster shots or failed to react immunologically. A more complex modelling approach would provide the solution to this question.
Time-series approaches analyse the herd as one unit to study monthly foal counts and their corresponding rates. The monthly data shows clear seasonal patterns because horses reproduce seasonally, with an overall decrease in the number. A time-series model would use ARIMA with seasonal components or state-space models to measure the trend structure and verify its statistical importance against the observed count variability. An interrupted time series analysis would evaluate April 2019 as the beginning of treatment to determine if there were any significant changes to foaling rate slopes or levels post-intervention when compared to pre-treatment periods. The main obstacle in this situation is the limited pre-intervention data from 2019 because they only had several months before vaccination, which were also partially missing. Analysing historical foal counts from 2018 would enable researchers to determine seasonal patterns, which could be compared to the observed changes in 2019–2022. The study authors did not apply this method since they followed the breeding season timing, which stayed normal, while the foaling peak decreased in amplitude. Through seasonal time-series decomposition and seasonal ARIMA models, researchers could have provided more substantial evidence that the peak timing and duration did not shift along with the confirmed decline in foaling rates. The authors use graphs to demonstrate these findings, yet models provide a statistical backing for their claims.
The data structure (many mares, each observed many times) suggests that a generalised linear mixed model (GLMM) is suitable for analysis. For example, one could set up a logistic GLMM where the outcome is whether a mare foals in a given year (or conceives in a given season) with fixed effects for treatment metrics (e.g., number of shots received, or a binary treated vs not in that year) and random effects for mare and perhaps for year or herd area. Such a model could directly estimate the impact of treatment on the odds of foaling. It could answer questions like: How much does each additional vaccine dose reduce the odds of a mare foaling if adequately specified? Or what is the odds ratio of foaling for a treated mare versus an untreated mare? – controlling for other factors. This would transform the largely qualitative efficacy claim into a quantitative one. For instance, other researchers have used logistic regression to estimate contraceptive effects: Roelle et al. (2017) modelled the probability of foaling in treated vs. control groups using logistic regression and reported that treated mares had dramatically lower odds of foaling (with odds ratios and p-values to demonstrate significance). Adopting a similar approach here would allow the authors to state something like “PZP treatment was associated with an X-fold reduction in the odds of foaling (p < 0.001)”, which is far more statistical language. The current study instead uses phrasing like “associated with a 58% reduction in foaling” without statistical inference, so incorporating a GLMM would tighten that causally.
Another advantage of mixed models is handling group-level effects. The Virginia Range is large, and the data included different herd areas and bands (which were recorded ). There may be random effects of band (harem) – for example, differences in stallion behaviour or band terrain could influence foal outcomes. A hierarchical model could include a random effect for band or geographic area, accounting for clustered outcomes. This was not explored; all data were pooled. While these random effects might not dramatically change the main conclusions, their inclusion would improve the precision of estimates and allow checking that results are not driven by, say, one particular sub-area.
Use of Repeated Measures for Precision: The study forgoes some statistical power by not using the longitudinal nature in an inferential model. Each mare’s history provides multiple data points that, if modelled appropriately, could strengthen confidence in the effect. For example, if a mare serves as her control (pre- vs post-treatment), that self-comparison can account for individual fertility level and improve detection of treatment effect. A before-and-after paired analysis could have been done for mares that had known fertility before PZP and after. The authors did not explicitly conduct such a paired analysis. Still, one could imagine using the data in that way (e.g., “Of mares that had foals before treatment, 80% had no foal after treatment” – a statement that would be compelling evidence of efficacy). Instead, they looked at aggregate foaling rates year by year.
The data structure was rich, but the analysis did not exploit it with advanced models. Mixed-effects models or GEE (generalised estimating equations) would account for the repeated measures and provide more robust inference (especially if one wanted to generalise to other herds or future years). Time-series models could better characterise the trend and seasonal patterns, offering formal tests for pattern changes. The absence of these models is a limitation in the study’s methodology, not in terms of their making a mistake, but in terms of missed opportunities for more profound insight. The results as presented are credible, but a reader might wonder if a rigorous model would yield the same conclusions (most likely yes, but it should be demonstrated). Employing such models would enhance confidence that the observed decline in foaling is genuinely due to the vaccinations and not an artefact of unmodeled variability or correlation.
Optimality of Chosen Methods vs. Alternatives
The descriptive longitudinal approach effectively showed a large-scale effect but failed to deliver causal or precise results. We detail how alternative statistical methods would lead to a more robust or informative analysis:
Applying Generalised Linear Models (GLMS) through logistic or Poisson regression models would enable researchers to incorporate treatment as a predictor variable while performing formal tests of its effect. The research design uses a logistic GLM to analyse annual foaling success (yes/no per mare-year) alongside variables like the number of vaccine shots administered during that year, mare age, and geographic location. The analysis produces an evaluation of vaccine effectiveness per dose administered. A Poisson or negative binomial GLM could model the count of foals per mare (mostly 0 or 1, but could handle a mare having zero foals vs one foal over the period, etc.). Due to the large number of zero values, the logistic approach presents the most straightforward solution. Implementing a GLM analysis would boost the research methodology by providing p-values and confidence intervals for evaluating the main effect of interest. The research methodology enables users to verify interaction effects and non-linear relationships between variables (for example, the decreasing value of applying more than two shots). The original study did not implement these statistical procedures; thus, a GLM would represent a more robust method to establish cause-effect relationships and measure effect sizes.
Mixed-Effects Models: The discussion highlighted that a GLMM (mixed model) would represent an even better solution because it handles repeated measurements and hierarchical data structure. The most suitable method for handling this data type would be mixed models. It could process time-dependent variables (such as total vaccine injections received by each mare) and include random mare intercepts. The outcome would deliver a vaccine effect measurement that applies to the entire population while providing an uncertainty measurement. A mixed model analysis might demonstrate that controlling for mare variations and yearly effects reveals X% decreased odds of foaling with each additional vaccination, precise confidence intervals, and Y times increased likelihood of foaling among untreated mares. The study would validate the efficacy statements through statistical methods. A mixed model would help analyse the predictors of treatment failures by examining whether the failure rates correlate with missing booster shots or being located in certain areas. The detailed information about the population becomes difficult to access through a basic descriptive summary.
Survival Analysis: Survival analysis or time-to-event methods could determine the duration of fertility suppression caused by treatment. The analysis of time to first foaling can start from the program’s initiation by treating death or removal as censoring events for mares who begin the program. The survival curve for “time to foaling” would extend further to the right (foaling time becomes longer) when pzp proves effective compared to a curve without treatment. The remaining untreated mares who unintentionally did not receive treatment could serve as a survival analysis comparison group. A survival model that includes time-dependent covariates enables researchers to analyse the exact moment when a mare received vaccination to examine the immediate change in her foaling hazard. The method provides exceptional power to evaluate waning vaccine efficacy since researchers can assess whether the foaling hazard elevates after twelve months post-vaccination. The current study lacks survival analysis, which would provide detailed information about the duration of vaccine effectiveness.
This study failed to apply these methods. Using these methods would increase confidence that the pZP vaccine caused the decline (over simply being correlated with it).
Modelling Heterogeneity and Uncertainty: Other methods could also increase the understanding of the heterogeneity of efficacy. For example, perhaps older mares are slightly more or less responsive to the vaccine; a model could examine whether there is an interaction between age and treatment. Or possibly efficacy increases after a mare has had multiple boosters (immune response builds) – a longitudinal model could assess whether fertility rates dropped further for mares that received boosters in consecutive years compared to those with gaps. The descriptive analysis provided suggestions (e.g., one year’s treatment is enough). Still, a model could support that by showing, for example, that mares who missed a year were significantly more likely to foal, thus demonstrating the importance of not exceeding a twelve-month gap. This is important, as we would like to know how much trust we have in, for example, “10% conception rate” rather than 10% ± 5%. With thousands of data points, uncertainty is likely small, but it should be stated.
Was the chosen method optimal? From a purist statistical standpoint, no, the methods were not optimal for inference. They were sufficient for description and probably sufficient to convince readers qualitatively (because the effect is significant), but they do not meet the highest standards of analytical rigour. The optimal methodology would likely be a combination of the above alternatives: perhaps a mixed-effects logistic regression for foaling outcomes (to estimate effect size and control confounders), complemented by a survival analysis for duration of efficacy, and possibly a time-series analysis to confirm no extraneous trend shifts. These methods would give a comprehensive, robust picture: that the vaccine works, how strongly it suppresses fertility, how long it lasts, and that the observed decline is due to the intervention and not other factors.
Critiques and Recommendations for Improved Rigour
Critique Summary: The study was very effective in showing that a pzp darting program can cause a sharp decline in foal production in a wild horse population, but the statistical analysis was heavily based on observation of trends without much formal modelling. Since there is no inferential statistics, the results, although persuasive, are based on the assumption that no other factors could explain the changes. Potential confounders (environment, mortality, heterogeneous treatment application) were not fully controlled, and the powerful longitudinal nature of the data was underutilised. The approach to vaccine efficacy (using predefined duration scenarios) was informative but did not extract the maximum insight that a data-driven model could provide. In essence, the analysis provided evidence of efficacy but did not provide measurement of efficacy with estimates of precision or tests of significance.
Recommendations: To enhance methodological rigour in this and similar studies, we recommend the following:
Incorporate a Control or Comparison: If an outright control group (untreated horses) is not ethically or logistically feasible, use internal comparisons. This could include untreated periods or regions as quasi-controls (with appropriate caveats), or comparing mares before vs. after they receive treatment (within-subject comparison). Even data from the fringes of the study (e.g., the excluded areas) could be leveraged via causal inference techniques to strengthen the argument that the observed declines are due to PZP and not an overall herd phenomenon. For example, a difference-in-differences analysis using the south-of-Highway-50 horses as a reference group could control for year effects on foaling rates.
Use Generalised Linear Mixed Models: Re-analysing the data with a GLMM would likely be the most informative improvement. A mixed model could solve many of the abovementioned problems: it can control for confounders (including covariates such as year or age), handle the repeated measures (random effects for mares), and estimate the treatment effect with a significance test. It would provide outputs like an odds ratio for vaccination effect, which could be directly compared to other studies or used in meta-analyses. Such a model could also implicitly include the vaccine efficacy duration: e.g., include terms for whether a mare is within 0–6 months post-shot, 6–12 months, etc., to see where fertility increases. We highly recommend that the authors or future researchers perform a mixed model analysis to quantify PZP’s effect on individual fertility risk.
Conduct Survival Analysis for Efficacy Longevity: A focused survival or time-to-foaling analysis should be done to estimate how long the vaccine protects a mare, in addition to the above. Mares should be tracked from their last treatment to see when (if at all) they will produce a foal next. A Kaplan-Meier curve would visually show the proportion of treated mares remaining foal-free over time, and a Cox proportional hazards model could test differences between groups (for example, mares that received boosters versus only primers) or estimate the hazard increase as time since treatment increases. This would test the “6, 12, 18 months vs permanent” assumptions and likely pinpoint a more precise duration (for example, perhaps finding that pregnancy hazard starts rising after 12–16 months). It also naturally handles censoring (mares that die or are removed). The survival analysis results could then be translated into an estimated efficacy period with confidence intervals (for example, “PZP effectively prevented foaling for a median duration of X months in treated mares”).
Address Biases in Data Collection: The study noted incomplete foal documentation in the first year and high foal mortality later. Future analyses should consider adjusting for detection bias, perhaps using auxiliary data like known predator kills or pregnancy observations. If foals are being missed, one might incorporate a correction factor or at least do a sensitivity analysis (e.g., “if X unseen foals existed, would it change conclusions?”). Also, explicitly incorporate initial pregnancy status: a suggestion is to start the analysis of foaling rates from mid-2020 onward (once no mare is still carrying a pre-treatment pregnancy) to isolate the treatment effect. Alternatively, include a covariate for whether a mare foaled in 2019 (meaning she wasn’t prevented that year) when modelling 2020 outcomes, etc. This could control for differences between mares that were initially pregnant vs not.
Use Multi-Variable Models to Adjust for Confounders: Even a straightforward multivariable logistic regression (not necessarily mixed if one does per-year analysis) could include year (or environmental indices) to adjust for annual conditions, age of mare (fertility can decline in very old mares, and very young mares have lower fertility; the study assumed >1 year as equal, but a 2-year-old vs a 15-year-old might differ), and location or band as covariates. By doing so, one can say “controlling for year and age, treated mares had an X% lower probability of foaling.” This increases confidence that the effect isn’t due to those other factors. It appears the authors recorded variables like band and herd area, so using them in a model to account for spatial clustering or stallion effects would be feasible and advisable.
Provide Uncertainty Estimates: Wherever possible, future reports should include confidence intervals or similar measures for key outcomes. For example, “58% reduction” could be accompanied by a 95% confidence interval (even if derived from a model or a data bootstrap). This communicates the statistical certainty. Given the large sample, the intervals might be narrow, but reporting them is good practice. Likewise, the “10% conception rate” could be given as 10% ± some margin. This would formally indicate how much variation in these percentages could occur due to randomness (though here randomness is mostly from which mares were observed or missed, since it’s population-level).
Explore Alternative Outcome Metrics: The study focused on foaling and conception rates. Another complementary metric is population growth rate. By combining foaling rates with mortality rates, one can estimate the annual population growth and see how it has changed. The authors mention zero-population-growth targets and that other studies took years to see a decline. A population projection model (even a simple exponential or matrix model) could be used to estimate the growth rate with and without the observed fertility control. This would translate the findings into a more aggregate outcome (herd growth slowed from x% to y% per year). It’s not purely a statistical method, but rather a modelling exercise that could strengthen the argument that pzp moved the herd from a growing state to a near-stable state. Coupling such a model with uncertainty from the data (via simulation) would further enhance the rigour. The research study employed basic methodologies, which produced easy-to-understand descriptive results, although more complex statistical methods should be used to verify and expand these findings. The analysis would gain strength through applying GLMMS for treatment effect and survival analysis for duration, alongside causal inference for unbiased effect estimation. The research methods would demonstrate the same conclusion regarding pzp darting effectiveness in lowering wild horse birth rates, but with strengthened evidence from statistical significance, controlled comparisons, quantitative effect size, and longevity measurements. The application of rigorous methods is vital because it enhances scientific precision and helps decision-makers rely on exact numbers (e.g., “The vaccine will reduce foaling probability by at least X% for up to Y months at a 95% confidence level”).
Future analyses of this dataset or similar field studies should utilise mixed-effects logistic models to estimate efficacy while accounting for repeated measures, apply survival analysis to determine how long the contraceptive effect lasts per treatment, carefully control for confounding factors either by design or statistical adjustment, and include measures of statistical uncertainty. This study’s excellent large-scale field effort will achieve equal robustness in statistical evidence through proper analysis, thus establishing its findings and guiding best practices for wild horse population management with enhanced accuracy.
Summary of the Paper
The study investigates the immunocontraceptive efficacy of Porcine Zona Pellucida (pzp) treatment on the free-roaming horse population in the Virginia Range. Over four years (2019-2022), researchers monitored 2,817 mares, tracking their vaccination records and reproductive outcomes. The results indicated a significant reduction in foaling rates, suggesting a nearly 60% decrease in pregnancies attributable to the PZP treatment. However, criticisms arose regarding the study’s methodology, which lacked rigorous statistical analysis and adequate control for confounding variables. Recommendations for future research highlighted the necessity of employing mixed-effects models and survival analysis to improve the robustness of findings related to vaccine efficacy and its duration of effect.
Recommendations for Future Research
- Engage Statistical Experts: Collaborate with a statistician with experience analysing ecological data, particularly involving wild horse populations. Their expertise can enhance the rigour of the statistical methods used.
- Local Research Collaboration: Involve researchers who are geographically closer to the Virginia Range horses. This can provide valuable insights into local environmental factors and horse behaviour that may influence reproductive outcomes.
- Mixed-Effects Models: To analyse the data, use mixed-effects models. This approach can account for individual mare variation and repeated measures, providing a clearer understanding of the treatment effects.
- Survival Analysis: Conduct survival analysis to accurately assess the duration of PZP contraceptive effects on the mare population. This method can help determine how long the vaccine remains effective post-treatment.
- Control for Environmental Variables: Incorporate environmental factors such as forage availability and predator presence into the analysis to control for confounding influences on foaling rates.
- Longitudinal Tracking and Comparison: Implement a longitudinal design that allows for before-and-after comparisons within the same mares, offering more precise insights into the treatment’s effects over time.
- Community Engagement: To ensure the research aligns with community goals and conservation efforts, foster relationships with local stakeholders and horse management organisations.
By enhancing statistical rigour and incorporating localised expertise, future research can produce more reliable findings that support effective wild horse population management strategies.
Dr. Meredith Hudes-Lowder
Biostatistician
©April 2025

