Difference between revisions of "Variance Components"

From BIF Guidelines Wiki
m
Line 8: Line 8:
 
Both REML and the Bayesian methods work with the likelihood of the data. A likelihood is a measure how likely a set of data is for different values of the parameters. In the case of REML, it uses the likelihood of the observed residuals where the residual, <math>\hat r=y-\hat y</math>, is the difference between the observed data, <math>y</math>, and the estimated data, <math>\hat y</math>, given fixed effects such as contemporary group effects. The REML estimates are then the values of the variance components that maximizes the likelihood of the observed residuals. Except, in simple cases it is not possible to maximize residual likelihood directly and a variety of iterative algorithms such as Expectation Maximization, Fisher Scoring, and Average Information have been used to numerically find the estimates. These algorithms do have in common in that they all use BLUP of the random effects obtained from the mixed model equations.
 
Both REML and the Bayesian methods work with the likelihood of the data. A likelihood is a measure how likely a set of data is for different values of the parameters. In the case of REML, it uses the likelihood of the observed residuals where the residual, <math>\hat r=y-\hat y</math>, is the difference between the observed data, <math>y</math>, and the estimated data, <math>\hat y</math>, given fixed effects such as contemporary group effects. The REML estimates are then the values of the variance components that maximizes the likelihood of the observed residuals. Except, in simple cases it is not possible to maximize residual likelihood directly and a variety of iterative algorithms such as Expectation Maximization, Fisher Scoring, and Average Information have been used to numerically find the estimates. These algorithms do have in common in that they all use BLUP of the random effects obtained from the mixed model equations.
  
For a trait such as calving difficulty genetic evaluation might use a [[[[Glossary#T| threshold model]] <ref>Gianola, D., and J. L. Foulley. 1983. Sire evaluation for ordered categorical data with a threshold model. Genet. Sel. Evol. 15(2):201-224. </ref> instead of a linear mixed model. In the case of a threshold model, it is no longer feasible to find either the likelihood or the residual likelihood. For threshold models where REML is no longer an option, penalized quasi-likelihood based methods <ref> Breslow, N. E., and D. G. Clayton. 1993. Approximate inference in generalized linear mixed models. J. Amer. Statist. Assoc. 88:9-25. </ref> can be used to obtain REML like estimates of the variance components.  
+
For a trait such as calving difficulty genetic evaluation might use a [[Glossary#T| threshold model]] <ref>Gianola, D., and J. L. Foulley. 1983. Sire evaluation for ordered categorical data with a threshold model. Genet. Sel. Evol. 15(2):201-224. </ref> instead of a linear mixed model. In the case of a threshold model, it is no longer feasible to find either the likelihood or the residual likelihood. For threshold models where REML is no longer an option, penalized quasi-likelihood based methods <ref> Breslow, N. E., and D. G. Clayton. 1993. Approximate inference in generalized linear mixed models. J. Amer. Statist. Assoc. 88:9-25. </ref> can be used to obtain REML like estimates of the variance components.  
  
 
'''Bayesian methods'''
 
'''Bayesian methods'''

Revision as of 12:30, 3 June 2019

Methods such as BLUP, Single-step Genomic BLUP, and Single-step Hybrid Marker Effects Models used to predict Expected Progeny Difference (EPD) are based on models which include random effects. Associated with these random effects are parameters known as variance components. For example, a typical model to predict the EDP using BLUP would include random effects for the additive genetic merit and environmental effects. Each of these would have an associated variance component, in this case additive genetic variance and environmental variance which quantify the amount variability associated with the two random effects. As these variance components are unknown they must be estimated.

For methods based on linear mixed models, such as BLUP and Single-step genomic BLUP variance components, can be estimated using Residual Maximum Likelihood (REML) [1] while for methods based on Bayesian models, such as Single-step Hybrid Marker Effects Models, variance components can be estimated by corresponding Bayesian methods [2].

REML

Both REML and the Bayesian methods work with the likelihood of the data. A likelihood is a measure how likely a set of data is for different values of the parameters. In the case of REML, it uses the likelihood of the observed residuals where the residual, , is the difference between the observed data, , and the estimated data, , given fixed effects such as contemporary group effects. The REML estimates are then the values of the variance components that maximizes the likelihood of the observed residuals. Except, in simple cases it is not possible to maximize residual likelihood directly and a variety of iterative algorithms such as Expectation Maximization, Fisher Scoring, and Average Information have been used to numerically find the estimates. These algorithms do have in common in that they all use BLUP of the random effects obtained from the mixed model equations.

For a trait such as calving difficulty genetic evaluation might use a threshold model [3] instead of a linear mixed model. In the case of a threshold model, it is no longer feasible to find either the likelihood or the residual likelihood. For threshold models where REML is no longer an option, penalized quasi-likelihood based methods [4] can be used to obtain REML like estimates of the variance components.

Bayesian methods

Bayesian methods make use of a prior distribution on the variance components in addition to the information coming from the likelihood to form the posterior distribution. In most cases, the prior is selected so that information coming from the likelihood dominates the information coming from the likelihood. The two predominate types of Bayes estimators are the posterior mode and the posterior mean. Posterior mode estimates are the values of the variance components which maximize the likelihood of the posterior distribution. Posterior mean estimates are the average values of the variance components sampled from the posterior distribution.

  1. Harville, D. A. 1977. Maximum likelihood approaches to variance component estimation and to related problems. Journal of the American Statistical Association 72(358):320-338.
  2. Gianola, D., and R. L. Fernando. 1986. Bayesian methods in animal breeding theory. Journal of Animal Science 63:217-244.
  3. Gianola, D., and J. L. Foulley. 1983. Sire evaluation for ordered categorical data with a threshold model. Genet. Sel. Evol. 15(2):201-224.
  4. Breslow, N. E., and D. G. Clayton. 1993. Approximate inference in generalized linear mixed models. J. Amer. Statist. Assoc. 88:9-25.