Difference between revisions of "Prediction Bias"

From BIF Guidelines Wiki
Line 5: Line 5:
 
We can approximate the degree of bias and under/over dispersion of EPD by using regression techniques. One such way to do this is to regress the EPD with more information (e.g., genomic EPD) on the EPD with less information (e.g, pedigree-based EPD). Our expectation is that the intercept from this regression is 0 (no bias) and the slope of the regression is 1 (no over or under dispersion). Our expectations come from the theory of BLUP where u^ is an unbiased estimator of u and that
 
We can approximate the degree of bias and under/over dispersion of EPD by using regression techniques. One such way to do this is to regress the EPD with more information (e.g., genomic EPD) on the EPD with less information (e.g, pedigree-based EPD). Our expectation is that the intercept from this regression is 0 (no bias) and the slope of the regression is 1 (no over or under dispersion). Our expectations come from the theory of BLUP where u^ is an unbiased estimator of u and that
  
Covar (EPD, EPD)/Var (EPD) = Covar (a,a)/Var(a) = Var(a)/Var(a) = 1
+
Covar (EPD, EPD)/Var (EPD) = Covar (1/2a,1/2a)/Var(1/2a) = Var(1/2a)/Var(1/2a) = 1
  
 
A fundamental assumption is that the ratio of variance components used to generate both sets of EPD are the same. if they are not, then the expectation of the regression coefficient being 1 no longer holds.
 
A fundamental assumption is that the ratio of variance components used to generate both sets of EPD are the same. if they are not, then the expectation of the regression coefficient being 1 no longer holds.
Line 11: Line 11:
 
Another approach is to regress phenotypes after being corrected for systematic effects on EPD. Here the expectation of the regression coefficient is 2.  
 
Another approach is to regress phenotypes after being corrected for systematic effects on EPD. Here the expectation of the regression coefficient is 2.  
  
Covar  
+
Covar (corrected phenotype, EPD)/var (EPD) = Covar (a +e, 1/2a)/var (1/2a) = 1/2 var (a) /1/4 var (a) =2
  
 
If EBV were used instead of EPD the expectation of the regression coefficient would be 1.
 
If EBV were used instead of EPD the expectation of the regression coefficient would be 1.
 +
 +
A key assumption is that the phenotype of the individual is not included in the EPD of that individual. Consequently, this approach lends itself to cross-validation or forward in time validation strategies whereby some set(s) of animals have their phenotypes masked in the genetic evaluation.

Revision as of 17:57, 11 June 2019

Bias

Let u be the true progeny difference (TPD) and u^ be our estimate (EPD). From this we could estimate the degree of bias in our estimate by determining the difference in the mean u and mean u^. However, we never observe the TPD. Instead we estimate it using pedigree, performance, and genomic data.

We can approximate the degree of bias and under/over dispersion of EPD by using regression techniques. One such way to do this is to regress the EPD with more information (e.g., genomic EPD) on the EPD with less information (e.g, pedigree-based EPD). Our expectation is that the intercept from this regression is 0 (no bias) and the slope of the regression is 1 (no over or under dispersion). Our expectations come from the theory of BLUP where u^ is an unbiased estimator of u and that

Covar (EPD, EPD)/Var (EPD) = Covar (1/2a,1/2a)/Var(1/2a) = Var(1/2a)/Var(1/2a) = 1

A fundamental assumption is that the ratio of variance components used to generate both sets of EPD are the same. if they are not, then the expectation of the regression coefficient being 1 no longer holds.

Another approach is to regress phenotypes after being corrected for systematic effects on EPD. Here the expectation of the regression coefficient is 2.

Covar (corrected phenotype, EPD)/var (EPD) = Covar (a +e, 1/2a)/var (1/2a) = 1/2 var (a) /1/4 var (a) =2

If EBV were used instead of EPD the expectation of the regression coefficient would be 1.

A key assumption is that the phenotype of the individual is not included in the EPD of that individual. Consequently, this approach lends itself to cross-validation or forward in time validation strategies whereby some set(s) of animals have their phenotypes masked in the genetic evaluation.