Explanations for Sign Changes in Correcting for Range Restriction
Janet D. Held
Navy Personnel Research and Development Center
San Diego, California 92152-7250
Ree, Caretta, Earles, and Albert (1994) recently provided a discussion with empirical findings regarding the potential for a corrected validity coefficient to change sign from its uncorrected condition. The discussion is extended in this paper with an examination of the basis for a sign change. Conceptually simplified range restriction correction formulas (univariate and multivariate) were used. The study data were scores on the Armed Services Vocational Aptitude Battery (ASVAB) obtained from a Navy applicant population. A suitable criterion was designated as one of the ten ASVAB tests so that population (unrestricted) validities were known.
The general restriction in range problem (Pearson, 1903; Lawley, 1943) in military personnel selection research is to find the predictor/criterion correlation (validity) for the applicant population of interest, for which only predictor (selection instrument) information is available. On the basis of complete predictor/criterion information obtainable for a restricted subset of the applicant population (students selected at some predetermined minimum aptitude level), correction formulas can be used to estimate the unrestricted applicant population validity (from which future students will be school selected). The accuracy of this estimated (corrected) validity is contingent on the degree to which certain data assumptions have been met. These assumptions for the bivariate case of one predictor and one criterion are (1) linearity of regression of the criterion, y, on the predictor, x, (2) homoscedasticity of y error variance for all values of x, and (3) selection having occurred solely on x. The bivariate formula (correction for explicit selection) commonly encountered in the literature is,
|
|
(1) |
(Case 1 from Guilford, 1965, p. 141; Case A from Thorndike, 1982, p. 210).
However, Equation 1 can be conceptually simplified in the more familiar form,
|
|
(2) |
where RXY is the corrected validity coefficient, SXY represents the unknown unrestricted population covariance, SY represents the unknown unrestricted population criterion standard deviation, and SX is the known unrestricted population predictor standard deviation. SXY is derived from the linearity identity where unrestricted and restricted slopes are assumed equal. SY is derived from the homoscedasticity identities where unrestricted and restricted standard errors of estimate across the entire range of the predictor are assumed equal. These slope and error identities are, respectively,
|
|
(3) |
and
|
|
(4) |
Without solving the problem here (see Gulliksen, 1950, Chapter 13 for derivations and Held & Foley, 1994 for formula applications), the numerator in Equation 2, the corrected covariance, presents the only possible opportunity for a negative sign. Further, from the linearity assumption, the corrected covariance is derived applying the restricted sample weight (unstandardized regression coefficient, or slope). Formally,
|
|
(5) |
and
|
|
(6) |
where the sign of b determines the sign of both the restricted and corrected covariance, and therefore, the restricted and corrected validity.
The bivariate correction just reviewed is the singular case of the general multivariate correction (Gulliksen, 1950, Chp 13; Lawley, 1943). The multivariate correction formulas are merely matrix algebra extensions of the univariate case. The multivariate correction treating incidental selector variables as explicit is typically applied by military psychologists in an attempt to isolate all selection factors (Novick & Thayer, 1969) and because the procedure has been shown to be, generally, more accurate than the univariate correction (Booth-Kewley, 1985). The multivariate correction has also been shown to be more accurate with very large samples under violations of the correction assumptions and at stringent selection ratios, where inaccurate corrections are typically found. This study of small samples, however, reveals the inadequacies of the multivariate correction with small samples, where sampling errors in regression weights are high (bouncing betas).
Taking Equation 2 as the conceptually simplified correction formula for the bivariate case, we need only generalize the multivariate covariance derivation, parallel to SXY. That parallel is, in matrix algebra notation,
|
|
(7) |
where, for multiple predictors but only one criterion, CXY and w/yx are covariance and full least squares regression weight vectors and CXX is a square matrix of predictor (selector) variances/covariances. Just as in the bivariate case, CXY is derived from linearity identities; however, now the identities apply to multiple selectors (all selector variables treated mathematically as explicit). Each covariance term, CXiY in CXY, is derived through matrix algebra (Horst, 1963, provides a clear presentation of matrix algebra for social scientists). This involves summing the multiplicative terms in two vectors; the particular selector variance with that selector's weight, and the subsequent covariances between that selector and every other selector times that other selector's weight. Given all selector variables are positively correlated, negative covariances in CXY will be obtained through the matrix multiplication if and only if at least one weight is negative. And, there must be sufficient magnitude or number of negative weights to produce the negative corrected covariance term (and thus, the negative corrected validity).
In order for a composite of tests to have a negative covariance, a negative or combination of negative covariances for composite tests must be of sufficient magnitude to offset any positive test covariance terms. This positive to negative sign change, a result of inadequate data, is explained first followed by the logically occurring negative to positive sign change. All predictors and the criterion are positively correlated in this study's unrestricted population.
The ASVAB selector composite, VEAR (Verbal + Arithmetic Reasoning) was used in this study to select five random samples of 50 cases each from a Navy applicant population at the .10 selection (acceptance) ratio.
Table 1 gives results of a univariate correction (VEAR is treated as the explicit selector variable) and results of a modified univariate/multivariate correction (VE and AR are treated separately as explicit selector variables). The criterion was the ASVAB test, Mechanical Comprehension (MC), which was used as a surrogate criterion in an earlier study of Navy mechanical school selection composites (Held & Foley, 1994). The modified two-variable correction gives virtually the same results as a univariate (bivariate) correction, and is reported here only as aid in explaining the full multivariate model (all ASVAB predictor tests treated mathematically as explicit selector variables).
There are no negative weights in Table 1 or corrected validity sign changes from positive to negative. Explanations for variability in correction accuracy are related to the violations of the assumption for performing the corrections, which are compounded by the small sample sizes (N = 50), and the sampling error of regression weights due to small samples.
Table 1
Uncorrected and Corrected Validities and Unstandardized Regression Weights
used in Univariate and Modified Multivariate Corrections.
|
Univariate Case |
Multivariate Case |
|||||
|
Weights |
Weights |
|||||
|
Groups |
Ru |
Rc |
VEAR |
Rc |
VE |
AR |
|
Unrestricteda |
.687 |
.687 |
.451 |
.687 |
.425 |
.472 |
|
Restrictedb |
.184 |
.838 |
.601 |
.838 |
.549 |
.649 |
|
Sample 1 (N=50) |
.145 |
.759 |
.432 |
.753 |
.631 |
.274 |
|
Sample 2 (N=50) |
.092 |
.559 |
.301 |
.528 |
.483 |
.118 |
|
Sample 3 (N=50) |
.251 |
.917 |
1.004 |
.913 |
1.079 |
.912 |
|
Sample 4 (N=50) |
.121 |
.682 |
.332 |
.682 |
.395 |
.283 |
|
Sample 5 (N=50) |
.315 |
.938 |
1.031 |
.933 |
1.386 |
.765 |
Note. Ru and Rc are the uncorrected and corrected validities, respectively. VEAR (Verbal + Arithmetic Reasoning) is the selection composite.
a147,288 Navy applicants; b13,684 Navy applicants selected by VEAR at SR = .10.
Table 2 gives results for the eight variable multivariate correction (the 2 explicit selection variables, VE and AR; and the 6 remaining ASVAB incidental selection variables treated as explicit). Sample 4 of Table 2 shows a negative sign change in the corrected validity, which can be attributed to the large (erratic) negative AR weight and its influence in the matrix algebra derivation of the AR covariance term.
Table 2
Uncorrected and Corrected Validities and Unstandardized Regression Weights
used in Eight Variable Multivariate Corrections.
|
Groups |
Weights |
|||||||||
|
Ru |
Rc |
VE |
AR |
MK |
AS |
GS |
EI |
NO |
CS |
|
|
Unrestricteda |
.687 |
.687 |
.065 |
.224 |
.173 |
.319 |
.137 |
.173 |
-.050 |
.029 |
|
Restrictedb |
.184 |
.719 |
-.070 |
.226 |
.264 |
.247 |
.168 |
.207 |
-.040 |
-.008 |
|
Sample 1 (N=50) |
.145 |
.822 |
.392 |
.411 |
.354 |
.164 |
-.180 |
-.133 |
.280 |
-.194 |
|
Sample 2 (N=50) |
.092 |
.345 |
-.363 |
-.177 |
.164 |
.247 |
.578 |
.012 |
.095 |
-.058 |
|
Sample 3 (N=50) |
.251 |
.870 |
.383 |
.622 |
.066 |
.266 |
.363 |
.121 |
.112 |
-.172 |
|
Sample 4 (N=50) |
.121 |
-.316 |
-.291 |
-1.044 |
.527 |
.318 |
.537 |
.004 |
-.057 |
-.065 |
|
Sample 5 (N=50) |
.315 |
.878 |
.793 |
.479 |
.503 |
.608 |
-.237 |
-.134 |
-.063 |
-.199 |
Note. Ru and Rc are the uncorrected and corrected validities, respectively. VEAR (Verbal + Arithmetic Reasoning) is the selection composite. The other ASVAB tests are Mathematics knowledge (MK), Auto and Shop Information (AS), General Science (GS), Electronics Information (EI), Numerical Operations (NO), and Coding Speed (CS). Mechanical Comprehension (MC) is the criterion. Raw score weights were derived from a stepwise multiple regression procedure.
a147,288 Navy applicants; b13,684 Navy applicants selected by VEAR at the .10 selection ratio.
To explain the negative to positive sign change, the three variable correction case of one explicit selector variable (VEAR) and two incidental selector variables (the criterion, MC, and a candidate replacement composite, VENOCS) was examined. The three variable formula commonly encountered in the literature is,
|
|
(8) |
(Case 3 from Guilford, 1965, p. 343; Case C from Thorndike, 1982, p.213), where z is designated as the incidental selector, and x and y are the explicit and criterion variables, respectively. As with the criterion, population values for z are unavailable (at least treated mathematically so). As in the bivariate case, Equation 8 can be conceptually simplified to Equation 2, and further to the multivariate case using matrix notation. However, CXY, the corrected VENOCS/MC covariance term (individual composite test covariances summed for the composite covariance term), is taken from the CYY matrix of derived incidental variance/covariance terms, as is the criterion standard deviation term (square root of the diagonal variance term).
The potential for a negative to positive sign change for the incidental selection composite validity can be evaluated from the equation for CYY,
|
|
(9) |
which is derived from the homoscedasticity identities. However, it is conceptually simpler to demonstrate the situation graphically in the following Figure.


Figure
Bivariate Predictor/Criterion Plot for the Explicit and Incidental Selector Variables
The data in the two graphs are from Sample 1 of this study. (Composite scores are sums of standardized test scores: M = 50, SD = 10 in norm population). The restricted validities of the explicit selector, VEAR and the incidental selector, VENOCS, are .145 and -.242, respectively. The restricted intercorrelation of the two selectors is .188. The unrestricted validities for VEAR and VENOCS are .678 and .420, respectively. The unrestricted intercorrelation of the two selectors is .700. The explanation for the second graph and the obvious negative predictor/criterion relationship, stems from the rather low unrestricted validity of the incidental selector compared to the moderately validity of the explicit selector, and the moderate intercorrelation of the two selectors. Complete truncation of the explicit selector at the stringent acceptance cutscore assures that at least few high criterion outliers will exist for at least a few low performing incidental selector scorers. Conversely, at least a few low outliers will exist for at least a few high performing incidental selector scorers. If the incidental selector were highly correlated with the explicit selector, the two graphs would be more similar. In fact, for this study, no negative restricted validities were found for other composites that were more highly correlated with VEAR.
The following is a simplified presentation of the multivariate correction for incidental range restriction for the three variable case graphed above using the data from Sample 1.
CXX = [210.05] (unrestricted VEAR SD); cxx = [3.31] (restricted VEAR SD)
cxy = [1.43 3.78] (restricted VEAR covariances with MC and VENOCS)
cyy = [ 29.53 -14.52 ] (restricted MC/VENOCS incidental variable
[-14.52 122.01 ] variance/covariance matrix)
wxy = c-1xxcxy = [1/(3.31)][1.43 3.78] = [.43 1.13]
CXY = CXX wxy = w/yx CXX = [210.05][.43 1.13] = [90.32 237.36]
CYY = cyy + w/yx[CXY - cxy] = (corrected MC and VENOCS variance/covariance matrix) =
[ 67.75 85.92 ]
[ 85.92 385.96 ]
RXY = 85.92 / (19.65
* 8.23). The corrected validity of VENOCS is .530, which deviates form the actual value of .420, but is of the correct sign. The two denominator values, 19.65 and 8.23, were derived from CYY as, respectively, the square roots of the MC and VENOCS variance terms (in the diagonal).This paper has described several conditions under which sign changes can occur when correcting validities for range restriction. In general, the negative to positive sign change when all selector variables and the criterion are positively correlated in the population, is a function of the intercorrelations of the selectors and criterion in the restricted data set, and cannot be viewed as an abnormal outcome. However, the positive to negative sign change under the same conditions may be a function of small and /or inadequate data set, and should be viewed as an unrealistic outcome. Methodological issues revealed in the study should lead one to use caution when applying the multivariate correction formulas with small samples.
References
Booth-Kewley, S. (1985). An empirical comparison of the accuracy of univariate and multivariate corrections for range restriction (NPRDC-TR-85-19). San Diego: Navy Personnel Research and Development Center.
Held, J. D., & Foley, P. P. (1994). Explanations for accuracy of the general multivariate formulas in correcting for range restriction. Applied Psychological Measurement, 18, 355-367.
Guilford, J. P. (1965). Fundamental statistics in psychology and education. New York: McGraw-Hill.
Gulliksen, H. (1950). Theory of mental tests. New York: John Wiley & Sons.
Horst, P. (1963). Matrix Algebra for Social Scientists. New York: Holt, Rinehart and Winson, Inc.
Lawley, D. (1943). A note on Karl Pearson's selection formula. Royal Society of Edinburgh, Proceedings, Section A, 62, 28-30.
Novick, M. R., & Thayer, D. T. (1969). An investigation of the accuracy of the Pearson selection formulas (ONR-RM-69-22). Princeton NF: Educational Testing Service.
Pearson, K. (1903). On the influence of natural selection on the variability and correlations of organs. Philosophical Transactions of the Royal Society, London, Series A, 200, 1-66.
Ree, M. J., Carretta, T. R., Earles, J. A., & Albert, W. (1994). Sign changes when correcting for range restriction: A note on Pearson's and Lawley's selection formulas. Journal of Applied Psychology, 79, 298-301.
Thorndike, R. L. (1982). Applied psychometrics. Boston: Houghton Mifflin Company
.