Effect of Population Size on Navy Rating Examination Performance1

Grover E. Diehl, Exam Branch
and
S. Darlene Barrow, Statistical Analysis Section
Navy Advancement Center
Pensacola, Florida

The Navy Advancement Center (NAC) uses a number of indices to monitor and control the performance of enlisted promotion examinations. These include specific indices for item analysis, differential analysis of sections within exams, and comparison of exams against the exam cohort and external criteria. There are two criteria of concern in this paper. The first involves identifying exam items (prior to scoring) that do not meet certain statistical parameters. These items appear on the Random Verification List (RVL). The second involves the ability of exams to produce individual items in broad performance categories. These categories are summarized in the Analysis of Categorization (AOC). The main independent variable in the analysis is the sample size of the number of examinees taking individual advancement examinations. While the expectation was that sample size is associated with negative performance due to mathematical limitations, the tests used in the study were nondirectional.

Sample size of examinees was not significantly related to RVL incidence. Sample size of examinees was significantly related to the likelihood of appearance in broad performance categories, with lower sample size items increasingly likely to fall in the lesser desirability categories. In the latter case, the proportion of variance accounted for by sample size was at most 28%. The data suggest that while sample size is related to negative performance, up to 79% of the variance remains undefined. Examination developers should continue research to explain factors associated with the remaining variance.

The Navy Advancement Center (NAC) uses a number of indices to monitor and control the performance of enlisted promotion examinations. These include specific indices for item analysis, differential analysis of sections within exams, and comparison of exams against the exam cohort and external criteria. There are two criteria of concern in this paper. The first involves identifying exam items (prior to scoring) that do not meet certain statistical parameters. These items appear on the Random Verification List (RVL). The second involves the ability of exams to produce individual items in broad performance categories. These categories are summarized in the Analysis of Categorization (AOC).

There is current within the community of NAC instructional systems specialists (ISSs) the assumption that the size of the population being tested biases exams toward negative performance characteristics. Much of this opinion is based on the fact that performance standards are liberalized as exam population decreases. This also occurs with known restrictions of range with correlation coefficients as population (or sample) decrease. Finally, it fits the practical experience of the ISSs. For these reasons, many ISS and subject matter experts, when presented with negative performance indices, immediately look to sample size for relief.

While the intuitive wisdom and psychometric logic strongly suggest sample size to be a significant causative factor, there is no empirical evidence one way or the other. The purpose of this paper is to evaluate the subject by proving the following two general hypotheses:

  1. There is no linear association between population (sample) size and number of exam items appearing on the RVL. Ho: r = 0.
  2. There is no linear association between population (sample) size and proportion of exam items falling in the three broad categories used to grade the general quality of items. Ho: r = 0 for each of three item categories.

____________________________
1The opinions expressed in this paper are those of the authors and do not necessarily represent those of the Department of the Navy or the U.S. Government.

Random Verification List

Method

Data. The RVL accounts for 80 to 90 percent of the total population of individuals in an exam cycle. There were 79 listed exam ratings for Cycle 150 (E7 candidates). Hits is defined as exam rate items selected for inclusion on the RVL (items that do not meet certain statistical parameters). Nsamp is defined as the number of individuals by exam rate when the RVL was produced. BR is defined as branch membership code. There were five branch membership codes.

Procedure. A multiple regression model was applied to the data. Hits was the dependent variable and Nsamp and BR were the independent variables. Since BR is a qualitative variable, four quantitative indicator variables were used. The probability for significance was 0.05 or less.

Statistical routine. SPSS release 5 for DOS running on a Digital DECPC XL server 590 in the Statistical Analysis Section, Data Analysis Branch, Enlisted Advancement Division, NAC, was used to analyze the data with the following commands:

REGRESSION VARIABLES = BRI TO BR4, HITS, NSAMP

/DESCRIPTIVES = MEAN STDDEV CORR SIG N
/STATJSTICS = ALL
/DEPENDENT HITS
/METHOD = ENTER BRT TO BR4
/METHOD = ENTER NSAMP
/DESCRIPTIVES = MEAN STDDEV CORR SIG N
/STATISTICS = ALL
/DEPENDENT = HITS
/METHOD = ENTER NSAMP
/IMETHOD = ENTER BR1 TO BR4.

Results

There was an average of 10.3 (SD = 5.2) Hits for the 79 exam ratings reviewed. The average Nsamp was 529.8 (SD = 543.9). The simple zero order correlation of Hits with Nsamp was -0.1746, p> 0.05. Table 1 shows the analysis of variance decomposition for Hits with Nsamp after partialing out branch membership. Table 2 shows the analysis of variance decomposition for Hits with branch membership after partialing out Nsamp.

Table 1

Analysis of Variance Decomposition of Likelihood of Inclusion on the NAC Random Verification Listing with Sample Size after Partialing out Branch Membership
(Exam Ratings = 79)

Source

df

SS

MS

F

prob.

Total Regression

5

205.82266

41.16453

1.575

>0.05

Due to Branch Only in Model

4

162.23103

40.55776

 

 

Due to Nsamp Given Branch in Model

1

43.59163

43.59163

1.668

>0.05

Residual

73

1907.62038

26.13179

 

 

Total

78

2113.44304

 

 

 

 

Table 2

Analysis of Variance Decomposition of Likelihood of Inclusion on the NAC Random Verification Listing with Branch Membership after Partialing out Sample Size
(Exam Ratings = 79)

Source

df

SS

MS

F

prob.

Total Regression

5

205.82266

41.16453

1.575

>0.05

Due to Branch Given Nsamp in Model

4

141.3868

35.3467

1.353

>0.05

Due to Nsamp Only in Model

1

64.43586

64. 43586

 

 

Residual

73

1907.62038

26.13179

 

 

Total

78

2113.44304

 

 

 

 

Discussion

Since Nsamp was not significantly associated with Hits on either a zero order correlation or analysis of variance basis, no linear association was observed in these data. Branch membership, in fact accounted for a little more than three times as much proportion of variance as did Nsamp. That is, the coefficient of partial determination (r2) for branches (after removing the effect of Nsamp) was approximately 0.069, while Nsamp (after removing the effect of branch membership) was approximately 0.022. In fact, the R2 (coefficient of multiple determination) on the zero order correlation of Hits with Nsamp was not much better, 0.030. Within the context of the present study, the finding related to branch membership is surprising.

Category 1 Items

Method

Data. The AOC accounts for 95 to 100 percent of the total population of individuals in an exam cycle. There were 80 listed exam ratings for Cycle 150 (E7 candidates). Category l items are items that do not meet minimum standards for further inclusion in exams. Cat1per is defined as the percent of category 1 items of the total number of test items (150). Samp is defined as the number of individuals by exam rate when the AOC was produced. BR is defined as branch membership code. There were five branch membership codes.

Procedure. A multiple regression model was applied to the data. Catlper was the dependent variable and Samp and BR were the independent variables. Since BR is a qualitative variable, four quantitative indicator variables were used. The probability for significance was 0.05 or less.

Statistical routine. SPSS release 5 for DOS running on a Digital DBCPC XL server 590 in the Statistical Analysis Section, Data Analysis Branch, Enlisted Advancement Division, NAC, was used to analyze the data with the following commands:

REGRESSION VARIABLES = BR1 TO BR4 SAMP CAT1PER

/DESCRJPTJVES = MEAN STDDEV CORR SIG N
/STATISTICS ALL
/DEPENDENT CATI PER
/METHOD = ENTER BR1 TO BR4
/METHOD = ENTER SAMP
/DESCRIPTIVES = MEAN STDDEV CORR SIG N
/STATISTICS = ALL
/DEPENDENT = CAT1PER
/METHOD = ENTER SAMP
/METHOD = ENTER BR1 TO BR4.

Results

There was an average of 29.4 percent (SD = 10.6 percent) of Category 1 items for the 80 ratings reviewed. The average Samp was 490.4 (SD = 512). The simple zero order correlation of Cat1per with Samp was -0.4197, p<0.05. Table 3 shows the analysis of variance decomposition for Catlper with Samp after partialing out branch membership. Table 4 shows the analysis of variance decomposition for Cat I per with branch membership after partialing out Samp.

Table 3

Analysis of Variance Decomposition of Likelihood of Category 1 Items on Exams with Sample Size after Partialing out Branch Membership
(Exam Ratings = 80)

Source

df

SS

MS

F

prob.

Total Regression

5

2843.77214

568.75443

6.920

<0.05

Due to Branch Only in Model

4

1232. 76069

308.19017

 

 

Due to Sample Given Branch in Model

1

1611.01145

1611.01145

19.600

<0.05

Residual

74

6082.47415

82.19560

 

 

Total

79

8926.24629

 

 

 

 

Table 4

Analysis of Variance Decomposition of Likelihood of Category 1 Items on Exams with Branch Membership after Partialing out Sample Size
(Exam Ratings = 80)

Source

df

SS

MS

F

prob.

Total Regression

5

2843.77214

568.75443

6.920

<0.05

Due to Branch Given Sample in Model

4

1271. 7571

31793928

3.868

<0. OS

Due to Sample Only in Model

1

1572.01504

1572.01504

 

 

Residual

74

6082.47415

82.19560

 

 

Total

79

8926.24629

 

 

 

Discussion

Samp was significantly related to incidence of Cat1per, accounting for approximately 21 percent of the variance (r2 = 0.209) after removing the effect of branch membership. On a simple zero order correlation basis the R2 between the two variables was slightly less, 0.176. Branch membership accounted for approximately 17 percent of the variance after removing the effect of Samp (r2=0.173). This curious incidence of higher Samp/Cat1per correlation after extracting branch membership variance may be due to branch membership taking out relatively more error variance than true variance and "enhancing" the effect of Samp.

Category 2 Items

Method

Data. The AOC accounts for 95 to 00 percent of the total population of individuals in an exam cycle. There were 80 listed exam ratings for Cycle 150 (E7 candidates). Category 2 items are items that if modified will meet minimum standards for further inclusion in exams. Cat2per is defined as the percent of category 2 items of the total number of test items (150). Samp is defined as the number of individuals by exam rate when the AOC was produced. BR is defined as branch membership code. There were five branch membership codes.

Procedure. A multiple regression model was applied to the data. Cat2per was the dependent variable and Samp and BR were the independent variables. Since BR is a qualitative variable, four quantitative indicator variables were used. The probability for significance was 0.05 or less.

Statistical routine. SPSS release 5 for DOS running on a Digital DECPC XL server 590 in the Statistical Analysis Section, Data Analysis Branch, Enlisted Advancement Division, NAC, was used to analyze the data with the following commands:

REGRESSION VARIABLES = BR1 TO BR4 SAMP CAT2PER

/DESCRIPTIVES = MEAN STDDEV CORR SIG N
/STATISTICS = ALL
/DEPENDENT = CAT2PER
/METHOD = ENTER BR1 TO BR4
/METHOD = ENTER SAMP
/DESCRIPTIVES = MEAN STD DLV CORR SIG N
/STATISTICS = ALL
/DEPENDENT = CAT2PER
/METHOD = ENTER SAMP
/METHOD = ENTER BR1 TO BR4.

Results

There was an average of 12.7 percent (SD = 5.2 percent) of Category 2 items for the 80 ratings reviewed. The average Samp was 490.4 (SD = 512). The simple zero order correlation of Cat2per with Samp was -0.4815, p<0.05. Table 5 shows the analysis of variance decomposition for Cat2per with Samp after partialing out branch membership. Table 6 shows the analysis of variance decomposition for Cat2per with branch membership after partialing out Samp.

Table 5

Analysis of Variance Decomposition of Likelihood of Category 2 Items on Exams with Sample Size after Partialing out Branch Membership
(Exam Ratings = 80)

Source

df

SS

MS

F

prob.

Total Regression

5

543.34872

108.66974

5.099

<0.05

Due to Branch Only in Model

4

136.62598

34.15649

 

 

Due to Sample Given Branch in Model

1

406.72274

406.72274

19.083

<0. 05

Residual

74

1577.16293

21.31301

 

 

Total

79

2120.51165

 

 

 

 

Table 6

Analysis of Variance Decomposition of Likelihood of Category 2 Items on Exams with
Branch Membership after Partialing out Sample Size
(Exam Ratings 80)

Source

df

SS

MS

F

prob.

Total Regression

5

543.34872

108.66974

5.099

<0.05

Due to Branch Given Sample in Model

4

51.81405

12.95351

0.608

>0.05

Due 10 Sample Only in Model

1

491.53467

491.53467

 

 

Residual

74

1577.16293

21.31301

 

 

Total

79

2120.51165

 

 

 

 

Discussion

Samp was significantly related to incidence of Cat2per, accounting for approximately 21 percent of the variance (r2 = 0.205) after removing the effect of branch membership. On a simple zero order correlation basis the R2 between the two variables was slightly higher, 0.232. Branch membership accounted for approximately 3 percent of the variance after removing the effect of Samp (r2 = 0.032). This is the first instance of an insignificant branch membership effect observed in the present category study. Note too that Samp accounted for more than six times the incidence of Cat2per variance as did branch membership (a factor of 6.41).

Category 3 Items

Method

Data. The AOC accounts for 95 to 100 percent of the total population of individuals in an exam cycle. There were 80 listed exam ratings for Cycle 150 (E7 candidates). Category 3 items are items that meet minimum standards for further inclusion in exams. Cat3per is defined as the percent of category 3 items of the total number of test items (150). Samp is defined as the number of individuals by exam rate when the AOC was produced. BR is defined as branch membership code. There were five branch membership codes.

Procedure. A multiple regression model was applied to the data. Cat3per was the dependent variable and Samp and BR were the independent variables. Since BR is a qualitative variable, four quantitative indicator variables were used. The probability for significance was 0.05 or less.

Statistical routine. SPSS release 5 for DOS running on a Digital DECPC XL server 590 in the Statistical Analysis Section, Data Analysis Branch, Enlisted Advancement Division, NAC, was used to analyze the data with the following commands:

REGRESSION VARIABLES = BR1 TO BR4 SAMP CAT3PER

/DESCRIPTIVES = MEAN STDDEV CORR SIG N
/STATJSTICS ALL
/DEPENDENT = CAT3PER
/METHOD = ENTER BR1 TO BR4
/METHOD ENTER SAMP
/DESCRIPTIVES = MEAN STDDEV CORR SIG N
/STATISTICS = ALL
/DEPENDENT CAT3PER
/METHOD = ENTER SAMP
/METHOD = ENTER BRl TO BR4.

Results

There was an average of 52.9 percent (SD = 14.2 percent) of Category 3 items for the 80 ratings reviewed. The average Samp was 490.4 (SD = 512). The simple zero order correlation of Cat3per with Samp was 0.5025, p<0.05. Table 7 shows the analysis of variance decomposition for Cat3per items with Samp after partialing out branch membership. Table 8 shows the analysis of variance decomposition for Cat3 per with branch membership after partialing out Samp.

Table 7

Analysis of Variance Decomposition of Likelihood of Category 3 Items on Exams with Sample Size after Partialing out Branch Membership
(Exam Ratings 80)

Source

df

SS

MS

F

prob.

Total Regression

5

5860.02907

1172.00581

8.677

<0.05

Due to Branch Only in Model

4

2001.20613

500.30153

 

 

Due to Sample Given Branch in Model

1

3858. 82294

3858.82294

28.568

<0.05

Residual

74

9995.52721

135.07469

 

 

Total

79

15855.55628

 

 

 

 

Table 8

Analysis of Variance Decomposition of Likelihood of Category 3 Items on Exams with Branch Membership after Partialing out Sample Size
(Exam Ratings = 80)

Source

df

SS

MS

F

prob.

Total Regression

5

5860.02907

1172.00581

8.677

<0.05

Due to Branch Given Sample in Model

4

1855. 83934

463.95984

3.435

<0.05

Due to Sample Only in Model

1

4004.18973

4004.18973

 

 

Residual

74

9995.52721

135.07469

 

 

Total

79

15855.55628

 

 

 

 

Discussion

Samp was significantly related to incidence of Cat3per, accounting for approximately 28 percent of the variance (r2 =0.279) after removing the effect of branch membership. On a simple zero order correlation basis the R2 between the two variables was slightly less, 0.253. Branch membership accounted for approximately 16 percent of the variance alter removing the effect of Samp (r2=0.157). In all three of these instances, the probabilities of the correlations were significant.

Conclusions

The finding that sample size was not significantly related to incidence of number of exam items appearing on the random verification listing will surprise many. The conventional wisdom holds that sample size drives all negative performance indices. Not only was sample size not a significant contributor, but branch membership, although not significant either, accounted for a greater proportion of variance as did sample size. The data may suggest that random verification exam items are predominantly random variables. The observation that the correlation of sample size with random verification exam items was negative suggests that there is perhaps some scintilla of effect due to sample size, but that it can be effectively ignored. Exam items appearing on the random verification list should be evaluated as independent events.

On the other hand, sample size was found to be significantly related to the percents of category exam items, negatively for Category 1 and 2 items, and positively with Category 3 items. in one respect this is an expected outcome since the point bi-serial correlation (rpb) is used by the Naval Advancement Center as an index of discrimination, which is in turn a component of category. The range of rpb (normally - 1.0 to 1.0) shrinks as sample size approaches 0. Thus, small sample sizes mathematically influence item category. That said, though, sample size remains a significant source of variance in the determination of item category. The next logical concern is how large is the effect. In these data, the largest effect was for the identification of Category 3 items, where the coefficient of partial determination (r2 ) was 0.279. More than one quarter of the total variance contributing to Category 3 items was accounted for by sample size. Data for Category 1 and 2 items identified similar trends but to lesser extent.

Finally, it should be noted that sample size failed to account for up to 79% of category variance in the examinations studied here. It occurs that the Naval Advancement Center has separate decision making tables for low sample size examinations, softening the use or non-use of Category 1 and 2 items. While these data do not contradict this policy, they do suggest that identification as Category 1 or 2 represents real performance deficits that should be addressed. Simply writing it off as a "low sample size problem" is not sufficient.

Back to Table of Contents