Shortening BARB
David Wright and Patrick Tapsfield
Human Assessment Laboratory
University of Plymouth
United Kingdom
1. Background and Introduction
Six of the component tests that comprise the British Army Recruit Battery (BARB) are combined to form a measure of psychometric G called the general trainability index (GTI). This is used as the sole measure of cognitive performance in soldier selection. Validity studies have shown that this composite is a valid predictor of success in training and in general it is accepted as a tool for selection by the Army. However, there are concerns about the absence of tests of literacy and numeracy and about the need for specialist attainment tests for selection for technical jobs.
The scope for reducing test times taken to produce the GTI has been under investigation for the last two years and the first two phases of this investigation were reported in Wright and Tapsfleld (1995). In section 2 of this paper we summarise this work. In section 3 we report the results of the third and final phase. This involved the development of shortened versions of BARB and an applicant test/retest trial in which a representative sample of applicants took the shortened version of BARB followed one week later by the full version. Our conclusions are summarised in section 4.
2. Previous Studies
Psychometric Models
Wright and Tapsfleld (1995) describe the application of psychometric models fitted to obtain test times from target reliabilities. The proposed times and modelled reliabilities for the current length tests and the shortened tests are summarised in Table 1.
Table 1: Predicted Test/Immediate Retest Reliabilities
|
|
Test |
|||||||
|
A2 |
LC |
ND |
RF |
SI |
T2 |
GTI |
||
|
Full |
Time (min) Reliability |
6 .80 |
4 .79 |
4 .85 |
4 .89 |
6 .75 |
5 .74 |
|
|
Reduced |
Time (min) |
- - |
4.25 .80 |
2 .74 |
2 .80 |
2.5 .80 |
3.75 .70 |
|
Item Analysis
Retests carried out in the experimental window enabled us to construct test and immediate retest scores for shortened tests from item level data. The results confirmed the validity of the predictions given in Table 1. A sample of recruits who elected to take a retest after a period of at least one month provided a basis for the assessment of test/long term retest reliability and long term gains in test scores. As expected, the test/long term retest reliabilities were generally smaller than the test/immediate retest reliabilities and the GTI composite for the full test was found to have a reliability of .78 for the reduced test compared with .81 for the full (see Wright and Tapsfleld (1995) for further details). Retest gains as large as 10 T score points were observed in this test/long term retest sample.
3. Applicant Study
Two trial versions of the battery
Two shortened versions of the BARB battery were included in the study. One was the same as the existing battery but with the reduced test times. In the other version, the instruction, practice and test sequences were modified in an attempt to reduce practice effects. There was a consolidated sequence of instructions and practice for all five tests at the start of the testing session. This was followed by the sequence of five tests. Each test was preceded by a 30 second countdown and a brief message reminding the subject of the test to come. The object of this modified version was to give the subject time to familiarise themselves and reflect on the tests. In the remainder of this report the shortened battery will be referred to as short BARB whilst the battery with the modified instructions, practice and test sequences will be referred to as mini BARB. The software placed in the trial ACIOs was programmed so that each machine alternated between the two versions.
Study Population
Applicants entering the ACIO were told about the trial and it was explained to them that their selection would depend on the full version of BARB that would follow about one week after they had done the trial version. It was emphasised that their performance on the trial version would have no bearing on any decisions concerning their selection. It was, however, pointed out that the practice gained in the trial should be of benefit to them when they took the full BARB. Each subject who agreed to enter the trial took one of the two experimental versions followed approximately one week later by the full BARB.
T Scores For Shortened Tests
Adjusted scores for the shortened versions of BARB were calculated using the formula
|
Adjusted Score = No. Correct - |
No. Incorrect |
(1) |
These were then scaled so that they are equivalent to the adjusted scores from the full tests by multiplying them by the ratio of the full test lengths to the shortened test lengths. For example, if the full test was 5 minutes and the shortened test length was 2 minutes, the adjusted score from the shortened test was multiplied by 5/2 = 2.5. The result was then converted to a T score using the means and standard deviations for the full test length. This process of rescaling and then calculating T scores should produce scores that are equivalent to the T scores from the full version of BARB. The justification for shortening the test depends on the extent to which these T scores have the same distribution as the full test scores and the size of the correlations with the full test scores. A GTI composite for the shortened versions of BARB was produced by taking an equally weighted composite of the five test scores. This composite was scaled so that its mean was equal to the average of the five means for the individual test and its standard deviation was 10. Comparison of the mean GTI from the shortened test with the hypothetical value of 50 and with the mean of the GTI from the full test are thus meaningful.
Summary Statistics
Summary statistics for the T scores from the shortened tests and the full tests are given in Table 2. Confidence intervals for the mean T scores are shown in Figure 1. The confidence intervals for each test are shown in pairs. The interval on the left of each pair is for the mini BARB version whilst that on the right is for the short BARB version.
Referring to Table 2 and Figure 1, we remark that the trial means are all within 4 T score points of their hypothetical value of 50. Tests LC and RF show a noticeable departure from a mean of 50. With LC both versions have means higher than 50. With RF the results are somewhat contradictory with mini BARB giving a low mean and short BARB a higher mean. Although, these differences are significant with an overall error rate of 5%, they are small from a practical viewpoint. Moreover, they can be dealt with by re-norming the tests. There is good agreement between the trimmed means and the untrimmed means. This confirms that our conclusions are not affected by outliers in the data. All the standard deviations in the table are close to the hypothesised value of 10 and again the robust estimates, obtained from the mean absolute deviation as described by Hoaglin et al. (1983), show that this conclusion is robust to outliers.
Table 2: Summary Statistics (T Scores)
|
Version |
LC |
ND |
RF |
SA |
T2 |
GTI |
|
|
Sample Size |
Mini BARB |
135 |
135 |
135 |
135 |
135 |
135 |
|
Mean |
Mini BARB |
53.8 |
50.7 |
53.3 |
49.8 |
50.1 |
51.5 |
|
Trimmed Mean |
Mini BARB |
54.0 |
50.8 |
53.4 |
49.9 |
50.2 |
51.6 |
|
SD |
Mini BARB |
11.7 |
12.3 |
12.3 |
11.2 |
11.4 |
10.0 |
|
Robust SD |
Mini BARB |
11.4 |
11.6 |
12.7 |
9.9 |
9.8 |
9.8 |
|
Minimum |
Mini BARB |
0 |
4 |
17 |
20 |
11 |
25 |
|
Maximum |
Mini BARB |
77 |
77 |
73 |
75 |
77 |
71 |
|
Proportion Guessing (%) |
Mini BARB |
0% |
3.7% |
2.2% |
0% |
0% |
- |
Figure 1: 95% Confidence Intervals For Test Means

Figure 2: 95% Confidence Intervals For Mean Score Increases

4. Conclusions and recommendations
We have examined the properties of scores from shortened versions of the BARB battery in terms of their distributional form, reliabilities and increases on retest. The results are very encouraging and suggest that a reliable GTI can be obtained from a revised set of tests that take about 35 mmutes to deliver, compared with the 60 minutes for the current version of BARB. With the unbiased test / retest sample included in this study, we have found smaller increases in test scores than have been obtained previously.
Two shorter experimental versions of BARB were trialed. One was the same as the existing battery but with shorter test times. The other was a modified version with a different instruction, practice and test configuration. On the basis of our statistical analysis we have found very little to choose between the two versions.
REFERENCES
Hoaglin, D. C., Mosteller, F. and Tukey, J. W., editors (1983). Understanding Robust and Exploratory Data Analysis. Wiley, New York.
Huber, P. J. (1981). Robust Statistics. Wiley, New York.
Tapsfleld, P.G.C (1993). The British Army Recruit Battery. Test-retest reliability. HAL Technical Report: 5-1993 APRE).
Wright, D.E. and Tapsfleld, P.G.C. (1995). Test Lengths and Reliability. Proceedings of the 37th Annual Conference of the International Military Testing Association. Toronto: Canadian Forces Personnel Applied Research Unit.