Development of a Technical Test Delivery Shell
John Humphreys and David Wright
1. Background and Introduction
The British Army has a requirement for a computer-delivered mathematics test to replace a paper and pencil test currently in use for selection and allocation into technical jobs. It is also anticipated that in the future there will be a demand for other technical tests covering domains such as engineering and electrical knowledge. Against this background we have developed a technical test delivery shell with the immediate aim of producing a test of mathematics knowledge and the longer-term aim of providing a shell which can be used to deliver other technical tests. The software runs under MS Windows and is capable of being used with a mouse or touch screen interface. To date the test has been tried on 92 recruits to technical jobs at the Army regional selection centre in Lichfield. In section 2 of this paper we describe the principles adopted in the design of the delivery shell and give examples of some of the mathematics items within the shell. In section 3, we give a brief overview of the results of the trial at Lichfield.
2. Design Principles
Items for delivery could be produced in any of the following forms:
(i) a fixed file of items produced by subject specialists
(ii) items generated by computer algorithm
(iii) items selected adaptively from an item bank
(iv) items generated adaptively by computer algorithm
In the design of the delivery shell our primary concern was to produce a system which could deal effectively with various types of items produced by any of the above means. In order to be able to do this we developed a flexible system for representing a rich class of item types. The generic form of items from this class is illustrated in Figure 1.
Figure 1: Generic Form of Item

Items are defined by specifying the objects identified in Figure 1. These include the question stem, the display and the keypad. Examples of items are illustrated in Figure 2. The stimulus can take the form of a bitmap image or various forms of text. These various forms of text include representations of fractions and other mathematical expressions. The response pad can take the form of a calculator, as in Figure 2(a), a numeric keypad, as in Figure 2(b), a clock-setting device, or a multiple-choice selection. The display pad can take the form of a numeric display, as in Figure 2(a), or a fraction display, as in Figure 2(b). Other solution displays are used for other forms of mathematical expression, such as that illustrated in Figure 2(d).
Figure 3: Codified Representation of Item (a)
|
3 1 0 40 1 1 |

Items are represented by text sequences of the form shown in Figure 3, which represents item (a) of Figure 2. These text sequences provide the details necessary to produce, present, and score the item. These text sequences can be taken from banks or generated by computer. In cases where the item involves an image, the test sequence includes the filename of a stored image.
Figure 4: Section of Report From Mathematics Test

We envisage that output from the test will include a norm-referenced score as well as feedback on performance in different domain areas. In the current version, the report shown in Figure 4 is used to provide feedback. In this report, correct items are indicated by the shading. By grouping items according to domain and including brief descriptions of item content, this simple mechanism provides an immediate and easily interpreted presentation of areas of strength and weakness.
3. Results of Initial Trials
In the trial conducted at Lichfield, 92 recruits who had already taken the paper and pencil mathematics test were given the computer-delivered test with a touchscreen interface. Following the test they were given a questionnaire designed to collect feedback on the acceptability of the computer-delivered version. The responses were predominantly very positive. The correlation between the number correct scores from the paper and pencil test and the computer-delivered test was 0.83. We remark that this is very close to the reliability we would expect from such tests.