Cognitive Task Load And Test Performance
Martin J. Ippel
NRC Senior Research Associate
Armstrong Laboratory, Brooks AFB
Abstract
This paper presents a theoretical analysis of how enhancing cognitive task load effects task performance as measured by accuracy scores. The analysis is based on a central assumption of human performance theory: the assumption of limited cognitive resources. It is shown that adding a individual differences postulate to this theory results in a class of mathematical models which allow precise predictions of accuracy decline as a function of cognitive load (task parameter) and available resources (subject parameter).
Introduction
The detrimental effects of environmental stressors on job and task performance have been well-documented (e.g., Spielberger, Sarason, Strelau, & Brebner, 1991). Methodical efforts to integrate this knowledge into the design of tests of cognitive performance have only recently begun. One attempt involves a test battery put together by the Aerospace Medical Panel Working Group 12 - the AGARD Standardized Tests for Research with Environmental Stressors (Reeves, Winter, LaCour, Raynsford, Vogel, & Grisset, 1991).
The study of how cognitive load effects human performance has its basis in cognitive modeling techniques originating from human performance theory (HPT). In a typical HPT study a cognitive process model is used to relate experimental manipulations to performance measures. The most widely used measures of task performance are reaction time and accuracy. The prevalent methods of investigation in HPT studies are reaction time decomposition methods, such as S.Sternberg’s (1969) additive factor method. Process models involved in this type of research conceive of mental activities as a series of relatively independent processing stages or operations, each of which absorbs an estimable amount of time. A common limitation of these models is that they are only defined for correct responses, and thus, only certain tasks with a relatively low likelihood of erratic responses, can be modeled by the additive factor method.
Accuracy scores have proven to be a convenient measure in many psychometric tests. In this paper I will investigate how manipulation the cognitive load effects task performance as measured by accuracy scores. The analysis will not be based on the HPT multistage view of information processing. In stead, the implications for cognitive testing of another HPT assumption, the hypothesis of limited processing resources, will be studied.
Limited Cognitive Resources
The notion of limited cognitive resources is a basic assumption of HPT, which implies that cognitive processes take from a finite pool of cognitive resources and that the amount required to perform a task (i.e., the cognitive task load) increases with the complexity of processing (Draycott & Kline, 1996). Normally, performance on a task is positively related to the amount of resources available to it. When the task load increases eventually there will be a deterioration of performance. This deterioration often appears as a gradual decline in task performance, rather than a calamitous break down (Norman & Bobrow, 1975).
In this conception performance is related to cognitive resources by a function having two parameters. The first parameter characterizes the human cognitive system, that is, the amount of resources available to the subject. The second parameter characterizes the task at hand, that is, the amount of resources required to perform the task. Note that both the subject parameter and the task parameter are expressed in the same scale - amount of processing resources. Task performance crucially depends on the relation between the two parameters. As long as the amount of resources consumed by the task is lower than, or equal to, the available amount, task performance will be adequate. However, task performance will gradually decline relative to the degree that tasks impose cognitive loads that exceed the available amount of resources. Thus, if too little processing resource is applied (because of limitations to the availability of processing resources), performance failure is to be expected. As more and more resources are applied to the task the likelihood of successful performance increases. Norman and Bobrow (1975) term performance on a task resource-limited whenever an increase in amount of processing resources results in improved performance. Whenever performance is independent of processing resources, task performance is called data-limited. Often some minimum threshold value of resource availability is required before differential application of resources makes a noticeable difference in task performance. Norman and Bobrow (1975) refer to this minimum threshold value as Rmin.
Norman and Bobrow (1975) propose the term performance-resource function to denote a monotonically nondecreasing function that relates performance to resource allocation. To determine the performance-resource function of a particular task Norman and Bobrow suggest to vary processing resources systematically while measuring some aspect of task performance. However, experimental control of resource allotment of subjects on a single task appears impossible to accomplish. As an alternative method to control resource allocation, Norman and Bobrow (1975) suggest that subjects perform two tasks simultaneously. Thus, it has been the impossibility to experimentally control resource allocation within subjects that led to the introduction of the dual task paradigm. The dual task paradigm, although a very fertile research program, has not led to the identification of a single, complete performance-resource function (see also Navon & Gopher, 1979).
A psychometric approach to determine performance-resource functions
A more promising alternative to determine performance-resource functions adopts a psychometric approach. According to the theory first proposed by Norman & Bobrow (1975) the cognitive system of subject i is completely characterized by the subject parameter xi, which denotes the amount of processing resources available to this subject. Therefore, assuming a population distribution function of xi over subjects with respect to task j is equivalent to experimental manipulation of resource allotment within a single subject.
Before taking the first step to determine the performance-resource function of task j, the reader should be alerted at a minor problem. The three variables in Norman & Bobrow’s theory are not all observable. Let the performance measure be an observable dichotomous variable p (1 = success, 0 = failure). This leaves the subject parameter xi and the task parameter bj essentially unobservable. For now it suffices to mention that the problem of estimating xi and bj is solvable (see next section), and to proceed as if these estimates are indeed available.
The relationship between xi and p at task j can be made transparent by ordering the xi values according to their magnitude from low to high. Let task j be a relatively complex task. Further, let xi be normally distributed in the population with respect to task j. As xi increases the proportion of individuals that accomplish the task successfully should increase. In general, the performance-resource function will be S-shaped, such as the curves of a two parameter (location and scale) cumulative distribution function (e.g., the normal or the logistic). Thus, the curve which links the resources applied to task j and the resulting performance can be interpreted as including a section in which increase in availability of processing resources does not effect the probability of successful task performance (compare Norman & Bobrow’s (1975) minimum threshold value), followed by a resource-limited section (up to a point where all the processing that can be done has been done), and a data-limited section from thereon.
Whether or not one of these cumulative distribution functions indeed provides an adequate description of the performance-resource relationship remains an empirical question. Either of these functions can be fitted to the data. Many researchers prefer to work with the logistic distribution function, rather than with the normal, for reasons of mathematical simplicity (Hosmer & Lemeshow, 1989). The specific form of the logistic that I will use is the following:
|
p (x) = exp (b 0 + b 1x)/(1 + exp [b 0 + b 1x]) |
[1] |
where p (x) denotes a subject’s expected value on the dichotomous score (1=pass, 0= fail), which is always stated as the expected probability of the subject being successful on the task. Thus, p (x) = Prob (p=1 | x). Or, to rephrase this model in the symbols I have been using:
|
p (x) = exp (bj + ajxi)/(1 + exp [bj + ajxi]) |
[2] |
where bj and aj denote b 0 and b 1 for task j, respectively. Parameter bj is the location parameter. The logistic curve has its inflection point at X50 (Baker, 1965), the mean of the fitted (symmetric) function. This point determines the position of the curve along the scale of amount of processing resources. The curves of more difficult tasks are located more to the higher or right end of the scale. Relatively easy tasks are located more to the lower or left end of the scale. Parameter aj is the scale parameter (psychometricians refer to aj as a discrimination parameter). It is proportional to the slope of the curve at the inflection point. It defines the change in the dependent variable as a function of one unit change in xi. If the parameters bj and xi were observable variables, a logistic regression analysis program (or a program based on the generalized linear model) could fit the logistic function to the data. As I have pointed out, bj and xi are not observable. However, with a little algebra Equation [2] can be rewritten as:
|
p (x) = exp aj (q i - bj*)/ (1 + exp aj[q i - bj*]) |
[3] |
where q i = xi and bj* = -bj/aj. This equation is the so-called item characteristic curve of Birnbaum’s two-parameter Item-Response model. If the set of (item) discrimination parameters aj, j = 1, …, k items or tasks, is constrained such that all k items have identical discrimination parameters (i.e., aj = a), Equation [3] results in Rasch’s one-parameter Item-Response model (Hamilton, Swaminathan, & Rogers, 1991; Lord, 1980; Mellenbergh, 1994). Note that both models define the likelihood of success on a task as a function of the relationship between a subject parameter q i and a task parameter bj* in a way similar to the performance-resource function as conceptualized by Norman & Bobrow (1975).
Logistic Item-Response Models
In the previous section it has been shown that the addition of an individual differences postulation to the HPT theory on performance-resource dependency results in a class of logistic models with well-known properties -- the so-called logistic Item-Response models (Hamilton, Swaminathan, & Rogers, 1991; Lord, 1980; Van der Linden, & Hamilton, 1996). Some of these properties are particularly desirable in the context of identification of the performance-resource function of tasks.
First, logistic Item-Response models (IR-models) provide estimates of subject parameters (whether these be interpreted as cognitive abilities, or as cognitive resources) independent of the particular task j being performed. To make this possible a set J of tasks is required such that task j Î J. All tasks j Î J are assumed to draw from the same pool of processing resources. Various ways have been suggested to check this assumption (for a summary see Hambleton, Swaminathan, & Rogers, 1991).
Second, item or task characteristics can be obtained that are not sample dependent. Here the individual difference postulation plays an important role. For example, Bock (Bock & Lieberman, 1970; Bock & Aitkin, 1981) developed a method to remove the subject parameters from the estimation problem. The method assumes the subjects (i.e., the subject parameters) to be randomly selected from a population, then, by specifying the distribution of the population, the subject parameters can be integrated out of the likelihood function that maximizes the item parameter estimates. This is the so-called marginal maximum likelihood estimation procedure, which provides assymptotically consistent estimates of task or item parameters.
Thus, IR-models provide estimates for the unobservable entities q i and bj. Once these subject and task (or item) parameters have been obtained, a performance-resource function (i.e., a set of expected proportions correct) for a particular task or item can be generated and fitted to the data. In the next section I will discuss some implications of logistic IR-models for the measurement of the effects of manipulation of cognitive load on task performance.
The Postulate of Specific Objectivity
As p (x) denotes a probability, logistic IR-models transform this variable using a logit function:
|
g(x) = ln {p (x)/[1 - p (x)]} |
[4] |
From Equation [2] it follows that
|
g(x) = bj + aq i |
[5] |
for the Rasch model, and g(x) = bj + ajq i for the Birnbaum model. Equation [5] describes the transformed probability as a linear function of a latent variable q . For the Rasch model this transformation has two well-known implications. First, a comparison of two items j and j¢ which differ in location on the latent continuum by a magnitude d , is independent of the values of the latent variable q of the subjects employed for the comparison. This can be easily shown
|
g(xij) - g(xij¢ ) = ([bj + d ] + aq i) - (bj + aq i) = d |
[6] |
Second, in analogues fashion it may be shown that a comparison of two subjects i and i¢ does not depend on the particular item used for the comparison. Together, this pair of implications has been heralded as the Postulate of Specific Objectivity (Fischer, 1987, 1996). In actuality they constitute the psychologically implausible results of a transformation of a variable (i.e., the expected probability p (x)) that has a mathematical, but not a psychological justification. For example, Equation [6] entails that the effect of enhancement of cognitive load on individual task performance is equal over all levels of available processing resources. This tenet is in contradiction with the substantive theory discussed in this paper, as will be explained below.
It has been argued in a previous section that a logistic function might well be a plausible model for a performance-resource mapping for tasks of average difficulty. It comprises a sub threshold level where resource differences do not effect performance, and a resource-limited section where differences in applied resources do make a difference in task performance, and finally a data-limited section where given the relative complexity of the task structure further accretion of resources does not improve task performance anymore. Figure 1 shows two logistic functions whose locations differ by a magnitude d on a latent continuum. Let these functions represent the performance-resource mappings of two tasks with a different cognitive load. The figure shows that subject i and subject i¢ , whose expected probabilities of success at task j do not differ much, are expected to be differently effected by an increase d in cognitive load. This makes perfect sense if we assume that the position of subject i¢ at

the latent variable suggests that this subject can allot much more resources to the processing of task j¢ than subject i can. The logit transformation, although mathematically convenient, does not preserve this essential feature. The adjustment of the model’s dependent variable (i.e., the expected probability p (x)) also transforms the logistic function, which relates the model parameters (i.e., q i and bj) to performance, into a linear function.
In summary, the logit transformation defines the performance effects of cognitive task load and the subject’s resource parameter as independent and additive. Thus, the logit transformation incorrectly suggests that the effects of increase of cognitive load are independent of the subject’s available resources. This effect is most clearly present in Rasch’s one-parameter logistic IR-model.
Some further explorations on effects of cognitive load manipulation
To further (informally) explore the effects of cognitive load manipulations on the untransformed dependent variable p (x) I generated three performance-resource functions, satisfying the Rasch one-parameter IR-model (see Figure 2). The functions represent three tasks drawing from the same resource pool and that differ in amount of

Figure 2. Performance-resource functions of three tasks with a different cognitive load.
processing resources required to perform the task. That is, their location on the latent variable differs. The task most to the left is of average difficulty (b1 = 0), the curve in the middle represents a difficult task (b2 = 2), and the task most to the right side of the continuum is extremely difficult (b3 = 4). The values of bj denote the location of the tasks in terms of distance from the mean in the scaling units of the theoretical distribution.
Figure 2 displays the differential impact of an increase in cognitive load for subjects who differ in amount of available processing resources. Figure 3 displays the same expected decline of performance in a way that makes it more easy to assess the magnitude of the decline in task performance. The decline in expected probability of success is shown as a function of three factors: the subject parameter q i, the initial level of cognitive load (i.e., either 0 or 2) and the two different levels of increase of cognitive load (i.e., either 2 or 4). Figure 3 makes clear that performance at levels of applied processing resources under (or close to) the minimum threshold level is

Figure 3. Performance decline as function of cognitive load and available processing resources.
hardly effected by an increase in cognitive task load. The relationship between performance decline and the increase of cognitive load is curvilinear (and symmetric in case of one-parameter model curves) as Figure 3 shows. At the other side of the distribution there are subjects whose processing has been data-limited at the simplest task, and whose performance declines only slightly even with tasks of extreme processing requirements. The largest declines in performance are to be expected in the areas of resource-limited processing.
Note, that Figure 2 and Figure 3 are based on the simplest logistic IR-model, the one-parameter Rasch model. More complex models (i.e., two-parameter or three-parameter models) would show less regularity (symmetry), but would not pose additional estimation problems.
Furthermore, the model suggests that subjects whose expected performances at one level of cognitive task load are approximately equal can differ considerably when the cognitive load increases. For example, in Figure 4 the expected values p (x) at b1 = 0 of the subjects s1, s2, and s3 are .99, .99, and .98, respectively. At b3 = 4 these values have dropped to .72, .62, and .50, respectively.

Figure 4. Diverging expected performances as cognitive task load increases.
An important point to make about the explorations in this section is, that these are explorations of a mathematical model. If the veracity of the one-parameter model can be assumed for a certain task and a subject population, all information presented can simply be inferred from the subject parameter q i and the task parameters bj for a set of tasks j Î J, all drawing from the same pool of processing resources.
General discussion
In this paper I presented three elements for a theory for the measurement of the effects of information (over)load on task performance as measured by accuracy scores:
A (general) human performance theory which explains the differential impact on subjects’ task performance of enhancement of cognitive load.
A class of mathematical models for this theory. These models, if shown to fit the data, allow precise predictions of performance decline..
A collection of (admittedly specific) algorithms, which are available to estimate the unobservable model parameters.
The challenge for further research is to design tasks that draw from the same pool of resources and that can be systematically manipulated in a meaningful way. That is, it should be possible to impose an order on the tasks that is psychologically significant and resonates in the locations of the performance-resource functions (i.e., the bj parameters) on the latent continuum.
A promising methodology for this purpose is the so-called facet design technique. According to the facet design technique each individual item can be described as the result of a Cartesian product of several task facets. Each task facet consists of a set of elements, which are referred to as the measurement conditions of the task facet. By taking a Cartesian product of separate task facets, it is possible to define a structured collection of items. Accordingly, each item, being an element of the Cartesian set, actually is a profile composed by selecting an element of the set of measurement conditions of each task facet (Ippel, 1986). If this technique can be combined with a psychological theory, which predicts differences in processing requirements for the measurement conditions of at least one of the task facets, a progression of cognitive load can be defined over items (i.e., realizations of the task), and performance profiles of subjects can be studied.
References
Advisory Group for Aerospace Research and Development, Aerospace Medical Panel Working Group 12. (1989). Human Performance Assessment Methods (AGARDograph, No. 308). Neuilly-sur-Seine, France: Author.
Baker, F.B. (1965). Origins of the item parameters X50 and b as a modern item analysis technique. Journal of Educational Measurement, 2, 167-180.
Bock, R.D., & Lieberman, M. (1970). Fitting a response model for dichotomously scored items. Psychometrika, 35, 179-198.
Bock, R.D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443-459.
Draycott, S.G., & Kline, P. (1996). Validation of the AGARD STRES Battery of Performance Tests. Human Factors, 38, 2, 347-361.
Fischer, G.H. (1987). Applying the principles of specific objectivity and of generalizability to the measurement of change. Psychometrika, 52, 565-587.
Fischer, G.H. (1996). Models for "Objective" assessment of treatment effects based on item response data. In I. Dennis & P. Tapsfield (Eds.). Human Abilities. Hillsdale, N.J. : Erlbaum.
Halford, G.S. (1993). Children’s understanding. The development of mental models. Hillsdale, N.J.: Erlbaum.
Hamilton, R.K., Swaminathan, H., Rogers, H.J. (1991). Fundamentals of Item Response Theory. Newbury Park, CA: Sage.
Hosmer, D.W., & Lemeshow, S. (1989). Applied Logistic Regression. New York: John Wiley’s & Son.
Ippel, M.J. (1986). Component-testing. A theory of cognitive aptitude measurement. Amsterdam: Free University Press.
Kahneman, D. (1973). Attention and effort. Englewood Cliffs, N.J.: Prentice-Hall.
Lord, F.M. (1980). Applications of Item Response Theory to practical testing problems. Hillsdale, N.J.: Erlbaum.
Mellenbergh, G.J. (1994). Generalized Linear Item Response Theory. Psychological Bulletin, 115, 2, 300-307.
Navon, D., & Gopher, D. (1979). On the economy of the human processing system. Psychological Review, 86, 3, 214-255.
Norman, D.A., & Bobrow, D.J. (1975). On data-limited and resource-limited processes. Cognitive Psychology, 7, 44-64.
Reeves, D.L., Winter, K.P., LaCour, S.J., Raynsford, K.M., Vogel, K., and Grisset, J.D. (1991) . UTC-PAB/AGARD STRES battery: User’s Manual and system documentation (Tech. Report NAMRL SR90-1) Pensacola, FL: Naval Aerospace Medical Research Laboratory.
Sternberg, S., (1969). Memory-scanning: Mental processes revealed by reaction time experiments. American Scientist, 57, 421-457.
Spielberger, C.D., Sarason, I.G., Strelau, J., and Brebner, J.M.T. (Eds.). (1991). Stress and Anxiety, Vol. 13. New York: Hemisphere.
Van der Linden, W.J., & Hambleton, R.K. (1996). Handbook of modern item response theory. New York: Springer.