Computerized Adaptive Rating Scales: A New Approach To Generating Performance Information
Walter C. Borman
Jerry W. Hedge
Mary Ann Hanson
Kristi K. Logan
Personnel Decisions Research Institutes, Inc.
Linda L. Sawin
Air Force Armstrong Laboratory Human Resources Directorate
The computerized adaptive rating scale (CARS) concept applies adaptive testing principles to the performance rating domain in an attempt to obtain more precise estimates of ratee performance than is possible with the usual rating scale. Briefly, adaptive ability testing is intended to provide precise estimates of a test taker's ability level. To accomplish this, he/she is first administered items of varying difficulty levels. When the general level of ability is determined (e.g., high, because the test taker is answering all of the easy and medium difficulty level items correctly and most of the more difficult items, as well), then he or she is administered items that vary in difficulty, but only in a relatively narrow band within the initially estimated range. The test taker's ability level can now be estimated quite precisely based on his/her pattern of correct and incorrect item responses within this narrow range of difficulty levels.
We have transported this general adaptive testing approach to estimating a ratee's performance level on a dimension. The notion is to present pairs of behavioral statements scaled according to effectiveness level and ask the rater to select which statement is more descriptive of the ratee. In this way, the general performance level of the ratee is determined, and then, analogous to adaptive testing, additional pairs of statements are presented within that level to more precisely identify the ratee’s performance score. CARS are being developed for three U.S. Air Force specialties, Ground Radar Specialists, Air Traffic Controllers, and Hydraulic Technicians. A second objective is to develop a CARS for measuring contextual performance.
The Notion of Contextual Performance
Borman and Motowidlo (1993) distinguished between task and contextual performance. Task performance refers to the core technical part of a job, the domain likely to emerge from a task-related job analysis. Contextual performance, rather than contributing directly to the core technical elements of the job, "supports the organizational, social, and psychological environment in which the technical core must function" (pp. 73). Contextual performance includes such activities as (1) volunteering to carry out task activities even if not part of the job; (2) persisting with extra effort when necessary to complete own work; (3) helping and cooperating with others; (4) following organizational rules and procedures; and (5) endorsing and supporting organizational procedures.
Borman and Motowidlo argue that both task and contextual performance on the part of organization members are important contributors to organizational effectiveness. Further, Motowidlo and Van Scotter (1994) and Borman, White, and Dorsey (1995), among others, have demonstrated that experienced supervisors weight employee task and contextual performance about equally when making overall performance or effectiveness judgments of these employees.
Very important for this application, task activities vary from job to job. As Borman and Motowidlo point out, when one job is considered different from another job, it is typically because of those differences in tasks performed. However, contextual activities vary much less across jobs. Such activities as volunteering, persisting, cooperating, following rules, and endorsing organizational objectives are probably important for the vast majority of jobs and organizations.
If we accept the argument that contextual performance is important for organizational functioning, and acknowledge the observation that contextual dimensions are likely quite similar across jobs, then additional exciting opportunities present themselves for the CARS methodology. Specifically, it should be possible to build a CARS format to measure contextual performance dimensions such that few revisions or changes will be necessary for different jobs. Further, our experience suggests that these contextual dimensions are relevant for military and civilian environments. For example, Motowidlo's work on this topic has been done in the Air Force, Borman and Motowidlo studied contextual performance in the U.S. Army, and Organ (1988) has done all his work on organizational citizenship behavior (a highly related concept) in civilian populations. Thus, the consistency of contextual performance requirements across jobs and organizations will allow a focused effort toward developing a comprehensive, precise CARS measure of performance for each contextual dimension.
As mentioned, Borman and Motowidlo (1993) introduced the contextual performance concept and argued that this criterion domain is important to consider when measuring organization members' overall effectiveness or contribution to the organization. This summary criterion concept is related to three other concepts. They are described briefly below.
Organizational Citizenship Behavior. The concept of organizational citizenship behavior includes several elements of contextual performance. Smith, Organ, and Near (1983) define organizational citizenship behavior essentially as extra-role, discretionary behavior that helps other organization members perform their jobs or that shows support for and conscientiousness toward the organization. This builds on earlier notions introduced by Barnard (1938) and Katz and Kahn (1978). Barnard notes the importance of the "informal organization," cooperative efforts in organizations, and the need for organization members to be willing to contribute in these cooperative efforts. Katz emphasizes that spontaneous, cooperative, helpful, and altruistic behaviors beyond formal role prescriptions are important for organizational functioning. Similarly, Katz and Kahn distinguish prescribed role performance from "spontaneous behavior", which includes cooperative gestures, actions protecting the organization, and behavior that enhances the external image of the organization. Such actions go beyond prescribed role behavior.
Prosocial Organizational Behavior. Prosocial organizational behavior is closely related to organizational citizenship and also includes several elements of contextual performance. Brief and Motowidlo (1986) define prosocial organizational behavior that is "(a) performed by a member of an organization, (b) directed toward an individual, group, or organization with whom he or she interacts while carrying out his or her organizational role, and (c) performed with the intention of promoting the welfare of the individual, group, or organization to whom it is directed" (p. 711).
The prosocial organizational behaviors identified by Brief and Motowidlo (1986) include assisting co-workers with job-related or personal matters, providing services or products to consumers, helping consumers with personal matters (for example, giving road directions or supplying change for telephones), complying with organizational values and policies, suggesting organizational improvements, putting forth extra effort on the job, volunteering for additional assignments, staying with the organization during hard times, and representing the organization favorably to outsiders.
A Model of Soldier Effectiveness. Another perspective on contextual performance is provided by results of efforts to define the criterion domain for Project A (Campbell, 1990). One of the first steps toward defining the criteria for these jobs was to develop a conceptual model of soldier effectiveness (Borman, Motowidlo, & Hanser, 1983). The model sought to describe aspects of soldier effectiveness that cut across all the different kinds of jobs that soldiers may perform. It assumed that soldier effectiveness involves more than just performing assigned job duties effectively and that other elements contributing to soldier effectiveness are common to all or nearly all soldiering jobs in the army. The model also assumed that the common soldier performance elements have close ties to the constructs of organizational commitment, organizational socialization, and morale.
The actual model of soldier effectiveness (Borman et al., 1983) drew on these three concepts to build a framework of dimensions intended to map the entire extra-technical proficiency domain of soldier effectiveness. From the combination of morale and commitment emerges a general category of effectiveness that can be labeled "determination". It is a motivational and affective category that reflects the spirit, strength of character, or "will do" aspects of good soldiering. The combination of morale and socialization yields "teamwork", behaviors that have to do with effective relationships with peers and the unit. The combination of commitment and socialization yields "allegiance". This taps into acceptance of army norms with respect to authority, faithful adherence to orders, regulations, and the army lifestyle, being adjusted and socialized to the point of wanting to continue in the soldiering role and stay in the army.
Each of these three general categories of solider effectiveness — determination, teamwork, and allegiance — subsumes five other dimensions that describe specific behavioral patterns of effectiveness.
The Borman and Motowidlo (1993) contextual performance taxonomy attempts to summarize all of the extra-technical proficiency concepts just reviewed. As mentioned, the five summary dimensions are: (1) Volunteering; (2) Extra Effort; (3) Helping and Cooperating; (4) Following Organizational Rules; and (5) Supporting the Organization.
The Computerized Adaptive Rating Scale (CARS) Prototype
The general notion of a computerized adaptive rating scale (CARS) was described previously. A prototype of this kind of rating system has been developed. The prototype CARS has as "raw material" two dimensions with three behavior summary statements representing each of four effectiveness levels (i.e., very ineffective, somewhat ineffective, somewhat effective, and very effective). The behavioral statements were generated in a study of the Navy recruiter job (Borman, Hough, & Dunnette, 1976). The dimensions represented are Gaining and Maintaining Rapport, and Salesmenship Skills. In the prototype, pairs of behavioral statements appear on a computer screen and the rater is asked to indicate which of the two statements is more descriptive of the ratee. That choice then triggers another screen with two other behavioral statements and so on through a preplanned branching sequence. That is, each choice, in turn, leads to a particular pair of behavioral statements, the next choice to another preprogrammed pair of statements, and so on. Table 1 lays out the various possible sequences of behavioral statement pair presentations according to the statements' effectiveness levels (1 = very ineffective, 2 = somewhat ineffective, 3 = somewhat effective, and 4 = very effective) and the choices made by the rater.
There are several features to this rating system. First, notice that we have constructed a 12-point scale, from 1 = least effective to 12 = most effective. The highest rating is obtained when the rater selects 4s to describe the ratee for the two 1-4 comparisons, the two 2-4 comparisons, and the two 3-4 comparisons. An 11 rating is awarded when the rater chooses all 4s except a 3 for one of the two 3-4 comparisons, and so on. Table 1 depicts how each of the ratings (1-12) is obtained. Second, there are many other possible response patterns in addition to these 12 if we want to consider those that are inconsistent or illogical. In fact, our system contains four such patterns (5, 6, 7, and 8). However, it's our belief that these four somewhat inconsistent patterns may be plausible if the actual ratee performance is very near the mid-point in effectiveness. Also, as the Note in Table 1 indicates, we will recommend that if a rater selects a 1-level and a 4-level statement for the first two screen presentations, a third 1-4 comparison will be shown. If the rater selects "1" on this choice, the normal sequence of two 1-3 comparisons will follow. If the rater responds "4", the usual sequence of two 2-4 comparisons will follow. The rationale for "allowing" this sequence is that, again, this confusion may be appropriate if the ratees actual performance is very near the scale mid-point.
At one level, it is an empirical question as to what proportion of rater responses will show inconsistencies. We would hope that very few of the more serious inconsistencies (beyond those already discussed) will occur.
Another potential advantage of CARS is that raters encounter no numbers during the rating task. Therefore, presumably they can focus on matching observed ratee behavior with behavior in the statements, without being distracted by numerical evaluation.
In sum, we currently have a working CARS prototype, with the screen sequences programmed to be compatible with our 12-point scoring system. The sequencing is designed to provide the rating scale analog of computerized adaptive testing. Initially, the rater's choices between behavioral statements to describe the ratee differentiate only broadly between effectiveness levels, but subsequent choices require more finely grained differentiations around the early estimates of effectiveness. Thus, the later choices provide more precise estimates of the ratee's performance level.
Table 1
Scoring System for Different Sequences of Choice Responses

Note: If the rater is inconsistent with the first two choices (i.e., selects a 1 and a 4) then we would administer a third 1-4 comparison and proceed in the majority direction (i.e., high or low) as above.
References
Barnard, C. I. (1938). The functions of the executive. Cambridge, MA: Harvard University Press.
Borman, W. C., Hough, L. M., & Dunnette, M. D. (1976). Development of behaviorally based rating scales for evaluating the performance of U. S. Navy recruiters (Institute Report #6). (NPRDC TR-76-31). Navy Personnel Research and Development Center.
Borman, W. C., & Motowidlo, S. J. (1993). Expanding the criterion domain to include elements of contextual performance. Chapter in N. Schmitt and W. C. Borman (Eds.), Personnel Selection. San Francisco: Josey-Bass (pps. 71-98).
Borman, W. C., Motowidlo, S. J., & Hanser, L. M. (1983, August). A model of individual performance effectiveness: Thoughts about expanding the criterion space. Paper presented as part of symposium, Integrated Criterion Measurement for Large Scale Computerized Selection and Classification, 91st Annual American Psychological Association Convention.
Borman, W. C., White, L. A., & Dorsey, D. W. (1995). Effects of ratee task performance and interpersonal factors on supervisor and peer performance ratings. Journal of Applied Psychology, 80, 167-177.
Brief, A. P., & Motowidlo, S. J. (1986). Prosocial organizational behaviors. Academy of Management Review, 11, 710-725.
Campbell, J. P. (1990). An overview of the army selection and classification project (Project A). Personnel Psychology, 43, 313-333.
Katz, D., & Kahn, R. L., (1978). The social psychology of organizations. New York: Wiley.
Motowidlo, S. J., & Van Scotter, J. R. (1994). Evidence that task performance should be distinguished from contextual performance. Journal of Applied Psychology, 79, 475-480.
Organ, D. W. (1988). Organizational citizenship behavior: The good soldier syndrome. Lexington, MA: Lexington Books.
Smith, C. A., Organ, D. W., & Near, J. P. (1983). Organizational citizenship behavior: Its nature and antecedents. Journal of Applied Psychology, 68, 653-663.