The Impact of Presentation Order & Sequence on

Survey Time:OA R&D via the Internet


Jimmy L. Mitchell, Institute for Job & Occupational Analysis, San Antonio, TX
Johnny J. Weissmuller, Metrica, Inc., San Antonio, TX
Winston Bennett, Jr., Air Force Research Laboratory, Mesa, AZ

Abstract

With the development of internet-enabled survey software (GenSurv), it becomes feasible to complete research
and development to improve survey data quality and reliability. In the GenSurv project, an internet form of the
Air Force Behavioral Scientist survey was created; the software randomly assigned respondents to two dif-
ferent forms. One form used traditional duty-order sequence where the second form presented tasks in order
of rated duty importance (duties with nonzero ratings). The hypotheses under study are that such rated impor-
tance order should control for fatigue with a long task list, with the most relevant tasks being considered first,
and that there should be a substantial time savings for the rated-order group but no significant difference
between the resulting group job descriptions.
Introduction
As part of the research and development of a prototype internet survey software capable of handling typical occupational analysis (OA) task lists, some research was conducted to explore the possible impacts of the new survey form (internet) on data quality and reliability (Stanton, 1998). The objectives of this research were to find new ways to enhance data collection efficiency, as well as improve data quality. Such improved data collection methods have considerable promise for use in training evaluation, occupational analysis, and as a way to expedite data for emerging decision support systems (DSSs) used in making critical manpower, personnel, and training (MPT) decisions (Bennett, Sego, Teachout & Phalen, 1994; Mitchell, Bennett, & Yadrick, 1993).

Since a Behavioral Scientist (AFS 3BSX1b) survey was recently completed by the Air Force Occupational Measurement Squadron using disk-based surveying technology (OASurv), this officer specialty was selected for an experimental study to test the equivalency of Internet surveying. The questions of interest in this experiment involved having all respondents rate major duty areas in terms of which duties they perform and then to present the tasks to be rated either in inventory order (the traditional way) or in descending order of rated importance (omitting those rated as not performed). This presentation order should control for the effects of fatigue on long surveys (i.e., those with many tasks) and should insure that all important duties are rated first. Those rated last would be the tasks where few people perform or where trivial amounts of job time are involved (Weissmuller & Mitchell, 1998). Theoretically, this tailored survey should develop more reliable information and should take considerably less time to complete, thus minimizing the amount of job time invested in survey data collection. Such an approach was used successfully in the recent U.S. Army Enlisted Common Soldiers Task survey to cope with a 900-item task list administered to about 20,000 soldiers.

Experimental Design

Key questions to be answered in this experiment were as follow:

  • Will Presentation Order Impact Group Job Description?
  • Less Time to Complete Survey?
  • Can the experiment be done quickly using an Internet Survey?
  • To address these issues, the GenSurv version of the Behavioral Scientist survey was developed to include a random assignment of cases to two different survey forms, one in traditional "Lock Step" inventory order, and one based of descending order of rated importance of duties in present job (i.e., "how much is this duty a part of your job?"). In both forms, the incumbent was asked to rate the tasks in terms of relative time spent (RTS), which is the usual scale used in the Air Force occupational survey program.

    Additionally, some questions were included at the end of the normal survey to assess the attitudes of survey respondents toward the survey in terms of what percentage of their present job is covered by the tasks they rated and how long it took to complete the survey. Provision was also made for survey takers to write-in any comments they wanted about the survey and the experimental process.

    Sample

    Since this was the first operational data collection using GenSurv, we felt that we should use the study as a pilot test of the software. Thus, the members of the Air Force Occupational Measurement Squadron were ideal subjects for the experiment; they are now the largest single concentration of Behavioral Scientists (military and civilian) in the Air Force. A side benefit would be to let AFOMS members become familiar with the software, which might be used in future Air Force internet surveys. Thus, we included provision for them to record any suggestions they might have about what should be in military internet surveys.

    An initial version of the Behavioral Scientist internet survey was developed in early February 1999 and a number of scientists were invited to assess and use the survey and to provide feedback and suggestions for its possible operational use. Many of their suggestions were incorporated in the final version of the survey which was posted to the web on April 15th, 1999; AFOMS was then notified that it was ready to begin data collection. All AFOMS personnel were given a website address and password, and asked to complete the survey as quickly as possible. After some initial difficulty gaining access via a webpage, which explained the purpose of the survey (our IJOA website was down the first weekend), survey administration proceeded quickly and was terminated about May 15, to permit data analysis to begin. The final sample included 52 military and civilian Behavioral Scientists, most from AFOMS. Thirty one completed the survey in inventory order (Treatment 1) and 21 completed it in duty importance order (T2). The difference in these numbers involved some additional cases (10 or more) where individuals had begun the survey but not completed it, and a few who obviously were just playing with the software. The final sample did include some individuals in one-of-a-kind jobs as well as a few supervisors not involved in technical work.

    Data Analysis

    One analysis using the total sample was to examine their responses to questions about the time it took to complete the survey. In one question, survey respondents were asked their degree of agreement with the statement that the "Time was reasonable" to complete the survey.

    Table 1. - Time Reasonable Analysis
    (7 point scale)


    Table 4. GRPREL Results

    T1 (n=10)    T2 (n=8)

    r1,1                                    .2731             .3621

    r k,k                         .6200              .6958

    r30,30                              .9185               .9445


    The average interrater agreement (r11) is well above the normal expected level of .20, but with groups these size, the extrapolated level of agreement (rkk) cannot reach the optimum level of .90. With these size samples where the objective is to compare two groups, a more reasonable extrapolation is to calculate the value if the groups were both n = 30 (r30,30). These calculated values both exceed the optimum level of .90. Values for the T1 (rated importance) group are consistently higher, even though that group is smaller. This may be the result of one occupational analyst in T1 also performing supervisory tasks.

    Actual Time for Survey

    In the initial analysis, participants perceptions of the reasonableness of time to complete the survey were examined, but it is also possible to examine the server log and determine how long each person actually spent on the system doing the survey. Again, the issue is to compare the two groups and assess their difference. Results of the analysis are show below:

    Table 5. Actual Computer Time to Complete Survey
    Mean       Median     S.D.

    Inventory Order (T1)                 39.00         39.0         12.35

    Rated Duty Order (T2)              35.12         36.5         14.36
     

                    Difference                           -3.88
     
    The T2 (rated order) group took somewhat less time than the T1 (inventory order) group, as might be expected but the difference of about 4 minutes per respondent is not a substantial amount. Thus the trend is in the right direction even if the magnitude of the difference is not great and, given the large standard deviations, would not reach statistical significance. Recall, however, that the job descriptions for both groups were less than 100 tasks in this rather specialized occupation.

    The group difference, and potential for timesavings, would undoubtedly be greater with specialties with much longer task lists, which is where the "fatigue" issue usually arises. To assess such potential timesavings, this study needs to be repeated with one or two specialties, which have much longer task lists.

    Write In Comments

    As noted earlier, provision was made for survey respondents to comment on the survey and make suggestions on how to improve the internet survey process. Table 6 summarizes analysis of these comments.


    Table 6. Write-In Comments

     Generally positive - most like the idea of web surveys

      Unfamiliar with rating duties before rating tasks


                Several suggested a need to see the tasks comprising a duty
                    before they would be comfortable rating
     

     Several caught one or more misspelled words…            None commented on experimental study


    Overall, the comments were highly positive about the use of the internet to conduct occupational analysis surveys. Several individuals were very enthusiastic about the possibility of the Air Force (and military services in general) collecting needed information in this way. A number of AFOMS occupational analysts found the idea of rating the importance of duty areas to be an unfamiliar activity, or, for those who received the Inventory Order administration, could not understand its purpose. One suggestion, made by several respondents, was that "look up" tables of the tasks comprising a duty should be available to insure better understanding of the sometimes generalized duty titles. Respondents were not hesitant to also point out minor errors, particularly misspelled words or obsolete acronyms.

    Conclusions

    This project was a successful employment of the new GenSurv internet survey technology to collect experimental data quickly (one month administration) in an attempt to answer some questions about how surveys might better be administered. Answers to such questions were not as definitive or complete as we might like, but a trend of saving time with the tailored administration approach was visible. Findings tended to be in the expected direction, and were sufficient to raise the expectation that more substantial savings of job incumbent time to take surveys, reduction in project data collection time, and improvement of data quality are possible, particularly with longer, more complex task lists. We do believe that the data support our expectation that ordered presentation (by duty importance) is a reasonable approach to combat rater fatigue with long lists. As with most research projects, we believe that additional research in needed to pursue these issues, and hope to have the opportunity for such research during on-going GenSurv projects. We anticipate at least four operational GenSurv military projects will be on-line during the coming year.

    Perhaps the most dramatic result of this project is that the entire survey process began in February with a draft inventory and results are available to report in this late May forum - a four month process. Data collection began on 15 April and closed 15 May - just one month. These timelines document some of the major strengths of GenSurv and the Internet: data can be collected quickly and efficiently while eliminating paper-and-pencil and computer diskettes as well as mailing costs (and time). While GenSurv is still a prototype system, it is clearly at the point of operational field surveying capabilities where studies need very rapid response time and where data are needed very quickly.


    References

    Bennett, W., Sego, D.J., Teachout, M. S., & Phalen, W. J. (1994). On-The-Job Training Time as a Criterion for Training: A Comparison of Several Task-Based Estimations Approaches. Proceedings of the 36th Annual Conference of the Military Testing Association, Rotterdam, The Netherlands.

    Mitchell, J.L., Yadrick, R.M., & Bennett, W. R. (1993). Estimating Training Requirements From Job and Training Models. Military Psychology, 5(1),1-20.

    Stanton, J.M., (1998). An empirical assessment of data collection using the internet. Personnel Psychology, 51:709-725.

    Weissmuller, J.J., Mitchell, J.L. (1998, October). Self-prioritizing inventory administration to maximize validity. In the Symposium, J.L. Mitchell & J.S. Tartell, co-chairs, Evaluating Innovations in Training Assessment & Occupational Modeling Technology, Proceedings of the 40th Annual Conference of the International Military Testing Association. Pensacola Beach, FL: Naval Education and Training Professional Development and Technology Center (available at html://www.internationalmta.org ).
     

    Back to Symposium Contents

    Back to IJOA web page