MODELING THE COMBAT-DISPOSITION OF NON-COMMISSIONED OFFICER-APPLICANTS USING LOGLINEAR MODELS
Francçois J. Lescreve
Belgian Armed Forces' Center for Recruitment and Selection
1. Introduction
The Belgian Armed Forces organize an annual recruitment of non-commissioned officers (NCO's) . Actually, the Forces are undergoing a major restructuring resulting in severe cuts in the number of applicants which can be enlisted. Given these circumstances, the Services, and in particular the Army, decided to emphasize the recruitment of candidates for combat arms. Once this is known by the applicants, it can be feared however that a number of them will give false information overestimating their combat-disposition in order to be enlisted. It therefore appeared interesting to investigate whether it would be possible to identify such a trait as "combat-disposition" and whether this trait could be related to more objective measures than the expressed preferences of the candidates.
The data for this study were collected during the annual NCO recruitment of 1995. For these data, we can assume that the applicants were not aware of the fact that candidates showing a higher combat-disposition would have a much higher probability of enlistment. So the expressed preferences and choices can be considered as true.
From the approximately 2400 original applicants, we kept the 1054 records of the acceptable applicants. Those are the one's who were still in the running after the medical, physical, intellectual and scholar tests and who passed the final interview successfully. These data have the advantage of being complete for each applicant.
Next table gives an overview of the considered variables.
|
Code |
Name |
Description |
Categ. 1 |
Categ. 2 |
Categ. 3 |
|
G |
GENDER |
gender of applic. |
male |
female |
- |
|
E |
ETHNIC |
ethnic group |
Flemish |
Francophone |
- |
|
P |
PARA |
want to become |
no |
don't know |
yes |
|
A |
ABROAD |
interested in being |
no |
don't know |
yes |
|
K |
MIL_ |
has the applic. previous knowledge of the military ? |
little |
medium |
high |
|
Y |
AGE |
age of the applic. |
<= 18 |
19 to 21 |
>21 |
|
I |
INTELL |
intellectual |
low |
high |
- |
|
S |
PERS |
personality score |
low |
medium |
high |
|
F |
PHYS |
physical fitness |
low |
medium |
high |
|
C |
COMB |
Preference for specialty |
combat arms |
support arms |
- |
|
J |
COMB JOB |
expressed interest for |
low |
high |
- |
|
M |
MEDIC |
overall medical condition |
low |
high |
- |
Table 1.
2. Constructing a latent variable "combat-disposition"
During the selection procedure, the applicants were asked to express their preference towards the different enlistment possibilities (infantry, Armor, …, equipment supply, computer operator) on a 99-points scale. A 2-dimensional scaling of the Euclidean distances between the preferences yielded a solution which showed a clear cut between the so called combat arms and the support arms. Next figure shows that solution.
TWO-DIMENSIONAL SCALING OF THE EXPRESSED PREFERENCES

Figure 1.
Although in reality the "combat-ness" of the different specialties is less dichotomous, it is interesting to denote that in the perception of applicants it seems quite dichotomic. This finding suggests that the construction of a latent class model with a limited number of classes (probably 2) could prove useful to explain the preferences towards all specialties expressed by the applicants.
After reviewing the available data, we considered the variables PARA, ABROAD, COMB_JOB and COMB_ARM as possible candidates to function as indicators for the latent variable "combat-disposition" These data were obtained at different stages of the selection procedure and although they seem highly related, they are not fully redundant. The first latent class model represented in next figure was set up postulating 2 classes for the latent variable.

Figure 2.
As shown by the output1, this model doesn't fit (p-value of L2=0.0000). Further trials consequently will consider either an increase of the number of latent classes, or a modification in the structure of the indicators. Since for identifiability reasons the number of parameters to be independently estimated cannot be larger than the number of the observed cell frequencies, the number of possibilities is limited. Next table gives an overview of some relevant data concerning those possible models.
|
# lat. classes |
indicators |
# parameters |
# cell freq. |
# df |
p-value L2 |
E |
|
2 |
C J P A |
14 |
36 |
22 |
0.0000 |
0.05 |
|
3 |
C J P A |
21 |
36 |
15 |
0.1885 |
0.17 |
|
2 |
C J P |
10 |
12 |
2 |
0.0018 |
0.06 |
|
2 |
C J A |
10 |
12 |
2 |
0.2964 |
0.07 |
|
2 |
C A P |
12 |
18 |
6 |
0.1134 |
0.10 |
|
2 |
J P A |
12 |
18 |
6 |
0.0277 |
0.07 |
Table 2.
The bold p-values of the log-likelihood ratio chi-square statistics indicate the latent class models that do not fit the observed data ( a = 0.05) . These models will no longer be considered. One should notice that we cannot use conditional tests here since we have no hierarchically nested models in which the unrestrictive model can be considered to be valid in the population.
To choose one of the three latent class models that fit the data reasonably well, we'll take the number of latent classes, the significance of L2 and the E values into account together with the interpretability of the parameters. Although the model with 3 latent classes fits reasonably, we'll prefer a more parsimonious model, especially since the multi-dimensional scaling (MDS) solution presented in figure 1 so clearly suggests a dichotomous combat-disposition. For interpretability reasons, as will become clear in a while, together with the L2 and E-values, we preferred the model with C J A as indicators This will be referred to as model B. We now will proceed with the interpretation of the model and reproduce in table 3 the latent and conditional probabilities for convenience.
_____________________________
1The program Outputs can be made available. Requests should be addressed to the author, Center for Recruitment and Selection, BRUYNstraat, B-I 120 BRUSSELS (N-O-H), BELGIUM.
2This Condition is necessary but on itself not sufficient.
|
Model B |
X |
X |
|
|
0.368 |
0.632 |
|
C 1 |
0.185 |
0.833 |
|
J 1 |
0.816 |
0.048 |
|
A 1 |
0.263 |
0.026 |
Table 3.
As shown by table 3, the model classifies 37% of the people in latent class 1 and the remaining 63% in class 2. People belonging to class 1 are characterized by a preference for support arms (C2>C1), a low interest for combat jobs (Jl>J2) and a slight interest in being posted abroad (A3>A2>A1). Compared to the people in class 2 however, they appear not to be as interested in a job abroad as the people in class 2 are (for class 2: A3>A2>Al) The people in class 2 are characterized quite in the opposite way. They prefer combat arms (C1>C2), have interest for combat jobs (J2>Jl) and are willing to be posted abroad (A3>A2>A1)
When looking at the odds ratios, one can state that the odds of preferring combat arms rather than support arms is 22.97 times higher for the people from the second (x=2) than for those of the first class (X=1) 3. The odds of preferring non-combat jobs rather than combat jobs is 87.95 times higher for the people from the first latent class (X=1) than for those from the second class (x=2) 4 . The odds of not volunteering for a job abroad are 22.13 times higher for the people from the first latent class (X=1) than for those from the second class (x=2)5 As one can see, the variable COMB_JOB has the strongest ties with the latent variable. On the other hand, it came a bit as a surprise that the variable PARA didn't appear to be necessary to make the model fit.

Figure 3.
Clearly these latent classes can be interpreted in terms of combat-disposition where class 1 is defined as "low combat-disposition" and class 2 as "high combat-disposition". In next section we'll try to set up models which can estimate the latent trait from more objective data.
_______________________________________________
3
3. Loglinear models with a latent variable
One possible starting point could consist of a model which replicates the latent class model and adds a saturated model for the external variables ({GEKYISFM,AX,CX,JX}) in order to check the parameters for the indicator variables and to obtain a first idea about the relative importance of the external variables. This however would yield a table showing too many zero-frequency cells, especially due to the variable GENDER where the number of females is relatively small.
In order to limit the number of external variables, we started looking at the relation between the external variables and the indicator variables. Table 4 gives the p-values for the Pearson X2. Significant departures from the independence hypothesis are boldfaced (a = 0.05).
|
|
G |
E |
K |
Y |
I |
S |
F |
M |
|
C |
.000 |
.246 |
.211 |
.017 |
.39 |
.330 |
.000 |
.185 |
|
J |
.003 |
.081 |
.486 |
.000 |
.406 |
.097 |
.000 |
.200 |
|
A |
.087 |
.018 |
.022 |
.511 |
.026 |
.051 |
.020 |
.743 |
Table 4.
Although it wouldn't be appropriate to eliminate external variables based on this table alone, it makes some sense to develop our model starting from the external variables apparently related to the strongest indicator of the latent variable. (In this case G (GENDER), Y (AGE) and F (PHYS) which are related to C (COMB_ARM)) . We therefore will try to fit a model with G, Y and F as external variables in a first approach and then see whether the other external variables (especially the ones related to A as shown in table 4) can add something valuables to it.
3.1. Fitting a model with gender (G), age (Y) and physical condition (F)
When considering those three variables, one cannot but think of a causal model such as represented in figure 4. To check whether this 'amoral graph', in which the originators of the causal effects on PHYS are independent, holds, actually two models need to hold. The first one, {Y,G} based on table YG, must check the independence of AGE and GENDER (Hypothesis 1). The second one, {YG,YF,GF} based on table YGF (hypothesis 2), will then yield the parameters for PHYS. Table 5 provides the results.

Figure 4.
|
Hypoth |
1 |
2 |
|
Table |
YG |
YGF |
|
Model |
Y, G |
YG, YF |
|
# param |
4 |
14 |
|
# cells |
6 |
18 |
|
# df |
2 |
4 |
|
p- (L2) |
.0062 |
.2350 |
Table 5.
As one can see the first hypothesis, GENDER and AGE are independent, doesn't hold (p-value of L2 .0062 <0.05). This means that the model as presented in figure 4 must include an interaction between AGE and GENDER. The interaction between GENDER and AGE, however, cannot be estimated by the (fitting) model used for testing the second hypothesis. Since physical fitness is causally posterior to age and gender, physical fitness cannot influence the relationship between age and gender. Holding physical fitness constant when estimating the relation between age and gender makes no sense. In order to test the model depicted by figure 5, we shall use the structural models YG {YG} and YGF {YF,GF} simultaneously. The output is summarized in table 6.

Figure 5
.|
Tables |
YG, YGF |
|
Model |
{YG} , {YF , GF} |
|
# cells |
18 |
|
# param |
14 |
|
# df |
4 |
|
p- (L2) |
0,2350 |
Table 6
.As indicated by the p-value of the L2 statistic, this model fits. We won't take too long to interpret the parameters for the moment as this isn't the final result. We just grasp the opportunity to illustrate how the computed parameters are used to estimate the observed frequencies. We take the additive method for cell Y1G1F1 which has an observed frequency of 70. The formula is:
ln(F-hat) = a + b Y1 + b G1 + b Y1G1 + b F1 + b YiFi + b G1F1
= 3.5697 + 0.1265 + 1.0602 - 0.2166 + 1.0756 + 0.0839 - 1.4466
= 4.2527 which yields an estimated cell frequency F-hat = 70.29496
This way of computing estimated cell-frequencies can be derived from the design matrix for this model which is given in table 6.7.
_______________________________
6This value departs slightly from the one given by the Computer Output (68.835) merely because of rounding errors.
|
YGF |
0 |
Y1 |
Y2 |
G1 |
Y1 G1 |
Y2 G1 |
F1 |
F2 |
Y1 F1 |
Y1 F2 |
Y2 F1 |
Y2 F2 |
G1 F1 |
G1 F2 |
|
111 |
1 |
1 |
0 |
1 |
1 |
0 |
1 |
0 |
1 |
0 |
0 |
0 |
1 |
0 |
|
112 |
1 |
1 |
0 |
1 |
1 |
0 |
0 |
1 |
0 |
1 |
0 |
0 |
0 |
1 |
|
113 |
1 |
1 |
0 |
1 |
1 |
0 |
-l |
-l |
-l |
-l |
0 |
0 |
-l |
-1 |
|
121 |
1 |
1 |
0 |
-l |
-l |
0 |
1 |
0 |
1 |
0 |
0 |
0 |
-1 |
0 |
|
122 |
1 |
1 |
0 |
-l |
-l |
0 |
0 |
1 |
0 |
1 |
0 |
0 |
0 |
-1 |
|
123 |
1 |
1 |
0 |
-l |
-l |
0 |
-l |
-l |
-l |
-l |
0 |
0 |
1 |
1 |
|
211 |
1 |
0 |
1 |
1 |
0 |
1 |
1 |
0 |
0 |
0 |
1 |
0 |
1 |
0 |
|
212 |
1 |
0 |
1 |
1 |
0 |
1 |
0 |
1 |
0 |
0 |
0 |
1 |
0 |
1 |
|
213 |
1 |
0 |
1 |
1 |
0 |
1 |
-l |
-l |
0 |
0 |
-1 |
-l |
-l |
-l |
|
221 |
1 |
0 |
1 |
-1 |
0 |
-1 |
1 |
0 |
0 |
0 |
1 |
0 |
-l |
0 |
|
222 |
1 |
0 |
1 |
-1 |
0 |
-l |
0 |
1 |
0 |
0 |
0 |
1 |
0 |
-1 |
|
223 |
1 |
0 |
1 |
-1 |
0 |
-l |
-l |
-l |
0 |
0 |
-l |
-l |
1 |
1 |
|
311 |
1 |
-l |
-l |
1 |
-l |
-l |
1 |
0 |
-l |
0 |
-l |
0 |
1 |
0 |
|
312 |
1 |
-l |
-l |
1 |
-l |
-l |
0 |
1 |
0 |
-l |
0 |
-l |
0 |
1 |
|
313 |
1 |
-l |
-l |
1 |
-l |
-l |
-1 |
-l |
1 |
1 |
1 |
1 |
-l |
-1 |
|
321 |
1 |
-1 |
-1 |
-1 |
1 |
1 |
1 |
0 |
-l |
0 |
-l |
0 |
-l |
0 |
|
322 |
1 |
-1 |
-1 |
-1 |
1 |
1 |
-1 |
-1 |
1 |
1 |
1 |
1 |
1 |
1 |
|
323 |
1 |
-1 |
-1 |
-1 |
1 |
1 |
0 |
1 |
0 |
-1 |
0 |
-1 |
0 |
-1 |
Table 7
. Design matrix for model YG {YG} , YGF {GF,YF}This matrix can be modified in order to impose certain relations among the variables. One could for example impose the equality among the parameters related to YF and GF. This would mean that the different classes of AGE respectively GENDER would have the same effect on PHYS. Within the context of our study, this doesn't make sense.
The next step consists of relating the obtained model to the latent class model developed previously.
3.2 Building a modified LISREL model.
We will now consider five different modified LISREL models that fit the data reasonably well They all include GENDER, AGE, PHYS, COMB_ARM, COMB_JOB, ABROAD and the latent variable with two classes. Several trials to include the other possible variables in a meaningful way failed and will not be discussed here. Next overview gives the model specification and the graphical representation concerning those models. Table 8 gives their principal data.

|
Model |
p-value L2 |
BIC |
param |
df |
||
|
G |
0.87 |
9325 |
44 |
172 |
||
|
H |
0.43 |
9236 |
25 |
191 |
||
|
I |
0.78 |
9230 |
27 |
189 |
||
|
J |
0.55 |
9236 |
26 |
190 |
||
|
K |
0.88 |
9228 |
28 |
188 |
||
Table 8
. Comparative data for the considered modelsAs said before, all these models fit. To choose one of them, we looked at the fit, the Bayesian information criterion (BIC) and the number of independent parameters. We therefore chose model K which seems a nice compromise between fit and parsimony. This is the model we will try to interpret now.
First we will check whether the link between indicators and latent variable is consistent with the model including only indicators and latent variable (model B) . Next table shows the latent and conditional probabilities for model B and for Model K
|
|
Model B |
Model K |
|||
|
Cl |
Xl |
X2 |
Xl |
X2 |
|
Table 9
. Latent and conditional probabilities for model B and K: comparison;As one can see, the two models yield very similar values. This means that the interpretation we gave for model B still holds for model K. So latent class Xl is still defined as "low combat disposition" and class X2 as "high combat disposition".
When considering table 10, where we give the remainder of the latent and conditional probabilities for Model K, we can develop the interpretation further.
|
Xl |
X2 |
|
|
|
0.350 |
0.650 |
|
Gl |
0.824 |
0.924 |
|
G2 |
0.176 |
0.076 |
|
Yl |
0.230 |
0.362 |
|
Y2 |
0.441 |
0.396 |
|
Y3 |
0.329 |
0.242 |
|
Fl |
0.387 |
0.255 |
|
F2 |
0.362 |
0.303 |
|
F3 |
0.251 |
0.442 |
Table 10
Now we can see that the model classifies 35% of the applicants in latent class 1 and the remaining 65% in class 2. People belonging to class 1 are generally males (Gl) . However, when compared to class 2, we see that the proportion of males is still higher for class 2. The interpretation of the proportions of the two classes for the variable AGE (Y) , is less obvious for now (it will become clear when we look at the odds ratio) . The people belonging to class Xl tend to have lower results for physical fitness (F)
When looking at the odds, one can make following statements.
The odds of belonging to class "high combat disposition" (X2) rather than class "low combat disposition" (Xl) are 2.59 times higher for the boys than for the girls (p <0.0000, Exact 95% CI l.939® 3.50l).
For the variable AGE (Y), the odds ratio between Y2 and Y3 is not significantly different from 1 (OR = 1.221, p>0.07655) . So, we collapsed the categories Y2 and Y3 for further computation of the odds ratio. The odds of belonging to the class "high combat disposition" rather than the class "low combat disposition" is 1.900 times higher for the young applicants (Y1: = 18 years) than for the older ones (Y2 or Y3: >18 years) (p <0.0000, Exact 95% CI 1.555® 2.322)
For the variable PHYS (F), the odds ratios based on the conditional probabilities can be summarized by table 11.
|
Odds Ratio |
Exact 95% CI |
p-value OR |
|
|
Fl - F2 |
1.270 |
l.0l96 ® l.5825 |
0.0328 |
|
F2 - F3 |
2.1039 |
1.6826 ® 2.6308 |
0.0000 |
|
Fl - F3 |
2.6723 |
2.1285 ® 3.3557 |
0.0000 |
Table 11
.When we interpret the last row of table 11 for instance, we can say that the odds of belonging to the class "high combat disposition" (X2) rather than low combat disposition is 2.67 times higher for applicants with a good physical fitness (F3) than for the ones with a poor physical fitness (Fl). When looking at the values of the parameters, one notices the large (but not surprising) effect of gender on physical fitness (given by l G1F1 = -1.4460) indicating that the males tend not to belong to the low physical fitness class).
To conclude, we can say that especially the young males who are in good physical condition tend to belong to the class of "high combat disposition". This latent disposition explains the observed answers concerning preference for combat arms, interest for combat jobs and disponibility for jobs abroad well. It does not appear to be necessary to include variables such as medical fitness, personality scores or intellectual potential to obtain a model that fits the observed data well. It seems to be possible to infer the combat disposition based on the variables GENDER, AGE and PHYSICAL FITNESS only. This study however should be repeated on a new data set, to confirm these conclusions The question also raises as to the use of the particular results of this research because AGE by definition isn't a constant quality of an individual and GENDER is a variable which is forbidden to use by law. On the other hand, the combat-disposition is something which is well perceptible within military units. So it might be possible that combat-disposition does exist but that it isn't assessed well by the expression of preferences within the actual selection procedure. This would mean that the latent variable we used wouldn't reflect true combat-disposition but merely the intention of combat-disposition expressed by applicants.