MODELING THE COMBAT-DISPOSITION OF NON-COMMISSIONED OFFICER-APPLICANTS USING LOGLINEAR MODELS

Francçois J. Lescreve
Belgian Armed Forces' Center for Recruitment and Selection

1. Introduction

The Belgian Armed Forces organize an annual recruitment of non-commissioned officers (NCO's) . Actually, the Forces are undergoing a major restructuring resulting in severe cuts in the number of applicants which can be enlisted. Given these circumstances, the Services, and in particular the Army, decided to emphasize the recruitment of candidates for combat arms. Once this is known by the applicants, it can be feared however that a number of them will give false information overestimating their combat-disposition in order to be enlisted. It therefore appeared interesting to investigate whether it would be possible to identify such a trait as "combat-disposition" and whether this trait could be related to more objective measures than the expressed preferences of the candidates.

The data for this study were collected during the annual NCO recruitment of 1995. For these data, we can assume that the applicants were not aware of the fact that candidates showing a higher combat-disposition would have a much higher probability of enlistment. So the expressed preferences and choices can be considered as true.

From the approximately 2400 original applicants, we kept the 1054 records of the acceptable applicants. Those are the one's who were still in the running after the medical, physical, intellectual and scholar tests and who passed the final interview successfully. These data have the advantage of being complete for each applicant.

Next table gives an overview of the considered variables.

Code

Name

Description

Categ. 1

Categ. 2

Categ. 3

G

GENDER

gender of applic.

male

female

-

E

ETHNIC

ethnic group

Flemish

Francophone

-

P

PARA

want to become
paratrooper ?

no

don't know

yes

A

ABROAD

interested in being
posted abroad ?

no

don't know

yes

K

MIL_
KNOW

has the applic. previous knowledge of the military ?

little
knowledge

medium
knowledge

high
knowledge

Y

AGE

age of the applic.

<= 18

19 to 21

>21

I

INTELL

intellectual
capacity

low

high

-

S

PERS

personality score

low

medium

high

F

PHYS

physical fitness

low

medium

high

C

COMB
ARM

Preference for specialty

combat arms

support arms

-

J

COMB JOB

expressed interest for
combat job

low

high

-

M

MEDIC

overall medical condition

low

high

-

Table 1.

2. Constructing a latent variable "combat-disposition"

During the selection procedure, the applicants were asked to express their preference towards the different enlistment possibilities (infantry, Armor, …, equipment supply, computer operator) on a 99-points scale. A 2-dimensional scaling of the Euclidean distances between the preferences yielded a solution which showed a clear cut between the so called combat arms and the support arms. Next figure shows that solution.

TWO-DIMENSIONAL SCALING OF THE EXPRESSED PREFERENCES

Figure 1.

Although in reality the "combat-ness" of the different specialties is less dichotomous, it is interesting to denote that in the perception of applicants it seems quite dichotomic. This finding suggests that the construction of a latent class model with a limited number of classes (probably 2) could prove useful to explain the preferences towards all specialties expressed by the applicants.

After reviewing the available data, we considered the variables PARA, ABROAD, COMB_JOB and COMB_ARM as possible candidates to function as indicators for the latent variable "combat-disposition" These data were obtained at different stages of the selection procedure and although they seem highly related, they are not fully redundant. The first latent class model represented in next figure was set up postulating 2 classes for the latent variable.

Figure 2.

As shown by the output1, this model doesn't fit (p-value of L2=0.0000). Further trials consequently will consider either an increase of the number of latent classes, or a modification in the structure of the indicators. Since for identifiability reasons the number of parameters to be independently estimated cannot be larger than the number of the observed cell frequencies, the number of possibilities is limited. Next table gives an overview of some relevant data concerning those possible models.

# lat. classes

indicators

# parameters

# cell freq.

# df

p-value L2

E

2

C J P A

14

36

22

0.0000

0.05

3

C J P A

21

36

15

0.1885

0.17

2

C J P

10

12

2

0.0018

0.06

2

C J A

10

12

2

0.2964

0.07

2

C A P

12

18

6

0.1134

0.10

2

J P A

12

18

6

0.0277

0.07

Table 2.

The bold p-values of the log-likelihood ratio chi-square statistics indicate the latent class models that do not fit the observed data ( a = 0.05) . These models will no longer be considered. One should notice that we cannot use conditional tests here since we have no hierarchically nested models in which the unrestrictive model can be considered to be valid in the population.

To choose one of the three latent class models that fit the data reasonably well, we'll take the number of latent classes, the significance of L2 and the E values into account together with the interpretability of the parameters. Although the model with 3 latent classes fits reasonably, we'll prefer a more parsimonious model, especially since the multi-dimensional scaling (MDS) solution presented in figure 1 so clearly suggests a dichotomous combat-disposition. For interpretability reasons, as will become clear in a while, together with the L2 and E-values, we preferred the model with C J A as indicators This will be referred to as model B. We now will proceed with the interpretation of the model and reproduce in table 3 the latent and conditional probabilities for convenience.

_____________________________
1The program Outputs can be made available. Requests should be addressed to the author, Center for Recruitment and Selection, BRUYNstraat, B-I 120 BRUSSELS (N-O-H), BELGIUM.
2This Condition is necessary but on itself not sufficient.

Model B

X
1

X
2

 

0.368

0.632

C 1
C 2

0.185
0.815

0.833
0.167

J 1
J 2

0.816
0.184

0.048
0.952

A 1
A 2
A 3

0.263
0.342
0.395

0.026
0.110
0.864

Table 3.

As shown by table 3, the model classifies 37% of the people in latent class 1 and the remaining 63% in class 2. People belonging to class 1 are characterized by a preference for support arms (C2>C1), a low interest for combat jobs (Jl>J2) and a slight interest in being posted abroad (A3>A2>A1). Compared to the people in class 2 however, they appear not to be as interested in a job abroad as the people in class 2 are (for class 2: A3>A2>Al) The people in class 2 are characterized quite in the opposite way. They prefer combat arms (C1>C2), have interest for combat jobs (J2>Jl) and are willing to be posted abroad (A3>A2>A1)

When looking at the odds ratios, one can state that the odds of preferring combat arms rather than support arms is 22.97 times higher for the people from the second (x=2) than for those of the first class (X=1) 3. The odds of preferring non-combat jobs rather than combat jobs is 87.95 times higher for the people from the first latent class (X=1) than for those from the second class (x=2) 4 . The odds of not volunteering for a job abroad are 22.13 times higher for the people from the first latent class (X=1) than for those from the second class (x=2)5 As one can see, the variable COMB_JOB has the strongest ties with the latent variable. On the other hand, it came a bit as a surprise that the variable PARA didn't appear to be necessary to make the model fit.

Figure 3.

Clearly these latent classes can be interpreted in terms of combat-disposition where class 1 is defined as "low combat-disposition" and class 2 as "high combat-disposition". In next section we'll try to set up models which can estimate the latent trait from more objective data.

_______________________________________________
3 .185*.167/.833*.815=0.04548= 1/21.97
4 .816*.952/.048* .184=87.95
5 Here the odds ratio was Computed for the Category 1 versus 3 of the variable ABROAD, yielding .263*.864/.026*.395= 22.13

3. Loglinear models with a latent variable

One possible starting point could consist of a model which replicates the latent class model and adds a saturated model for the external variables ({GEKYISFM,AX,CX,JX}) in order to check the parameters for the indicator variables and to obtain a first idea about the relative importance of the external variables. This however would yield a table showing too many zero-frequency cells, especially due to the variable GENDER where the number of females is relatively small.

In order to limit the number of external variables, we started looking at the relation between the external variables and the indicator variables. Table 4 gives the p-values for the Pearson X2. Significant departures from the independence hypothesis are boldfaced (a = 0.05).

 

G

E

K

Y

I

S

F

M

C

.000

.246

.211

.017

.39

.330

.000

.185

J

.003

.081

.486

.000

.406

.097

.000

.200

A

.087

.018

.022

.511

.026

.051

.020

.743

Table 4.

Although it wouldn't be appropriate to eliminate external variables based on this table alone, it makes some sense to develop our model starting from the external variables apparently related to the strongest indicator of the latent variable. (In this case G (GENDER), Y (AGE) and F (PHYS) which are related to C (COMB_ARM)) . We therefore will try to fit a model with G, Y and F as external variables in a first approach and then see whether the other external variables (especially the ones related to A as shown in table 4) can add something valuables to it.

3.1. Fitting a model with gender (G), age (Y) and physical condition (F)

When considering those three variables, one cannot but think of a causal model such as represented in figure 4. To check whether this 'amoral graph', in which the originators of the causal effects on PHYS are independent, holds, actually two models need to hold. The first one, {Y,G} based on table YG, must check the independence of AGE and GENDER (Hypothesis 1). The second one, {YG,YF,GF} based on table YGF (hypothesis 2), will then yield the parameters for PHYS. Table 5 provides the results.

Figure 4.

Hypoth

1

2

Table

YG

YGF

Model

Y, G

YG, YF

# param

4

14

# cells

6

18

# df

2

4

p- (L2)

.0062

.2350

Table 5.

As one can see the first hypothesis, GENDER and AGE are independent, doesn't hold (p-value of L2 .0062 <0.05). This means that the model as presented in figure 4 must include an interaction between AGE and GENDER. The interaction between GENDER and AGE, however, cannot be estimated by the (fitting) model used for testing the second hypothesis. Since physical fitness is causally posterior to age and gender, physical fitness cannot influence the relationship between age and gender. Holding physical fitness constant when estimating the relation between age and gender makes no sense. In order to test the model depicted by figure 5, we shall use the structural models YG {YG} and YGF {YF,GF} simultaneously. The output is summarized in table 6.

Figure 5.

Tables

YG, YGF

Model

{YG} , {YF , GF}

# cells

18

# param

14

# df

4

p- (L2)

0,2350

Table 6.

As indicated by the p-value of the L2 statistic, this model fits. We won't take too long to interpret the parameters for the moment as this isn't the final result. We just grasp the opportunity to illustrate how the computed parameters are used to estimate the observed frequencies. We take the additive method for cell Y1G1F1 which has an observed frequency of 70. The formula is:

ln(F-hat) = a + b Y1 + b G1 + b Y1G1 + b F1 + b YiFi + b G1F1

= 3.5697 + 0.1265 + 1.0602 - 0.2166 + 1.0756 + 0.0839 - 1.4466

= 4.2527 which yields an estimated cell frequency F-hat = 70.29496

This way of computing estimated cell-frequencies can be derived from the design matrix for this model which is given in table 6.7.

_______________________________
6This value departs slightly from the one given by the Computer Output (68.835) merely because of rounding errors.

YGF

0

Y1

Y2

G1

Y1

G1

Y2

G1

F1

F2

Y1

F1

Y1

F2

Y2

F1

Y2

F2

G1

F1

G1

F2

111

1

1

0

1

1

0

1

0

1

0

0

0

1

0

112

1

1

0

1

1

0

0

1

0

1

0

0

0

1

113

1

1

0

1

1

0

-l

-l

-l

-l

0

0

-l

-1

121

1

1

0

-l

-l

0

1

0

1

0

0

0

-1

0

122

1

1

0

-l

-l

0

0

1

0

1

0

0

0

-1

123

1

1

0

-l

-l

0

-l

-l

-l

-l

0

0

1

1

211

1

0

1

1

0

1

1

0

0

0

1

0

1

0

212

1

0

1

1

0

1

0

1

0

0

0

1

0

1

213

1

0

1

1

0

1

-l

-l

0

0

-1

-l

-l

-l

221

1

0

1

-1

0

-1

1

0

0

0

1

0

-l

0

222

1

0

1

-1

0

-l

0

1

0

0

0

1

0

-1

223

1

0

1

-1

0

-l

-l

-l

0

0

-l

-l

1

1

311

1

-l

-l

1

-l

-l

1

0

-l

0

-l

0

1

0

312

1

-l

-l

1

-l

-l

0

1

0

-l

0

-l

0

1

313

1

-l

-l

1

-l

-l

-1

-l

1

1

1

1

-l

-1

321

1

-1

-1

-1

1

1

1

0

-l

0

-l

0

-l

0

322

1

-1

-1

-1

1

1

-1

-1

1

1

1

1

1

1

323

1

-1

-1

-1

1

1

0

1

0

-1

0

-1

0

-1

Table 7. Design matrix for model YG {YG} , YGF {GF,YF}

This matrix can be modified in order to impose certain relations among the variables. One could for example impose the equality among the parameters related to YF and GF. This would mean that the different classes of AGE respectively GENDER would have the same effect on PHYS. Within the context of our study, this doesn't make sense.

The next step consists of relating the obtained model to the latent class model developed previously.

3.2 Building a modified LISREL model.

We will now consider five different modified LISREL models that fit the data reasonably well They all include GENDER, AGE, PHYS, COMB_ARM, COMB_JOB, ABROAD and the latent variable with two classes. Several trials to include the other possible variables in a meaningful way failed and will not be discussed here. Next overview gives the model specification and the graphical representation concerning those models. Table 8 gives their principal data.

Model

p-value L2

BIC

param

df

G

0.87

9325

44

172

H

0.43

9236

25

191

I

0.78

9230

27

189

J

0.55

9236

26

190

K

0.88

9228

28

188

Table 8. Comparative data for the considered models

As said before, all these models fit. To choose one of them, we looked at the fit, the Bayesian information criterion (BIC) and the number of independent parameters. We therefore chose model K which seems a nice compromise between fit and parsimony. This is the model we will try to interpret now.

First we will check whether the link between indicators and latent variable is consistent with the model including only indicators and latent variable (model B) . Next table shows the latent and conditional probabilities for model B and for Model K

 

Model B

Model K

 

Cl
C2
Jl
J2
Al
A2
A3

Xl
0.368
0.185
0.815
0.816
0.184
0.263
0.342
0.395

X2
0.632
0.833
0.167
0.048
0.952
0.026
0.110
0.864

Xl
0.350
0.165
0.835
0.831
0.169
0.268
0.343
0.390

X2
0.650
0.827
0.173
0.060
0.940
0.029
0.116
0.855

Table 9. Latent and conditional probabilities for model B and K: comparison;

As one can see, the two models yield very similar values. This means that the interpretation we gave for model B still holds for model K. So latent class Xl is still defined as "low combat disposition" and class X2 as "high combat disposition".

When considering table 10, where we give the remainder of the latent and conditional probabilities for Model K, we can develop the interpretation further.

Xl

X2

 

0.350

0.650

Gl

0.824

0.924

G2

0.176

0.076

Yl

0.230

0.362

Y2

0.441

0.396

Y3

0.329

0.242

Fl

0.387

0.255

F2

0.362

0.303

F3

0.251

0.442

Table 10

Now we can see that the model classifies 35% of the applicants in latent class 1 and the remaining 65% in class 2. People belonging to class 1 are generally males (Gl) . However, when compared to class 2, we see that the proportion of males is still higher for class 2. The interpretation of the proportions of the two classes for the variable AGE (Y) , is less obvious for now (it will become clear when we look at the odds ratio) . The people belonging to class Xl tend to have lower results for physical fitness (F)

When looking at the odds, one can make following statements.

The odds of belonging to class "high combat disposition" (X2) rather than class "low combat disposition" (Xl) are 2.59 times higher for the boys than for the girls (p <0.0000, Exact 95% CI l.939® 3.50l).

For the variable AGE (Y), the odds ratio between Y2 and Y3 is not significantly different from 1 (OR = 1.221, p>0.07655) . So, we collapsed the categories Y2 and Y3 for further computation of the odds ratio. The odds of belonging to the class "high combat disposition" rather than the class "low combat disposition" is 1.900 times higher for the young applicants (Y1: = 18 years) than for the older ones (Y2 or Y3: >18 years) (p <0.0000, Exact 95% CI 1.555® 2.322)

For the variable PHYS (F), the odds ratios based on the conditional probabilities can be summarized by table 11.

Odds Ratio

Exact 95% CI

p-value OR

Fl - F2

1.270

l.0l96® l.5825

0.0328

F2 - F3

2.1039

1.6826® 2.6308

0.0000

Fl - F3

2.6723

2.1285® 3.3557

0.0000

Table 11.

When we interpret the last row of table 11 for instance, we can say that the odds of belonging to the class "high combat disposition" (X2) rather than low combat disposition is 2.67 times higher for applicants with a good physical fitness (F3) than for the ones with a poor physical fitness (Fl). When looking at the values of the parameters, one notices the large (but not surprising) effect of gender on physical fitness (given by l G1F1 = -1.4460) indicating that the males tend not to belong to the low physical fitness class).

To conclude, we can say that especially the young males who are in good physical condition tend to belong to the class of "high combat disposition". This latent disposition explains the observed answers concerning preference for combat arms, interest for combat jobs and disponibility for jobs abroad well. It does not appear to be necessary to include variables such as medical fitness, personality scores or intellectual potential to obtain a model that fits the observed data well. It seems to be possible to infer the combat disposition based on the variables GENDER, AGE and PHYSICAL FITNESS only. This study however should be repeated on a new data set, to confirm these conclusions The question also raises as to the use of the particular results of this research because AGE by definition isn't a constant quality of an individual and GENDER is a variable which is forbidden to use by law. On the other hand, the combat-disposition is something which is well perceptible within military units. So it might be possible that combat-disposition does exist but that it isn't assessed well by the expression of preferences within the actual selection procedure. This would mean that the latent variable we used wouldn't reflect true combat-disposition but merely the intention of combat-disposition expressed by applicants.

Back to Table of Contents