Page 46 - Special Topic Session (STS) - Volume 4
P. 46
STS560 Haniza Yon et al.
level; at the item level, it is called Differential Item Functioning (DIF). We tested
for bias by gender at both levels.
3. Results
Item Fit and Reliability. The results of scaling analyses of 15 factors are
summarised in Table 1. Acceptable item MS-Outfit statistics (i.e., Outfit < 1.40,
see Linacre, 2019) were obtained for all but five of the factors. For each of the
remaining five, the misfit was caused by a single misfitting item, and removing
this item did not affect the overall person measures (r > 0.96). Table 1 further
shows that each of the factors has acceptable reliability (all values > 0.75).
Figure 1: Raw-to-Logit Transformation for Women and Men: Customer-Service Orientation
6
4
2
0
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46
-2
-4
-6
Men Women
Gender Bias. Table 1 shows that three factors contained at least one item
with statistically significant DIF by gender (i.e., the item parameters differ
between men and women). However, given the large number of pairwise
comparisons made at p < 0.01, this number is to be expected by chance alone
(Binomial Distribution, p > .20).
Most importantly, the gender-related DIF had little effect at the test level,
in the estimation of respondents’ traits. When the raw-to-logit transformations
for men and women were computed separately, the differences were
negligible. For instance, “Customer-Service Orientation” had the largest
number of items showing DIF (two). Yet, as is illustrated in Figure 1, the test-
level distortion is negligible; the raw-to-logit curves for men and women
essentially coincide. Very similar graphs were obtained for “Problem-Solving
& Resourcefulness” and “Problem Solving & Decision-Making”, the other two
factors for which items showed statistically significant DIF. Accordingly, we
conclude that the factor estimates were essentially unbiased by gender.
35 | I S I W S C 2 0 1 9