Page 46 - Special Topic Session (STS) - Volume 4
P. 46

STS560 Haniza Yon et al.
                  level; at the item level, it is called Differential Item Functioning (DIF). We tested
                  for bias by gender at both levels.

                  3.  Results
                      Item Fit and Reliability. The results of scaling analyses of 15 factors are
                  summarised in Table 1. Acceptable item MS-Outfit statistics (i.e., Outfit < 1.40,
                  see Linacre, 2019) were obtained for all but five of the factors. For each of the
                  remaining five, the misfit was caused by a single misfitting item, and removing
                  this item did not affect the overall person measures (r > 0.96). Table 1 further
                  shows that each of the factors has acceptable reliability (all values > 0.75).

                   Figure 1: Raw-to-Logit Transformation for Women and Men: Customer-Service Orientation
                                   6


                                   4


                                   2


                                   0
                                      1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46
                                  -2


                                  -4


                                  -6

                                                    Men        Women


                     Gender Bias. Table 1 shows that three factors contained at least one item
                  with  statistically  significant  DIF  by  gender  (i.e.,  the  item  parameters  differ
                  between  men  and  women).  However,  given  the  large  number  of  pairwise
                  comparisons made at p < 0.01, this number is to be expected by chance alone
                  (Binomial Distribution, p > .20).
                     Most importantly, the gender-related DIF had little effect at the test level,
                  in the estimation of respondents’ traits. When the raw-to-logit transformations
                  for  men  and  women  were  computed  separately,  the  differences  were
                  negligible.  For  instance,  “Customer-Service  Orientation”  had  the  largest
                  number of items showing DIF (two). Yet, as is illustrated in Figure 1, the test-
                  level  distortion  is  negligible;  the  raw-to-logit  curves  for  men  and  women
                  essentially coincide. Very similar graphs were obtained for “Problem-Solving
                  & Resourcefulness” and “Problem Solving & Decision-Making”, the other two
                  factors for  which items showed statistically significant DIF. Accordingly, we
                  conclude that the factor estimates were essentially unbiased by gender.

                                                                      35 | I S I   W S C   2 0 1 9
   41   42   43   44   45   46   47   48   49   50   51