Page 366 - Invited Paper Session (IPS) - Volume 2
P. 366

IPS279 Rense Lange
                  Figure 2: PAIRS algorithm applied to 2500 simulated items and 25 simulated persons
                  (see text).






















                     Illustration.  Simulations  indicate  that  PAIRS  works  surprisingly  well  for
                  binary items. For instance, 2500 item difficulty parameters Di were drawn from
                  a  normal  distribution  with  M  =  -0.5  and  SD  =  1.5  (i.e.,  N(-0.5,  1.5)).  Also,
                  twenty-five person trait level parameters Tj were obtained  drawing from N(0,
                  1). Equation 5 was then applied to the resulting 25 x 2500 matrix of 0 and 1s
                  to recover the Di. It can be seen in Figure 2 above that this unusually small
                  number  of  respondents  sufficed  to  recover  the  2500  Di  with  considerable
                  accuracy (r = 0.92, SE = 0.009) – albeit with poor results for extreme values of
                  D. To bring SE down to below 0.001 required just 50 simulated persons (r =
                  0.98).

                  Figure 3: Frequency (F) tables of item x ratings pairs (left) and rater raw sums  (right).


                               Item 1   Item 2   Item 3          Rater 1 Rater 2 Rater 3 Rater 4
                             0   1  0  1   0  1                   x-1   x-1   x-1  x-1
                     Item 1 1                 1 D 11     Rater 1 x  0   13    5     7  S 1
                           2                    D 12     Rater 2 x  3    0    3     2  S 2
                     Item 2 1                   D 21     Rater 3 x  15   3    0     2  S 3
                           2     1            1 D 22     Rater 4 x  8   14    0     0  S 4
                     Item 3  1                  D 31
                           2                    D 32

                     Raters and Rating Scales. Equation 1 allows for the use of raters and rating
                  scale  response  formats  with  more  than  two  categories.  The  following
                  simulations assumes that 3 three-category rating scales are used and that half
                  of the respondents are being rated by two raters and the others by a single
                  one. The only  side condition (to  be discussed later)  is that the ratings are
                  connected in a graph theoretical sense. Space limitations prevent deriving the
                  equations for the general case, analogous to Equations 2 through 5. Instead,
                  the  algorithms  below  are  supported  by  computer  simulations  rather  than

                                                                     353 | I S I   W S C   2 0 1 9
   361   362   363   364   365   366   367   368   369   370   371