Page 368 - Invited Paper Session (IPS) - Volume 2
P. 368

IPS279 Rense Lange
                         different processors thereby decreasing computation time.
                        The parameters’ SE can be obtained via bootstrapping (at least of 25
                         samples are required [8])
                        Updating the person and rater frequency matrices (see Figure 3) shares
                         no computational steps. To save execution time, these updates can
                         thus be done simultaneously using parallel processing.
                        The  computation  of  the  items’  difficulty  parameters  proceeds
                         analogously,  and  yield  Di1=  Di+F1,  Di2  =  Di+F2  as  row  means  (not
                         shown). The convention that S Di = 0 identifies the Di and the Fw.

                  Figure 4: F  (left), R (middle) and log(R) (right) matrices for rater data in right-side of
                            2
                  Figure 3.







                     Zero Entries. Above we assumed that the frequency matrices in Figures 3
                  contain no zeros, but this is not actually the case. However, raising F to the
                                         p
                  next power (i.e., F p+1  = F .F ) will correct this by connecting indirectly connected
                  pairs of items or raters thereby decreasing the number of zeros [8, 9]. It is
                  almost always necessary to raise F to a second power, but higher powers are
                  rarely  needed.  For  instance,  the  hypothetical  rater  frequency  matrix  in  the
                  right-hand side of Figure 3 contains structural zeros along the diagonal, as
                  well an additional zero in row 3 and col 3. As is illustrated by the left table in
                  Figure 4, raising this matrix to the second power causes all zeros to disappear.
                     Computational Effort For Parameter Updates. The computational efforts
                  needed to obtain actual parameter estimates is small. In particular, creating
                  the R and log(R) matrices for items and raters requires computational effort
                                                               2
                                                  2
                  proportional to (nitems x nsteps)  and nraters , respectively. In practice, 30 >
                  nraters  >  nitems  x  nsteps,  and  processing  matrices  of  this  size  requires
                  negligible effort.
                     PROX, JMLE [3] or a Conditional Maximum Likelihood approaches [1] can
                  all be used to estimate students’ Tj. This can efficiently be implemented as a
                  “raw-sum to logit” lookup table for a neutral rater (Sk = 0), adjusting the table
                  entry by the raters’ estimated severity and averaging across raters.

                  3.  Simulation Results
                      Jupyter iPython 3.6.5 was used to simulate the grading of student essays
                  written for tests of reading comprehension. Hypothetical essays were graded
                  for “completeness” and “style” and these items had simulated difficulties Di =
                  -0.5 and 0.5 respectively (see Equation 1). As in Figure 1, each item used a


                                                                     355 | I S I   W S C   2 0 1 9
   363   364   365   366   367   368   369   370   371   372   373