Page 367 - Invited Paper Session (IPS) - Volume 2
P. 367
IPS279 Rense Lange
formal proof or derivations. Those interested in a more complete presentation
should consult [8].
Rating scales are modelled analogous to the binary case. Figure 3 shows
that frequencies are collected in a matrix F with sides nitems.nsteps1, where
nitems is the number of items, nsteps denotes the number of answer
categories and nsteps1 = nsteps –1. Higher ratings (i.e., 1, ..) occur along
the rows, whereas lower values occur along the columns (…, nsteps1). Each
new rating has to be compared to all others, requiring a total of
nitems(nitems – 1) / 2 comparisons. Realistic applications rarely involve over
five rating scales which takes under 10 numerical comparisons. As an
example, assume that the very first observations to be added to the zero
matrix F are 0, 2, and 1 for items 1, 2, and 3, respectively. In this case the
item table is updated as is shown by the 1’s in the left table. Of course, later
observations will be added cumulatively as grading progresses.
Computing the rater severity parameters requires updating a second F
matrix of size nrat x nrat (see Figure 3, right side) where nrat denotes the
total number of raters – and the focus is on the sum of the ratings. In
rd
th
particular, whenever the same person is being rated by a second (or 3 , 4 ,
…) rater, the sum of the last ratings is compared to the earlier summed
ratings. The table then simply tallies by raters the number of times one of
the sums exceeds the other by exactly one point.
Item biases can be studied by computing group-specific rating x categories
F matrices (see below).
Computational Effort During Data Collection. Because the number of
test takers can be very large, updating of the rater table is far more
computationally expensive overall than is the updating the item table. That is,
one has to keep track of doubly rated students, and their rating sums of have
to be compared across raters. In the worst case, each new case may require
inspecting all previously processed test-takers. Assuming that npers students
have already been graded, this could potentially require npers.(npers-1)/2
comparisons. The use of random access techniques can reduce this number.
Given the frequency matrices, the following steps are required to obtain actual
parameter estimates:
The ratio of off diagonal elements is required. For instance, the rater
matrix F in Figure 3 yields the ratios Rij = Fji / Fji shown in the centre
table of Figure 4. See below for dealing with zeros.
Taking the logarithms of each entry in R yields the matrix log(R) shown
on the right of Figure 4.
The row-means of log(R) represent the severity parameters Sk
(rightmost Sk column of Figure 4).
Updating the D and S frequency matrices is computationally
independent. Thus, these updates can be performed in parallel using
354 | I S I W S C 2 0 1 9