Page 363 - Invited Paper Session (IPS)

Page 363 - Invited Paper Session (IPS) - Volume 2

P. 363

IPS279 Rense Lange
service reviews [2]. Because essay writing and evaluation procedures are
familiar to nearly everyone, I use essay grading terminology throughout the
following.
There already exists excellent software to estimate MFRS models and to
assess model fit [4, 5]. However, this software is batch-oriented, for use when
data collection is completed. Yet, in practice it is often desirable to identify
and correct problems much earlier. For instance, in a project that required
timely grading of some 150,000 student essays [6], MFRS was used to identify
poorly performing graders. However, model fitting could not be completed
before graders already had started their next working day and this greatly
limited its’ usefulness. Similar concerns play a role in other contexts [2].
This paper grew out of the OBJECTIVE project that aims to fit MFRS models
online in “real-time” - at least in an essay grading context, while providing
useful quality control information. The current emphasis is the parameter
estimation aspect via the PAIRS approach [7], and issues related to quality
control and model fit are outside the scope of the current paper. The basics
are discussed in some detail [7, 8] and this is followed by a presentation of the
results of various computer simulations that are aimed at discovering the
approach’s strength and weaknesses for applications like OBJECTIVE.

2. Methodology
Essay grading is characterized by four sets of the parameters [2]: (1) Di, the
difficulty of the item, question or problem i; (2) Tj, the ability, capability, or trait
level of the person answering the questions; and (3) Sk, the severity of the rater
/ evaluator who assigns an (ordinal) grade to evaluate the answer or product.
Raters’ judgments typically take the form of a rating scale, i.e., raters choose
one option out of a set of ordered response categories. Accordingly, a fourth
set of “step” parameters {Fw} is used to represent the points at which ratings w
= i and w = i-1 occur with equally probability. Together, these parameters
define the following probabilistic response model (see, e.g., [3]):

log(Pijkw / Pijk(w-1)) = Tj – Di – Sk + Fw,
(1)

where Pijkw represents the probability that we observe a rating equal to w
rather than (w-1) with w = 0, 1, ... . Notice that T, D, S, and F are all expressed
in the same metric, i.e., the log-odds (or logits) as defined by the left side of
Equation 1.

350 | I S I W S C 2 0 1 9

358 359 360 361 362 363 364 365 366 367 368