Page 363 - Invited Paper Session (IPS) - Volume 2
P. 363

IPS279 Rense Lange
            service  reviews  [2].  Because  essay  writing  and  evaluation  procedures  are
            familiar to nearly everyone, I use essay grading terminology throughout the
            following.
                There already exists excellent software to estimate MFRS models and to
            assess model fit [4, 5]. However, this software is batch-oriented, for use when
            data collection is completed. Yet, in practice it is often desirable to identify
            and correct problems much earlier. For instance, in a project that required
            timely grading of some 150,000 student essays [6], MFRS was used to identify
            poorly performing graders. However, model fitting could not be completed
            before graders already had started their next working day and this greatly
            limited its’ usefulness. Similar concerns play a role in other contexts [2].
                This paper grew out of the OBJECTIVE project that aims to fit MFRS models
            online in “real-time” - at least in an essay grading context, while providing
            useful  quality  control  information.  The  current  emphasis  is  the  parameter
            estimation aspect via the PAIRS approach [7], and issues related to quality
            control and model fit are outside the scope of the current paper. The basics
            are discussed in some detail [7, 8] and this is followed by a presentation of the
            results  of  various  computer  simulations  that  are  aimed  at  discovering  the
            approach’s strength and weaknesses for applications like OBJECTIVE.

            2.  Methodology
                Essay grading is characterized by four sets of the parameters [2]: (1) Di, the
            difficulty of the item, question or problem i; (2) Tj, the ability, capability, or trait
            level of the person answering the questions; and (3) Sk, the severity of the rater
            / evaluator who assigns an (ordinal) grade to evaluate the answer or product.
            Raters’ judgments typically take the form of a rating scale, i.e., raters choose
            one option out of a set of ordered response categories. Accordingly, a fourth
            set of “step” parameters {Fw} is used to represent the points at which ratings w
            = i and w = i-1 occur with equally probability. Together, these parameters
            define the following probabilistic response model (see, e.g., [3]):

                           log(Pijkw / Pijk(w-1)) = Tj – Di – Sk + Fw,
                                  (1)

            where  Pijkw  represents  the  probability  that  we  observe  a  rating  equal  to  w
            rather than (w-1) with w = 0, 1, ... . Notice that T, D, S, and F are all expressed
            in the same metric, i.e., the log-odds (or logits) as defined by the left side of
            Equation 1.





                                                               350 | I S I   W S C   2 0 1 9
   358   359   360   361   362   363   364   365   366   367   368