[
Author Prev][Author Next][
Thread Prev][
Thread Next][
Author Index][
Thread Index]
Re: potentially dumb question re: psychometrics
- Subject: Re: potentially dumb question re: psychometrics
- From: William Cala <wcala@ROCHESTER.RR.COM>
- Date: Sun, 31 Mar 2002 21:11:38 -0500
- Reply-to: Assessment Reform Network Mailing List <ARN-L@LISTS.CUA.EDU>
- Sender: Assessment Reform Network Mailing List <ARN-L@LISTS.CUA.EDU>
George,
Remembering that I'm not a psyhometrician, how does IRT detect bias?
BC
----- Original Message -----
From: "George Cunningham" <gkc@LOUISVILLE.EDU>
To: <ARN-L@listsrva.CUA.EDU>
Sent: Sunday, March 31, 2002 8:12 PM
Subject: Re: potentially dumb question re: psychometrics
> Teresa,
>
> See below.
>
> George Cunningham
> University of Louisville
> ----- Original Message -----
> From: Teresa or J Glenn <jtglenn@CAROLINA.NET>
> To: <ARN-L@LISTS.CUA.EDU>
> Sent: Sunday, March 31, 2002 6:45 PM
> Subject: potentially dumb question re: psychometrics
>
>
> > Don't ask why but I am reading the technical manual for the NC EOGs this
> > weekend... I really don't have anything else to do, what with *only* 30
> take
> > home essay tests to grade and an entire poetry unit to plan and all
> > that........
> >
> > At any rate, here I am, a minimally mathematically inclined English
> teacher
> > trying to remember what I can of college calculus to understand this
> manual,
> > and I have several questions.
> >
> > (1) What are the relative merits of "classic measurement analyses" and
> > "item response theory analyses"? I understand what they *are* (there's
an
> > okay explanation in the manual after I got past the "point-biserial
> > correlation" part)-- but what can each one *do* and what *can't* they
do?
>
> Item response theory is very complicated, but at its simplest level it
more
> complex because instead of having each correct item count as one point, as
> is usually done, with IRT items are given a range of values depending on
> their item characteristics. This is very useful because it can be used to
> equate tests horizontally so that the scores from two different forms of a
> test mean the same thing or different forms in succeeding years can be
> equated to have the same meaning. It also facilitates other tasks such as
> conducting DIF analyses to determine whether any items are biased. There
> are two distinct schools of thought about the correct way of conducting
IRT.
> First, there is the one parameter model or Rasch method, which is used for
> example by Harcourt-Brace; second, the three-parameter model favored by
> McGraw-Hill and ETS. If McGraw-Hill is doing your tests they are using
the
> 3-parameter model, which I think is the better of the two. Riverside does
> not use IRT for the ITBS. Each model is based on a separate theory of
test
> characteristics and the debate about the two can be as acrimonious as any
> other religious debate.
>
> > (2) Explain p-value again. I know we've done it before, but humor me.
> Is
> > a p-value of .83 good? How about .380? Those are the highest and
lowest
> > p-values listed for the math/ reading EOGs.
>
> The p-value is the proportion of students correctly responding to an item.
> The goal of publishers of norm-rferenced tests is high coefficient alpha
> reliability. This is achieved with high item-total correlations (a more
> correct term than point-biserial correlations. Publishers prefer the
latter
> for some reason but they are algebraically equivalent). The magnitude of
a
> correlation is a function of variability and variability can be maximized
by
> the correct p-value. There is a formula for determinng this, but for a
> four-choice multiple choice test, the ideal is an average p-value of
62.5.
> As as has been discussed here at length, this only applies to NRTs. This
> does not mean that every item has to have this p-value, but they need to
> average this and not be too much higher or lower.
>
> > (3) There are several pages purporting to be an "evaluation" of the
> > "validity" of the tests. These are ratings done by reviewers who, accd.
> to
> > this manual, first took the tests themselves, then commented on them.
> They
> > rated the tests on a scale of 1-4, 4 being a superior degree. The
> > parameters measured were: curricular validity, instructional validity,
> item
> > quality, test item bias, and one best answer. This isn't any kind of
> > objective test of validity, is it? On top of that,the highest rating
> *any*
> > test in grades 3-8 got on any of the parameters was 3.3 and that was
grade
> 3
> > curricular validity. Grade 4 item quality got a 1.9 rating. But
there's
> no
> > "scientific" premise behind having groups of reviewers rate the tests,
are
> > there?
>
> This is validity only in the fervid imagination of the publisher.
Opinions
> would hardly be considered scientific evidence for validity.
>
> > (4) There are frequency charts, page after page of them, on the back of
> the
> > book. None of them look like a bell curve. The grade 8 reading EOG
> ranges
> > from scores of 138- 184. Just visually looking at the graph, most of
the
> > scores fall at 160 and below-- with a huge spike at 164 (164 has a
> frequency
> > of 4500 (N=80833) and its neighbors 163/ freq= 3600, 165/ freq= 2200 are
> > much lower). What's up with this? The lowest scores have higher
> > frequencies than the upper scores-- 138/ freq=500, 140/ freq=1000, 142/
> > freq= 900 ---------- 180/ freq= 200, 182 & 184/ freq= less than 100.
> What's
> > up with this?
>
> It would appear that they did not achieve the goal of a bell shaped curve.
> The distribution is skewed either because the sample is skewed or there is
> something strange going on with the test. This would appear to be a
> difficult test for the sample.
>
> > TIA. I know it's like talking to a wall with me and complicated numbers
> > sometimes, so I appreciate any help any of you mathematically inclined
> folks
> > out there could give me. These items are bugging me, but they may not
be
> > issues at all... they stood out to me as odd tidbits in the manual.
Just
> > keep it simple for the simpleminded. ;)
> >
> > Teresa G.
> >
>
> --------------------------------------------------------------------------
> > To unsubscribe from the ARN-L list, send command SIGNOFF ARN-L
> > to LISTSERV@LISTS.CUA.EDU.
> >
>
> --------------------------------------------------------------------------
> To unsubscribe from the ARN-L list, send command SIGNOFF ARN-L
> to LISTSERV@LISTS.CUA.EDU.
>
--------------------------------------------------------------------------
To unsubscribe from the ARN-L list, send command SIGNOFF ARN-L
to LISTSERV@LISTS.CUA.EDU.
Post a Message to arn-l: