[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: potentially dumb question re: psychometrics


  • Subject: Re: potentially dumb question re: psychometrics
  • From: George Cunningham <gkc@LOUISVILLE.EDU>
  • Date: Sun, 31 Mar 2002 22:21:36 -0500
  • Reply-to: Assessment Reform Network Mailing List <ARN-L@LISTS.CUA.EDU>
  • Sender: Assessment Reform Network Mailing List <ARN-L@LISTS.CUA.EDU>

Bill,

There are two approaches to determining whether items are biased against
particular groups. Since being able to assert that a test is unbiased is
good for marketing, proposals from publishers tend to be replete with this
information. Publishers usually use both. First, the set up panels made up
of a diversity of members to identify items that appear unfair or biased.
Second, they conduct DIF (differential item functioning) studies. Basically
what this does is determine whether items are more difficult for specific
groups after equating their overall performance. Because of the detailed
item statistics available with IRT, it is possible to conduct more sensitive
DIFs. There are other methods of determining DIFs without using IRT. The
problem with these two approaches is that they tend to not be in synch.
Items identified as biased often do not garner different DIF scores and
items with different DIF scores tend to not always have any obvious
characteristics that suggest why they are biased. Publishers tend to
eliminate any items about which there is any doubt. Still items slip
through that don't seem legit.

George Cunningham
University of Louisville
----- Original Message -----
From: William Cala <wcala@ROCHESTER.RR.COM>
To: <ARN-L@LISTS.CUA.EDU>
Sent: Sunday, March 31, 2002 9:11 PM
Subject: Re: potentially dumb question re: psychometrics


> George,
>
> Remembering that I'm not a psyhometrician, how does IRT detect bias?
>
> BC
> ----- Original Message -----
> From: "George Cunningham" <gkc@LOUISVILLE.EDU>
> To: <ARN-L@listsrva.CUA.EDU>
> Sent: Sunday, March 31, 2002 8:12 PM
> Subject: Re: potentially dumb question re: psychometrics
>
>
> > Teresa,
> >
> > See below.
> >
> > George Cunningham
> > University of Louisville
> > ----- Original Message -----
> > From: Teresa or J Glenn <jtglenn@CAROLINA.NET>
> > To: <ARN-L@LISTS.CUA.EDU>
> > Sent: Sunday, March 31, 2002 6:45 PM
> > Subject: potentially dumb question re: psychometrics
> >
> >
> > > Don't ask why but I am reading the technical manual for the NC EOGs
this
> > > weekend... I really don't have anything else to do, what with *only*
30
> > take
> > > home essay tests to grade and an entire poetry unit to plan and all
> > > that........
> > >
> > > At any rate, here I am, a minimally mathematically inclined English
> > teacher
> > > trying to remember what I can of college calculus to understand this
> > manual,
> > > and I have several questions.
> > >
> > > (1) What are the relative merits of "classic measurement analyses"
and
> > > "item response theory analyses"? I understand what they *are*
(there's
> an
> > > okay explanation in the manual after I got past the "point-biserial
> > > correlation" part)-- but what can each one *do* and what *can't* they
> do?
> >
> > Item response theory is very complicated, but at its simplest level it
> more
> > complex because instead of having each correct item count as one point,
as
> > is usually done, with IRT items are given a range of values depending
on
> > their item characteristics. This is very useful because it can be used
to
> > equate tests horizontally so that the scores from two different forms of
a
> > test mean the same thing or different forms in succeeding years can be
> > equated to have the same meaning. It also facilitates other tasks such
as
> > conducting DIF analyses to determine whether any items are biased.
There
> > are two distinct schools of thought about the correct way of conducting
> IRT.
> > First, there is the one parameter model or Rasch method, which is used
for
> > example by Harcourt-Brace; second, the three-parameter model favored by
> > McGraw-Hill and ETS. If McGraw-Hill is doing your tests they are using
> the
> > 3-parameter model, which I think is the better of the two. Riverside
does
> > not use IRT for the ITBS. Each model is based on a separate theory of
> test
> > characteristics and the debate about the two can be as acrimonious as
any
> > other religious debate.
> >
> > > (2) Explain p-value again. I know we've done it before, but humor
me.
> > Is
> > > a p-value of .83 good? How about .380? Those are the highest and
> lowest
> > > p-values listed for the math/ reading EOGs.
> >
> > The p-value is the proportion of students correctly responding to an
item.
> > The goal of publishers of norm-rferenced tests is high coefficient alpha
> > reliability. This is achieved with high item-total correlations (a more
> > correct term than point-biserial correlations. Publishers prefer the
> latter
> > for some reason but they are algebraically equivalent). The magnitude
of
> a
> > correlation is a function of variability and variability can be
maximized
> by
> > the correct p-value. There is a formula for determinng this, but for a
> > four-choice multiple choice test, the ideal is an average p-value of
> 62.5.
> > As as has been discussed here at length, this only applies to NRTs.
This
> > does not mean that every item has to have this p-value, but they need to
> > average this and not be too much higher or lower.
> >
> > > (3) There are several pages purporting to be an "evaluation" of the
> > > "validity" of the tests. These are ratings done by reviewers who,
accd.
> > to
> > > this manual, first took the tests themselves, then commented on them.
> > They
> > > rated the tests on a scale of 1-4, 4 being a superior degree. The
> > > parameters measured were: curricular validity, instructional validity,
> > item
> > > quality, test item bias, and one best answer. This isn't any kind of
> > > objective test of validity, is it? On top of that,the highest rating
> > *any*
> > > test in grades 3-8 got on any of the parameters was 3.3 and that was
> grade
> > 3
> > > curricular validity. Grade 4 item quality got a 1.9 rating. But
> there's
> > no
> > > "scientific" premise behind having groups of reviewers rate the tests,
> are
> > > there?
> >
> > This is validity only in the fervid imagination of the publisher.
> Opinions
> > would hardly be considered scientific evidence for validity.
> >
> > > (4) There are frequency charts, page after page of them, on the back
of
> > the
> > > book. None of them look like a bell curve. The grade 8 reading EOG
> > ranges
> > > from scores of 138- 184. Just visually looking at the graph, most of
> the
> > > scores fall at 160 and below-- with a huge spike at 164 (164 has a
> > frequency
> > > of 4500 (N=80833) and its neighbors 163/ freq= 3600, 165/ freq= 2200
are
> > > much lower). What's up with this? The lowest scores have higher
> > > frequencies than the upper scores-- 138/ freq=500, 140/ freq=1000,
142/
> > > freq= 900 ---------- 180/ freq= 200, 182 & 184/ freq= less than 100.
> > What's
> > > up with this?
> >
> > It would appear that they did not achieve the goal of a bell shaped
curve.
> > The distribution is skewed either because the sample is skewed or there
is
> > something strange going on with the test. This would appear to be a
> > difficult test for the sample.
> >
> > > TIA. I know it's like talking to a wall with me and complicated
numbers
> > > sometimes, so I appreciate any help any of you mathematically inclined
> > folks
> > > out there could give me. These items are bugging me, but they may not
> be
> > > issues at all... they stood out to me as odd tidbits in the manual.
> Just
> > > keep it simple for the simpleminded. ;)
> > >
> > > Teresa G.
> > >
> >
>
> --------------------------------------------------------------------------
> > > To unsubscribe from the ARN-L list, send command SIGNOFF ARN-L
> > > to LISTSERV@LISTS.CUA.EDU.
> > >
> >
>
> --------------------------------------------------------------------------
> > To unsubscribe from the ARN-L list, send command SIGNOFF ARN-L
> > to LISTSERV@LISTS.CUA.EDU.
> >
>
> --------------------------------------------------------------------------
> To unsubscribe from the ARN-L list, send command SIGNOFF ARN-L
> to LISTSERV@LISTS.CUA.EDU.

--------------------------------------------------------------------------
To unsubscribe from the ARN-L list, send command SIGNOFF ARN-L
to LISTSERV@LISTS.CUA.EDU.


Post a Message to arn-l:

Your name:

Your email address: (use the exact address you are subscribed with)

Subject line:

Message: