[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: Sensitivity (was Re: In response to Judi's statement to Deanna) -- longish
- Subject: Re: Sensitivity (was Re: In response to Judi's statement to Deanna) -- longish
- From: Judi Hirsch <judih@OUSD.K12.CA.US>
- Date: Thu, 18 Feb 1999 15:43:16 -0800
- Reply-to: Assessment Reform Network Mailing List <ARN-L@LISTS.CUA.EDU>
- Sender: Assessment Reform Network Mailing List <ARN-L@LISTS.CUA.EDU>
Regarding diversity of panels, it is clear from your answer that
all--or at least most--of the panelists come from within the system, NOT a
very fair wasy of deciding if there is bias. Sounds like sending foxes to
guard the chicken coop, which brings me back to one of my first points,
i.e., that if we look at who succeeds at these tests and we look at who
makes them and sells them and advocates for them, we are basically looking
at the same group!
At 01:34 AM 2/18/99 EST, you wrote:
>The comment regarding that the "review panels [be] as diverse as possible" was
>made by Kristen Huff and perhaps she will respond directly. I elaborated on
>her comment going into detail about the difference between an item that has
>been flagged statistically as being possibly biased and an item that is
>actually biased. There is a difference and this point has confused many
>Regarding the composition of a sensitivity committee, in many cases there are
>actual at least two such committees--sometimes with the same members and at
>other times with different members--each with a slightly different focus. The
>first committee reviews items visually for possible bias BEFORE the items are
>piloted or field tested. The second committee reviews the items AFTER data
>has been collected on the items specially looking at those items that were
>flagged as possibly being biased. This committee review the data and makes a
>determination as to whether the items is acceptable as is, acceptable as
>revised or should be discarded.
>To keep the committee at a size that is not too cumbersome to complete this
>task, it is generally suggested that this committee not exceed 20 persons
>(10-12 is preferred). This committee includes individuals that are
>specifically trained to conduct sensitivity reviews (testing experts,
>measurement specialsists/psychometricians, those familiar with multicultural
>and issues relating to persons with disabilities, etc.). These individuals
>have extensive training and experience -- thus it is not necessary to have
>EVERY religious or ethnic group represented on the committee but diversity is
>the goal. Thus the committee does include individuals that is as diverse as
>possible with respect to gender, ethnicity, geographical location, etc.) If
>each of these was represented the committee would be huge! The actual
>composition will vary somewhat depending on whether the test is national,
>state or local. Some states require that this committee also include teachers
>(who often do not have the necessary training to do sensitivity reviews) so we
>suggest that the committee have no more than 20 percent of the members be
>teachers. The teachers are provided with basic training which is supplemented
>by the other 80 percent who have extensive training as indicated above. It is
>felt that the sensitivity and technical advisory committees require very
>specific training and experience and that while students, parents, and others
>should be involved in the test development process, this role can best be met
>through other committees (content committee and test specifications
>committees). There is also another committee which reviews specific issue
>related to disabilities and ESL. So it is not always the case that these
>constituencies are represented on a sensitivity committee.
>I hope this clarifies things a bit and I do encourage others from the testing
>industry to share their experiences and practices (especially those that that
>have achieved the greatest degree of diversity). It may well be true that we
>still do not have a committee that is as diverse as it could be and I am open
>to any suggestions that Judi may wish to offer that recognizes the need to
>ensure that the committee is not too large. I do think that if one is aware of
>the many committees involved that one would easily see that we have made the
>effort to ensure that committee membership is as diverse as possible.
>Deanna M. De'Liberto, President/Director of Assessment
>D Squared Assessments, Inc.
>(Specialists in Test Development/Validation and Test Administration)
>9 Bedle Road, Suite 250
>Hazlet, NJ 07730-1209
>Phone: (732) 888-9339
>Member of the Association of Test Publishers
>In a message dated 2/17/99 10:03:38 PM Eastern Standard Time,
>> Dear Deanna,
>> I wonder about the fact that you keep mentioning again and again,
>> about there being "review panels as diverse as possible." Do you mean they
>> actually have non-college graduates, perhaps even non-high school grads,
>> homeless people, disabled people (with a variety of disabilities), high
>> school students, people living in poverty, those who don't speak English as
>> their first language? Maybe I'm naive, but I don't believe this is the
>> however, I would like to read a description of what you (or they) consider
>> At 07:40 PM 2/17/99 EST, you wrote:
>> >In a message dated 2/17/99 6:34:18 PM Eastern Standard Time,
>> >> >Another
>> >> >comment: there are people who actually acat as mediators >rather than
>> >> lawyers
>> >> >in order to help people resolve their differences--including >getting
>> >> >peaceable divorce--without getting rich themselves.
>> >> Yes, you're right. And there are test developers who are not rich white
>> >> whose main purpose is to put certain sectors of students at a
>> >> with misused tests.
>> >Great point Kristen!
>> >> >Finally, the bit about
>> >> >rich white men simply reflects the current reality. As a matter of >
>> >> you
>> >> >might want to look at an old (say, 1960's) Stanford Binet IQ >test,
>> >> one
>> >> >of the questions was "which one is prettier," and the students >had to
>> >> choose
>> >> >between a Caucasian and an African American.
>> >> An example from the 1960s is not "current reality".
>> >Another great point Kristen!
>> >> >There is a lot of cultural and
>> >> >class bias in standardized tests,
>> >> Let's talk more about this. Items go through so many reviews by panels
>> >> are as diverse as possible. Test developers are aware that an item
>> >> a city-dweller may be testing the city-knowledge of a rural student
>> >> a rural student may not know what a "city block" is), etc. I wouldn't
>> >> that there's "a lot of bias", and I know you'll disagree with me. But,
>> >> said-- let's exchange what we each know/feel about the subject and maybe
>> >> can each learn something.
>> >This is a topic near and dear to my heart after working at ETS on the SAT
>> >exam. So as not to in any way jeopardize test security by discussing the
>> >specific items that I had reviewed, I will make up my own examples of some
>> >these issues as a result of the experience I have gleaned over the years.
>> >Let's suppose we have a mathematics test item on topic X (X could be any
>> >in the math curriculum) and upon reviewing items statistically it is
>> >discovered that some group of equal ability as the reference group
>> >significantly worse (typically this would be greater than 1.5 standard
>> >deviations) than the reference group. Now at this point the test
>> >are ready to flag (and advocate eliminating) this item from future use.
>> >as G. Camilli and other point out the item may NOT really be biased. All
>> >know at this point is that statistically there is a difference (hence the
>> >DIF) in the performance of two groups we have reason to believe are equal
>> >ability for this item. If this difference is A RESULT of some cultural,
>> >ethnic or gender bias, then the item should be discarded. But the item is
>> >measuring some difference relating to the content being measured by the
>> >then by no means should this item be discarded. So how can we tell which
>> >it is? There are several ways---a sensitivity review is often conducted
>> >committee of individuals specifically trained to do sensitivity reviews (
>> >members represent various ethnic groups and geographical locations). In
>> >addition, we sometimes ASK the students themselves to explain how they
>> >the item. This is very revealing!
>> >When I was at ETS, they seemed to prefer just discarding any item with a
>> >of more than 1.5 standard deviations and it became apparent to me that
>> >item on that topic was being flagged and discarded. Well, this prompted
>> >attention because clearly by rejecting all of the items for this topic, we
>> >would in effect not be measuring this topic which were required by our
>> >specifications. I felt strongly as I do now that test specifications
>> >not be changed without serious discussion with the test development
>> >This was a real dilemma back then but I do think that we are taking
>> >precautions to ensure that test items are not biased. Of course some do
>> >by but as I have stated many times, EVERY TEST HAS SOME MEASUREMENT ERROR
>> >we are not ever going to eliminate this error. The best we can do is to
>> >reduce this error as much as possible.
>> >> >and athe question for you is what are you
>> >> >going to do about it? You can support them or you can try to >change
>> >> >It's up to you.
>> >> Thanks for the inspirational closing.
>> >> Cheers,
>> >> Kristen
>> >The closing was really good. My answer is to educate anyone and everyone
>> >will listen about the positives and negatives of testing and to bring an
>> >awareness of these issues so that we can do an even better job!
>> >Deanna M. De'Liberto, President/Director of Assessment
>> >D Squared Assessments, Inc.
>> >(Specialists in Test Development/Validation and Test Administration)
>> >9 Bedle Road, Suite 250
>> >Hazlet, NJ 07730-1209
>> >Phone: (732) 888-9339
>> >Email: Ddeliberto@aol.com
>> >Web: http://www.quikpage.com/D/dsquared
>> >Member of the Association of Test Publishers
>To unsubscribe from the ARN-L list, send command SIGNOFF ARN-L
To unsubscribe from the ARN-L list, send command SIGNOFF ARN-L