Re: Norm-Referenced Tests
- Subject: Re: Norm-Referenced Tests
- From: "George K. Cunningham" <gkc@LOUISVILLE.EDU>
- Date: Sun, 7 Nov 1999 15:11:24 -0500
- Reply-to: Assessment Reform Network Mailing List <ARN-L@LISTS.CUA.EDU>
- Sender: Assessment Reform Network Mailing List <ARN-L@LISTS.CUA.EDU>
See comments below.
George K. Cunningham
University of Louisville
----- Original Message -----
From: Sheridan, George <gsheridan@BOMUSD.EDCOE.K12.CA.US>
Sent: Friday, November 05, 1999 1:48 PM
Subject: Re: Norm-Referenced Tests
> I remember these three points and have shared them with others on a number
> of occasions. In this post, I referred only to the second point: "[I]tems
> which actually do cover what teachers think is important are tossed out
> from the test. So this part of teaching effectiveness becomes invisible
> because the items that measure it never make it into the final, published
> test." I did not recall Dr. Popham having discussed the 60-40 or 70-30
> correct rates.
First of all, NRT are not criterion-referenced tests by definition. They
are therefore not intended to measure exactly what students know or exactly
what is taught. Criterion-referenced tests claim to do this. NRT are
intended to tell us how students compare to one another inferentially.
On NRT tests, for every subject assessed, a large universe of items is
assumed. It contains
every item that could be asked. It is called the domain of observables.
It is assumed that items are randomly selected from this domain or
selected in such a way that they behave as though they were. The score of
a student on this sample from the domain tells us how well the student would
have done if they had been asked every item from the entire domain of
Previous discussions of difficulty have been misdirected because they
presume that items
have an inherent level of difficulty. Multiple-choice items
don't have this quality. The difficulty of multiple-choice items is easily
and intentionally manipulated by altering the distractors--the incorrect
choices. If the distractors are similar to the correct answer, the item
becomes difficult. If they are changed to make them different, the item
becomes easier. The previous discussions referred to items that only 10
percent of students got correct as being too difficult and in need of
being removed. Instead they would be made more difficult by manipulating
NRT is the best technology for assessing students we have under the
condition that we want to rank order students according to their ability in
a given domain. It tells us nothing about what a student actually knows.
Therefore, it can tell us with great fidelity, which students know the most
and which know the least about the topic, but it will tell us nothing about
what a student knows. If the test is intended to determine which schools,
districts, teachers, or students are doing best and which are doing poorly,
an NRT test should be used.
The ideal difficulty for a test is not 50 percent by the way. It is 62.5
percent for a M-C test with four responses.
This percentage results in the maximum variability of scores which in turn
leads to the highest reliability.
In reference to James Popham. I know him and I cite his writing extensively
in my books. He is not a good source of information about NRT. He is
famous, an expert, in some ways a founder of criterion-referenced testing.
As such he is the sworn enemy of NRT and spend his career fighting against
them. I know there is the saying that the the enemy of my enemy is my
friend. Since this list is full of individuals who despise standardized,
NRT, and M-C tests you might consider Popham an ally but CRT is different
from NRT in ways that should make them more inapprortpriate for members of
this list. In other words, if you dislike NRT for the reasons so often
cited here, you would dislike CRT even more and you would find little in
common with Popham.
You also wrote:
> >I don't know why you think an item that only 10% of the students answer
> >useful on an NRT.
> My statement was that I understood you to have said this, based upon the
> following passage from your previous post:
> >>...let's ask questions that we expect 90% of people to
> >>know; questions that we expect only 10% of people to know: and questions
> >>that 50% of the people know (the typical value on an NRT).
> As a Californian, one significanat aspect of all the above is that
> norm-referenced tests like SAT-9 are a particularly poor way to measure
> achievement of state-adopted standards. They are, however, the sole method
> currently approved by our legislature and governor (with a high school
> exam, yet to be designed, scheduled to come on line in the immediate
> future). NRTs are currently being used to measure individual achievement,
> determine promotion, and assess school quality.
> George Sheridan
> Black Oak Mine Unified School District
> Northside School
> P.0. Box 217
> Cool, CA 95614
> (530) 333 4506
> Hope is ... not the conviction that something will turn out well, but the
> certainty that something makes sense, regardless of how it turns out.
> Vaclav Havel
> To unsubscribe from the ARN-L list, send command SIGNOFF ARN-L
> to LISTSERV@LISTS.CUA.EDU.
To unsubscribe from the ARN-L list, send command SIGNOFF ARN-L
Post a Message to arn-l: