[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: test misuse: Here we go again


  • Subject: Re: test misuse: Here we go again
  • From: Art Burke <aburke@VANSD.ORG>
  • Date: Fri, 7 Sep 2001 08:50:54 -0700
  • Reply-to: Assessment Reform Network Mailing List <ARN-L@LISTS.CUA.EDU>
  • Sender: Assessment Reform Network Mailing List <ARN-L@LISTS.CUA.EDU>

Mickey ... Certainly the cut scores are an essential feature of the testing program, but it's not what I was asking about. The original writer said that the change in cut scores was evidence the state misused the tests because the change violated the testing contract. That makes no sense to me - it's like saying the state misused the test because it ordered blue covers in the contract and then changed them to green.

Art

>>> PAVURSOL@AOL.COM 09/06 8:41 PM >>>
In a message dated 9/6/2001 7:19:11 PM Eastern Daylight Time,
aburke@VANSD.ORG writes:


> . You seem to be saying that Virginia contracted with HEM to have HEM set
> cut scores but then Virginia decided to set different cut scores. What's
> wrong with that?
>
>

I have pasted our synopsis of the cut score setting process below.
(Roxie, can you believe I could lay hands on it so easily!)
Mickey
Parents Across Virginia United to Reform SOLs

SOL Test Cut Score Problems

The passing scores ("cut scores") for the SOL tests were not set according to
preferred professional practice given the standard (score) setting method
used. As a result, the scores are unrealistically high and there is great
danger of misclassifying the many students who score very near the passing
mark. (Lawrence Cross, "Are Virginia's Public Schools Failing? Assessing the
Assessments," Virginia Issues and Answers, Vol. 6, No.1 (Spring 1999)). Dr.
Cross is a professor of educational research and evaluation at Virginia Tech
who has used the score-setting method in question himself).

The teachers and parents on the Standard Setting Committees for the SOL
tests, who were trained for the task, recommended passing scores for each
test based on their judgments about what "barely proficient" students should
know, as the "Modified Angoff " method provides. According to established
practice for this method, the committees should have been given information
about item difficulty, i.e. actual test results, to help them make reasonable
and realistic judgments. In addition, the committee members' individual
recommendations should have been averaged so that the adopted cut scores
would reflect their collective judgment, not that of an individual member or
two. [Cross et al. References available on request.] It is also crucial to
consider measurement error in setting scores and classifications (i.e.,
proficient, advanced, or failure) based on these scores. These steps were
not followed in setting the SOL test cut scores.

Harcourt Brace had planned to have the committees provided with actual test
performance data, as shown in the test contract. However, the committees were
purposely not given any such information, as noted in several Department and
Board statements and informational materials, because the Board wanted
"honest" scores. (Several committee members noted on their evaluation forms
that they had felt the need for this, to have a sense of the questions'
difficulty for the students at the grade levels in question). The Board also
should have considered such information and the various consequences of
setting scores at particular levels, but it did not.

Further preferred practice called for the Board to choose scores for each
test that represented the average of each committee's individual
recommendations, or scores a bit below that in order to take measurement
error into account. [Cross et al.] Several committee members and most of the
committee reports stressed the importance of the Board's choosing scores from
the mid-range of their proposed scores, i.e. scores representing the majority
judgment, not from the extremes.

However, the Board ignored the judgments of the vast majority of the members
of the eight committee, apparently believing that they knew better "what each
child should know" in each subject at each grade level. The Board chose
scores in almost every case that had been recommended by only one or two of
the 18 - 21 members of each committee. They chose scores at the top of the
range in all but two cases. In those two cases, the scores were higher than
anyone recommended. Only 9% of the 507 scores recommended by the judges were
as high as the cut score chosen by the Board. Yet the Board had said it would
give "great deference" to those "best qualified to make that determination:
the classroom teachers, principals, curriculum specialists and parents who
have spent this past summer working hard to make test score recommendations."
(Board of Education Statement: Setting Passing Scores on the SOL Tests:
Principles and Considerations," Sept. 29, 1998).

As Professor Cross, who has studied the scores and who has used the same
standard-setting procedure for the National Teacher Exam, has said, "In their
zeal to have rigorous standards, the politically appointed....Board...set the
benchmark for 'proficient' at the upper score ranges suggested...for nearly
every SOL test. The...recommendations for 'advanced' standing were largely
ignored...It is not surprising, therefore, that the percentage...classified
as 'proficient' is low or that the percentage classified as 'advanced'
approaches zero for most tests." (Virginia Issues and Answers, Vol. 6, No.1,
p. 5).

Finally, measurement error was not sufficiently taken into account in
setting the SOL test scores. In addition to the fact that the scores
themselves were set at levels that do not allow for measurement error, the
Department's report on validity and reliability for the 1998 tests "...does
not address the reliability of the classification decisions. Even if the
scores have respectable reliability and the benchmarks were divinely
inspired, the potential for misclassifying students scoring near the
benchmarks is great." (Va. Issues and Answers, p.5). In other words, the
number of students misclassified is likely to be very large, because of
measurement error and the fact that the cut score for most of the tests is
near the middle of the pack, meaning many students' scores will fall very
close to but just miss it.

--------------------------------------------------------------------------
To unsubscribe from the ARN-L list, send command SIGNOFF ARN-L
to LISTSERV@LISTS.CUA.EDU.


Post a Message to arn-l:

Your name:

Your email address: (use the exact address you are subscribed with)

Subject line:

Message: