Good Column on Flaws in Bush Testing Scheme

New York Times -- January 24, 2001
by Richard Rothstein

The big idea in President Bush's education plan, sent to Congress
yesterday, is to hold schools with poor children accountable, using an
annual test. Pupils at schools with two years of inadequate test-score
gains can transfer to another public school. After a third year of
little progress, they can use public money at private schools.
It seems reasonable to have tests identify schools that don't
improve. But the president may put too much faith in scores that are
less accurate than he thinks.
Under President Bush's proposal, commonplace measurement error could
cause states to identify as "failing" some schools that don't deserve
the label.
This column previously described the pitfalls of using one test to
evaluate a student. Because any day's score might differ from a
student's average, or true, score if the test were taken many times, one
test should not decide promotion. Judging a student by one test is like
judging a baseball player by one day's batting average.
It may seem that these problems don't apply schoolwide because when
some students have good days, others have bad ones. If these average
out, accountability using a test should work.
But even a school average can wobble around its true value, so
sanctions based on annual score changes run a risk of unfairness.
Thomas J. Kane, an economist at the Hoover Institution in Palo Alto,
Calif., and Douglas O. Staiger, an economics professor at Dartmouth
College, have studied the accuracy of North Carolina's tests. When
schools there have above-average score gains, teachers get bonuses.
But Dr. Kane says even tiny sampling errors can keep scores from
reflecting true performance. Consider how teachers think of their
classes as "good" or "bad" in any year. A less- able teacher could
produce greater score gains in a class with better pupils than a
more-able teacher could produce with worse students -- even if the
classes are demographically identical.
Dr. Kane says that in typical North Carolina elementary schools,
nearly one-third of the variance in school reading gains is a result of
this "luck of the draw," that is, whether this year's seemingly
identical students are easier to teach than last year's.
This affects small schools more, Dr. Kane says, because a few
extreme scores can more easily distort an average. Thus, small schools
with big gains may seem more effective. But small schools also more
often post tiny gains or even losses. A small school's results may have
more to do with sampling error than school quality.
Also, even at large schools, a rainy day or other random events may
change children's dispositions. This can affect a school's rank: nearly
another third of the variance in score gains is linked to the fact that
a school's average can vary from one day to another.
North Carolina solved one problem while creating another. Like the
Bush plan, the state's program judges schools by annual gains, not
absolute scores, to avoid rewarding schools that test well only because
they have privileged students. But sampling and random events affect
both earlier and later scores, compounding the inaccuracy.
David Rogosa, a professor of education statistics at Stanford
University, is analyzing error rates in California, where teachers at
schools with rising scores will soon receive bonuses as high as $25,000
Professor Rogosa estimated that if awards were based on school
averages alone, over one-fourth of schools with no gains would still
But California avoids this problem by insisting that schools succeed
for low-income students and each minority group, as well as schoolwide.
This reduces the chances of undeserved awards, because simultaneous
false gains in each group are unlikely.
But the reverse is also true. Schools deserving rewards will be more
likely to lose them because if any group fails as a result of random
events or sampling error, the school will be disqualified. Diverse
schools will fail more often; they have more subgroups where false
declines can occur.
President Bush wants to hold schools accountable for gains
schoolwide as well as for disadvantaged students. So if either the
school or the disadvantaged group posts a false decline, the school
could be wrongly labeled failing. But the president's requirement that
sanctions follow two successive years of failure is a partial safeguard
against measurement error.
The administration's proposal is intended to spur achievement.
Schools showing adequate progress will be encouraged to continue their
practices; failing schools should change their ways.
But if the wrong schools are sanctioned, what message do we send?
Will unfairly sanctioned schools drop methods that work? Will
ineffective schools get rewards and continue poor teaching?
Surely we can find better ways to measure schools than relying on an
annual test.

