Fewer states are using student test scores to evaluate teachers, according to a report released Tuesday by the National Council on Teacher Quality. As of this year, 34 states require scores to be used in teacher evaluations, down from a high of 43 in 2015.
The decline illustrates the continued retreat of an idea that took education policy by storm during the first half of the decade, but proved divisive and difficult to implement.
The push to remake teacher evaluations was jump-started by the Obama administration’s Race to the Top competition, which offered a chance at federal dollars to states that enacted favored policies — including linking teacher evaluation to student test scores. This came on the heels of an influential report, “The Widget Effect,” which concluded that teacher evaluations in many districts were perfunctory and nearly always resulted in a satisfactory rating.
More accurately distinguishing between effective and ineffective teachers, many argued, was critical to making sure good teachers stayed in the profession and bad ones left.
Philanthropies — most notably the Bill and Melinda Gates Foundation — provided support for a constellation of groups pushing these ideas. (Gates is also a funder of Chalkbeat.)
The result? Between 2009 and 2013, the number of states requiring test scores to be used in teacher evaluations spiked from 15 to 41, including Washington D.C.
“In order to comply, in order for the chance of funds, there it was,” said Grace Leavitt, the president of the Maine Education Association, referring to the inclusion of test scores in teacher evaluations. The union, she said, has been “trying to get rid of it ever since.”
States that complied with federal urging to overhaul their evaluation systems struggled with exactly how to measure teachers’ performance. Classroom observations were usually the biggest factor, with tests playing a key role. But since many teachers do not have a standardized test corresponding to their grade and subject, some districts created new tests or had teachers create their own, raising concerns about overtesting.
In other instances, teachers were evaluated in part by student performance in subjects they didn’t teach — the situation for half of New York City teachers in 2016.
In many states, the new evaluations debuted just as new academic standards and tests were being implemented, frustrating teachers and their unions who felt they were being held accountable for unfamiliar material without adequate training. Teachers filed lawsuits against the new provisions with some success. Testing concerns caused a jump in the number of students who declined to take the state test, particularly in New York, where one in five students statewide opted out of the state exam in 2015.
The backlash culminated with the 2015 passage of the Every Student Succeeds Act, which explicitly bars future secretaries of education from doing what Obama’s Education Secretary Arne Duncan did — trying to influence how teachers are evaluated. On this, teachers unions on the left and skeptics of federal involvement in education on the right were in rare agreement.
Today, most states still require test scores in teacher evaluations. But some states have dropped test scores and rolled back the Obama-era policies in other ways. Five fewer states require teachers to be evaluated annually than did in 2015, for example, bringing the total down to 22.
“As swiftly as states moved to make these changes, many of them have made a hasty retreat,” the NCTQ report says.
Maine is one of them: A new law removes a requirement that tests be used in evaluation and says that evaluation criteria will be determined by a steering committee, largely composed of teachers appointed by the local union.
NCTQ has continued to advocate for using test scores, which it refers to as “objective measures of student growth.” But many have questioned how objective such measures are, especially when they’re based on tests in other subjects or of other students.
Even sophisticated statistical efforts to isolate teachers’ contributions to student scores — known as “value-added” — have been shown to bounce around from year to year, leading some researchers to caution against their use.
Other researchers counter that “value-added” metrics offer valuable information about teacher quality, and that their flaws, while real, apply to other indicators of teacher performance too.
Research on whether the teacher evaluation changes have helped students has been mixed. A study of a Gates Foundation-funded project showed that tougher evaluations failed to yield any clear benefits in a number of states. Studies in Chicago, Cincinnati, and Washington D.C. have come to more encouraging conclusions. Other research has shown that teacher evaluation has reshaped principals’ role in schools.
Notably, the vast majority of teachers continue to be rated effective or better under the revamped evaluations, although slightly more teachers do receive subpar ratings.
NCTQ argues that the reversion away from the Obama-era evaluation changes is bad news. “Former systems … generally failed to provide the information necessary for individual teachers to improve their practice and for policymakers to make strategic personnel decisions,” the report says.
But public sentiment has continued to shift away from firing low-performing teachers to supporting teachers calling for higher salaries. Recent polling shows increased support for teacher pay increases in the wake of strikes in a number of states.
“The political landscape has been ever changing,” said Leavitt of Maine. “As it has played out, I think it’s been shown that this really doesn’t show if I’ve been teaching effectively or not.”
Kalyn Belsha and Sarah Darville contributed reporting.