NYC's evals include scoring fix that districts lacked this year

The State Education Department is hoping to mend holes in its evaluation regulations, and it’s using the evaluation plan that Commissioner John King imposed on New York City as its model.

The changes are aimed primarily at eliminating the possibility that teachers could receive final ratings that do not reflect their performance.

One issue revolves around how scores on three subcomponents of evaluations turn into a single rating. Under the state’s scoring rules, there are some scenarios where a teacher could be rated ineffective overall despite scoring “developing” or higher on each subcomponent.

A teacher needs a composite score of at least 65 out of 100 points to be rated developing or higher. But when the state set scoring ranges based on student growth measures, there were a small number of scenarios where a teacher could receive as few as six points out of 40 and still get rated developing on those subcomponents. Any point total under 59 that that teacher received on the remaining 60 points would not meet the 65-point threshold and result in an overall “ineffective.”

“They never took the time to run through all the permutations,” said Carol Burris, principal of South Side High School in Long Island, who has written about versions of the scoring quirk since the state adopted new teacher evaluation requirements in 2012.

The parties involved in negotiating those requirements — teachers unions, State Education Commissioner John King and Gov. Andrew Cuomo — have never conceded to Burris’s criticism publicly. But when King imposed an evaluation on New York City last month, he tacitly acknowledged it by using different scoring rules.

The new rules for New York City increase the range of points that teachers receive when they are rated “ineffective” on either of the two student growth components. Under the scoring system used throughout the state, teachers rated ineffective receive between 0 and 2 points. But in New York City, teachers who are rated ineffective can get up to 12 points. The result is that teachers who score “developing” in either category are unlikely to net overall “ineffective” ratings.

State officials say the changes won’t result in more teachers being rated ineffective. Instead, the changes will ensure that only teachers whose performance merits ineffective ratings will get them.

“The cut scores in force throughout the rest of the state are the problem,” UFT researcher Jackie Bennett wrote on the union’s Edwize blog. “The NYC cuts are actually the fix.”

The second set of changes has to do with the “subjective” evaluation measures, which for New York City next year are based entirely on observations. Under the scoring rules in place across the state, teachers who score higher than “ineffective” rack up 50 points or more. But in New York City, teachers who are rated ineffective get no more than 38 points.

The result is that scores of teachers with a wide range of performance on their observations would all receive similar point totals, meaning that their scores on the objective measures would make the difference in their final ratings.

“When you create such a narrow band from developing to highly effective, it means that most of the variation is going to come from the measures of student learning,” said Chief Academic Officer Shael Polakow-Suransky, who argued for more proportional ranges during state arbitration. “In those versions, it actually created more weight than the law intended on measures of student learning.”

Polakow-Suransky said it would also allow principals to have greater control over how teachers in their schools are evaluated.

“The measures of student learning should be potent,” he added, “but it shouldn’t be determinative of most of the evaluation.”

The scoring quirks that the state is addressing are products of the state’s approach to combining evaluation data into a final rating.

New York uses a “numerical” approach, which some consider easy to communicate with stakeholders. Evaluations in Washington, D.C., and Tennessee also use the approach.

“Many states and districts look at the numerical approach and consider it to be more transparent because the numbers and formulas are more clearly articulated,” said Lisa Lachlan-Haché, a researcher at the American Institute of Research who studied the subject for a white paper she co-wrote last year. New York hired AIR to develop its student growth models for state assessments.

In two other widely used approaches to convert evaluation components into final ratings — “profile,” which is used in New Haven, and “holistic,” which is used in Massachusetts — there is no conversion to points. Instead, raters translate information more impressionistically.

Lachlan-Haché said that New York State was taking advantage of the flexibility of the numeric approach, which allows policymakers to make changes when they see some things aren’t working.

“We should expect to see pioneers like New York adjust their cut points and summative rating approaches in the first few years of implementation,” Lachlan-Haché said. “States are generally going into evaluation design with the understanding that these systems will not be perfect.”

The New York City model could be on the table for other districts when they renegotiate their evaluation systems.

“That is a topic we can come back to in July,” King said last month, referring to the July 17 Board of Regents meeting.