Last year, the Evander Childs Campus got a new library, replete with rows of new computers and a mural depicting scholarly pursuits.
The library opened its doors for the first time last month — but not to students. Instead, it housed teachers from other high school campuses, who convened there to try out a new model for grading students’ final exams.
Regents exams, which students must pass to graduate from high school, have been scored by the teachers who administered them since the Regents exam program began in the nineteenth century. But mounting concerns about cheating — spurred on by the finding that students hit the minimum passing score at a disproportionately high rate — have prompted the city and state to make changes to how the exams are graded.
The state’s test security overhaul calls for schools to stop grading their own Regents exams by June 2013. The changes are meant to reduce opportunities and incentives for teachers to inflate their students’ scores, which under state law could factor into teachers’ evaluations in the future. The shift would bring Regents exam grading in line with how most states score high-stakes exams and with New York State’s requirements about elementary and middle schools’ exams.
Buoyed by its own concerns about cheating and softer forms of score inflation, the city has sped that timeline up. In January, a handful of schools tested out a system to ensure that teachers do not grade their own students’ exams.
Department of Education officials expanded that system, known as “distributed scoring,” to more than 160 schools this spring. Most of the schools deployed teachers to centralized locations such as Evander Childs, and teachers from 17 schools tested a system for grading exams online. In total, about 107,000 exams were graded under distributed scoring last month.
Teachers who participated in the pilot gave it mixed reviews. Some said the system made them better graders because they considered only the answers, not the students, when assigning scores. But others said the system of musical graders was complicated, time-consuming, and likely to lead to unfairly deflated scores. And a small number of missing tests highlight the potential cost of logistical mishaps.
Department of Education officials solicited feedback from teachers who piloted the new system and said they would use that information to improve it before the next round of exams in 2013. Shael Polakow-Suransky, the department’s chief academic officer, said the pilot included a wide range of schools from across different districts and networks to elicit as full a range of feedback as possible.
“It was just trying to get a right mix so that we could actually see where the challenges will be and where we need to make adjustments for next year,” he said.
Concerns about fairness
For Richard Mangone, a retired social studies teacher recruited to grade U.S. History exams at the Prospect Heights Campus, few changes are needed. He said the scoring process at his site was the most efficient he had seen in 30 years of grading exams.
He also said he found it easier to grade fairly. One finding that prompted the city’s February audit was that teachers issued a disproportionate number of 65s — the lowest passing score — on Regents exams, suggesting that they might be bumping up the scores of students on the verge of passing. That’s less likely to happen now that teachers are not grading their own students’ tests, Mangone said.
“It’s not that you’re less objective, but it’s easier,” Mangone said about distributed scoring. “You’re just looking at the response.”
Some said have argued the bulge of 65’s reflects not padded scores but concern for students most at risk of failing. At a panel last August, high school social studies teacher (and GothamSchools Community section contributor) Stephen Lazar said that when teachers are more invested in their success, they are more attentive while grading, preventing careless scoring errors from costing a student the score he needs to graduate.
Monica Mazzocchi, who teaches at New Utrecht High School, which was a scoring center, said she prefers grading her own exams for the same reasons.
“Because it’s not their students, will they care as much as we care?” she said about other graders.
Other teachers said the lack of context could be problematic for other reasons. Peter Lapré, a social studies teacher at Park East High School, said he teaches his students extensively about the Venetian salt trade, even though the subject is not covered in standard course materials. This year, his students’ exams were graded online by teachers at other schools — some of whom might not be familiar with that topic.
“I’m concerned my students who chose to write about that were graded unfairly because the teacher didn’t know that information,” he said.
A teacher from Harry S. Truman High School said she worried that other graders wouldn’t be aware that students from her school take U.S. history in ninth grade rather than in their junior year, as students in most schools do, and would grade them according to the standards they would apply to high school juniors.
Fears of score deflation
The Truman teacher, who scored exams at DeWitt Clinton High School and asked not to be named because she feared repercussions, said she found that her concerns were warranted: Her school’s test scores in the subjects graded at the central location dropped significantly, even though they rose in subjects graded in the old model. And Lapré said his scores were stable from last year, even though his school had doubled the time devoted to global studies instruction in an effort to boost scores.
Both teachers said they thought the new system placed more pressure on teachers to grade harshly by exposing them to oversight from their colleagues and supervisors. The Truman teacher said that because no one at her site wanted to be seen as too lax, teachers debating between two scores usually tended to round down instead of up.
Plus, each grader was assigned a three-digit identification number and assigned to write them next to every response he or she scored.
“You’re well aware you’re being watched,” Lapré said.
Each exam was also marked with the student’s name and school. Arthur Goldstein, a social studies teacher at Francis Lewis High School, which did not participate in the pilot, said he was concerned that information could bias graders against students.
“I wonder if a bunch of papers go to a closing school [to be graded], if they won’t look at it and make my kids pay for it because we’re a good school,” he said.
Another teacher whose school did not participate in the pilot said he worried that bias could cut the other way, disadvantaging students whose names or schools suggested they were likely to be black or Hispanic because teachers would expect them to perform less well.
The city’s progress report system for evaluating schools judges high schools in large part by their Regents exam pass rates, and rates that fall from one year to the next would result in a lower grade. The system also weighs each school’s performance against that of other schools with similar students. When schools’ annual letter grades are announced this fall, some schools that used distributed scoring will have been compared to schools whose exams were scored under the old system.
Department officials said the distributed grading model actually shields students from unfairness. Before teachers even began grading tests at the centralized sites, they completed an exercise to make sure they shared an understanding about what makes an essay worth one score rather than another. First, each teacher graded the same essay, and then members of each grading discussed their rationales before comparing their assessments to the state’s guidelines.
The actual scoring happened in committees of four or six, with two teachers grading every essay. Discrepancies of more than one point between the teachers’ scores would trigger another reading by a third teacher, according to a department official who worked on the new system. That rarely happened, teachers who participated in the pilot said.
A large-scale logistical undertaking
Test scores weren’t the only things that moved as a result of the pilot: The physical tests also had to be transported around the city, posing a logistical challenge. In one extreme hiccup, 17 exams taken at Franklin D. Roosevelt High School in Brooklyn that were supposed to be taken to New Utrecht for grading were lost.
The Department of Education’s Office of Special Investigation is looking into what happened and whether distributed scoring played a role, according to Marge Feinberg, a department spokeswoman. She said the exams, which were taken by students in FDR’s evening school for students at risk of dropping out, are the only ones missing from June’s Regents period.
“We are working with school staff to find the exams,” Feinberg said.
Other logistical issues had graders concerned about distributed scoring’s efficiency. The teacher from Truman said she and her colleagues spent long stretches in DeWitt Clinton’s library, doing nothing while they waited for exams to be collected or distributed.
A Brooklyn teacher who commuted with 24 colleagues to score at New Utrecht said he said he found it inefficient to commute each day to a school other than his own. And he worried that the presence of 100 extra people in the building while New Utrecht’s students were taking final exams was disruptive.
But the teacher, who asked not to be named because he feared repercussions, said he saw a value in handing off his students’ exams for others to grade.
“What I would recommend personally is, give me Utrecht’s, I’ll give them my papers, and we can stay in our own buildings,” the teacher said.