Why Denver’s school rating system is coming under fire on multiple fronts

Denver Public Schools’ comprehensive and increasingly complex system for rating schools is facing criticism this year from leaders and advocates on different sides of the education policy debate.

Some say the system is making bad schools look good. Others say the opposite. Many complain that frequent changes to the School Performance Framework make excellence a moving target in a district that promotes school choice — and one in which parents use the color-coded ratings to decide where to send their kids.

A record number of schools earned one of the top two ratings on the framework this fall, putting the state’s largest school district closer to meeting goals for raising the quality of schools citywide.

With student enrollment tied to funding, and in an era where low performance puts Denver schools on a path toward closure or replacement, the ratings carry real consequences.

“Lots of things get mentioned or murmured in the hallways,” said Chantel Maybach, an educator at George Washington High School. “Instead of building up a school, that’s an easy way to start tearing it down from the inside, those fears and those concerns.”

Superintendent Tom Boasberg defended the ratings system, which put new emphasis this year on how well schools are educating traditionally underserved students. He also defended the academic gains schools have made and the high ratings they earned.

But he acknowledged that some measures weren’t as rigorous as they need to be, while others had the potential to be applied in a way that didn’t make sense. Correcting that will require more changes to the framework. While the fluidity of the system is one of the most persistent criticisms, he said making those changes is critical if Denver is going to get it right.

“Do you not make improvements that clearly need to be made in the interest of saying, ‘No change?’” Boasberg said. “I think our view is that over time, as we learn more and listen to folks, we want to make those improvements. … If we have data that’s not doing a good job helping schools focus on how and what to improve, that’s a reason we want to improve our tool.”

The concerns voiced by educators and advocates this year include that the framework too heavily weights the scores of less-rigorous early literacy tests taken by students in kindergarten through third grade, thereby inflating elementary school ratings.

Others complain that the new “academic gaps indicator” for all schools does the opposite, unfairly penalizing those that serve a diverse population at a time when the 92,000-student district, where two-thirds of students are living in poverty, is trying to increase school integration.

To understand the concerns, it’s helpful to first understand the framework.

What is the School Performance Framework?

The School Performance Framework was adopted by Denver Public Schools in 2008 under Boasberg’s predecessor, Michael Bennet, who is now a U.S. senator.

It awards schools points based on a long list of metrics. The number of points a school earns puts it in one of five color categories: blue (the highest), green, yellow, orange and red.

The system was meant to reward top-performers and identify low ones, which from the beginning received extra funding to help them improve. Bennet warned the ratings could have more dire consequences, too, including being used as a basis for school closure.

While the district has for many years closed schools due to poor performance, it solidified the framework’s role in those decisions in 2015 when the school board approved a policy setting consecutive low ratings as the first step toward school closure or restart.

So how are schools measured? State test scores have always been a big part of the metrics. But it’s more than just how many students score at grade-level or above, a factor the district calls status. In fact, the framework more heavily weights academic growth, or how much progress students make on the tests compared to peers who scored similarly to them in previous years.

When the framework debuted, Denver was among a first wave of large urban districts to emphasize growth over status. In 2008, growth accounted for about 60 percent of a school’s score, while status counted for about 30 percent, a ratio of 2-to-1.

As the district has added more growth metrics over the years, that ratio has stretched to 3-to-1 for elementary and middle schools. Growth accounted for 73 percent of an elementary school’s score this year, while status counted for 22 percent.

Boasberg is adamant that growth is more important than status. The latter, he said, is more a measure of where students start, which can depend on factors outside a school’s control. A school is not “good” because it serves more affluent kids, he said.

The traditional way of measuring schools based on how many students pass a test “plays to your worst biases around privilege,” Boasberg said. “The most important thing is for schools to make sure when kids come in, whatever level they’re at, that they grow.”

But the district has been criticized, including by candidates in this year’s heated school board election, for giving high ratings to schools that may have above-average growth but where, for example, just 10 percent of third-graders can read and write at grade-level.

The percentage of schools rated blue and green, the two highest ratings, has grown over the years. In 2010, 45 percent of schools were blue and green. This year, more than 60 percent were. The district’s goal is for 80 percent of schools in every neighborhood to be blue or green by 2020.

Sean Bradley, the president and CEO of the Urban League of Metropolitan Denver, is concerned that all that blue and green is misleading to parents.

“The district has a duty to tell the truth,” he said. “And the current calculations that the district is putting out there may not be as accurate as we assume they are.”

Early literacy concerns

Last year, just 9 percent of third-graders at Barnum Elementary in southwest Denver scored at grade-level or above on the PARCC literacy test, which the state requires be given to students in grades three through nine and which it considers the gold standard measure of what students should know.

But 57 percent of those same third-graders scored at grade-level or above on the iStation literacy test, another state-chosen test that’s given to students in kindergarten through third grade.

For the purposes of Denver’s school ratings, that 48-point gap and others like it are troubling to advocates like Van Schoales, CEO of the nonprofit education advocacy group A Plus Colorado.

“What’s happened this year on the elementary school front, primarily because of the early literacy scores, threatens undermining the whole system,” Schoales said. “Most importantly, it is saying to families that schools are good when they aren’t.”

This year, the district increased the number of points schools could earn for doing well on iStation and other early literacy tests by adding metrics measuring how groups of traditionally underserved students did, which district leaders consider key to closing achievement gaps.

That increase in the number of points came at the same time schools across Denver, including Barnum, saw big jumps in the number of young students scoring at grade-level on iStation and other tests, which leaders credit to an increased focus and investment in early literacy.

As a result, Barnum earned nearly every possible point on the framework for its early literacy scores, while earning far fewer points for its PARCC scores, including zeroes in several categories. The school, which serves a primarily low-income student population, was rated green this year after being rated yellow the year before.

In a statement provided to Chalkbeat, Principal Beth Vinson said Barnum is proud to have been rated green. She said its focus on early literacy “is starting to show good results” that she hopes will lead to higher achievement in its upper grades.

Barnum was not the only green school with a big chasm between its third-grade early literacy scores and its third-grade PARCC scores. One of the biggest was at Castro Elementary, where 73 percent of third-graders scored on grade-level on iStation but just 17 percent did on PARCC. Castro jumped all the way from a red rating, the lowest, to green this year.

Boasberg agrees that the misalignment between PARCC and tests like iStation is concerning. Because PARCC is relatively new, he said it was only recently that the district had enough data to confirm the mismatch. To remedy it, the district announced this fall that it will raise the early literacy test cut points, which were previously set by test makers and the state. Doing so will make it harder for schools to earn points, which Boasberg suspects will affect ratings.

The higher cut points will go into effect for 2019, giving schools time to get used to them. Boasberg rejected an idea floated by some critics to eliminate the early literacy tests from the framework altogether. While he acknowledged they’re an imperfect measure, he said the district added them in response to complaints that elementary school ratings long ignored progress being made in the lower grades because those students don’t take PARCC.

“We definitely agree the PARCC assessment is a stronger, higher quality assessment,” he said. But the early literacy tests are useful, too, he said, and the district is better off using them than nothing. “The question is,” he said, “‘Do you let the perfect be the enemy of the good?’”

The debate over academic gaps

Another pervasive complaint this year has been how the district’s focus on academic gaps between more-privileged and less-privileged students is dragging down some schools’ ratings.

Two years ago, the district launched a new part of the framework it called the “equity indicator.” Meant to shine a light on educational disparities, it measured how traditionally underserved students — low-income students, students of color, special education students and English language learners — were scoring on tests compared to set benchmarks, and how they were scoring compared to students not in those groups, so-called “reference students.”

The district warned schools that the following year, the equity indicator could count against them. If they didn’t score blue or green on the indicator, they couldn’t be blue or green overall.

During that hold-harmless year, 33 blue or green schools scored poorly on equity. The hold-harmless period also provided a chance to highlight issues with the indicator. Some school leaders, for example, complained it was unfairly dinging them for having large gaps even though their traditionally underserved students were scoring better than average.

What sort of message was it sending low-income parents, they argued, when a school with a big gap between poor and affluent students but where poor students were doing above average was rated lower on equity than a school where all students were doing below average?

The district took those concerns into account and tweaked the indicator this year, Boasberg said. It still measures gaps within a school, but it awards twice as many points for whether traditionally underserved students are meeting the benchmarks, taking the emphasis off the comparisons and putting it on whether underserved kids are on grade-level.

The district also gave the indicator a more precise name: the “academic gaps indicator.”

But concerns persist.

The Downtown Denver Expeditionary School, a charter elementary school where about 40 percent of students are minorities and a quarter are low-income, scored red on the academic gaps indicator for the second year in a row and was rated orange overall.

School leaders acknowledge the school has work to do in closing its gaps. Last year, 61 percent of middle- and upper-income third-graders scored at grade-level on the state literacy tests, while just 23 percent of students who qualify for subsidized lunches did, for example.

But they said despite the district’s tweak, it continues to make little sense that schools with smaller gaps but 8 percent literacy proficiency are green, while their school is orange.

“This isn’t about not holding us accountable for our achievement gaps,” said principal Erin Sciscione. “We want to be held accountable to that. We just don’t think the current system of measuring that is doing what it says it’s doing.”

Chantel Maybach, a special education coordinator at George Washington High, was among a group of teachers, parents and students who spoke publicly about the indicator at a recent school board meeting. She said she was “discouraged and sickened” to learn from one of the school’s data specialists that if white students at George had just not answered every fifth question on the test, the school would done better on the indicator and been green overall instead of yellow.

Senior Emily Ostrander said the lower rating was unfair for a school that serves “some of the highest-achievers in the district.” George is home to a rigorous International Baccalaureate program that for years fueled a divide among students, often along racial lines, that the school is working to erase. About 72 percent of George students last year were students of color, and about 55 percent qualified for free or reduced-price lunch.

“In a way, it dings the school for being as diverse as it is,” said student Yemi Kelani.

Nine schools were downgraded this year because they didn’t score high enough on the academic gaps indicator. George wasn’t among them, but Brown International Academy, an elementary school in northwest Denver, was. Kate Tynan-Ridgeway, a third-grade teacher at Brown, wrote an opinion piece in the Denver Post calling the ratings misleading.

Sixty-one other teachers signed on in support of the opinion piece.

If Brown were located a few blocks west and over the border of Jefferson County, where there is no academic gaps indicator, Tynan-Ridgeway said, it’d be green and not yellow.

“The achievement gap worries us all,” she said. “As educators, we’re differentiating all the time.”

But Tynan-Ridgeway said that with the indicator highlighting the performance of traditionally underserved students, “it feels to me that the district is saying those kids are far more important than what could potentially be the bulk of your student body.”

Boasberg responded with an opinion piece of his own explaining why the indicator exists. He wrote that it’s already showing promising results: The number of would-be green schools with poor indicator scores dropped by two-thirds from the hold-harmless year to this year.

The district is still fine-tuning the indicator, Boasberg said, and it’s possible more tweaks are coming. One issue, he said, is whether it should apply to schools where nearly all students belong to traditionally underserved groups. This year, the district decided not to downgrade the overall ratings of three high-poverty schools even though they did poorly on the indicator.

Looking ahead

With such high stakes as funding, enrollment and even possible closure attached to school ratings, there are plenty of theories about the reasons behind the frequent changes. Is the district embellishing the ratings to make its schools look better and insulate itself from criticism about closing low-performers? Or is it inventing new ways to drive traditional schools’ ratings down so it can justify replacing them with charter schools?

Boasberg insisted it’s neither. But he said he understands why people hold such passionate, and often conflicting, opinions about the way the district rates its schools.

“There’s no perfect way to do it,” he said. “At the end of the day, it’s enormously helpful for teachers, for parents and for school communities to have a school performance framework that takes data from many different sources and brings it together in a way that’s understandable.”

While the district debates what to do about the academic gaps indicator and gives schools another year to get used to higher early literacy cut points, there is one change that’s definitely happening for the 2018 framework. After lowering the bar in 2016 to essentially give schools a reprieve from the new and rigorous PARCC tests, all cut points for the literacy and math tests will go up next year, inching blue and green ratings a bit further out of reach.