Test performance

For a few weeks, Denver at center of PARCC testing world

The basement conference rooms of a Denver hotel are ground zero for setting the performance levels of students in Colorado and other states on last spring’s PARCC language arts and math tests.

Dozen of educators gathered Monday to begin setting the five performance levels – sometimes called “cut scores” – for the two sets of tests given to Colorado students in grades 9-11.

“It’s exhausting,” said Marti Shirley, a high school math teacher in Mattoon, Ill. “But it’s invigorating in a way, too.”

Shirley and about 120 other teachers, administrators and college professors are meeting in Denver this week to set cut scores for high school language arts and math tests. Similar panels will gather in Denver during the last two weeks of August to set proficiency levels for elementary and middle school test results.

The educators do their work in panels of about 20 members each. Six groups are working in Denver this week.

Officials from PARCC and a group of panelists met with reporters Wednesday to explain the process and reflect on what it means.

Because the PARCC tests are designed to be harder than Colorado’s old TCAP exams and other states’ past tests, smaller percentages of students are expected to be ranked in the top proficiency levels. Panelists were asked repeatedly about that gap between how students actually perform and how they should perform.

They all came down on the side of setting high expectations.

“We’ve got to raise the standard if we want to do better. … The only way to do that is to keep raising the bar,” said Robin Helms, a math teacher at Wray High School on Colorado’s eastern plains. She’s serving on one of the panels.

“Students only give you what you ask them, so you have to push,” said Katherine Horodowich, an English teacher at Hot Springs High School in Truth or Consequences, N.M. “We have to set the bar higher.”

Shirley said there’s wide agreement among educators “that these standards are attainable. Are they attainable tomorrow? That’s not the case. … Trust us. Give us the benefit of the doubt that we know what we’re doing.”

The overall goal of the Common Core State Standards, on which the tests are based, and of the tests themselves is that high school students should be ready for college or to go to work and that younger students are prepared for the work in the next grade.

How performance setting works

The panels will be setting the scores needed for a student to be ranked in one of five performance levels.

“They are making recommendations about how good is good enough,” explained Mary Ann Snider, a Rhode Island education official who works with PARCC.

Each PARCC member state selected 20 educators to serve on the panels. The high school panels started this week with two days of intensive training and began setting levels on Wednesday.

The five levels
  • Level 5: Distinguished understanding of subject matter
  • Level 4: Strong understanding
  • Level 3: Adequate understanding
  • Level 2: Partial understanding
  • Level 1: Minimal understanding

A key tool for the panels are the detailed “performance level descriptors” that lay out the knowledge and skills that students need to demonstrate to be rated in each performance level. (See an example of a descriptor as the bottom of this article.)

Here’s how the panels work:

  • Members work through test question one by one.
  • Panelists individually decide what scores on a particular question should be assigned to each performance level.
  • Members then share their individual scores with each other, learn what the group’s median score was for each level and also learn the median score of all students in a particular grade on a test.
  • Based on that shared knowledge, individual panelists reconsider their individual decisions, and the whole process is repeated until the group reaches consensus.

The panelists who met with reporters had positive things to say about the process.

“None of us are shy. We have no problem telling people we disagree,” said Loretta Holloway, an English professor at Framingham State University in Massachusetts.

“It’s not like we all sit down and make one judgment. It’s a conversation,” said Helms. “We’re spending a whole week looking at this.”

What went on before

Before the panelists could begin work, the tests taken by 5 million children had to be scored.

The Pearson testing company used about 14,000 scorers at home or at more than a dozen centers around the country to score the tests, which took about a month per content area. Scoring was done by grade level, not by state. And individual scorers worked on individual questions, not entire tests.

Scorers assigned points for each answer, which could be as many as six points, depending on the question. To be hired, scorers had to have a four-year degree in a relevant field and pass a scoring “test” after being trained. Samples of scorers’ work were double checked by testing experts.

What’s next

After the high school panels finish their work, the education commissioners from the eight PARCC governing board states (including Colorado) will meet to review the recommended cut scores. The commissioners can make changes. Higher education executives from the states also will review the cut scores on high school tests.

The education commissioners will meet again Sept. 9 to review the middle and elementary school cut points.

Public release of scores, including parent reports similar to the one pictured above, will come in late fall or early winter, PARCC officials said Wednesday. In future years results should be available in June or July.

Colorado uses test scores, plus growth data based on multiple years of scores, as part of the system that rates schools and districts. A law passed by the 2015 legislature created a one-year timeout in the accreditation system, so PARCC scores from last spring won’t be used to rate schools and districts next year.

The state’s non-PARCC tests for science and social studies use four performance levels – distinguished, strong, moderate and limited. Students with distinguished or strong command are considered to be ready for college work, or for the next grade.

The State Board of Education will have to fine-tune the existing accreditation system in order to account for PARCC’s five performance levels.

Scores in

After a wild testing year, Tennessee student scores mostly dip — but there are a few bright spots

PHOTO: Getty Images/Sathyanarayan

Student test scores were mostly flat or dipped this year in Tennessee, especially in middle school where performance declined in every subject, according to statewide data released on Thursday.

But there were a few bright spots, including improvement in elementary school English and high school math — both areas of emphasis as the state tries to lift its proficiency rates in literacy and math.

Also, performance gaps tightened in numerous subjects between students in historically underserved populations and their peers. And scores in the state’s lowest-performing “priority” schools, including the state-run Achievement School District, generally improved more than those in non-priority schools.

But in science, students across the board saw declines. This was not expected because Tennessee has not yet transitioned to new, more difficult standards and a new aligned test for that subject. Education Commissioner Candice McQueen said the drops reinforce the need to support science teachers in the shift to higher expectations beginning this fall.

The mixed results come in the third year of the state’s TNReady test, which measures learning based on academic standards that have undergone massive changes in the last five years. The 2017-18 school year was the first under new math and English standards that are based on the previous Common Core benchmarks but were revised to be Tennessee-specific. And in addition to new science standards that kick off this fall, new expectations for social studies will reach classrooms in the 2019-20 school year.

In an afternoon press call, McQueen said “stability matters” when you’re trying to move the needle on student achievement.

“It takes time to really align to the full depth and breadth of these expectations,” she said.

The three charts below illustrate, by subject, the percentage of students statewide who performed on track or better, both this year and last year, in elementary, middle, and high schools. The blue bars reflect the most recent scores.

McQueen acknowledged the good and bad from this year’s results.

“While we’ve focused extensively on early grade reading and are starting to see a shift in the right direction, we know middle school remains a statewide challenge across the board,” she said in a statement.

Tennessee’s data dump comes after a tumultuous spring of testing that was marred by technical problems in the return to statewide computerized exams. About half of the 650,000 students who took TNReady tested online, while the rest stuck with paper and pencil. Online testing snafus were so extensive that the Legislature — concerned about the scores’ reliability — rolled back their importance in students’ final grades, teachers’ evaluations, and the state’s accountability system for schools.

However, the results of a new independent analysis show that the online disruptions had minimal impact on scores. The analysis, conducted by a Virginia-based technical group called the Human Resources Research Organization, will be released in the coming weeks.

Even so, one variable that can’t be measured is the effect of the technical problems on student motivation, especially after the Legislature ordered — in the midst of testing — that the scores didn’t have to be included in final grades.

“The motivation of our students is an unknown we just can’t quantify. We can’t get in their minds on motivation,” McQueen told Chalkbeat on the eve of the scores’ release.

Thursday’s rollout marked the biggest single-day release of state scores since high school students took their first TNReady tests in 2016. (Grades 3-8 took their first in 2017.) The data dump included state- and district-level scores for math, English, science, and U.S. history for grades 3-12.

More scores will come later. School-by-school data will be released in the coming weeks. In addition, Tennessee will unveil the results of its new social studies test for grades 3-8 this fall after setting the thresholds for what constitutes passing scores at each grade level.

You can find the state-level results here and the district-level results here.

Chalkbeat illustrator Sam Park contributed to this story.

Surprising report

EXCLUSIVE: Did online snafus skew Tennessee test scores? Analysts say not much

PHOTO: TN.gov
Education Commissioner Candice McQueen will release the results of Tennessee's 2017-18 standardized test this week, but the reliability of those TNReady scores has been in question since this spring's problem-plagued administration of the online exam.

An independent analysis of technical problems that disrupted Tennessee’s online testing program this spring is challenging popular opinion that student scores were significantly tainted as a result.

Education Commissioner Candice McQueen said Wednesday that the disruptions to computerized testing had “small to no impact” on scores, based on a monthlong analysis by the Human Resources Research Organization, or HumRRO. The Virginia-based technical group has expertise in psychometrics, the science behind educational assessments.

“We do believe these are valid, reliable scores,” McQueen told Chalkbeat on the eve of releasing state- and district-level scores for TNReady, the state’s standardized test in its third year.


Here are five things to know as Tennessee prepares to release TNReady scores


The state hired the research group to scrutinize several issues, including whether frequent online testing snafus made this year’s results unreliable. For instance, during at least seven days out of the three-week testing window, students statewide reported problems logging in, staying online, and submitting their tests — issues that eventually prompted the Legislature to roll back the importance of scores in students’ final grades, teacher evaluations, and school accountability systems.

But the analysis did not reveal a dramatic impact.

“For students who experienced the disruption, the analysis did not find any systematic effect on test scores that resulted from lapses in time between signing in and submitting their tests,” McQueen told Chalkbeat.

There was, however, a “small but consistent effect” if a student had to log on multiple times in order to complete the test, she said.

“When I say small, we’re talking about an impact that would be a handful of scale score points out of, say, a possible 200 or 250 points,” McQueen said.

Analysts found some differences in test score averages between 2017 and 2018 but concluded they were not due to the technical disruptions.

“Plausible explanations could be the students just didn’t know the (academic) standards as well and just didn’t do as well on the test,” McQueen said. “Or perhaps they were less motivated after learning that their scores would not count in their final grades after the legislation passed. … The motivation of our students is an unknown we just can’t quantify. We can’t get in their minds on motivation.”

About half of the 600,000 students who took TNReady this year tested with computers, and the other half used paper materials in the state’s transition to online exams. Those testing online included all high school students.

Out of about 502,000 end-of-course tests administered to high schoolers, educators filed about 7,600 irregularity reports – about 1.4 percent – related to problems with test administration, which automatically invalidated those results.

The state asked the analysts specifically to look at the irregularity reports for patterns that could be cause for concern, such as demographic shifts or excessive use of invalidations. They found none.

TNReady headaches started on April 16 – the first day of testing – when students struggled to log on. More problems emerged during the weeks that followed until technicians finally traced the issues to a combination of “bugs in the software” and the slowness of a computerized tool that helps students in need of audible instructions. At one point, officials with testing company Questar blamed a possible cyberattack for shutting down its online platform, but state investigators later dismissed that theory.

While this year’s scores officially are mostly inconsequential, McQueen emphasized Wednesday that the results are still valuable for understanding student performance and growth and analyzing the effectiveness of classroom instruction across Tennessee.

“TNReady scores should be looked at just like any data point in the scheme of multiple data points,” she said. “That’s how we talk about this every year. But it’s an important data point.”