Performing Well vs. Thinking Well: A Two-Stage Assessment Study

Maybe some residue lingers longer

The Beacon—by EA'S CTL and Justin Cerenzia

Jun 17, 2026

This is another window into our CTL Scholars program, where faculty use a Scholarship of Teaching and Learning framework to move beyond intuition and into evidence—bridging the gap between research and classroom practice.

This post is one small piece of that work: a look at what happened when I tried to test something I'd believed for years but never actually measured. I’m kind of obsessed with assessment. To be clear, I’m not obsessed with it because I want to rank or sort students. If I wasn’t required to assign a grade, I’d be happy to simply not do that. Instead, I want to know what my students know and I want that knowledge to be durable and readily accessible for them far beyond the end of the class. That’s much harder to measure than a grade. What follows is one man’s attempt at closing that distance.

The Problem

My obsession with assessment likely stems from my strong belief that most (all?) forms of assessment are imperfect. In schools we have a tendency to design for efficiency and in doing so our students optimize their own learning processes to navigate the maze of schooling in ways that allow them to perform well, but not necessarily to think well. Paul Kirschner, Carl Hendrick, and Jim Heal’s recent AFT article, The Illusion of Performance does a masterful job capturing the distinction between performance and learning—and they offer a number of options to promote learning over performance.1 What I wanted, ultimately, wasn't a better way to check learning. I wanted the assessment to be the learning.

The Inspiration

I've been using a form of two-stage (or collaborative) assessment for close to a decade. Initially, I was inspired by Harvard Kennedy School's Teddy Svoronos. It also helped that I had a colleague doing a version of collaborative assessment in his Physics class across campus, thus I could see the work in action. And I keep coming back to Stanford professor Matthew Rascoff's quote that "school is learning embedded in social experience." I wanted to leverage the social element, in this case the second stage of my assessment design, because I believed it would lead to more durable learning. I knew we'd likely see performance gains in the second stage, but I suspected something else was happening too—that students would remember things better given their interactions with one another. The assessment itself would act as a formative experience, helping students learn from the act of being assessed, not just demonstrate what they already knew.

The Intervention

While I've been using collaborative assessment for some time, I've never formally investigated whether or not they worked in the ways I sought. I often had good stories to tell about student interactions during the second stage, and given the performance gains, students seemed to appreciate the experience—but I never knew whether anything stuck. The intervention itself was simple by design. I didn't want the test of the idea to be more complicated than the idea itself. First, students took a unit test individually. The tests were a mix of 25 multiple choice questions and some writing (typically multiple short answer questions or a brief essay). In the next class they would take the multiple choice section in mixed groups of three to four students working together. For the purposes of this investigation I would remove five questions to act as a kind of control mechanism. Those five concepts would resurface on future tests, the midterm, and the final—which meant I could eventually compare two kinds of memory: concepts students had only ever seen once, alone, against concepts they'd also worked through together. The same test, but with two different histories.2

Does the second stage take up more class time? Absolutely. But I wouldn’t describe it as time lost. With stakes attached (in this case points), students wrestle: with the ideas and with each other. That kind of friction is hard to manufacture in a normal class.3

The Results

If, as one of our CTL favorites Dan Willingham says, "Memory is the residue of thought. The more you think about something, the more likely it is that you'll remember it later," we might also add that the residue seems to linger longer when the thinking is shared. This was an admittedly noisy investigation, but the data did hint at some cognitive benefits from the second stage of assessment. On the midterm, students correctly answered questions tied to concepts they'd engaged with twice 88% of the time, compared to 74% for concepts they'd only seen once. On the final exam, months removed from the original material, that gap held: 86% versus 72%.

We do not possess enough confidence to suggest that this gap is derived solely from the collaborative nature of the second stage of assessment. We do possess enough confidence to suggest that two-stage assessment was affectively positive for students in my US History class—whatever else the data does or doesn't prove, they liked it.For some, it offered a chance at a higher grade, which they always appreciate. For most, however, the experience itself was the reward—the chance to argue, defend a position, and occasionally get talked out of a wrong answer by a peer.

What’s my verdict on two-stage assessment? I’m going to keep it for now. Students look forward to test days, which is a win in itself. And when you witness the conversations (which sometimes devolve into arguments…in a good way) during the second-stage, you can’t help but appreciate the intensity of thinking on display. Sometimes that’s all of the data one needs.

I do quite literally every one of the things they reference. Apparently that wasn't enough for me—hence the rest of this post.

This is likely where you start to quibble with the study design. I get it. But this is SoTL work. We try to isolate things where we can and live with the rest.

In terms of points, I weight the first stage of the assessment at 90% and the second stage at 10%. This eliminates a potential free-rider problem. I’ll also note that students don’t need to come to a consensus on the second stage. They’re free to disagree and go with what they perceive to be the correct answer.

A guest post by

Justin Cerenzia

Buckley Executive Director, Chair for Teaching and Learning at The Episcopal Academy. Academic DJ. #GoBirds

Discussion about this post

Ready for more?