Tom Schimmer is an author and a speaker with expertise in assessment, grading, leadership, and behavioral support. He is a former district-level leader, school administrator, and teacher.

Most Recent? Most Frequent? Most Accurate?

One of the fundamental tenets of standards-based grading is that greater (if not exclusive) emphasis is placed on the more recent evidence of learning. For years now, the consensus among both experts and practitioners is that the combination of old and new evidence (most often via calculating a mean average) distorts the accuracy of reported achievement levels (O’Connor, 2011; Guskey, 2015; Reeves, 2015). As students move through their natural learning trajectories, they should be given full credit for their learning, regardless of how low or slow the start. Instead of being a full reflection of learning, grades produced through calculating the mean often reflect where the student used to be, since the student, at some point, was likely at the level the average represents; students who reach the proficient level were quite likely a developing or a novice learner at one point.

What we have collectively realized is that the speed at which a student achieves has inadvertently (or intentionally) become a significant factor in determining grades, especially when determined within a traditional grading paradigm. When averaging is the main (or sole) method for grade determination, success is contingent upon early success; never forget that every 40 needs an 80 to reach an average of 60. The lower the initial level, the more students must exponentially outperform themselves just to average out to a pass. With grades based on proficiency against standards, emphasis shifts to determining the most accurate level or grade, regardless of how low or slow the start. Sometimes that’s the most recent evidence, sometimes that’s the most frequent evidence, and sometimes it’s both.

Most Recent Evidence 
The most recent evidence is often the most accurate, especially when the standards, targets, or performance underpinnings being assessed represent foundational knowledge and skills. Foundational knowledge and skills are typically aspects of the curriculum that have a fairly linear learning progression where slip back is highly unlikely; once students truly know or can do something it is unlikely that they will, even after a period of time, suddenly not know or know how. That doesn’t mean mistakes don’t occur. Even the most proficient students make mistakes, but that also doesn’t mean they’ve suddenly lost proficiency; errors are an inevitable occurrence given enough attempts. Using the most recent evidence means that old evidence is trumped by new evidence since in our standards-based, criterion-referenced instructional paradigm it is irrelevant how long it took a student to reach proficiency. That said, the most recent evidence is not always the most accurate determination.

Most Frequent Evidence
There are also times when the most frequent evidence is the most accurate. Generally speaking, the more complex the standard or the demonstration of proficiency is, the more likely the most frequent evidence would be the more accurate determination. Let’s use argumentative writing as an example. Writing is a process that has a number of contributing variables. Most ELA teachers would agree that one writing sample is insufficient to make an accurate judgment about the writing proficiency of any given writer since so many aspects of quality contribute to what would be deemed proficient. One sample could have a strong opening paragraph and thesis statement, lack cohesion in its presentation of the argument, and finish with a concluding paragraph that meanders without focus; the next sample could have a weak opening, exceptional body paragraphs with smooth transitions, and a strong conclusion. This pattern – or lack of pattern – could continue throughout a given period of time. One sample would not be enough for the teacher to make an accurate determination. Choosing the most recent evidence with complex, multifaceted standards could leave students’ grades vulnerable to a weak performance at the end, despite several strong performances prior.

Both?
In some cases, a teacher may want to simultaneously consider both the most recent and most frequent performances. In doing so, teachers can honor both the growth students are showing while ensuring that one strong or weak performance doesn’t overly influence a final determination. Accurate assessment is always about adequate sampling, which means teachers can find the sweet spot between using one or every task the students complete. Using one task would not be an adequate sampling, while using every task would likely result in the combination of old and new evidence.

I was recently in a conversation with a high school ELA teacher and I asked her how many argumentative writing samples she would need in order to accurately judge her students as argumentative writers; she said four. I then asked her if the rigor with which she assessed her students writing increased as the semester progressed; she said it did. As a result, we determined that she would restrict her grade book to four writing samples to determine student proficiency, but these would be the four most recent samples. This did not mean the students were only going to complete four writing samples; it meant she would only use four. Had she said three or five or any other number, then that’s what she would have used.

Obviously the first four samples would populate the gradebook. The manoeuvring begins when sample five (and beyond) is collected. Since this teacher expressed that the rigor of assessment increases as the semester moved forward, she would simply drop the most dated score and replace it with the most recent one; this would continue throughout the semester. At the end of the semester, she would have the required four samples, which would be the four most recent samples, but she would consider a large enough sample to be able to determine the most frequent or consistent performance. If the rigor of assessment did not increase throughout the semester the teacher could have chosen to drop the lowest score, regardless of when it occurred; this is at the teacher’s discretion. The goal is accuracy, and the art of grading demands teachers use their judgment to ensure students earn full credit for their learning.

Some might be concerned with the level of subjectivity involved in most recent vs. most frequent decisions, but subjectivity is not a four-letter word; there is a level of subjectivity in all assessment. What often separates professions from jobs is the level of professional judgment required. Teachers are much more than data-entry clerks who enter numbers into spreadsheets; they are professionals who understand what quality work looks like, who know what is needed for students to continue to improve, and who know when the numbers don’t tell the full story. Accurate grade determination–not just calculation–is about teachers leveraging their training and experience to draw thoughtful conclusions about the proficiency levels of each of their learners.

References:

Guskey, T. (2014). On Your Mark: Challenging the Conventions of Grading and Reporting. Bloomington, IN: Solution Tree.

O’Connor, K. (2010). A Repair Kit for Grading: Fifteen Fixes for Broken Grades with DVD (2nd Ed.). Portland, OR: Pearson Assessment Training Institute.

Reeves, D. (2015). Elements of grading: A guide to effective practice (2nd Edition). Bloomington, IN: Solution Tree Press.


Comments

  1. Veronica Saretsky

    When we use sumative assessments too early in the learning process, we inadvertently teach students they can not be successful and many students give up.
    Recognizing teachers as professionals and not “data entry clerks” will be an important cultural shift for parents who want credit for their child learning faster than the others.

    Reply

Leave a Reply to Veronica Saretsky

  • (will not be published)