Sarah Schuhl, a consultant specializing in mathematics, has been a secondary mathematics teacher, high school instructional coach, and K–12 mathematics specialist for nearly 20 years.

How Close are We?

Several years ago as an instructional coach in a district new to the work of collaborative teams in a professional learning community, I learned we should calibrate our grading of common assessments.  I decided to ask our Algebra teachers if they would be willing to take on this task.  My request was met with resistance as the teachers explained to me it was an unnecessary use of time since they had already worked together to write the test and had even determined how many points each question was worth and typed those points on the side of each item.  I offered to bring donuts and a couple of student samples if they would agree to score a test together.  They finally gave in.

At the next team meeting, each of the five teachers scored an assessment individually.  When we shared results, the student could have earned an overall grade of D- to B+ depending on which teacher graded the assessment.  The teachers were in shock and surprised at the range of different grades.

What came next can be categorized as one of the worst collaborative team meetings I have been a part of. I learned that whether a teacher has been teaching one year or twenty-five (as was the case on this team), grading is personal and emotional.  Teachers said things like, “I’ve always graded this way.” Or “That is how I was graded.” as they argued their reasons for why their grading was more accurate.  Some gave partial credit and others did not, some graded on a scale of 1–5 and used ratios to scale to the points on the test — one teacher took points off to the minus one-eighth of a point.  While the team had agreed on how many points each question was worth, they had not discussed how students earned those points.  Suddenly, a few teachers were shouting at one another, one was in tears, and it was clear consensus would not be reached.

After a two-day break, we met again, armed with examples and rationale for how to score assessments.  The teachers were able to make agreements and identified how students would earn the points on each question, much like the scoring of an AP exam.  They agreed to calibrate themselves on future assessments and continue to define their scoring agreements in advance of grading assessments.

It matters.

Students were being told they needed to participate in an intervention based on their assessment results.  This meant it depended on which of the five teachers scored the assessment – even though the standards, assessment, and learning were consistent from class to class – to determine whether or not a student needed additional learning opportunities.  The teachers quickly realized the problem this created.

Later, when working with ELA teachers to calibrate their scoring of student writing using the Six Traits Writing Guide, I learned that even with rubrics, teachers can interpret the language of categories differently and not consistently determine whether or not a student has demonstrated proficiency with a standard or category. As we worked to gain consensus on scoring a writing sample, one teacher finally shouted, “It might be a 4 on the rubric, but it is a 3 in my class!” After a stunned silence, another teacher said, “Then it’s a 4.  Let’s not tell students it is a 3 to push them higher, let’s tell them it’s a 4 and challenge them to reach a 5.  Let’s make sure they know the expectations.”

So, how do you know if you and your colleagues are calibrated?  Suppose you are busy and your team needs to analyze and collectively respond to student data from an assessment.  A colleague approaches you and offers to grade your assessments for you so the team has all of the information needed, knowing it will help you.  Is your first thought, No!  You may not understand how to score the work of my students? Or is your first response, Seriously?  Thank you so much – that would be great!  With calibration, teachers trust one another to teach and assess consistently.  They learn from the work of individually scoring work and gaining consensus around student proficiency.

Too often, teams work diligently to create a common assessment, but forget to clarify common scoring agreements or fail to calibrate their grading of student work to make sure they are consistent in using their common scoring agreements.  Whether your team uses a rubric, proficiency scales, or points to grade an assessment and whether it is a proficiency based assessment by standard or an assessment covering multiple standards, consistent and calibrated scoring helps teachers and students understand learning expectations and respond to the learning shown.

How will students know if they are learning standards?  How will teachers know which students are learning and which are not learning a standard yet?  Answers come from common assessments.  Consistent grading of these assessments and interpretation of results will benefit teacher teams and students alike.  It may be an initially difficult conversation to have with your team, but the benefits far outweigh the cost when the evidence of student learning is at stake.

Leave a Reply

  • (will not be published)