When it comes to measurement, four is a popular number; rather, a range of 1 to 4 is a common scheme. Two different powerful measurement systems use a range of 1 through 4 scores to clarify levels of quality. The Depth of Knowledge [DOK] framework by Webb, 2005, uses a 1 through 4 scale to rank the cognitive complexity of an assessment task. The Proficiency Scale framework by Marzano and Kendall (2008) uses a 1 through 4 scale to rank students’ performance levels on individual standards.
Both assessment measurement systems have much in common:
- Both systems range from 1 to 4
- Each number in the range of both systems represents a degree of sophistication
- The 1 represents the lowest degree of sophistication and the 4 represents the highest degree, or the “best” number
But that’s where the similarities stop. The DOK scale is used to measure the rigor level of a task (the source), whereas a proficiency scale is used to measure the quality of student work (the result) on an academic standard. A side by side summary of both frameworks highlights the differences in each level:
|Depth of Knowledge Levels||Proficiency Levels|
|(The measure of cognitive complexity in a task)||(The measure of proficiency found in student work)|
|4: Extended Thinking—Requires an investigation, time to think and process multiple conditions of the problem (Synthesize, Analyze, Prove, Connect, Design, Apply Concepts).||4: Advancing—The student has met standard expectations and advances the standard requirements with in-depth inferences and/or extensive sophisticated connections, etc.|
|3: Strategic Thinking—Requires reasoning, developing a plan or a sequence of steps, some complexity, more than one possible answer (Assess, Revise, Critique, Draw Conclusions, Differentiate, Formulate, Hypothesize, Cite Evidence).||3: Achieving—The student is independent and demonstrates accuracy (no major errors or omissions) on standard expectations.|
|2: Skill/Concept—Use information or conceptual knowledge, two or more steps, etc. (Infer, Identify Patterns, Modify, Predict, Distinguish, Compare).||2: Developing—The student is independent, but demonstrates only partial accuracy on standard expectations|
|1: Recall—Recall of a fact, information, or procedure (Recite, Recall, Label, Naming, Define, Identify, Match, List, Draw, Calculate).||1: Initiating—The student is dependent on scaffolding and support to demonstrate minimal or inaccurate / incomplete understandings of standard expectations|
Even though one system measures the input (task) and one measures the output (result), it’s a very common and alluring misstep to equate the two systems as synonymous. The argument could be made, for example, that a student is advancing (proficiency level 4) if he/she can engage in extended thinking (DOK level 4) regarding the standard(s).
However, it’s important to remember that the two 4-point schemes are meant to serve very different purposes. All four levels of proficiency can apply to any task that is not binary (there are no gradations of quality, so evaluation is solely based on right/wrong, yes/no, or present/absent responses); and, a single performance task can be assigned any one level of cognitive complexity. In other words, a task that requires strategic thinking (DOK level 3), can generate all 4 levels of proficiency in student responses.
The 3 Dangers
There are a few dangers in tying the two measures together.
Trapping Students at the Lower Levels
The first involves the limitations such a combination would place on students: Students who struggle to earn high levels of proficiency would become trapped in lower level tasks. Obviously, this is a significant concern. The joy and the big picture of learning often rest in DOK 3 and 4 level tasks which many struggling learners need, but if deemed a 1 or a 2 in proficiency would not be allowed to experience.
Forcing Students to Only Engage at Level 4
The second danger is the reverse of the first: Students wouldn’t be able to earn a 4 level of proficiency unless they were always engaged in extended thinking tasks.
The third danger is the oversimplification of two different sophisticated measurement systems. Turning complex evaluation processes into streamlined, simple algorithms compromises the accuracy of teacher decision making and exempts educators from needing to think carefully about what constitutes deep learning.
So, what’s in a 4? It turns out there is a lot of important information behind any assigned score … but the information will vary based on the specific measurement system being employed and the items being measured. Let’s use each system as it was intended and avoid the temptation to merge and simplify them.
“I wouldn’t give a nickel for the simplicity on this side of complexity, but I would give my life for the simplicity on the other side of complexity.” —Einstein
Marzano R. and Kendall J. (2008) Designing and assessing educational objectives: Applying the new taxonomy. Thousand Oaks, CA: Corwin Press
Webb, N. L., Alt, M., Ely, R., and Vesperman, B. (2005). Web alignment tool (WAT) training manual. Washinton, DC: Council of Chief State School Officers