Formative vs. summative evaluations

Published on July 22, 2025 5:36 PM GMT

(This is a series of comments that have been turned into a post.)

In the field of usability engineering, there are two kinds of usability evaluations: formative and summative.

Formative evaluations are done as early as possible. Not just “before the product is shipped”, but before it’s in beta, or in alpha, or in pre-alpha; before there’s any code—as soon as there’s anything at all that you can show to users (even paper prototypes), or apply heuristic analysis to, you start doing formative evaluations. Then you keep doing them, on each new prototype, on each new feature, continuously—and the results of these evaluations should inform design and implementation decisions at each step. Sometimes (indeed, often) a formative evaluation will reveal that you’re going down the wrong path, and need to throw out a bunch of work and start over; or the evaluation will reveal some deep conceptual or practical problem, which may require substantial re-thinking and re-planning. That’s the point of doing formative evaluations; you want to find out about these problems as soon as possible, not after you’ve invested a ton of development resources (which you’ll be understandably reluctant to scrap).

Summative evaluations are done at or near the end of the development process, where you’re evaluating what is essentially a finished product. You might uncover some last-minute bugs to be fixed; you might tweak some things here and there. (In theory, a summative evaluation may lead to a decision not to ship a product at all. In practice, this doesn’t really happen.)

It is an accepted truism among usability professionals that any company, org, or development team that only or mostly does summative evaluations, and neglects or disdains formative evaluations, is not serious about usability.

Summative evaluations are useless for correcting serious flaws. (That is not their purpose.) They can’t be used to steer your development process toward the optimal design—how could they? By the time you do your summative evaluation, it’s far too late to make any consequential design decisions. You’ve already got a finished design, a chosen and built architecture, and overall a mostly, or even entirely, finished product. You cannot simply “bolt usability onto” a poorly-designed piece of software or hardware or anything. It’s got to be designed with usability in mind from the ground up. And you need formative evaluation for that.

The same principles apply when evaluating, not products, but ideas.

The time for clarifications like “what did you mean by this word” or “can you give a real-world example” is immediately.

The time for pointing out problems with basic underlying assumptions or mistakes in motivating ideas is immediately.

The time for figuring out whether the ideas or claims in a post are even coherent, or falsifiable, or whether readers even agree on what the post is saying, is immediately.

Immediately—before an idea is absorbed into the local culture, before it becomes the foundation of a dozen more posts that build on it as an assumption, before it balloons into a whole “sequence”—when there’s still time to say “oops” with minimal cost, to course-correct, to notice important caveats or important implications, to avoid pitfalls of terminology, or (in some cases) to throw the whole thing out, shrug, and say “ah well, back to the drawing board”.

To only start doing all of this many months later, is way, way too late.

Note that “Formative evaluations” need not be (indeed, will rarely be) complete works by single contributors. In intellectual discussion as in usability engineering, evaluations will often be collaborative efforts, contributed to by multiple commenters.

And, accordingly, evaluations (whether formative or summative) are made of parts. If a commenter writes “what are some examples”, or “what did you mean by that word?”, or any such thing, that’s not an evaluation. That’s a small contribution to a collaborative process of evaluation. It makes no sense at all to judge the value of such a comment by comparing it to some complete analysis. That is much like saying that a piston is of low value compared to an automobile. We’re not being presented with one of each and then asked to choose which one to take—the automobile or the piston. We’re here to make automobiles out of pistons (and many other parts besides).

Such parts, then, must be judged on their effectiveness as part of the whole—how effectively they contribute to the process of constructing the overall evaluation. A question like “what are examples of this concept you describe?”, or “what does this word, which is central to your post, actually mean?”, are contributions with an unusually high density of value; they offer a very high return on investment. They have the virtuous property of spending very few words to achieve the effect of pointing to a critical lacuna in the discussion, thus efficiently selecting—out of the many, many things which may potentially be discussed in the comments under a post—a particular avenue of discussion which is among the most likely to clarify, correct, and otherwise improve the ideas in the post.

Of course, reviews serve a purpose as well. So do summative evaluations.

But if our only real evaluations are the summative ones, then we are not serious about wanting to be less wrong.

Discuss

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签