少点错误 07月23日 01:42
Formative vs. summative evaluations
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文深入探讨了用户体验工程中的两种关键评估方法:形成性评估和总结性评估。形成性评估强调在产品开发早期、甚至概念阶段就开始进行,通过持续的用户反馈和可用性分析来指导设计决策,即使这意味着推翻重来,也能在早期阶段发现并解决深层问题,避免后期巨大投入的浪费。而总结性评估则是在产品临近完成时进行,主要用于发现和修复一些收尾性的bug或进行微调。文章强调,一个不重视形成性评估的团队,很难真正做到以用户为中心。同时,这些原则也适用于思想和观点的早期澄清,及时的反馈和修正远比后期补救更为有效和经济。

✨ 形成性评估是用户体验工程的基石,应在产品开发的最早期阶段(甚至纸面原型阶段)就启动,并贯穿整个开发周期。其核心价值在于通过持续的用户反馈和可用性分析,及时发现设计中的潜在问题和概念性缺陷,从而指导和优化设计与实现决策,即使这意味着需要推翻部分工作或进行重大调整,也能在投入大量开发资源之前进行,从而大大降低修改成本和风险。

🎯 总结性评估通常在产品开发接近尾声时进行,其主要目的是对已基本完成的产品进行最终的评估,以发现一些遗留的bug或进行微小的调整。然而,文章明确指出,总结性评估对于纠正产品中的根本性设计缺陷是无效的,因为它发生的时间点太晚,已经错过了进行重大设计决策和架构调整的最佳时机,无法从根本上提升产品的可用性。

💡 文中指出,将形成性评估的原则应用于思想和观点的评估同样至关重要。对于新提出的想法或观点,应在早期就进行澄清、讨论和验证,及时发现其不一致性、潜在假设问题或与读者的理解偏差。这种“立即”的反馈机制,能够以最小的成本进行纠错和方向调整,避免想法在未经充分检验的情况下被广泛接受并成为后续讨论的基础,从而防止错误的累积和无效的开发。

💬 在评论或思想交流的语境下,并非所有反馈都是完整的“评估”。诸如“请举例说明”或“您这句话是什么意思”这类问题,应被视为构成整体评估过程的组成部分,而非独立的、可直接进行价值判断的“评估”。这些看似零散的提问,实则对澄清概念、指出关键遗漏点起着至关重要的作用,它们能高效地引导讨论方向,对改进和完善核心观点做出重要贡献,其价值体现在对整体评估过程的积极推动作用上。

Published on July 22, 2025 5:36 PM GMT

(This is a series of comments that have been turned into a post.)

In the field of usability engineering, there are two kinds of usability evaluations: formative and summative.

Formative evaluations are done as early as possible. Not just “before the product is shipped”, but before it’s in beta, or in alpha, or in pre-alpha; before there’s any code—as soon as there’s anything at all that you can show to users (even paper prototypes), or apply heuristic analysis to, you start doing formative evaluations. Then you keep doing them, on each new prototype, on each new feature, continuously—and the results of these evaluations should inform design and implementation decisions at each step. Sometimes (indeed, often) a formative evaluation will reveal that you’re going down the wrong path, and need to throw out a bunch of work and start over; or the evaluation will reveal some deep conceptual or practical problem, which may require substantial re-thinking and re-planning. That’s the point of doing formative evaluations; you want to find out about these problems as soon as possible, not after you’ve invested a ton of development resources (which you’ll be understandably reluctant to scrap).

Summative evaluations are done at or near the end of the development process, where you’re evaluating what is essentially a finished product. You might uncover some last-minute bugs to be fixed; you might tweak some things here and there. (In theory, a summative evaluation may lead to a decision not to ship a product at all. In practice, this doesn’t really happen.)

It is an accepted truism among usability professionals that any company, org, or development team that only or mostly does summative evaluations, and neglects or disdains formative evaluations, is not serious about usability.

Summative evaluations are useless for correcting serious flaws. (That is not their purpose.) They can’t be used to steer your development process toward the optimal design—how could they? By the time you do your summative evaluation, it’s far too late to make any consequential design decisions. You’ve already got a finished design, a chosen and built architecture, and overall a mostly, or even entirely, finished product. You cannot simply “bolt usability onto” a poorly-designed piece of software or hardware or anything. It’s got to be designed with usability in mind from the ground up. And you need formative evaluation for that.


The same principles apply when evaluating, not products, but ideas.

The time for clarifications like “what did you mean by this word” or “can you give a real-world example” is immediately.

The time for pointing out problems with basic underlying assumptions or mistakes in motivating ideas is immediately.

The time for figuring out whether the ideas or claims in a post are even coherent, or falsifiable, or whether readers even agree on what the post is saying, is immediately.

Immediately—before an idea is absorbed into the local culture, before it becomes the foundation of a dozen more posts that build on it as an assumption, before it balloons into a whole “sequence”—when there’s still time to say “oops” with minimal cost, to course-correct, to notice important caveats or important implications, to avoid pitfalls of terminology, or (in some cases) to throw the whole thing out, shrug, and say “ah well, back to the drawing board”.

To only start doing all of this many months later, is way, way too late.


Note that “Formative evaluations” need not be (indeed, will rarely be) complete works by single contributors. In intellectual discussion as in usability engineering, evaluations will often be collaborative efforts, contributed to by multiple commenters.

And, accordingly, evaluations (whether formative or summative) are made of parts. If a commenter writes “what are some examples”, or “what did you mean by that word?”, or any such thing, that’s not an evaluation. That’s a small contribution to a collaborative process of evaluation. It makes no sense at all to judge the value of such a comment by comparing it to some complete analysis. That is much like saying that a piston is of low value compared to an automobile. We’re not being presented with one of each and then asked to choose which one to take—the automobile or the piston. We’re here to make automobiles out of pistons (and many other parts besides).

Such parts, then, must be judged on their effectiveness as part of the whole—how effectively they contribute to the process of constructing the overall evaluation. A question like “what are examples of this concept you describe?”, or “what does this word, which is central to your post, actually mean?”, are contributions with an unusually high density of value; they offer a very high return on investment. They have the virtuous property of spending very few words to achieve the effect of pointing to a critical lacuna in the discussion, thus efficiently selecting—out of the many, many things which may potentially be discussed in the comments under a post—a particular avenue of discussion which is among the most likely to clarify, correct, and otherwise improve the ideas in the post.


Of course, reviews serve a purpose as well. So do summative evaluations.

But if our only real evaluations are the summative ones, then we are not serious about wanting to be less wrong.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

用户体验评估 形成性评估 总结性评估 产品开发 设计决策
相关文章