少点错误 2024年08月14日
[LDSL#6] When is quantification needed, and when is it hard?
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了将排序转换为数值的常见方法,例如 Elo 评分和项目反应理论,并分析了这些方法背后的原理。作者认为,这些方法基于对数尺度,因为它假设测量误差与测量值无关,而线性尺度中误差通常与测量值成正比。此外,对数尺度允许我们通过加法来推断大型事物,而线性尺度则需要乘法,这在实际应用中可能无法满足需要。

🤔 **对数尺度和线性尺度:** Elo 评分和项目反应理论等方法将排序转换为数值,这些方法通常基于对数尺度,因为它们假设测量误差与测量值无关。与之相比,线性尺度中的测量误差通常与测量值成正比。

🧮 **对数尺度的优势:** 对数尺度允许我们通过加法来推断大型事物。例如,我们可以将不同商品的价格加起来,以了解生活成本的变化。而线性尺度则需要使用乘法,这在实际应用中可能无法满足需要。

⚖️ **选择合适的基数:** 选择合适的对数基数可能并不简单。例如,Elo 评分的定义似乎暗示了基数 10,但这可能会导致一些问题,因为 Elo 评分并不总是能够很好地反映获胜概率。

💡 **结论:** 对数尺度在测量和分析排序数据时可能比线性尺度更合理,因为它能够更好地处理测量误差,并允许我们通过加法来推断大型事物。

Published on August 13, 2024 8:39 PM GMT

This post is also available on my Substack.

In the previous post, I discussed the possibility of doing ordinal comparison between different entities. Yet usually when we think about measurement, we think of it as involving putting numbers to things, not just ranking them.

There are quite a few methods to convert rankings into numbers, such as Elo scores or item response theory. Different such methods generally yield extremely similar results, and are often based probabilities. For instance, in Chess, if two players’ ability differ by 400 Elo points, then the odds of the better player winning are 10:1.

Are these methods reasonable? Can they go wrong?

Quantified ordinals are ~log scales

If you look at the formula for Elo scores, it involves an exponential function, so each time the Elo changes by 400, the odds of winning changes by a factor of 10:1 (by definition). A similar principle applies to the probability of solving a difficult task as a function of IQ, if one uses item response theory.

Another way to see it is to look at the practical outcomes. For instance, IQ appears to be exponentially related to income.

But I think the most fundamental way to understand it is that these methods assume that measurement error is independent of the measured quantity, such that there is the same amount of error in the ranking of the best as in the ranking of the worst. In linear scales, measurement error is usually proportional to the measured quantity, so in order for it to be independent, one must take the logarithm.

Log scales need a base for addition

It might be tempting to think that a log scale is equivalent to a linear scale, since you can just take the exponential of it. However, there are many different exponential functions: 2^x, e^x, 10^x, ….

If n=a^x, and m=a^y, then nm=a^(x+y). So we can add log-scaled numbers, and this simply corresponds to multiplying their linearly-scaled values, even if we don’t know what base is most natural.

However, there is no corresponding expression for n+m. We can at best give some limiting values, e.g. as (y-x)ln(a) goes to infinity, n+m approaches a^y. We can use this limit to approximate n+m as a^max(x, y), but this approximation fails badly when you start summing up tons of values of similar size. (In practical terms, a tank plus a speck of dust is equal to a tank, but an apple and an orange is more than just an apple or just an orange.)

Addition lets you infer large things from an enumeration of small things

You need addition to even know whether there seems to be a problem worth performing inference on in the first place. For instance, imagine that the prices of various goods have changed. If you have some idea of living standards that must be achieved, you can add up the prices of the goods needed to achieve these living standards, in order to see if the cost of living has changed.

Could you have done this purely multiplicatively, by e.g. taking the geometric mean of the relative changes in each type of good? (Or equivalently, by averaging the changes in the logarithms of the prices?) No, because the price of a few big things (e.g. housing) might have gone one way, while the price of many small things might have gone the other way. A geometric mean of relative changes would ignore the magnitude of the prices, and instead mainly consider the number of goods.

Picking the right base might not be trivial

Picking the right base might seem trivial. For instance, the definition of Elo scores seems to imply that n=10^(x/400). Given this linearization, the linear scores are directly proportional to the probability of winning a match: in a game between a player with Elo x, and a player with Elo y, if we let n=10^(x/400) and m=10^(y/400), then the probability of the first player winning is n/(n+m).

This is relatively sensible, but the issue is that it is not compositional; for instance the scores will not be proportional to the probability of winning two matches. This probability would instead by (n/(n+m))^2 = n^2/(n^2 + m^2 + 2nm). Similarly, the Elo score presumably does not divide neatly when considering things like the probability of making a good move.

This lack of compositionality undermines the point of picking a base, because we wanted to pick a base in order to sum different things together.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

量化排序 Elo 评分 项目反应理论 对数尺度 线性尺度
相关文章