少点错误 2024年09月05日
We Should Try to Directly Measure the Value of Scientific Papers
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

探讨如何评估科学研究的价值,包括识别不良科学、信息价值的定义及计算方法,以及实际应用中的挑战和解决方法。

🎯科学研究中存在诸多问题,评估科学论文或研究的价值时,可从多个方面进行批判,但此工作困难且耗时。科学家虽有领域内的隐性知识,但区分有用和无用研究对非专业人士来说很困难。

📄科学论文/研究是社会信息收集的一种形式,其价值可用决策理论中的信息价值(VOI)公式衡量。通过具体例子展示了VOI的计算方法,即考虑研究结果对行动的影响。

🛠️应用VOI方法衡量论文/研究的价值需三个条件:对论文结果的先验概率有清晰估计;有受论文结果影响的合理决策;决策结果需通过某种度量进行比较。每个条件都存在一定挑战,但应努力克服。

💡作者认为VOI框架是思考科学工作价值的正确方式,虽未被明确测量和解释,但能将许多评估置于共同语言中,使非专业人士更易理解。

Published on September 5, 2024 9:08 AM GMT

(Epistemic Status: I have never published an academic paper or been involved in grantmaking for academic work, so my perspective on current practices is limited. Still, I think the basic idea I am proposing is clear and straightforward enough to overcome that limitation.)

I spend a decent amount of time reading/listening to articles/podcasts from places like Astral Codex Ten, Ben Recht’s arg min, and The Studies Show, all of which explore problems with scientific research. In some cases, the issues discussed involve sexy topics like data fraud or p-hacking, but, much more often, papers will fall into the category described in this great post, where the problem is less misconduct (though there is a lot of that) but more that the research is essentially worthless.

How can we recognize bad science?

When assessing the value of a scientific paper or study, there are often many different critiques you can levy. You can question the study's power, the generalizability of the results, whether there were unobserved confounders, the randomization methods, the theoretical structure, the measurement method, the mathematical representation of variables, the real-world interpretation of variables, the variability explained, and on and on. These critiques are vital to improving science, but, unfortunately, while it is essential to engage on these points, the work is difficult, time-consuming, often unrewarded, and, as a result, severely out-scaled by the volume of bad papers.

From an epistemic perspective, this approach also creates a horrible situation where distinguishing useful and important work from trivialities is extremely demanding on anyone outside of a specific field or subfield. As a result, naive readers (and the media) end up believing the claims of bad papers, and skeptical readers end up disbelieving even potentially good ones.

Luckily, although comprehensive or systematic criticisms of papers are difficult, scientists (and others) have access to a high degree of tacit knowledge about their fields, the methods that work, and the plausibility of results, all of which they can and do leverage when evaluating new studies. In more Bayesian terms, they often have strong priors about whether papers are making meaningful contributions, which we could hopefully elicit directly, without needing specific, formal critiques.

The value of information

Scientific papers/studies are a form of societal information gathering, and their value comes from the same place as the value of any information: the ability to make better choices between options. 

We can then codify and measure the (Expected) Value of Information (VOI) for a paper/study with this standard formula from decision theory:

VOI = Expected Value of Actions Given Information - Expected Value of Actions Without Information

Looking at this formula, we can see a clear pragmatic definition of the worth of a scientific paper. If nobody will change how they act or react, regardless of the paper’s specific results, then it has no value. If people will change their actions in response to the paper’s specific results, then the value of the paper is precisely equal to the (expected) improvement in those actions.

Let’s run through an example. Suppose Alice is a regulator at a simplified FDA, deciding whether to approve a new drug called Canwalkinol that is designed to cure 100 people of a disease that makes them unable to walk. Currently, Alice thinks there is a 30% probability that Canwalkinol is deadly (i.e. too dangerous to approve) and a 70% chance that it is not dangerous. Alice’s current plan is to not approve the drug, an action with an expected value of ‘100 lives without the ability to walk.’ If a study comes along that can perfectly demonstrate whether Canwalkinol is dangerous, then Alice will be able to make a perfectly informed decision. From Alice’s perspective, that study would have a 70% chance of showing that Canwalkinol is safe, allowing her to approve the drug and 100 people to be able to walk, and a 30% chance of showing Canwalkinol is deadly, in which case she does not approve the drug. We can calculate the value of this study as follows:

Value of Study = Expected Value of Action Given Study - Expected Value of Action Without Study

If we let Vwalk be the value of ‘one life with the ability to walk’ and Vnot be the value of ‘one life without the ability to walk’, then we can derive the following:

Expected Value of Action Given Study = 70% 100 Vwalk (if the drug is safe) + 30% 100 Vnot (if the drug is not safe) = 70 Vwalk + 30 Vnot 

Expected Value of Action Without Study = 100 Vnot

Value of Study = Expected Value of Action Given Study - Expected Value of Action Without Study = (70 Vwalk + 30 Vnot) - 100 Vnot = 70 Vwalk - 70 Vnot

= value of curing 70 people and giving them the ability to walk

So we can see that, to Alice, the value of this particular paper/study is equal to curing 70 people of the disease.

How would this actually work?

There are three components required for it to be possible to measure the value of a paper/study using the VOI method, (1) there needs to be a clear estimate of the prior probability for the outcomes of the paper, (2) there needs to be a decision that is plausibly affected by the results of the paper, and (3) outcomes from that decision need to be comparable through some metric. Each of these presents some level of challenge to applying this method in practice, but I think that all of the difficulties can –and should– be overcome.

(1) Having good priors for paper results

In order to measure the expected value of a paper, we will need to have some estimate of the probabilities associated with each of the paper’s possible outcomes. Since these estimates are just probabilities on the different possible outcomes of the paper, methods for generating them can include any number of options, including prediction markets, forecasting tournaments, surveying forecasters, surveying experts in the field, etc. There is nothing particularly novel about producing assessments for this project relative to any other, but, as always, it is necessary to make sure to properly incentivize accuracy and induce participation

(2) Finding relevant decisions

I think this issue is the most difficult to address, but the challenges associated with it are, to a substantial degree, a reflection of the problems with scientific papers themselves. As I said above, if there are truly no decisions that will depend on the outcome of a paper, then the paper does not have value. An inability to find relevant decisions for papers is often a strength of this method, rather than a weakness since it allows for clearly distinguishing between valuable and inconsequential contributions to science.

Still, what decisions might be acceptable? I think it is sensible to be agnostic on this question, for the basic reason that value calculations should be able to speak for themselves. It doesn’t really matter what a decision itself is, since we should be able to just compare the improvement in final outcomes (such as lives saved or study methodologies updated) from changing the decision.

(3) Making decision outcomes comparable

One benefit of an idealized version of the VOI framework would be the direct comparability of the value of different papers and the ability to prioritize both between papers/studies and between science and other uses of resources. Unfortunately, our utility detectors are stuck in the mail, so VOI estimates will need to rely on a diverse set of metrics based on the decisions affected. Still, I think this should be fine. People are pretty good at comparing the value of different outcomes, especially in cases where useful differences are likely to be large.

Final Thoughts

I wrote this post because I think that the VOI framework is the correct way to think about the value of scientific work theoretically, but that it is not measured or explained as explicitly as it can and should be. Many people complain about the quality of studies or have intuitions that some fields are unrigorous nonsense, but often these criticisms can seem ad hoc, specific to a given paper/study or methodology, or just difficult to evaluate. Explicitly using measures of VOI can put many of these assessments in a common language and make their contributions more interpretable to people outside the given field.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

科学研究 信息价值 VOI框架 评估方法
相关文章