少点错误 2024年09月16日
Reinforcement Learning from Market Feedback, and other uses of information markets
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了利用大型语言模型(LLM)构建信息市场的可能性,以解决信息不对称问题。作者提出了一种基于LLM的“信息集市”概念,通过将信息交易委托给LLM代理,可以有效地降低信息获取成本,并解决信息市场中存在的“买方检验悖论”。

🤔 **买方检验悖论与信息不对称**:传统的市场中,买方可以通过检验商品来判断其质量,但在信息市场中,买方无法在获取信息前对信息进行检验。一旦获取信息,就无法退回。这种“买方检验悖论”导致了信息市场中普遍存在的信息不对称问题,买方难以获得可靠的信息,从而导致市场效率低下。

🤖 **LLM代理的优势:**本文提出将LLM作为代理,代替人类进行信息交易,可以有效解决信息不对称问题。LLM代理可以根据买方的需求,从信息市场中筛选出最相关的信息,并且能够根据买方的需求进行信息处理和分析,最终为买方提供更加准确和可靠的信息。

💰 **信息市场的应用场景:**作者提出,基于LLM的信息市场可以应用于多个领域,例如知识产权保护、预测市场、强化学习等。通过构建一个由LLM代理组成的信息市场,可以有效地提高市场效率,并解决市场中存在的各种信息不对称问题。

💡 **信息市场未来的发展方向:**本文还探讨了信息市场未来的发展方向,例如如何防止信息泄露、如何设计更有效的激励机制等。作者认为,随着LLM技术的不断发展,信息市场将迎来新的发展机遇,并将在解决信息不对称问题方面发挥更加重要的作用。

Published on September 16, 2024 1:04 AM GMT

Markets for information are inefficient, in large part due to the Buyer’s Inspection Paradox: you can’t “inspect” information like you would any other good before buying — the moment you inspect the information, you have obtained it and cannot return it. More generally, the problem is an inability to reliably commit to forgetting.

A comment by John Wentworth mentioned that you can use amnestics to overcome this: make your purchase decision while under the influence of an amnestic that blocks your mind’s ability to write to long-term memory.

When you read this, it’s hard not to immediately think of LLMs, which can make purchase decisions without committing anything to long-term memory.[1] Apparently the authors of Language Models Can Reduce Asymmetry in Information Markets (2024) had the same idea, and propose the “Information Bazaar”: a digital marketplace in which LLM agents trade information on behalf of external human principals.

I think the implications of this idea are quite promising and underrated. In particular this can solve:

If you are familiar with Yoram Barzel’s view that “the key friction in society is information” — or that Paul Christiano’s HCH-style alignment proposals hinge on “verification is generally easy”, then you might see how this can be pretty big.

Google ads but for information

The main cause of most market failures is information asymmetry: many quality improvements to goods are never done because buyers cannot verify them; many goods aren’t even produced because buyers cannot verify their quality.

There might be specific pieces of information that will influence the buyer’s decision: e.g. “in a randomized test of 1000 of these appliances, only 5 were faulty!”. But also crucially, the buyer cannot verify the quality of this information: there may be information that will influence the buyer’s decision to buy this information: “The industry average for such faults is 1/1000” “There is no such study this is fake news” “I am the author of that study and I approve this message, here’s my signature”.

Here’s a sketch of how you could implement a marketplace for such information, with each agent LLM recursively spinning off its own agents to consider such sub-information:

class Buyer:  goal: str   wealth: float   info_processing_cost: float # could be a function instead       def call(self) -> tuple[list[str], bool]:        # initialize info_collected     info_collected = []             # tell information agents your goals and get info offers from them      info_offers = Arena.offer_info["self"]      top_offer = max(info_offer, key=lambda offer, offer.bid)                if top_offer.bid > info_processing_cost:         # charge winning advertiser for cost of considering info            self.wealth += top_offer.bid            top_offer.parent.wealth -= top_offer.bid                        # spin off contractor agent to decide if to buy top offer           contractor = Buyer(             goal=DecideToBuy(top_offer),                wealth=self.wealth,             info_processing_cost = self.info_processing_cost            )           info_collected, decision = contractor()                     if decision:                # buy the info              info_collected.append(top_offer.info)               self.wealth -= top_offer.price              top_offer.parent.wealth += top_offer.price              return info_collected, self.decide(info=info_collected)     def decide(self, info: list[str]):      ... # some intelligent behaviourclass Informer:     wealth: float       @dataclass  class InfoOffer:        bid: float      price: float        info: str       parent: Informer        @property   def offer_info(self) -> dict[Buyer, InfoOffer]:      # some daemon that monitors the arena for places it could       # be useful, and advertises its info there      ...class Arena: buyers: list[Buyer]    informers: list[Informer]    @property    def offer_info(self):      info_offers = {buyer: [] for buyer in self.buyers}      for informer in self.informers:         for buyer in info_offers:               info_offers["buyer"].append(informer.offer_info)        return infooffers

(For simplicity I have pretended that the buyer can only process one piece of information at a time — of course, it can instead have multiple attention slots to auction to advertisers)

You could imagine this being implemented e.g. on Amazon. When you go to buy something, an “information ad” can pop up (its contents invisible to you). You have your LLM agent look at them and decide whether to buy these information ads or not (which will make them visible to you) — recursively, the LLM agent gets information ads informing its decisions and so on — and make your final decision based on all the information acquired.

It is also straightforward to see how this could be applied to say, fact-checking/community notes, or for recommender algorithms.

Reinforcement learning from market feedback

Basically this exact protocol can be used as an RLHF alternative. The “buyers” are human raters (and LLM agents employed thereof) who have to solve some problems; the “informer” is the LLM being trained. The informer is given the buyer’s goal as context/input, and returns its output in the form of an InfoOffer; the wealth updates are used as rewards.

This is basically a generalized form of Debate, except with proper incentives for the human rater.

This can also be used for benchmarking.

IP rights

Similarly, you can have an idea market. Your “goal” here might be something like “I want to find a well-defined and impactful problem I can write a solid research paper on in 3 months, given my CV and background” or “I want an AI start-up idea to work on”.

Prediction markets

The “subsidy parameter” in LMSR can be understood as the “price of information”, but this is entirely a positive externality. In particular, it prevents us from having “deep” prediction markets, where intermediate agents would subsidize markets for subsidiary relevant information etc.

(Perhaps this can be mitigated by Latent Variable Prediction Markets or Combinatorial Prediction Markets, but I’m not sufficiently familiar with those.)

But again, it is solved by our recursive information markets: the buyer’s “goal” is now simply his probability for some forecasting question.


Positive externalities

Not everything is solved.

Buyer’s inspection is one of two problems with information markets, the other being positive externalities (it’s hard to prevent information from “leaking” outside your property).

Even if you ensure (via legal enforcement, or by having the whole decision done by an LLM buyer) that people don’t leak the information they buy, the decision of whether you buy some information will itself correlate with the information. For example: you are much more likely to buy information confirming that big foot is real, than information rejecting it.

I’m not sure how big of a problem this is. My initial impression is that (1) it can be mitigated for applications like Reinforcement Learning from Market Feedback where we can just control what information is received by who, and (2) it is much less a problem for ideas than for answers and proofs, because the search space in the former is larger — from a framing I like:

It seems to me there are actually three sorts of information in the world:

    "Ideas": math/science theories and models, inventions, business ideas, solutions to open-ended problems :: search"Answers": math theorems, experimental observations, results of computations :: inference"Proofs": math proofs, arguments, evidence, digital signatures, certifications, reputations, signalling :: alignment

There’s also the fact that e.g. Barzel believes that all transaction costs and lack of definition in property rights (and therefore externalities) are fundamentally about information. I’m not sure how to evaluate this claim though.

Useful reading

stuff that I haven’t read or properly internalized myself are marked with [???]

Footnotes


  1. More generally I would say, something people miss about LLMs is that they aren’t just cheaper, more reliable humans: intelligence is now decoupled from the human “architecture”. Things like memory, train-of-thought, continual learning, inability to insulate from outside info. This opens up new opportunities in epistemics — questions like counterfactuals and “What would I think if I didn’t know info X?” are now meaningful — and institutions. ↩︎



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LLM 信息市场 信息不对称 买方检验悖论 人工智能
相关文章