Reinforcement Learning from Market Feedback, and other uses of information markets

Published on September 16, 2024 1:04 AM GMT

Markets for information are inefficient, in large part due to the Buyer’s Inspection Paradox: you can’t “inspect” information like you would any other good before buying — the moment you inspect the information, you have obtained it and cannot return it. More generally, the problem is an inability to reliably commit to forgetting.

A comment by John Wentworth mentioned that you can use amnestics to overcome this: make your purchase decision while under the influence of an amnestic that blocks your mind’s ability to write to long-term memory.

When you read this, it’s hard not to immediately think of LLMs, which can make purchase decisions without committing anything to long-term memory.^[1] Apparently the authors of Language Models Can Reduce Asymmetry in Information Markets (2024) had the same idea, and propose the “Information Bazaar”: a digital marketplace in which LLM agents trade information on behalf of external human principals.

I think the implications of this idea are quite promising and underrated. In particular this can solve:

Intellectual propertyPositive externalities of prediction marketsProper incentives for RLHFInformation asymmetry in real markets

If you are familiar with Yoram Barzel’s view that “the key friction in society is information” — or that Paul Christiano’s HCH-style alignment proposals hinge on “verification is generally easy”, then you might see how this can be pretty big.

Google ads but for information

The main cause of most market failures is information asymmetry: many quality improvements to goods are never done because buyers cannot verify them; many goods aren’t even produced because buyers cannot verify their quality.

There might be specific pieces of information that will influence the buyer’s decision: e.g. “in a randomized test of 1000 of these appliances, only 5 were faulty!”. But also crucially, the buyer cannot verify the quality of this information: there may be information that will influence the buyer’s decision to buy this information: “The industry average for such faults is 1/1000” “There is no such study this is fake news” “I am the author of that study and I approve this message, here’s my signature”.

Here’s a sketch of how you could implement a marketplace for such information, with each agent LLM recursively spinning off its own agents to consider such sub-information:

class Buyer:  goal: str   wealth: float   info_processing_cost: float # could be a function instead       def call(self) -> tuple[list[str], bool]:        # initialize info_collected     info_collected = []             # tell information agents your goals and get info offers from them      info_offers = Arena.offer_info["self"]      top_offer = max(info_offer, key=lambda offer, offer.bid)                if top_offer.bid > info_processing_cost:         # charge winning advertiser for cost of considering info            self.wealth += top_offer.bid            top_offer.parent.wealth -= top_offer.bid                        # spin off contractor agent to decide if to buy top offer           contractor = Buyer(             goal=DecideToBuy(top_offer),                wealth=self.wealth,             info_processing_cost = self.info_processing_cost            )           info_collected, decision = contractor()                     if decision:                # buy the info              info_collected.append(top_offer.info)               self.wealth -= top_offer.price              top_offer.parent.wealth += top_offer.price              return info_collected, self.decide(info=info_collected)     def decide(self, info: list[str]):      ... # some intelligent behaviourclass Informer:     wealth: float       @dataclass  class InfoOffer:        bid: float      price: float        info: str       parent: Informer        @property   def offer_info(self) -> dict[Buyer, InfoOffer]:      # some daemon that monitors the arena for places it could       # be useful, and advertises its info there      ...class Arena: buyers: list[Buyer]    informers: list[Informer]    @property    def offer_info(self):      info_offers = {buyer: [] for buyer in self.buyers}      for informer in self.informers:         for buyer in info_offers:               info_offers["buyer"].append(informer.offer_info)        return infooffers

(For simplicity I have pretended that the buyer can only process one piece of information at a time — of course, it can instead have multiple attention slots to auction to advertisers)

You could imagine this being implemented e.g. on Amazon. When you go to buy something, an “information ad” can pop up (its contents invisible to you). You have your LLM agent look at them and decide whether to buy these information ads or not (which will make them visible to you) — recursively, the LLM agent gets information ads informing its decisions and so on — and make your final decision based on all the information acquired.

It is also straightforward to see how this could be applied to say, fact-checking/community notes, or for recommender algorithms.

Reinforcement learning from market feedback

Basically this exact protocol can be used as an RLHF alternative. The “buyers” are human raters (and LLM agents employed thereof) who have to solve some problems; the “informer” is the LLM being trained. The informer is given the buyer’s goal as context/input, and returns its output in the form of an InfoOffer; the wealth updates are used as rewards.

This is basically a generalized form of Debate, except with proper incentives for the human rater.

This can also be used for benchmarking.

IP rights

Similarly, you can have an idea market. Your “goal” here might be something like “I want to find a well-defined and impactful problem I can write a solid research paper on in 3 months, given my CV and background” or “I want an AI start-up idea to work on”.

Prediction markets

The “subsidy parameter” in LMSR can be understood as the “price of information”, but this is entirely a positive externality. In particular, it prevents us from having “deep” prediction markets, where intermediate agents would subsidize markets for subsidiary relevant information etc.

(Perhaps this can be mitigated by Latent Variable Prediction Markets or Combinatorial Prediction Markets, but I’m not sufficiently familiar with those.)

But again, it is solved by our recursive information markets: the buyer’s “goal” is now simply his probability for some forecasting question.

Positive externalities

Not everything is solved.

Buyer’s inspection is one of two problems with information markets, the other being positive externalities (it’s hard to prevent information from “leaking” outside your property).

Even if you ensure (via legal enforcement, or by having the whole decision done by an LLM buyer) that people don’t leak the information they buy, the decision of whether you buy some information will itself correlate with the information. For example: you are much more likely to buy information confirming that big foot is real, than information rejecting it.

I’m not sure how big of a problem this is. My initial impression is that (1) it can be mitigated for applications like Reinforcement Learning from Market Feedback where we can just control what information is received by who, and (2) it is much less a problem for ideas than for answers and proofs, because the search space in the former is larger — from a framing I like:

It seems to me there are actually three sorts of information in the world:
"Ideas": math/science theories and models, inventions, business ideas, solutions to open-ended problems :: search"Answers": math theorems, experimental observations, results of computations :: inference"Proofs": math proofs, arguments, evidence, digital signatures, certifications, reputations, signalling :: alignment

There’s also the fact that e.g. Barzel believes that all transaction costs and lack of definition in property rights (and therefore externalities) are fundamentally about information. I’m not sure how to evaluate this claim though.

Useful reading

IP
Language models can reduce asymmetry in information markets (the Information Bazaar paper) by Nasim Rahaman, Martin Weiss et al (Mar 2024) + perhaps similar literature therein mentioned on openreview Some Experiments I'd Like Someone To Try With An Amnestic by John Wentworth (May 2024)IP+ like Barbed Wire? by Robin Hanson (Jul 2011)Rah efficient IP by Robin Hanon (Jul 2011)A note about differential technological development by Nate Soares (Jul 2022)
Information markets/prediction markets “depth”

Prediction markets, mechanism design and co-operative game theory by Vince Conitzer (2012)Information markets by eva

[???]

Latent variables for prediction markets

[???]

LMSR subsidy parameter is the price of information

blog post

Transaction costs: are they just costs?

tribute by Brian Albercht on Economic Forces

My AI model delta compared to Paul Christiano

Hyland and Gavenciak

[???]

Social Choice should guide AI alignment in dealing with Diverse Human Feedback

Axioms for AI alignment from Human Feedback

“Models of Human Feedback for AI alignment” ICML workshop

stuff that I haven’t read or properly internalized myself are marked with [???]

Footnotes

More generally I would say, something people miss about LLMs is that they aren’t just cheaper, more reliable humans: intelligence is now decoupled from the human “architecture”. Things like memory, train-of-thought, continual learning, inability to insulate from outside info. This opens up new opportunities in epistemics — questions like counterfactuals and “What would I think if I didn’t know info X?” are now meaningful — and institutions. ↩︎

Discuss

Google ads but for information

Reinforcement learning from market feedback

IP rights

Prediction markets

Positive externalities

Useful reading

Footnotes

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签