Since that post, a number of changes have been made to the scaffold. The major ones are:
Instead of using colored squares on the game screenshots, information is printed as text, ex. "Impassable", "Explored", "Check Here"Models are seemingly helped by putting relevant information blatantly in the spot they need to see it, rather than indirectly via a legend or instructions or whateverFor some reason it helps if you write "CHECK HERE" on every unexplored tile.Automatically-updating ASCII collision map given to LLMGenerated by codeUses numbers indicating how many moves away each tile isBehold, Pewter City.Improved prompts for "Critique Claude"/"Guide Gemini"/"Oversight o3"Prompt 1: Given a bunch of facts about the current game state and instructions on what is trustworthy and what's not, make a summarythis is an attempt to get the model to grasp reality better, telling it what sources of information it should basically always trust (data from game's RAM), mostly trust (its own knowledge of the game from training), not trust (map labels it made itself), and mostly distrust (its own vision)Prompt 2: Look at output from prompt 1 and try to remove inconsistenciesPrompt 3: OK now talk to the model you're critiquingModels encouraged to use a "mark_checkpoint" tool to maintain a running list of major checkpoints (Left House, Beat Misty, died to Brock, etc.)"detailed_navigation" tool which, if called, calls an alternate model that basically rolls around trying to explore + DFS but isn't told what the goal is (but is told to talk to NPCs and exit maps)Autopathing tool that can travel to known coordinates on the map
All of this helps somewhat but doesn't make LLMs amazing at Pokémon by any means.