On Paying Attention

Published on July 2, 2025 9:52 PM GMT

Problems often persist not because they are hard, but because nobody's paying attention to them. These eight fictitious studies are sketches of what paying attention can look like in practice.

The Well-Dressed Intern

Aaron wanted to make a strong impression in his first week as an intern, optimizing his appearance to be taken seriously. He aimed to both fit into the office culture and subtly signal his status as an intern, making it significantly easier to ask so-called "stupid questions."

During his interviews, Aaron paid careful attention to the office's general dress code. He noted the typical button-downs and khakis, but also observed senior associates leaning towards slightly more formal attire.

Based on these observations, Aaron devised a specific strategy: a blue suit, two shades lighter than standard navy for his first day, transitioning to the office's casual norm (button-down and khakis) from the second week. He'd still wear the blue jacket into the office each morning, and leave it hanging on the back of his chair. This lighter blue was a deliberate signal, aiming for an impression of professional effort without becoming over-formal and burdensome.

The strategy worked pretty well. Colleagues modelled him as appropriately professional, yet didn't mind his approaches. When he asked basic questions, they responded helpfully rather than dismissively. Within three weeks, he'd effectively used his attire to shape initial perceptions, establishing himself as competent, and also still learning.

The Family Planners

Bea and her spouse, parents to a two-year-old daughter, were trying to decide whether to have more children. There was no miscommunication, or hidden preferences; they'd discussed it extensively and understood each other's positions perfectly, having paid close attention to each other's arguments and feelings.

One felt a strong, non-negotiable desire for another child. They loved parenting, believed their daughter would benefit from a sibling, and imagined their family feeling incomplete with just one.

The other felt overwhelmed raising just one child. They loved their daughter intensely, but found the sleep deprivation, constant vigilance, and reduced autonomy exhausting. They worried about finances, their relationship, and their capacity to be a good parent to multiple children simultaneously, and were honestly concerned about their emotional and practical limits.

They explored every avenue: parenting books, conversations with friends, couples therapy, and compromises like waiting a number of years. They examined whether their feelings reflected deeper, unarticulated values.

But ultimately, after thoughtful discussion and attentive consideration, this reduced to a binary decision. At least one person would be disappointed. Their understanding only clarified the stakes; it did not, and could not, resolve the fundamental incompatibility.

The Tensor Wrangler

Claire, engaged in machine learning research, repeatedly encountered frustrating indexing and broadcasting errors with high-dimensional NumPy tensors. She'd often spend twenty minutes debugging what should be a simple matrix multiplication, only to find she'd summed across the wrong dimension.

After losing many hours to these errors over many weeks, Claire finally admitted she needed help. She paid specific attention to the nature of these recurring issues, recognizing them as a fundamental difficulty in managing complex tensor operations. She then described her exact pain points to an ML-savvy friend.^[1]

"Oh, use einops. It lets you name your tensor dimensions explicitly instead of tracking indices. Seriously, try it. It just works."

Claire installed einops that afternoon. Instead of writing tensor.reshape(batch_size, -1, num_heads, head_dim).transpose(1, 2), she could now write rearrange(tensor, 'batch seq (heads dim) -> batch heads seq dim', heads=num_heads). The explicit dimension names clarified her intent, preventing errors before they could occur. This attentive reframing of her data structures dramatically reduced friction, and her research productivity doubled overnight.

The Resume Optimiser

Dave had been applying to software engineering jobs for three months, with frustrating results. He regularly advanced to final rounds but frequently faced rejections, often without useful feedback. More puzzling, he often didn't hear back from companies at all where he clearly met the job requirements.

After his twentieth rejection, Dave decided to analyze the pattern systematically. He created a spreadsheet tracking company types, response rates, and interview progression. He noticed the initial response rate varied significantly by company size and industry, even when technical requirements seemed identical, while his interview results seemed much more consistent.

Dave suspected his resume format might be the issue. He'd used various approaches—a visually striking design, a plain template optimized for automated scanning, or an academic-style CV. Each format sent different signals and potentially interacted differently with various hiring processes.

He decided to test this hypothesis systematically, reformatting his resume into different styles while keeping content identical, then running what he hoped was a controlled experiment across his next twenty applications.

The results were... inconclusive. Perhaps the response rate improved? And a few months later, he did secure an offer. But the job market is incredibly noisy—maybe those companies were just better fits, or he interviewed better due to increased confidence, or it was pure coincidence. One recruiter mentioned liking his resume format, but another claimed to ignore formatting entirely.

Dave took the job offer, but remained genuinely uncertain whether his resume optimization made any difference. The feedback loop in job searching was too delayed, too sparse, and too confounded by other variables to draw clear conclusions. He suspected the change helped, but he'll never truly know.

The Arrogant Mathematician

Eric decided to tackle one of mathematics' most famous open problems:^[2] improving the exponential upper bound of Ramsey's theorem, a challenge that had remained essentially unchanged since 1935. He was confident that previous mathematicians had simply been too cautious in their approaches.

Eric spent six months mastering the existing literature, then set out to push beyond what he saw as overly incremental progress. He was certain that bold new techniques and a willingness to take mathematical risks would achieve the breakthrough that decades of careful work had missed.

He tried several ambitious approaches: probabilistic arguments that attempted to circumvent classical barriers, algebraic constructions designed to exploit symmetries others had overlooked, and extremal methods based on novel constructions.

Each major approach ultimately hit the same walls, though Eric did manage to squeeze out some results. His probabilistic work produced a couple of technical lemmas that improved bounds on obscure variations of related problems by tiny constants. The algebraic constructions gave alternative proofs of things people already knew. They were perhaps slightly more elegant, but elegance doesn't count for much when you're trying to solve a problem that's stumped everyone for ninety years.

His extremal methods looked promising at first. Eric thought he'd found genuine insights that previous researchers had missed. But after three years of turning these insights into actual theorems, it became clear they were mostly just new ways of getting stuck at the same places everyone else got stuck.

Eric published his work in decent journals and earned a respectable reputation in extremal combinatorics. But "respectable" was not what he'd been aiming for. His papers were the kind that get cited in surveys and footnoted in textbooks, not the kind that make mathematicians stay up all night rethinking everything they thought they knew. The central problem remained exactly as hard as it had always been.

The Alignment Researcher

Florence chose to work in AI safety, convinced that aligning superintelligent systems was the most critical problem of her generation. She read extensively (papers on evals, sharp left turns, truthfulness research, sycophancy) and concluded that mechanistic interpretability was the most promising research direction.

Joining a top AI lab, she spent over a year intensely studying sparse autoencoders (SAEs), becoming a recognised expert in the field. She published papers, gave talks, and made genuine contributions.

However, Florence gradually noticed a critical problem. Despite her deep expertise, she was no closer to understanding how to align a superintelligent system. SAEs could sort of identify individual features^[3] in language models, but this felt akin to studying individual neurons when the challenge lay instead in understanding the behaviour of the entire brain.

After eighteen months, Florence decided to pivot her research focus within mechanistic interpretability. Some colleagues made similar internal pivots, while others left mech interp entirely for governance, policy, agent foundations, or other safety research. Despite everyone's sincere efforts and genuine progress in their respective areas, aligning superintelligent AI remains as difficult as ever.

The Coordinator

Georgia hosted monthly board game nights for eight friends, but she'd noticed a recurring problem: the first thirty minutes were consistently consumed by increasingly tense negotiations over which game to play. Some friends wanted complex strategy, others lighter social games, and a few had strong aversions to specific mechanics. Everyone arrived tired from work, and the discussion often felt more like conflict resolution than fun.

Georgia decided to solve this by moving game selection to their group chat during the week prior. People could propose options, discuss preferences openly, and vote on the final choice. Everyone arrived knowing the game and having contributed to the decision.

The immediate problem vanished completely. No more awkward debates when people simply wanted to relax and socialize. The group chat discussions were even enjoyable, since people got excited about upcoming games and explained their enthusiasm for particular options.

But eight months later, Georgia noticed an unintended consequence that troubled her. They'd settled into playing essentially the same rotation of four, maybe five games. The democratic pre-selection optimized for consensus, systematically favoring familiar options everyone found acceptable over new games that might be excellent but carried a risk of disappointment.

They were missing out on discovery. Several times, Georgia wanted to suggest a new game she'd heard great things about, but knew it would lose in a group vote to something already enjoyed. The group had become risk-averse in a way that seemed sensible but now felt limiting.

Georgia realized she'd solved the coordination problem but created a new dilemma: how to balance the comfort of known good options against the potential upside of exploration. She wasn't sure how to encourage more adventurous choices without reintroducing the original chaos.

The Conversation Starter

Harry had always been nervous about talking to strangers, especially women he found attractive. He knew, intellectually, what confident social interaction looked like. Maintain eye contact, be comfortable with silence, slow down, don't rush to fill every pause with nervous chatter. He'd read about it, observed it in others, even given advice about it to friends. But he'd never actually tried to implement these behaviours himself.

The gap between knowing and doing felt enormous. Harry assumed that attractive, socially successful people possessed some innate charisma he lacked, or that confident eye contact required years of practice to master. The behaviours seemed simultaneously obvious, and also impossibly difficult.

At a house party, Harry made an arbitrary decision to have a go. He would try the most basic version of what he already knew mattered: hold eye contact, slow down everything, don't feel the need to fill every conversational lull. No elaborate techniques, no pickup artist scripts. Just the fundamentals, that he'd always known but never understood.

He approached a woman he'd been wanting to talk to all evening, expecting awkwardness, fumbling, maybe even an immediate rejection. Instead, something remarkable happened. The sustained eye contact created immediate intimacy. The comfortable silences allowed natural chemistry to build instead of being interrupted by his usual nervous babbling. She seemed genuinely engaged, laughed at his jokes, touched his arm while talking. Harry was sure he was doing a lot wrong, but given that he was making an attempt at the basics, none of that seemed to matter.

Harry was stunned. He'd constructed an elaborate narrative about his social deficits, assuming improvement would require developing skills he didn't possess. Instead, the gap between his current behavior and effective behavior had been smaller than he'd imagined. He already knew what to do. He just hadn't been doing it.

Conclusion

A recurring pattern in these examples is that paying deliberate attention often makes things better. Through the straightforward act of noticing a situation exists, and subsequently choosing to engage with it, we can often make a surprising amount of progress.

Paying close attention isn’t always useful, but it is frequently enough to be worth trying by default. The failure mode is usually low-cost, and the upside is frequently high.

Many thinking tools can be seen as ways of formalising this move. Goal Factoring can be viewed as paying structured attention to one's values, Debugging as structured attention to failures, and Strategic Planning as structured attention to tradeoffs. But the core step, before any of these techniques can work, is noticing that something might be worth looking into.

Humans don't have infinite attention, and this, like all general advice, is vulnerable to Reverse Any Advice You Hear, but I nonetheless claim this step alone solves more than we expect, and that it’s easy to skip, easy to apply, and worth doing more of.

^[4]

^{^}
ML-savvy friendly Large Language Models also available.
^{^}
Notably, this open problem is no longer open! Details for the mathematicians in the audience:
Let the Ramsey number $R (k)$ be the minimum $n \in N$ such that every two-colouring of the edges of the complete graph $K_{n}$ on $n$ vertices contains a monochromatic copy of $K_{k}$ . We aim to find the lowest constant c such that we can prove $R (k) \leq c^{k}$ .
From Erdös (1935), we can easily prove the result for $c = 4$ .
Sahasrabudhe et. al. (2023) proved the result for $c = 4 - ϵ$ , where $ϵ = 2^{- 7}$ . This has more recently been improved, using the same techniques to $c = 3.78$ .
The lower bound remains (up to sub-exponential factors) at $R (k) > {\sqrt{2}}^{k}$ , which has had no exponential improvements since 1947 (Erdös).
^{^}
For certain meanings of 'feature'.
^{^}
General citations, links and inspirations:

Eight Short Studies on Excuses (Scott Alexander)
The Thing and the Symbolic Representation of the Thing (Zvi Mowshowitz)
Humans are not automatically strategic (Anna Salamon)
Various talks and conversations at LessOnline 2025, both on the object-level content that inspired the post, and various encouragements towards converting me from long-time lurker to poster.
Easy Mode/Hard Mode (Zvi Mowshowitz)
I don't like NumPy (Dynomight)
Should You Reverse Any Advice You Hear? (Scott Alexander)

Discuss

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签