The trigger is the bottleneck

2026-05-19

A 60-minute case study in cross-domain research with Opus 4.7.

I had a one-line idea I'd been sitting on: use ARM's CHERI capability extensions to redesign Java's garbage collector and eliminate stop-the-world pauses. CHERI gives you hardware-checked, tagged pointers. ZGC and Shenandoah spend a meaningful slice of throughput on software load barriers. The cross-domain link writes itself.

I gave the idea to Claude Opus 4.7. Sixty minutes later, I had what would normally be a week of literature review, a delta analysis against six relevant papers, and a list of specific reasons my framing was wrong.

This post is about the shape of that collaboration, not the GC project.

What the model did in an hour

It immediately said the idea wasn't new. Within the first turn it named CHERIvoke (MICRO 2019) and Cornucopia (Oakland 2020) — the existing line of work on using CHERI capability revocation for C/C++ temporal safety. I asked for a delta analysis. It went and pulled the current state:

Then it decomposed my one-line idea into four sub-claims and showed me three were already taken:

ClaimTaken by
Use CHERI revocation to reclaim memoryCHERIvoke / Cornucopia
Make revocation concurrent via hardware load barriersCornucopia Reloaded
Run a JVM GC on CHERI / MorelloMOJO
Use CHERI primitives as the load-barrier mechanism for a concurrent moving GCOpen

The fourth one is open because Cornucopia-line work invalidates capabilities (clears the tag); Java's moving collectors forward them (rewrite the target). CHERI caps are immutable. That's an actual unsolved subproblem with three plausible paths (page-level remap, a new ISA op, sealed-cap trampoline), and the model laid out the trade-offs of each.

Where it pushed back

This part matters more than the literature review.

I framed the project as "completely eliminate GC pauses." The model said no:

Drop any "completely eliminate STW" framing — ZGC is already sub-ms, and CHERI doesn't help root scanning (the remaining pause source). The claim doesn't survive close scrutiny.

Then it narrowed the novel slice for me. Out of the four sub-claims above, only the fourth — CHERI primitives as the load-barrier mechanism for a moving GC — is actually open territory. The other three would be reinventing CHERIvoke / Cornucopia / MOJO with a thinner story. None of this was flattering. All of it was correct.

A yes-and model is useless for this kind of work. The version that disagrees with the prompter — with reasons — is the one worth paying for.

What it couldn't do

The same hour didn't move the project forward in any way that matters for actually building it.

It can't get me Morello hardware. It doesn't know HotSpot's ZGC barrier slot internals in the depth the ten people who maintain that code do. It can't run SPECjbb2015 on a real CHERI machine. It can't do any of the actual engineering or measurement that proves the idea works on silicon.

The compression is entirely on the front half of a research project: scoping, prior art, positioning, kill-the-bad-claims. The back half — 12 to 18 months of systems engineering and measurement — is unchanged.

The collaboration shape

Old shape:

  1. Researcher reads ~30 papers across two fields (2 weeks)
  2. Realizes their idea overlaps with prior work, refines it (1 week)
  3. Writes up the novel slice — what you'd actually claim as new (1 week)
  4. Builds the thing (12–18 months)

New shape, with a sufficiently sharp human trigger:

  1. Human gives the model one cross-domain sentence (1 minute)
  2. Model returns prior art map + delta + the actual novel slice to write (1 hour)
  3. Human verifies, picks the framing, makes the bet (1 hour)
  4. Builds the thing (12–18 months, unchanged)

The first three steps compressed from a month to two hours. The last step is the same.

The asymmetric scarcity

Opus-4.7-class models now hold deep working knowledge across programming languages, computer architecture, distributed systems, ML, compilers, hardware security. The breadth is already there. What they cannot do is self-trigger — they don't naturally ask "what if I took capability hardware from field A and applied it to garbage collection in field B?" That kind of question lives outside the gradient they were trained on.

Most ideas given to LLMs aren't actually cross-domain. They're "make this code compile" or "explain this stack trace." Those are fine and useful, and the model is roughly a senior engineer at them. The interesting regime is when a human supplies a question the RL process never saw — a connection between two technical worlds that share no conference. There, the model is doing something closer to a 1-on-1 with a polymath colleague who has read every paper.

The scarce input is the human who has enough taste across enough fields to know which connections are worth probing. That taste used to mostly help its owner do better at one domain. In 2026 it's a multiplier on a different resource entirely.

Honest caveats

N=1. I had a specific cross-domain seed that happened to land near genuine open territory. Most "ideas" don't.

The literature review compression doesn't validate the idea. It located the open subspace and showed me which sub-claims to drop. The remaining novel slice — using CHERI primitives for moving GC — might still be wrong on contact with real hardware. Cornucopia caps are immutable, and "forward semantics on immutable capabilities" might turn out to require an ISA change that ARM's CHERI working group isn't going to ship.

Close

This isn't "AI replaces researchers." It's narrower and weirder: a specific kind of researcher — the one who already had the broad-vision, cross-domain question-asking habit — now has a roughly 10× amplifier on the front half of their workflow. The back half is exactly as hard as it was.

If you have that habit, the bottleneck has moved. It's not knowledge synthesis anymore. It's the rate at which you can ask sharp questions.