Jaideep Singh

RenderWeb is an experiment in what UI looks like when the model is the renderer. You type a topic, the screen becomes a page, and every click generates the next one — live. No HTML, no DOM, no layout engine. Just a stream of frames and a cursor.

// Type a topic — the page is generated, not composed.

## The lineage

The idea that an interface could be sampled from a model, not composed from primitives, starts with Ha and Schmidhuber's World Models (2018) — policies trained entirely inside a hallucinated dream. Hafner's Dreamer line hardened it. Google's GameNGen (2024) fine-tuned Stable Diffusion to predict the next DOOM frame from the last click. DeepMind's Genie 3 (2025) pushed it to 720p at 24 fps with promptable world events. Oasis (Decart + Etched) shipped the product-shape version. Zhuge and Schmidhuber's Neural Computers paper (April 2026) names what everyone is now chasing: a machine form that absorbs compute, memory, and I/O into a single learned runtime.

RenderWeb lives in the second camp of generative UI — the one that generates pixels, not HTML. v0, Claude Artifacts, and Websim still hand a DOM to a browser. GameNGen, Genie, Oasis, and Flipbook hand a frame buffer to a video element. Mechanically different. Often conflated.

## The thesis

Karpathy framed Software 3.0 as "LLM prompts in English" — weights replacing algorithms. RenderWeb carries that one step further: if weights can replace algorithms, they can replace the rendering pipeline itself. Late binding taken to its limit — the UI only exists at display time.

Hassabis puts it plainly: "these are not games or videos, they're worlds." A page generated this way isn't a page. It's a frame in a world that happens to look like one.

// Every click is an action token. The next frame is the next page.

## Why an encyclopedia

Wikipedia rabbit holes are already an infinite-depth activity. Every entry has a predictable visual grammar — hero, infobox, sections, "see also" — which gives the model a shape to learn and hold. Topics are unbounded, so "infinite" is honest. The content matches the UX: clicks go deeper, the model renders deeper.

## What breaks when you remove the renderer

Memory. Models drift. Genie 3 holds a few minutes; Oasis forgets when you turn around; GameNGen uses three seconds of context. Without attention sinks, pose-indexed KV retrieval, or explicit session state, a return trip to a page renders a different page.

Economics. Per-session generation is six to eight orders of magnitude more expensive than serving a static HTML page. Sora 2 was shut down in March 2026 on roughly $15M/day in costs. Interactive pixel-gen only survives through distillation, purpose-built silicon, or a business model that captures more per session than ad-supported pages do.

Semantics. A click is a pixel coordinate, not a DOM event. The gap between "user clicked the gills" and "model renders something plausible" has to close inside a 40 ms frame budget — Molmo for pointing, SAM 2 for segmentation, a serialized action embedding for the world model to attend over.

Everything else. URLs don't round-trip. Pages can't be re-visited in the same form. Screen readers have no DOM to read. The entire contract the web has with its users — linkability, reproducibility, accessibility, citation, caching — evaporates. The bet is that the contract was never the point; the interface was.

## Close

Every previous interface revolution — Sketchpad, the GUI, the web, mobile — replaced a committed substrate with a more malleable one. RenderWeb is a small bet that weights are the next substrate, and that the interface can be grown from data instead of composed from primitives.

It probably won't replace Chrome. It might anticipate a class of interfaces whose elements don't exist until the moment they're needed.

// inspired by Zain Shah's Flipbook — the clearest instantiation yet of the model-is-the-computer thesis.