Recursive Language Models (RLM)

Context Folding for Long-Horizon AI Agents
Source: Prime Intellect | Summary by Amber
TL;DR: LLM agents are getting powerful but struggle with long contexts due to cost and performance degradation. Recursive Language Models (RLM) solve this by letting models actively manage their own context through Python code and sub-LLM calls, rather than stuffing everything into one massive prompt. Instead of summarizing (which loses info), RLMs delegate work programmatically—like a good manager distributing tasks.

The Problem: Context is Expensive

Modern LLM agents can implement complex changes across dozens of files autonomously. But this requires vast numbers of tokens, creating two critical problems:

  • Linear cost scaling: Per-token costs rise linearly with context length
  • Context rot: Even the best models' performance drops as contexts grow longer

Current solutions like Claude Code and OpenAI's Codex use scaffolding—a succession of agents connected by prompts and file states, with LLM summarization to compress context. But this is just one approach.

Context Folding: A Different Approach

Instead of external files and summaries, context folding manages the context window itself to keep it short while maintaining a continual, growing rollout. It's compatible with file-based scaffolding—from the outside, it just looks like a normal LLM.

Existing Context Folding Methods:

1. Scaling Long-Horizon LLM Agent via Context-Folding

The agent can branch its rollout and return from branches. Within a branch, it retains full context; after returning, only a self-chosen summary remains.

2. AgentFold

Every action produces both a result and a summary of the action and reasoning. Summaries can be hierarchical, consolidating lessons from multiple actions.

3. Agentic Context Engineering

A three-agent system: Generator (creates rollout), Reflector (takes lessons), Curator (adapts knowledge base).

The RLM Solution: Self-Managing Context

Prime Intellect believes the Recursive Language Model (RLM) is the simplest, most flexible method. Introduced by Alex Zhang in October 2025, now available as a full paper.

How RLM Works:

Rather than ingesting potentially huge input data directly, the RLM uses a persistent Python REPL to inspect and transform input, and call sub-LLMs from within Python.

❌ Traditional Approach

  • Stuff all data into context
  • Process everything sequentially
  • Summarize to compress
  • Lose information

✅ RLM Approach

  • Access data programmatically
  • Delegate to sub-LLMs
  • Search & filter with Python
  • Preserve all information

RLM Capabilities

  • No direct data loading: Huge inputs (PDFs, datasets, videos) don't clog the context—the model stays lean and avoids context rot
  • Python-powered filtering: Search, filter, and transform context using Python, avoiding redundant processing
  • Sub-LLM delegation: Spawn fresh instances of itself to perform work, piping specific data to them programmatically
  • Aligns with The Bitter Lesson: More in line with learned approaches than hand-crafted summarization strategies
  • No information loss: Never summarizes—delegates instead

Prime Intellect's Implementation

Available in their verifiers repository, with RLM-based environments on the Environments Hub.

Key Enhancements:

1. Tools Only for Sub-LLMs

Main RLM doesn't see tool output tokens—it delegates tool-using work to sub-LLMs. Many tools produce lots of tokens; this keeps the main model lean.

2. Parallelized Sub-LLM Calls

An llm_batch function processes multiple prompts in parallel, speeding up complex workflows.

3. Answer Via Environment Variable

The model provides its answer through a Python dictionary:

  • answer["content"]: Can be edited/deleted over multiple turns
  • answer["ready"]: Only when set to True does the rollout end

This enables diffusion-style generation of the final answer over the reasoning chain.

4. Any pip Package

Install what you need (numpy, scipy, sympy, etc.). Code executes in isolated Sandboxes.

5. Limited REPL Output

Only 8192 characters of REPL output shown to the RLM per turn (user-adjustable). Forces the model to use Python and sub-LLMs intelligently rather than dumping everything.

Why This Matters

Prime Intellect believes teaching models to manage their own context end-to-end through reinforcement learning will be the next major breakthrough, enabling agents to solve long-horizon tasks spanning weeks to months.

Current work focuses on ablations with the RLM scaffolding on existing models called through APIs. Future work will scale RLM training on environments that reward effective very long-horizon reasoning.

The RLM is powerful, flexible, strong at tool-use, and perfect for a world where context is a sparse resource.

The Big Picture

RLM represents a shift from "how do we compress context?" to "how do we teach models to actively manage context like a skilled developer?"

Instead of fighting context limits with bigger windows or lossy summaries, RLMs embrace the constraint and learn to work within it—delegating, filtering, and focusing programmatically. It's scaffolding that scales with learning, not just engineering.