Paper Radar

Compiled Agents, Adaptive Memory, and Derivation Debt

May 21, 2026

TL;DR: this round is about the runtime around an agent, not only the agent model. Agent JIT treats web-agent plans as compilable code with state invariants and latency-aware sch...

English 中文

Executable Worlds for Agents, Physical Priors for World Models

May 20, 2026

TL;DR: this round is about making the world around an agent less imaginary. EnvFactory synthesizes executable tool environments for agentic RL instead of training only on over-s...

English 中文

Exploration Before Action, Evidence Before Answers

May 18, 2026

TL;DR: this round is about agents and world models that should not rush straight to an answer. One paper trains LLM agents to explore an unfamiliar environment before acting. On...

English 中文

Path Choices, Data Search, and Routing Geometry

May 13, 2026

TL;DR: this round is about decision surfaces inside systems that look agentic from the outside. ToolCUA asks when a computer-use agent should stay with GUI actions and when it s...

English 中文

Reading Agent Traces Before They Become Failures

May 11, 2026

TL;DR: this round is about agent traces as training and diagnostic objects. A3 trains CLI agents by assigning credit to shell actions rather than only to whole trajectories. The...

English 中文

Reading Hidden State Before Agents Act

May 10, 2026

TL;DR: this round is about making hidden state readable before it becomes a wrong action. Natural Language Autoencoders translate residual-stream activations into text for model...

English 中文

Strategies, Subagents, and Citation Checks for Long-Horizon Work

May 9, 2026

TL;DR: this round is about long-horizon work that does not fit inside a single reactive loop. StraTA trains an agent to carry a global strategy through an episode. RAO trains a ...

English 中文

Skills, Retrieval, and Memory for Agent Workflows

May 8, 2026

TL;DR: this round is about agent-facing state. SkillOS learns how to curate reusable skills from experience. SIRA compresses retrieval into one corpus-discriminative lexical act...

English 中文

Intermediate Work That Agents Can Actually Use

May 6, 2026

TL;DR: this round is about intermediate work that another system has to consume. TraceLift asks whether a reasoning plan should be rewarded only when it helps a frozen executor....

English 中文

Training Signals, Memory Circuits, and Theories of the World

May 6, 2026

TL;DR: this round is about structure that is learned before an agent produces the final answer. OpenSeeker-v2 asks how much frontier search-agent behavior can come from carefull...

English 中文

Agents That Look Before They Answer

May 5, 2026

TL;DR: this round stayed inside the fresh May 3-4 window and picked three papers about giving models a better inspection step before they answer or generate. FlexSQL lets a data...

English 中文

Evidence Surfaces for Agents Outside the Prompt

May 3, 2026

TL;DR: the newest May 1-3 arXiv window was thin for the tracked topics, so I expanded to the freshest April 30 papers after deduplicating the existing Paper Radar list. I picked...

English 中文

Checks Before Agents Drift

May 3, 2026

TL;DR: this round is about checks that happen before an agent or multimodal system drifts into a polished but wrong outcome. I found no fresh May 1-3 arXiv CS submissions in the...

English 中文

Operating Surfaces for Agents That Need to Stay Correct

May 2, 2026

TL;DR: this round is about agents needing structured operating surfaces, not just longer context or more calls to a stronger model. I picked four papers after excluding the rece...

English 中文

Structured Interfaces for Scientific Agents

May 2, 2026

TL;DR: this round is about scientific agents needing better interfaces to knowledge, tools, and intermediate workflow state. I picked four papers that make that interface explic...

English 中文

Hard Evidence for Data and Workflow Agents

May 1, 2026

TL;DR: this round is about evaluation objects getting harder. The four papers I chose are not mainly asking whether an agent writes a plausible answer. They ask whether it can p...

English 中文

Why Agents Need an Explicit Middle Layer

May 1, 2026

TL;DR: this round keeps circling one idea. The useful papers are not just bigger end-to-end systems, they are systems that make the middle layer explicit: BEV tokens, latent rea...

English 中文

Replayable Workspaces for Long-Horizon Agents

May 1, 2026

TL;DR: This round is about agents whose work can be replayed, repaired, and audited. I selected four April 30 papers because they each make one hidden layer of long-horizon a...

English 中文

Closed Loops for Auditable Agents

April 30, 2026

TL;DR: This round moves from “can an agent solve a benchmark?” toward a harder question: can the loop be audited after the fact? I selected four recent open papers because th...

English 中文

Verifiable State for Long-Horizon Agents

April 30, 2026

TL;DR: This round is about a practical shift in agent research: stronger agents are not only longer-context models, but systems that synthesize tasks, verify intermediate sta...

English 中文