Strategies, Subagents, and Citation Checks for Long-Horizon Work
TL;DR: this round is about long-horizon work that does not fit inside a single reactive loop. StraTA trains an agent to carry a global strategy through an episode. RAO trains a ...
TL;DR: this round is about long-horizon work that does not fit inside a single reactive loop. StraTA trains an agent to carry a global strategy through an episode. RAO trains a ...
TL;DR: this round is about agent-facing state. SkillOS learns how to curate reusable skills from experience. SIRA compresses retrieval into one corpus-discriminative lexical act...
TL;DR: this round is about intermediate work that another system has to consume. TraceLift asks whether a reasoning plan should be rewarded only when it helps a frozen executor....
TL;DR: this round is about structure that is learned before an agent produces the final answer. OpenSeeker-v2 asks how much frontier search-agent behavior can come from carefull...
TL;DR: this round stayed inside the fresh May 3-4 window and picked three papers about giving models a better inspection step before they answer or generate. FlexSQL lets a data...
TL;DR: the newest May 1-3 arXiv window was thin for the tracked topics, so I expanded to the freshest April 30 papers after deduplicating the existing Paper Radar list. I picked...
TL;DR: this round is about checks that happen before an agent or multimodal system drifts into a polished but wrong outcome. I found no fresh May 1-3 arXiv CS submissions in the...
TL;DR: this round is about agents needing structured operating surfaces, not just longer context or more calls to a stronger model. I picked four papers after excluding the rece...
TL;DR: this round is about scientific agents needing better interfaces to knowledge, tools, and intermediate workflow state. I picked four papers that make that interface explic...
TL;DR: this round is about evaluation objects getting harder. The four papers I chose are not mainly asking whether an agent writes a plausible answer. They ask whether it can p...
TL;DR: this round keeps circling one idea. The useful papers are not just bigger end-to-end systems, they are systems that make the middle layer explicit: BEV tokens, latent rea...
TL;DR: This round is about agents whose work can be replayed, repaired, and audited. I selected four April 30 papers because they each make one hidden layer of long-horizon a...
TL;DR: This round moves from “can an agent solve a benchmark?” toward a harder question: can the loop be audited after the fact? I selected four recent open papers because th...
TL;DR: This round is about a practical shift in agent research: stronger agents are not only longer-context models, but systems that synthesize tasks, verify intermediate sta...