TL;DR: This round is about a practical shift in agent research: stronger agents are not only longer-context models, but systems that synthesize tasks, verify intermediate state, preserve evidence, and learn from world-state transitions. I selected four recent open papers because each one makes state more inspectable: ClawGym builds verifiable computer-use tasks, World2VLM turns world-model imagination into training data, DataPRM verifies data-analysis steps inside the environment, and OCR-Memory stores long agent histories as optically retrievable evidence.