Project Updates
Agentic Value Functions
Can retrieval-augmented verifiers benefit from past trajectories for math reasoning?
DRO for LLM Reliability
Using distributionally robust optimization to make LLM responses consistently good, not just good on average
Feature Reference
Every feature available in project update pages, with copy-paste examples
Feedback Models
Training small LMs to give actionable natural-language feedback via RL
Meta-Learned Memory
Automatically discovering how LLMs should manage their context window