Project Updates

Agentic Value Functions

Can retrieval-augmented verifiers benefit from past trajectories for math reasoning?

Using distributionally robust optimization to make LLM responses consistently good, not just good on average

Every feature available in project update pages, with copy-paste examples

Training small LMs to give actionable natural-language feedback via RL

Automatically discovering how LLMs should manage their context window