Heavy Thinking: A Test-Time Scaling Pattern for Hard Problems
Now we can have GPT5 Pro at home
Now we can have GPT5 Pro at home
A large-scale study on long-horizon document tasks.
An agentic framework for end-to-end game creation
Self-generated agent context files don't help.
Curated skills boost agent performance by 16 points; self-generated ones don't help at all.
A new paradigm for single-step generative modelling
Optimising computation at the token-level
aka Self-Confidence is All You Need
on John Carmack's Upperbound 25 Talk Notes
Using evolutionary algorithms with LLM-coding agents