LLMs Corrupt Your Documents When You Delegate
A large-scale study on long-horizon document tasks.
A large-scale study on long-horizon document tasks.
We probably need to do deliberate daily mental exercise
An agentic framework for end-to-end game creation
I guess there is only one hard thing left in Computer Science?
A Jupyter Notebook-style Obsidian plugin that runs code in your notes and stores the outputs directly in the Markdown file.
An approach to agentic software development that I use
Self-generated agent context files don't help.
Curated skills boost agent performance by 16 points; self-generated ones don't help at all.
A new paradigm for single-step generative modelling
is verification the future of software engineering?