Software Factory

Feb 08, 2026 permanent SoftwareEngineering

Software Factory ¹ refers to the idea of completely abandoning the notion of writing code, even reviewing it, leaving engineers to manage the agent's goal and validate the correctness of the system. Effectively, developers become system-level QA engineers, writing specs and unblocking agents.

StrongDM defines a set of principles for the Software Factory approach ², which requires thinking in terms of "The Loop": the model starts with a seed, iterates, validates, receives feedback, and continues until all the "holdout scenarios" pass. With that paradigm, the question for the engineer is how to structure the problem so that each criterion can be validated (without return-true-in-tests cheating) and can receive meaningful feedback to guide its further development. StrongDM goes so far as to build "Digital Twin Universes": entire replicas of all the tools it integrates with, such as Jira, Okta, and Google Sheets, which it uses to exhaustively test its scenarios.

For a software engineer who's made a living writing software by hand for many years, I can't help but feel anxious when I hear people talking about developing software like this. Clearly, my industry is changing. But I do see some inevitability in this paradigm, or at least some of the ideas. Because, even before I knew it was a thing, I've been seeing it as a natural way to solve problems in the Opus 4.5+ era of agentic software development.

For example, I wanted an MLX (basically, PyTorch for Apple Silicon hardware) version of Demucs, one of the best audio stem-splitting models available. Without knowing a term for it, more just because I was lazy, I set up a software factory. I gave the model context of the reference implementation (the original PyTorch implementation), and the validation scenario: given a range of audio inputs I selected, the MLX outputs must match the PyTorch outputs within a numerical tolerance. I gave the agent an IMPLEMENTATION_NOTES.md file (see Spec-First LLM Development) to serve as both the plan and working memory, to be updated as it goes. I reviewed the initial plan to check that we agreed on the success criteria, then left it to work. And it did work. The results are here.

In another project, I wanted to integrate Pocketsmith (my budgeting tool) into OpenClaw as a "skill". I pointed the agent at several existing OpenClaw skills as concrete exemplars (which StrongDM calls "Gene Transfusion"), gave it the Pocketsmith API documentation, provided a basic indication of how I wanted to use it, told it how to plan and track its tasks, and let it work. The verification was mainly just me testing it and making sure I was satisfied with the interface it had constructed. This skill is just for my own personal use, so I'm not so worried about exhaustive verification - the surface for things to go disastrously wrong is limited. The results are here. I also integrated my preferred share portfolio tool, Sharesight, using a similar approach. See sharesight-skill.

Dan Shapiro ³ discusses 5 levels of software autonomy, with Dark Software Factory at Level 5, where software is a "black box that turns specs into software". I might not quite be operating at Shapiro's level 5 here - I still find myself butting in on the agent's work with my opinions about code quality - but I can certainly see the path towards it.

For my actual work at Canva, our users depend on us getting things exactly right, and currently, engineers must own AI-generated code as if they wrote every line themselves, so we're not doing Dark Software Factory anytime soon. But agentic coding is a fact of life today. Even if the code itself is carefully peer-reviewed at the end, there are definitely lessons to take away: how can I make sure the agent has all the context it needs with exemplar references? How can I enable it to validate at every stage of the implementation, and how can I provide feedback to guide its work? How can I be as exhaustive as possible in all the testing scenarios, removing any means of cheating?

The present of engineering is becoming more about reviewing code than writing it. But the future of engineering might be more about exhaustive verification of your system's correctness, and not much about the actual code at all.

Discussion on Bluesky, LinkedIn or Mastodon.

Cover by Homa Appliances on Unsplash

References

Moynihan, L. (2024, December). The Software Factory. LukePM.com. ↩
StrongDM. (n.d.). StrongDM Software Factory. ↩
Shapiro, D. (2026, January). The Five Levels: From Spicy Autocomplete to the Dark Factory. ↩

Tags

Notes by Lex Toumbourou

Software Factory

References