Wednesday, 25 February 2026

Mechanical Sympathy 2.0: From Software Tuning to Model-as-Silicon

A Toronto startup called Taalas hardwired an LLM into transistors and got 16,000 tokens per second. That number sounds like a benchmark. It's actually a paradigm shift hiding in plain sight.




Every decade or so, the computing industry reaches a point where it keeps solving the wrong problem. In the 1990s, the industry optimized instruction pipelines endlessly while memory latency quietly became the actual bottleneck — what scientists called the "memory wall." solution wasn't a faster CPU. It was a different architectural philosophy: caches, NUMA awareness, locality of reference. 

We are building more powerful general-purpose accelerators for an increasingly specific workload, while the actual barriers — latency and cost per inference — remain stubbornly high. A startup called Taalas just walked through a door that everyone else assumed was locked.

Their idea sounds almost offensive in its simplicity. Instead of building a better computer to run AI models, they asked: what if the model itself became the computer? Not metaphorically. Literally. They etched the weights of Llama 3.1-8B directly into silicon — one weight, one multiply, one transistor. Result is a chip that does exactly one thing and does it at 16,000 tokens per second per user. That's not a 2× improvement. It's an order of magnitude beyond what Nvidia, Cerebras, and Groq can achieve on the same model.

Abstraction Tax We Stopped Noticing

To understand why this matters, consider what happens every time a GPU runs inference. You have a general-purpose parallel compute engine. On top of that sits CUDA. On top of that, a deep learning framework. On top of that, a model serving system. On top of that, the model itself — with weights loaded from High Bandwidth Memory that sits physically separated from the compute units, connected by a bandwidth-constrained bus. Every layer of that stack has a cost: power, latency, engineering complexity. HBM memory stacked on modern AI chips consumes significant power just shuttling weights back and forth. Chip doesn't know it's running a transformer. It learns this at runtime through software.




Taalas eliminated the entire middle of that stack. Their HC1 chip — built on TSMC's N6 process at 815mm² — stores model weights in the transistors themselves using a mask ROM fabric. Compute and memory collapse into the same physical location. The von Neumann bottleneck, the memory wall that has haunted computer architects for forty years, simply doesn't exist. There is no bus to saturate. There is no data to move. The multiply happens where the weight lives.

What's striking is not just the performance number, but the power consumption story. Ten HC1 chips running continuous inference consume 2.5 kilowatts. An equivalent GPU setup for the same throughput would demand significantly more power and require liquid cooling, custom packaging, and HBM stacks. Taalas runs in standard air-cooled racks. If this scales, it doesn't just change AI economics — it changes where AI can physically run.

Flexibility-Performance Corner Nobody Explored

The obvious objection is flexibility. An HC1 chip runs exactly one model: Llama 3.1-8B. Update the model, retape the chip. In a field where frontier models are replaced every few months, betting on dedicated silicon seems reckless. This is exactly why nobody went down this path before. The assumption — reasonable until recently — was that AI was changing so fast that any specialized hardware would be obsolete before it paid for itself.

"Nobody went into this corner because everybody felt AI was changing so rapidly that it would be a massively risky thing to do. But we wanted to see what's hiding in that corner."
— Ljubisa Bajic, CEO, Taalas

But Taalas found something in that corner. Two things changed that make their bet less reckless than it appears. First, a growing subset of model families — the Llamas, the DeepSeeks, the Qwens — are stabilizing into production workhorses. Enterprises aren't running the frontier model of the week. They're running fine-tuned versions of models that are already 6–12 months old, because that's what their workflows are validated against. Second, Taalas' retaping cycle is two months, not two years. They only customize two metal layers on an otherwise fixed chip — borrowing an idea from structured ASICs of the early 2000s. The base chip is permanent; only the weight layer changes. Order a chip for your deployment window, run it until the model evolves, retape. If the cost per inference drops by 1,600×, you can absorb a faster hardware refresh cycle and still come out far ahead.




What Does Sub-Millisecond Inference Unlock?

Here is where it gets interesting — and where most analysis of Taalas misses the bigger story. The coverage tends to frame this as "Nvidia competitor" or "cheaper inference." Both are true but both are underselling it. The more important question is: what categories of software become possible when inference is effectively free and instantaneous?

Think about agentic AI systems the way we think about database transactions. Today, every call to an LLM is expensive enough that you architect your system to minimize them — a prompt here, a structured output there, careful chain design. It's the equivalent of designing around the cost of disk I/O in the 1980s. Every application decision was shaped by that constraint. When memory got cheap enough, the constraint dissolved, and entire new software paradigms emerged. In-memory databases. Real-time analytics. Applications that would have been unthinkable when you had to plan every memory access became trivial. Sub-millisecond, near-zero-cost inference does the same thing for AI-native applications.

A coding agent that can spawn 100 parallel reasoning threads to explore different implementation approaches — and complete all of them in the time a single GPU call takes today — is not just a faster version of Copilot. It's a different class of tool. Voice interfaces that feel genuinely instantaneous rather than simulated-fast-typing change the interaction model entirely. IoT devices that run inference locally, on-chip, without cloud round-trips, enable entirely new application categories: real-time translation in earbuds, continuous monitoring in industrial settings, robotic perception loops that don't wait for a network packet.



Architecture of Deployment is About to Flip

There is a deeper structural implication here that I haven't seen discussed elsewhere. The current AI deployment model is highly centralized. You train at hyperscale data centers, you serve from hyperscale data centers, and latency is a tax you pay for accessing that centralized intelligence. This isn't a choice — it's a law of the physics. GPU clusters consume hundreds of kilowatts. You run them where power is cheap and cooling is achievable. Everything else connects via API.

Taalas' HC1 running 10 chips at 2.5 kilowatts fits in a standard rack. Not a special power-zone rack. Not a liquid-cooled custom installation. A standard rack. Scale this to their second-generation silicon and frontier models, and suddenly the economics of edge inference look very different. A hospital running inference on-premise. A factory running quality control loops locally. A telecom running inference at the edge of the network. None of these require a supercomputer. They require a box that costs a few hundred kilowatts and delivers sub-millisecond responses.

The historical parallel that comes to mind is the minicomputer revolution. In the 1960s, computing was centralized by necessity — mainframes were expensive and power-hungry, and only institutions could afford them. The minicomputer didn't just make computing cheaper. It redistributed computing into departments, into labs, into engineering teams that previously had to submit batch jobs and wait. The same shift happened again with workstations, again with PCs, again with smartphones. Each wave moved intelligence closer to the point of use, and each wave unlocked applications that were inconceivable at the previous scale. Taalas, if their roadmap holds, is proposing that AI inference can make that same journey — from hyperscale data center to edge server to eventually embedded device.

What the Risk Profile Actually Looks Like

The hardwired approach carries genuine risks that deserve an honest look. The model specificity is not a minor caveat — it's the central bet. If the Llama family fades and a new architecture dominates, chips hardwired for the old model have limited residual value. The two-month retaping cycle is fast by traditional ASIC standards, but in an AI field where significant model releases happen monthly, it still represents a lag. There's also the question of whether Taalas' approach scales to frontier models. The HC1 runs an 8B parameter model — valuable for production workloads, but well below the frontier. Their second-generation silicon targets a mid-size reasoning model, and frontier capability is planned for later in 2026. That progression is the one to watch.

There's also a market dynamics question. Cloud providers don't necessarily want their customers achieving this kind of cost reduction on inference. Lower inference costs are great for consumers but threaten the economics of API businesses. Whether hyperscalers will adopt Taalas chips, build competing specialized silicon, or simply let GPU clusters continue to dominate through inertia — that's an open question with real strategic stakes.

And yet, the technical result is hard to argue with. 16,000 tokens per second per user. $0.0075 per million tokens. 250 watts per chip. These aren't paper benchmarks — the chip exists, developers can apply for access today. A Toronto company with 25 employees and $219 million raised has produced a benchmark that makes the GPU stack look architecturally mismatched for this workload. 

Lesson Worth Carrying Forward

The lesson I take from Taalas isn't about AI chips specifically. It's about the value of inhabiting the corners of solution spaces that everyone else has deemed too risky to explore. The GPU path is rational. General-purpose compute is flexible, quickly amortized, and continuously improved by massive R&D budgets. The structured ASIC path looks irrational until you do the physics carefully enough to see that the entire software stack you're preserving with all that flexibility is itself the bottleneck. Taalas didn't find a new physics. They found a corner in the design space where the tradeoffs that seemed unacceptable from the outside look entirely acceptable from the inside — because the gains are large enough to absorb the rigidity.

For software engineers building AI-powered systems today, the practical implication is this: the inference cost model you're designing around right now is not a law of nature. It's an artifact of the current hardware generation. If Taalas' approach — or the pressure it creates on incumbent vendors — succeeds in driving inference costs down by one or two orders of magnitude, the right architectural choices for AI-native applications will look completely different in 18 months. The applications that seem economically impossible today — agents that think in parallel, voice interactions that feel truly instant, intelligence embedded in every device — are not science fiction. They're just waiting for the infrastructure to catch up with the ambition.

The model, it turns out, can become the machine. That changes more than inference costs.

Tuesday, 17 February 2026

No One Builds a Search Engine in a Weekend

A solo developer spent a weekend building an AI agent. Two million people used it within weeks. OpenAI and Meta immediately came knocking. Try imagining this story with Google Search. You can't. That's the entire problem with the AI lab business model.

Here is a thought experiment. Imagine a developer spends a weekend building a new search engine. It gets 196,000 GitHub stars. Two million people use it every week. Google sends an acquisition offer within the month. Impossible, right? infrastructure alone — the crawlers, the index spanning hundreds of billions of pages, the query-serving infrastructure that returns results in under 200 milliseconds at global scale — takes years and billions of dollars to assemble. A weekend project cannot replicate it. The moat is structural, physical, and time-locked.

Now run the same thought experiment with the App Store. A developer can build an app that sits on top of the App Store. They cannot build a replacement App Store in a weekend. The payment rails, the developer trust relationships, the OS-level integration, the review infrastructure — none of this is replicable. Apple's moat is not the quality of any individual app. It is the platform that makes apps possible at all.

Peter Steinberger spent a weekend in November 2025 building OpenClaw — an AI agent framework that could control your computer, browse the web, run shell commands, manage your email, and post to social platforms autonomously. Within weeks it had 196,000 GitHub stars and 2 million weekly users. Both Meta and OpenAI sent acquisition offers. OpenAI won the acqui-hire. Steinberger is now inside Sam Altman's operation, tasked with building the next generation of personal agents.

Gap between those two thought experiments is the entire story of why AI labs, for all their astronomical valuations, are operating on sand rather than bedrock.



What Made Google and Apple Unassailable

Google's search moat has three layers that compound on each other. First is the index — years of crawling the web, storing and ranking hundreds of billions of documents, building the infrastructure that makes real-time query response possible at global scale. Second is the feedback loop — two decades of user query data that trained ranking algorithms no competitor can replicate from scratch. Third is distribution — default search agreements with browser makers and device manufacturers that cost Google approximately $26 billion in 2021 alone, just to maintain the default position. A weekend developer cannot interrupt any of these three layers simultaneously. Moat is not one wall, it is three walls reinforcing each other.

Apple's App Store moat is different but equally structural. It is not the quality of Apple's own apps — it is the OS-level trust relationship with the device. Every app on an iPhone exists inside Apple's permission system. Developers build on Apple's infrastructure, follow Apple's rules, pay Apple's cut, and cannot distribute outside Apple's channel without jailbreaking the device. Moat is not about any particular capability. It is about controlling the ground on which all capabilities are built.

Now look at what Steinberger actually built. OpenClaw is an interface layer — a framework for issuing instructions to AI models and executing the outputs. It required no proprietary infrastructure. It required no exclusive data. It required OpenAI's and Anthropic's own API keys, which any developer can obtain in minutes. Entire product sat on top of infrastructure that the AI labs themselves made openly available, then immediately disrupted the market position those same labs were trying to establish. Steinberger did not build a moat. He exposed the absence of one.

Why Anthropic's Reaction Revealed Everything

When OpenClaw was still named ClawdBot — to capture ClaudeCode momentum , the Anthropic model that many developers were using to power it — Anthropic's response was to threaten legal action over the name. This forced Steinberger to rename the project twice, eventually landing on OpenClaw after checking with Sam Altman that the name was acceptable.

Read that sequence again carefully. A solo developer builds the most viral open-source AI agent framework of late 2025, powered substantially by Anthropic's own Claude model, and Anthropic's first move is to send a cease-and-desist letter about a name.

Name threat was not really about trademark law. It was about Claude Code. Anthropic had spent significant resources building Claude Code as its flagship agent-developer product — the agentic interface that would cement Claude's relationship with the engineering community. OpenClaw, running on Claude's API, was demonstrating better viral product dynamics than Claude Code's official launch. ClawdBot's very name threatened to create confusion in exactly the market segment Anthropic was trying to own: developers building with agentic AI. Anthropic looked at a solo developer capturing their intended market and reached for a lawyer instead of a product manager.

When the most viral agent experience is built on your model and you respond with a trademark letter, you have revealed that you believe your moat is your brand — not your technology, not your distribution, not your platform. That is a very thin moat.

Google does not threaten developers who build search-adjacent products. It doesn't need to. No search-adjacent product has ever threatened to replace Google Search because the infrastructure required to replace it doesn't fit in a weekend project. When your competitive position is genuinely structural, you don't respond to open-source alternatives with legal letters. You respond by noting that the alternative needs your infrastructure to function and cannot survive without it. Anthropic could not make that response. Agent ran fine without Anthropic's blessing — it just needed the API key.

Specific Thing AI Labs Cannot Build

Every AI lab in 2026 will tell you their moat is their model. Benchmark performance, the training runs that cost hundreds of millions of dollars, the research teams producing capabilities no open-source alternative has yet matched. This argument has surface plausibility and a fatal flaw.

Flaw is that OpenClaw was explicitly model-agnostic. It ran on Claude, GPT-5, Gemini, Grok, and local models via Ollama. Most viral agent interface of early 2026 was architected from day one to treat every frontier model as a commodity interchangeable with every other. Steinberger himself committed to keeping it model-agnostic even after joining OpenAI. If the product that captured 2 million weekly users doesn't care which model it runs on, what is the model moat actually protecting?

Structural Comparison

Google built a search product that requires years, billions, and global infrastructure to replicate. Apple built a distribution platform that requires OS-level trust to compete with. OpenAI and Antropic built a frontier model, then watched a developer spend a weekend building the interface layer that users actually wanted — using their APIs — and had to acquire or threaten  him. 

Difference is not capability. It is whether the moat lives in the product or in the infrastructure beneath the product.

Google and Apple are not threatened by weekend projects because their moats are below the application layer. Search index is below any search interface. App Store payment rail is below any app. Whatever you build on top cannot replace what is underneath. AI labs have the opposite problem: their most defensible asset — the frontier model — is exposed at the API level to anyone with a credit card. Everything built on top of that API, every interface layer, every agent framework, every product that users actually interact with, is up for grabs every weekend.

What a Real AI Moat Would Look Like

This is not an argument that AI labs are worthless or that the frontier model is irrelevant. It is an argument about what kind of moat is durable versus what kind evaporates the moment a motivated developer has a good weekend.

A durable AI moat would look like Google's: infrastructure that is physically impossible to replicate quickly. Stargate project — OpenAI's $500 billion joint venture with Oracle and SoftBank to build dedicated AI infrastructure — is a bet in this direction. 

If running capable agents at mass scale requires compute infrastructure only a handful of players can afford to build, then the compute becomes the moat the way the search index is Google's moat. But this is an infrastructure bet, not a model bet. OpenAI is effectively betting that the future of AI advantage looks more like owning a power grid than owning a better algorithm.

A durable AI moat would also look like Apple's: owning the OS-level relationship with the device, such that no agent framework can operate without your permission. Microsoft comes closest to this with Windows and the enterprise stack. Google has it with Android. Apple has it most completely with iOS. 

AI labs that sit inside these platforms — OpenAI's ChatGPT integration with Apple Intelligence, Anthropic's enterprise agreements — are paying for distribution access rather than building it. They are tenants in someone else's moat.

What is conspicuously absent from every major AI lab's current strategy is the thing that made Google and Apple truly unassailable: a proprietary feedback loop that improves with use and cannot be transferred to a competitor. 

Google's search gets better with every query because the query data belongs to Google. Apple's App Store gets stronger with every app because developer relationships belong to Apple's ecosystem. 

Every time someone uses ChatGPT or Claude, the interaction data could theoretically compound into better models — but the API-first distribution model means that a large portion of actual usage happens through third-party interfaces, with the data relationship owned ambiguously or not at all. 

Steinberger's 2 million weekly OpenClaw users were generating interaction data that told you something profound about how humans actually want to use agents. That data lived with OpenClaw, not with the model providers whose APIs were processing the requests.

Conclusion

OpenClaw acquisition is not primarily a story about a talented developer getting a well-deserved outcome. It is a story about what happens when the product layer of a technology platform is structurally undefended. 

Peter Steinberger could build OpenClaw in a weekend because the infrastructure he needed was all openly available, cheaply accessible, and deliberately designed to be used by anyone. 

Labs built it that way intentionally — API-first distribution was the fastest path to revenue and adoption. But API-first distribution is also moat-last distribution. Every interface you don't control is an OpenClaw waiting to happen.

Google has never had to acquire a weekend search project because no weekend search project could threaten Google Search. Index is not for sale. Feedback loop is not accessible. Distribution agreements are not replicable. Moat is below the level where weekend projects operate.

AI labs have built their products at the level where weekend projects operate. That is, right now, their most significant strategic vulnerability — and no acquisition, however well-timed, changes the underlying architecture.

Steinberger asked Sam Altman whether naming the project "OpenClaw" was acceptable. Altman said yes. 

Most revealing detail in this entire story is not that OpenAI acquired the project. It is that the founder of the project felt he needed to ask the CEO of OpenAI for naming permission, and got it, and still had 2 million weekly users and full negotiating leverage with both Meta and OpenAI. 

That is what the absence of a structural moat looks like in practice: you are powerful enough to threaten the biggest AI company in the world from a weekend project, and polite enough to check if the name is okay first.

Wednesday, 11 February 2026

Blind Spots in Anthropic's Agentic Coding Report

 Anthropic's 2026 Agentic Coding Trends Report documents a real shift in how software gets built. The data from Rakuten, CRED, TELUS, and Zapier shows engineers increasingly orchestrating AI agents rather than writing code directly. Trend lines are clear: 60% of development work now involves AI, and output volume is rising.

But as someone building production systems with these tools, I found myself returning to what the report didn't address. Not because Anthropic's data is wrong—it isn't—but because the gaps reveal assumptions that deserve scrutiny. These aren't minor omissions. They're the difference between a marketing document and an honest assessment of where this technology actually stands.

Here are seven critical areas where the report's silence speaks louder than its claims.


Cost Model Is Conspicuously Absent



Report asserts that "total cost of ownership decreases" as agents augment engineering capacity. There's a chart. The line goes in the right direction. What's missing is any actual cost analysis.

Running multi-agent systems at the scale Anthropic envisions requires substantial compute. A coordinated team of agents working across separate context windows, iterating over hours or days, generates significant API costs. For a well-funded enterprise, this might be absorbed easily. For smaller teams, especially those in markets with different economic realities, this is a first-order consideration.

Absence of cost modeling isn't accidental—it's strategic. Anthropic benefits when organizations focus on productivity gains rather than infrastructure costs. But builders need both sides of the equation to make informed decisions.

Without cost data, you can't calculate ROI. You can't compare agent-augmented workflows against traditional development. You can't determine which tasks justify agent delegation and which don't. report gives you trend lines but no decision framework.

This matters particularly for the "long-running workflows" trend the report highlights. If tasks stretch across days with multiple agents maintaining state and coordinating actions, the compute bill scales accordingly. Organizations need to understand this economics before committing to these architectures.


Junior Developer Paradox



Report positions role transformation optimistically: engineers evolve from implementers to orchestrators. This framing works for experienced developers who already possess deep systems knowledge. It sidesteps a harder question about how that knowledge gets built in the first place.

Consider what the report itself acknowledges through an Anthropic engineer's quote: "I'm primarily using AI in cases where I know what the answer should be or should look like. I developed that ability by doing software engineering 'the hard way.'"

This creates a structural problem. If agents handle the implementation work that traditionally builds developer intuition—debugging complex issues, understanding why certain patterns fail, developing architectural taste—where does the next generation of experienced engineers come from?

This isn't a philosophical concern about automation displacing jobs. It's a practical question about skill development pipelines. Organizations adopting the orchestrator model need engineers who can effectively direct agents. Those engineers need deep systems understanding. But if the path to developing that understanding increasingly involves reviewing agent output rather than building from scratch, the pipeline breaks.

Report assumes a steady supply of experienced engineers capable of orchestration. It doesn't address how to maintain that supply in a world where early-career development looks fundamentally different.


Failure Modes at Scale Aren't Examined



Rakuten's case study highlights "99.9% numerical accuracy" for a seven-hour autonomous coding task. This is impressive. It's also potentially misleading as a success metric.

In production systems, 99.9% accuracy can translate to hundreds or thousands of subtle bugs at scale. More importantly, agent-generated bugs differ qualitatively from human-generated ones. Traditional debugging assumes you can reconstruct the reasoning that produced the code. Agent-generated code breaks this assumption.

When code fails, the standard approach is to examine the implementation and understand what the author intended. With agent-generated code, there's no author to query and no reasoning to reconstruct. Agent followed patterns and produced output that satisfied its objectives. Understanding why the code works a certain way requires reverse-engineering rather than recall.

Report doesn't discuss what happens when agents produce code that passes tests but contains architectural flaws that only manifest under load. Or when multi-agent systems create emergent complexity that no single reviewer can fully evaluate. Or when errors compound over multi-day tasks because early decisions affect later implementation in ways the orchestrating engineer didn't anticipate.

As agent-generated code becomes a larger percentage of codebases, these failure modes need systematic study. Report treats increased output as an unqualified success. It should be examining what happens when that output fails in production.


Global Access Barriers Remain Invisible

Every case study features well-resourced organizations in developed markets: Rakuten (Japan), TELUS (Canada), CRED (venture-backed India), Zapier (US). "democratization" trends discuss non-technical users gaining coding abilities but remain silent on geographic and economic access disparities.

Agentic coding at scale requires reliable infrastructure, API access with scalable billing, and often English language proficiency for optimal results. These requirements create structural barriers for developers in many markets.

Cost consideration from section one compounds this. If running agent workflows at meaningful scale requires substantial API spend, access becomes stratified by organizational resources. A developer at a startup in Lagos faces different constraints than one at Rakuten.

This matters because software development has been more democratized than many industries—you need a computer and internet access, not expensive capital equipment. If agentic coding raises the resource bar significantly, it doesn't democratize development. It concentrates it.

Report's vision of transformation only reflects the experience of well-funded organizations in specific markets. If this genuinely represents the future of software development, unequal access to these tools doesn't create a temporary gap. It creates stratification in who participates in that future.


Verification Doesn't Scale With Generation



Report celebrates increased output volume: more features shipped, more bugs fixed, more experiments run. It notes that 27% of AI-assisted work consists of tasks "that wouldn't have been done otherwise."

This creates a bottleneck the report doesn't examine. If output increases significantly while humans can only fully delegate 0-20% of tasks (per the report's own data), verification load increases proportionally. Someone must review the additional code. Someone must validate the architectural decisions. Someone must ensure the implementation is correct.

Report proposes "agentic quality control" as a solution—using AI to review AI-generated code. This doesn't resolve the problem; it relocates it. If you can't trust the agent to write code without review, the logical basis for trusting it to review code is unclear. You've created a verification loop that still requires human judgment at some point.

The fundamental constraint isn't code generation—agents demonstrably excel at that. Constraint is verification. Human reviewers can only evaluate so much code, especially code they didn't write and can't query about intent.

Organizations that scale output without proportionally scaling verification capacity aren't increasing velocity sustainably. They're accumulating technical debt and increasing the probability of errors reaching production.


Legal and IP Questions Are Unaddressed



When agents autonomously generate code, questions arise that the report doesn't acknowledge: Who owns the intellectual property? If agent-generated code replicates patterns from training data, who bears copyright liability? When legal teams use agents to build self-service tools (as the report highlights), what's the liability framework if those tools produce incorrect guidance?

These aren't theoretical concerns. They're active legal questions that enterprises must resolve before scaling agentic workflows to the levels Anthropic envisions. The report mentions that Anthropic's legal team built tools to streamline processes but doesn't address what happens when automated legal work produces errors.

Enterprises adopt new technologies slowly not primarily due to technical limitations but due to legal and compliance uncertainty. A forward-looking report that ignores these questions optimizes for excitement over practical adoption guidance.

Organizations need frameworks for:

  • IP ownership when agents generate substantial code independently
  • Copyright compliance when agent output may reflect training data patterns
  • Professional liability when agents augment knowledge work in regulated fields
  • Responsibility allocation when multi-agent systems make decisions over extended periods

Absence of any discussion around these points suggests they're considered solved problems. They're not.


Vendor Lock-In Isn't Mentioned

Report positions Anthropic as the infrastructure for agentic coding's future. Every case study uses Claude. Every workflow assumes access to Anthropic's tools. There's no discussion of what organizations should do to maintain strategic flexibility.

What happens when your multi-agent architecture, your long-running workflows, your team's entire development process is built around one provider's models and tools? When that provider changes pricing, when model capabilities shift, when new competitors emerge with better offerings?

Building deep dependencies on any single vendor creates strategic risk. In a market where model capabilities evolve rapidly and pricing structures change frequently, organizations need abstraction strategies.

Report understandably doesn't highlight this—Anthropic benefits from deep integration. But readers evaluating long-term adoption should be thinking carefully about portability. Today's best model becomes tomorrow's commodity. The investment is in workflows and processes, not specific model endpoints.

Organizations need to consider:

  • How to abstract agent interactions so models can be swapped
  • What standards exist for agent framework portability
  • How to structure workflows to minimize provider-specific dependencies
  • What the exit costs look like if they need to migrate

Report envisions a future built on Anthropic's infrastructure. Strategic planning requires thinking about that future without assuming permanent vendor relationships.


What's Actually Happening

Trends in Anthropic documents are real. Agentic coding is changing software development in meaningful ways. Data showing 60% AI involvement with only 0-20% full delegation is honest and valuable—it describes actual practice rather than aspirational vision.

But this is a vendor report designed to drive adoption, and it accomplishes that goal effectively. What it doesn't do is provide the complete picture builders need to make strategic decisions.

Most important questions about agentic coding in 2026 aren't about capabilities—agents demonstrably work. The questions are about economics, skill development, failure modes, access equity, verification scalability, legal frameworks, and strategic flexibility.

Agentic capabilities are impressive but gaps in the analysis are equally significant.

Understanding both is necessary for making informed decisions about how deeply to integrate these tools into your development process.

Wednesday, 28 January 2026

Why Your AI Coworker Will Never Understand Your Code

The Uncomfortable Pattern Everyone's Ignoring

When Claude Code first started autocompleting my code with insane accuracy, I felt what every engineer feels: a flash of obsolete. Here's a system that writes cleaner boilerplate than I do, recalls API signatures I've forgotten, and implements patterns faster than I can type them. Then I asked it to help with a custom threading synchronization algorithm I was designing - something that had never been written before - and watched it confidently generate a total mess.

Pattern repeats everywhere. LLMs write SQL queries brilliantly until you need to optimize for your specific data distribution. They explain React patterns perfectly until you're debugging a novel state management approach. They're simultaneously genius and confused grade schoolers, and everyone's pretending this jaggedness is just a scaling problem waiting to be solved.

Andrej Karpathy finally said what we've been avoiding: we're not building animals that learn from reality. We're summoning ghosts - digital entities distilled from humanity's text corpus, optimized for mimicry rather than understanding. And if Yann LeCun is right, the architecture itself guarantees they can never become anything else.

This isn't point of view to stop using LLMs. It's recognition that the tools transforming software engineering today have a ceiling we need to see clearly, because your career decisions should account for what AI will never do, not just what it's learning to automate.

What Karpathy Actually Said About Ghosts

Metaphor landed because it captured an asymmetry everyone building with LLMs feels but struggles to articulate. Karpathy's framing is black and white: "today's frontier LLM research is not about building animals. It is about summoning ghosts."

Distinction is architectural. Animals learn through dynamic interaction with reality. "AGI machine" concept envisions systems that form hypotheses, test them against the world, experience consequences, and adapt. 

There's no massive pretraining stage of imitating internet webpages. There's no supervised finetuning where actions are teleoperated by other agents. 

Animals observe demonstrations but their actions emerge from consequences, not imitation.

Ghosts are different. They're "imperfect replicas, a kind of statistical distillation of humanity's documents with some sprinkle on top." 

Optimization pressure is fundamentally different: human neural nets evolved for tribal survival in jungle environments - optimization against physical reality with life-or-death consequences. 

LLM neural nets are optimized for imitating human text, collecting rewards on math puzzles, and getting upvotes on LMSYS Arena. These pressures produce different species of intelligence.

This creates what Karpathy calls "jagged intelligence" - expertise that doesn't follow biological patterns because it wasn't shaped by biological constraints. 

LLM can explain quantum field theory while failing basic common sense about physical objects. It writes elegant code for standard patterns while generating nonsense for novel architectures. 

Jaggedness isn't a bug - it's the signature of learning from corpus statistics rather than reality.

LeCun's Mathematical Doom Argument

While Karpathy's ghost metaphor describes the phenomenology, Yann LeCun argues the architecture itself is mathematically doomed. His position isn't "we need better training" - it's "autoregressive generation can't work for genuine intelligence."

The core argument is this: imagine the space of all possible token sequences as a tree. Every token you generate has options - branches in the tree. 

Within this massive tree exists a much smaller subtree corresponding to "correct" answers. 

Now imagine probability e that any given token takes you outside that correct subtree. Once you leave, you can't return - errors accumulate. Probability your sequence of length n remains correct is (1-e)^n.



This is exponential decay. Even if you make e small through training, you cannot eliminate it entirely. Over sufficiently long sequences, autoregressive generation inevitably diverges from correctness. You can delay the problem but you cannot solve it architecturally.

LeCun's critics point out the math assumes independent errors, which isn't true - modern LLMs use context to self-correct. They note that LLMs routinely generate coherent thousand-token responses, which seems impossible under exponential decay. Recent research shows errors concentrate at sparse "key tokens" (5-10% of total) representing critical semantic junctions, not uniformly across all tokens.

But LeCun's deeper point stands: the autoregressive constraint means sequential commitment without exploring alternatives before acting.

The Lookback vs Lookahead Distinction

To be precise about what "autoregressive" actually constrains: LLMs have full backward attention - at each token, the model attends to ALL previous tokens in the context. This is fundamental to how transformers work. They're constantly "looking back."

What they don't do is lookahead during generation:

Standard Autoregressive Generation:
Token 1: Generate → COMMIT (can't change later)
Token 2: Generate → COMMIT  
Token 3: Generate → COMMIT
...each decision is final upon generation

Compare this to search-based planning (like AlphaGo):

Consider move A → simulate outcome → score: 0.6
Consider move B → simulate outcome → score: 0.8  
Consider move C → simulate outcome → score: 0.4
Choose B (explored before committing)

Chess analogy: Standard LLM generation is like being forced to move immediately after seeing the board position, without considering "if I move here, opponent does this, then I..." Human planning involves internally simulating multiple futures before committing to action. Autoregressive generation commits token-by-token without exploring alternative continuations.

What People will say to this argument ? 

Modern techniques add planning on top of base generation. Chain-of-thought generates reasoning tokens first but still commits sequentially. 

Beam search keeps multiple candidates but is exponentially expensive for deep exploration. 

OpenAI's o1 reportedly uses tree search during inference, which IS genuine lookahead - a significant architectural addition beyond pure autoregressive generation.

LeCun's claim isn't that these improvements are impossible. It's that they're band-aids on an architecture that doesn't naturally support the kind of internal world simulation that characterizes animal intelligence.

Four Gaps That Can't Be Trained Away

LeCun identifies four characteristics of intelligent behavior that LLMs fundamentally lack: understanding the physical world, persistent memory, the ability to reason, and the ability to plan. But the deepest issue is what they're optimized for.

Consider what this means for scientific reasoning. Scientists don't generate hypotheses by pattern-matching previous hypotheses - they observe phenomena, form novel explanations, design experiments to falsify them, observe results that surprise them, and refine their models. 

Every step involves interaction with ground truth that can prove you wrong.

LLMs have no mechanism for this. 

- They can't run an experiment and be surprised. 

- They can't observe results that contradict their predictions and update based on physical evidence.

 Every token is inference from prior tokens in a corpus that only contains what was already discovered and written down. You cannot discover novel physics from a corpus that only contains known physics.

This explains why LLMs excel at code but struggle with physical reasoning. Code operates in "a universe that is limited, discrete, deterministic, and fully observable" - the state space is knowable and verification is programmatic. 

Physical reality is continuous, partially observable, probabilistic, and full of phenomena we haven't documented. 

Animals navigate this effortlessly because they learn directly from it. LLMs can only learn from our linguistic shadows of it.

When LeCun states "Auto-Regressive LLMs can't plan (and can't really reason)," he's not being provocative - he's describing an architectural constraint. 

Even chain-of-thought prompting doesn't fix this because it's "converting the planning task into a memory-based (approximate) retrieval." You're not teaching reasoning - you're teaching corpus-level pattern matching about what reasoning looks like. This is exact reason that "Prompt" is hyper parameter and model is very very sensitive to it. 

LLM learns to generate text that resembles reasoning steps because that's what appears in the training data, not because it's internally simulating multiple future scenarios and choosing the best path.

Why Code Works But Reality Doesn't

I've noticed an interesting pattern with GitHub Copilot/Claude Code. When implementing a standard REST API or writing React components, the suggestions are good - often exactly what I was about to type. 

When debugging a distributed systems issue or architecting a novel state management approach, the suggestions become actively unhelpful, confidently wrong in ways that would break production.

The difference isn't random. Standard patterns exist extensively in training corpora - GitHub is full of REST APIs and React components.

LLM has seen thousands of implementations and learned the statistical regularities of how these patterns manifest in code. 

It's not understanding the requirements and generating a solution; it's recognizing "this looks like a REST endpoint" and retrieving an approximate match from distribution of REST endpoints in its training data.

For novel code that deviates from conventions, this breaks down. When you are building custom thread synchronization, models repeatedly failed because they kept pattern-matching to standard practices - adding defensive try-catch statements, turning focused implementations into bloated production frameworks. 

They couldn't understand his actual intent because they don't understand intent at all. They understand corpus statistics.

This is why code works better than general reasoning for LLMs: code is verifiable, domains are closed, and common patterns dominate the training data. You can build benchmarks with programmatic correct answers. 

You can use Reinforcement Learning from Verifiable Rewards (RLVR) because verification is automatic. But this success doesn't generalize to open-ended domains where ground truth isn't programmatically checkable.

Strategic question for engineers: which parts of your work are "standard patterns well-represented in training corpora" versus "novel architecture requiring genuine understanding?" 

First category is being rapidly automated at 100X speed. Second isn't just hard for current LLMs - it may be architecturally impossible for them.

What Animals Have That Ghosts Never Will

LeCun's alternative to LLMs is Joint Embedding Predictive Architecture (JEPA), which inverts the paradigm entirely. Instead of predicting next token in pixel/word space, JEPA learns to predict in latent representation space - building an internal world model that captures structural regularities while ignoring unpredictable details.

Key insight: most of reality's information is noise. When you watch a video of someone throwing a ball, the exact trajectory is predictable from physics but the precise pixel values (lighting, shadows, texture) contain high entropy. 

Generative models waste capacity modeling all this unpredictable detail. JEPA learns representations that "choose to ignore details of the inputs that are not easily predictable" and focus on "low-entropy, structural aspects" - like the parabolic arc, not the exact RGB values.

This mirrors biological learning. An infant knocking objects off a table learns gravity not by memorizing pixel sequences but by building an abstract model: "objects fall downward." 

The model ignores irrelevant details (color, texture, lighting) and captures the physical law. 

No books required, no 170,000 years of reading - just observation and interaction.

Meta's V-JEPA demonstrates this works. When tested on physics violations (objects floating mid-air, collisions with impossible outcomes), it showed higher surprise/prediction error than state-of-the-art generative models. 

It acquired common-sense physics from raw video by building an actual world model, not by memorizing corpus statistics about how people describe physics.

Architectural difference matters because it determines what's learnable. LLMs learn "what humans wrote about the world" - a tiny, biased, lossy compression. 



JEPA-style models can learn "how the world actually works" through observation. The first hits a data ceiling when you've processed all available text. The second has access to reality's infinite bandwidth.



Architecture Is The Constraint

LeCun's prediction is bold: "within three to five years, no one in their right mind would use" autoregressive LLMs. 

His position is that better systems will appear "but they will be based on different principles. They will not be auto-regressive LLMs.

This isn't incrementalism. It's claiming the entire paradigm is a dead end for genuine intelligence. The reason is architectural: 

LLMs and humans play by completely different rules. One is a master of compression, and the other is a master of adaptation.

Simply feeding more data to "this compression beast will only make it bigger and stronger, but it won't make it evolve into an adaptive hunter.

Consider what this means for the current race toward AGI via scaling. If LeCun is right, we're optimizing along a dimension that can't reach the target. Better compression of human text gets you better mimicry, not understanding. 

Larger context windows let you mimic longer documents, not think longer thoughts. 

RLVR on verifiable domains like code and math creates better pattern-matchers for those domains, not general reasoners.

Counterargument is so convincing: LLMs keep surprising us with emergent capabilities. GPT-4 does things GPT-3 couldn't, and Claude Sonnet 4 does things GPT-4 struggled with. 

Maybe there's no architectural ceiling, just insufficient scale. Maybe chain-of-thought reasoning plus tool use plus larger context windows eventually produces something indistinguishable from genuine intelligence.

LeCun's response: show me the world model. Show me the system that can watch a video, form a novel hypothesis about what happens next, be surprised when it's wrong, and update its model of reality.

 Autoregressive text generation can't do this by construction - it has no mechanism for ground truth interaction, no ability to be surprised by reality rather than corpus statistics.

What This Actually Means For Your Career

Practical implication isn't abandoning LLMs - they're extraordinarily useful for what they actually are. It's recognizing their ceiling so your skill development accounts for what AI will never automate.

Here's the framework: 

LLMs excel at problems where 

(1) the solution space is well-represented in training corpora, 

(2) verification is possible through execution or programmatic checking, and 

(3) the domain is closed and discrete. 

They struggle where 

(1) the problem is genuinely novel, 

(2) correctness requires understanding beyond pattern-matching, or 

(3) the domain involves continuous physical reality or open-ended reasoning.

This creates a clear dividing line in engineering work:

Automatable (pattern-matching sufficient): Standard CRUD implementations, boilerplate reduction, API integration following documentation, test generation for known patterns, code explanation and documentation, refactoring for style consistency.

Not automatable (understanding required): Novel algorithm design, distributed systems debugging with emergent behavior, performance optimization for specific workload characteristics, architectural decisions balancing tradeoffs, security reasoning about attack surfaces, integration of fundamentally new technologies.

The difference isn't difficulty - it's whether success requires recognizing patterns in existing code versus forming and testing novel hypotheses about system behavior. 

One is corpus retrieval, the other is scientific method.

For career strategy, this suggests investing in skills that require building world models: understanding how systems actually behave under load, why certain architectural patterns create subtle failure modes, what tradeoffs matter for your specific context. 

These aren't pattern-matching problems. They're "I need to understand this system well enough to predict what happens in scenarios I haven't seen."

The engineers who thrive won't be those who resist AI tools. They'll be those who understand exactly which problems LLMs can solve (letting them automate aggressively) versus which problems require genuine understanding (where they need to think like scientists, not pattern-matchers). 

The tools are getting better at mimicry. But mimicry isn't understanding, and the architecture guarantees it never will be.

If Karpathy's right that we're summoning ghosts, and LeCun's right that ghosts can never become animals, then the question isn't "how do I compete with AI." 

It's "which problems require animal intelligence?" Those problems aren't going away. They might be the only ones that matter.

Friday, 23 January 2026

Intelligence has become Commodity

Why Apple's $1B Intelligence Rental Is Actually Brilliant

Apple recently announced they're paying Google approximately $1 billion annually to power Siri with Gemini. It might looks like Apple admitting defeat in AI.

They got it exactly backwards.

Apple just confirmed what the smartest companies already know: If you have distribution and trust, renting intelligence is the power move. And Meta's desperate open-sourcing of Llama proves what happens when you lack the moat that makes renting viable.



Deal Everyone Misreading

Apple looked at building a competitive foundation model in-house. The real cost:

  • $10-20 billion over 5 years
  • Hundreds of ML researchers (competing with Google, OpenAI, Anthropic for talent)
  • Building training infrastructure from scratch
  • 3-5 years to catch up to Google's decade-long DeepMind advantage

Apple's response: "We'll pay you a billion a year to skip all that."

This isn't weakness. It is refusing to fight a multi-billion dollar battle in territory where you have zero advantage.

For $1 billion annually - 0.25% of their revenue, less than they spend on store design - Apple bought:

Optionality without commitment. If Google's models fall behind, switch to Anthropic, OpenAI, or whoever wins next generation. The integration layer (Private Cloud Compute, iOS hooks) works with any sufficiently capable model.

Speed without technical debt. Ship AI features this year instead of 2028. No sunk costs, no research teams to maintain, no infrastructure to depreciate.

Competitive intelligence. By being Google's customer, Apple sees exactly what's possible with current models, what's improving, what's still broken. Learning the constraints without paying for research.

But they did not rent many things like relationship with 2 billion iOS device owners. App Store monopoly developers can't escape. Hardware-software integration that took twenty years to build. The premium pricing power from brand trust.

Apple is renting the commodity and owning the moat.

What Meta's Strategy Actually Teaches

Meta spent billions building Llama, then gave it away for free. Open source. No licensing fees. Available to anyone, including direct competitors.

This isn't generosity. This is a calculated move to prevent moats from forming in the intelligence layer.

Meta absolutely COULD rent models like Apple does - nothing stops them from using Claude, GPT, or Gemini. They have the money, the distribution, 3 billion users. But at their scale, with their existing compute infrastructure and data, building is actually cost-effective.

Strategic choice isn't BUILD vs RENT. It's what they did AFTER building: They gave it away for free.

Here's why: Meta's actual business is advertising on social feeds. Nightmare scenario is Google or OpenAI building such a strong moat in AI that it becomes a competitive bottleneck. Imagine if accessing frontier AI required paying Google, and Google could prioritize their own products or charge Meta premium rates.

By open-sourcing Llama, Meta is trying to: Commoditize intelligence to prevent anyone from building a moat there.

If Llama is free and good enough, then intelligence can't become a competitive advantage. If it's free, Google can't charge Apple $1 billion for preferential access. If it's free, startups can't build AI-native products that threaten Meta's ad business without Meta having access to the same capabilities.

This is a deliberate choice about WHERE moats should form. Meta is saying: "We're fine with moats in distribution, in user relationships, in advertising infrastructure. But intelligence itself? We need that to stay commodity."

Uncomfortable Insight

 Apple-Meta comparison reveals: Different companies want moats in different layers.

Apple can rent because their moat is in ecosystem and distribution:

  • 2 billion devices creating lock-in
  • An ecosystem developers can't afford to leave
  • Premium pricing power from brand trust
  • Hardware integration creating switching costs

For Apple, intelligence being commodity is PERFECT. It means they can access the best models without building research teams, and their real advantages (ecosystem control, user trust) remain defensible.

Meta builds and open sources because they want to PREVENT moats from forming in intelligence:

  • They have the scale and infrastructure to build cost-effectively
  • They need intelligence to stay commodity to protect their ad business
  • They can't afford Google or OpenAI controlling access to frontier AI

Apple pays $1 billion to rent what Meta gives away for free. But they're not in the same strategic position - they're pursuing opposite goals.

Apple wants intelligence to be rented infrastructure (keeps it commodity, lets them focus on ecosystem). Meta wants intelligence to be free infrastructure (keeps it commodity, prevents competitors from building moats there).

Both strategies treat intelligence as commodity. The difference is how they achieve that outcome.

Pattern Across Big Tech

Amazon is doing the Apple strategy at scale. AWS Bedrock hosts everyone's models - Claude, Llama, Cohere, their own Titan. They don't care who wins the model race because infrastructure is their moat.

Google is the only one playing both sides profitably. They sell to Apple ($1B/year), power their own products, AND offer Vertex AI to enterprises. But even Google's strategy depends on their search monopoly - the intelligence itself is just one revenue stream. To some extend they are also thinking like META

Companies that RENT to keep intelligence commodity:

  • Apple (ecosystem lock-in is the moat)
  • Amazon (infrastructure dominance is the moat)
  • Enterprise SaaS with strong moats (Salesforce, Adobe)

These companies WANT intelligence to be rented commodity infrastructure. It protects their actual moats.

Companies that BUILD then OPEN SOURCE to keep intelligence commodity:

  • Meta (prevent competitors from building intelligence moats)
  • Mistral (European AI sovereignty positioning)

These companies spend billions building, then give it away to prevent moats from forming in the intelligence layer.

Companies trying to BUILD moats IN intelligence:

  • OpenAI (model quality as primary differentiation)
  • Anthropic (model quality + safety positioning)


Ecosystem Defense Through Rental

Apple's strategy is more sophisticated than just "rent the AI." They're using rented intelligence to strengthen their ecosystem while avoiding the sunk cost trap that kills tech giants.

The playbook:

  1. Rent frontier models to ship competitive AI features fast
  2. Build the integration layer in-house (Private Cloud Compute, iOS hooks)
  3. Own the user relationship and trust (privacy positioning)
  4. Let model providers compete for their business
  5. Switch providers when someone gets better

Every AI feature makes iOS more valuable. Every AI integration makes it harder to leave Apple's ecosystem. But none of it requires winning the model training arms race.

Compare Meta's position. They spend billions on Llama to:

  1. Prevent Google/OpenAI from building intelligence moats
  2. Protect their ad business from AI disruption
  3. Maybe get PR credit for "open source leadership"

One company strengthens their moat. The other desperately tries to prevent competitors from building one.

Story from Model builder

They're optimized for different strategic contexts.

Companies trying to build proprietary moats IN intelligence itself (OpenAI, Anthropic) are betting against both Apple AND Meta's preferred outcome.

If either Apple or Meta succeeds in keeping intelligence commodity - whether through competitive rental markets or open source proliferation - then intelligence itself can't be a sustainable moat.

The smartest strategic question isn't "should I build or rent intelligence?"

It's "where do I want moats to form, and does my intelligence strategy support or undermine that goal?"

Thursday, 15 January 2026

Audacious Plan to Train AI in Space

What happens when humanity's appetite for artificial intelligence outpaces our planet's ability to feed it? Someone decided the answer was "leave the planet."

In December 2025, something delightfully absurd happened 325 kilometers above Earth. A 60-kilogram satellite named Starcloud-1, carrying an NVIDIA H100 GPU, trained an AI model on the complete works of Shakespeare. Model learned to speak in shakespeare English while orbiting our planet at 7.8 kilometers per second.

"To compute, or not to compute"—apparently, in space, the answer is always "compute."

This wasn't a publicity stunt. It was a proof of concept for what might be the most bold infrastructure play in the history of computing: moving AI training off Earth entirely.

First reaction: "This is insane. I should buy more chip stocks !"

Second reaction, after reading the physics: "Wait, this might actually work."





Dirty Secret Nobody Wants to Talk About at AI Conferences

Here's something the AI industry prefers to whisper about over drinks rather than announce on keynote stages: we're running out of power. Not in some distant climate-apocalypse scenario. Now. Today. While you're reading this.

The numbers read like a horror story for grid operators:

Data centers consumed approximately 415 terawatt-hours of electricity globally in 2024—roughly 1.5% of all electricity generated on Earth. By 2030, that figure is projected to more than double to 945 TWh. That's Japan's entire annual electricity consumption. For computers. Training models to argue about whether a hot dog is a sandwich.

Virginia's data centers alone consume 26% of the state's electricity. Imagine explaining to your neighbors that their brownouts are because someone needed to train a chatbot to write better cover letters.
But here's where it gets truly uncomfortable: to train the next generation of frontier models—think GPT-6 or whatever Claude's grandchildren will be called—we'll need multi-gigawatt clusters. 

5 GW data center would exceed the capacity of the largest power plant in the United States. These clusters don't exist because they can't exist with current terrestrial infrastructure.

Breakthrough is not coming from earth solar panel and fusion reactors is 10 to 20+ Year.

It might come from the one place where solar works really, really well.

325 kilometers straight up.

The Physics of Space: Nature's Cheat Codes

Starcloud's white paper makes a case that initially sounds like venture capital science fiction. But then you check the physics, and... huh. It actually works. Let me break down as per paper why space is basically running a different game engine than Earth.

Cheat Code #1: Infinite Solar Energy (Seriously)

Solar panels in Earth orbit receive unfiltered sunlight 24/7. No atmosphere absorbing photons. No weather. No pesky night cycle if you pick the right orbit. A dawn-dusk sun-synchronous orbit keeps a spacecraft perpetually riding the terminator line between day and night—eternal golden hour, but for electricity.

The capacity factor of space-based solar exceeds 95%, compared to a median of 24% for terrestrial installations in the US. The same solar array generates more than 5X the energy in orbit than it would on your roof.  This seems like coding gains we get from claude code :-) 


Cheat Code #2: The Universe's Free Air Conditioning

Deep space is cold. Like, really cold. Cosmic microwave background sits at approximately -270°C. A simple black radiator plate held at room temperature will shed heat into that infinite cold at approximately 633 watts per square meter.

Cooling algorithm is very different on earth vs space.

Earth Cooling:
Evaporative cooling towers consuming billions of gallons of water. Chillers running 24/7. 
Microsoft literally sinking servers in the ocean like some kind of tech burial at sea.

Space Cooling:

Point a black plate at the void. Wait. Physics does the rest. No water. No chillers.

Just thermodynamics being thermodynamic.


Cheat Code #3: No Land Law in Orbit

Perhaps the most underrated advantage. On Earth, large-scale energy and infrastructure projects routinely take a decade or more to complete due to environmental reviews, utility negotiations, zoning battles, and that one guy at every town hall meeting who's convinced 5G causes migraines.

In space? You dock another module and keep building. When xAI had to resort to natural gas generators for their Memphis cluster because the grid wasn't ready, they weren't just solving a technical problem—they were demonstrating the bureaucratic fragility of terrestrial infrastructure.

What does Cost Math look like

Starcloud's white paper presents this comparison for a 40 MW data center operated over 10 years:

Terrestrial 10-Year Cost:
Energy: ~$140M (@$0.04/kWh)
Land, permits, cooling infrastructure
Water, maintenance, grid upgrades
Total: ~$167 million+
Space 10-Year Cost:
Solar array: ~$2M
Launch: ~$5M (next-gen vehicles)
Radiation shielding: ~$1.2M
Total: ~$8 million

That's a 20x difference, driven almost entirely by energy costs.

Now, before you start a space data center SPAC, let's be honest about what this analysis conveniently ignores: the actual compute hardware. 40 MW of GPU capacity costs somewhere in the neighborhood of $12-13 billion. That's... a lot of billions.

But here's the thing: you pay for that hardware whether it's sitting in a concrete bunker in Iowa or floating above the atmosphere. The operational cost delta remains. And as models get larger and training runs stretch from weeks to months, that delta compounds like the most patient venture capitalist in history.

What This Means for Everyone Betting Billions on the Ground Datacenter

If orbital data centers become economically viable at scale.

Hyperscaler Dilemma

 Microsoft, Google, Amazon, and Meta have collectively committed over $200 billion in capital expenditure on terrestrial data center infrastructure. These are sunk costs with multi-decade payback periods. Do they pivot to space and write off billions? Do they wait and risk being leapfrogged? The prisoner's dilemma dynamics here are brutal. Someone will defect first.

Sovereign AI Gets Complicated

Countries racing to build domestic AI capabilities have assumed the limiting factor is talent and chips. If it turns out the limiting factor is energy, and the solution is orbital infrastructure, the competitive landscape shifts dramatically. Quick: who controls orbital launch capacity? Who can deploy and maintain space-based infrastructure? These aren't questions most national AI strategies have seriously considered.

Environmental Narrative Flips

Right now, AI's carbon footprint is a vulnerability—a PR problem and increasingly a regulatory target. Orbital data centers, powered entirely by solar energy and requiring no water for cooling, transform AI infrastructure from environmental liability to potential climate solution. That's a narrative shift worth billions in avoided regulatory friction alone.


Design Principles That Actually Matter

What makes Starcloud's approach interesting isn't just "put computers in space"—it's how they're thinking about building something that can survive and scale in a hostile environment. 

There's some genuine distributed systems wisdom here:

Modularity: Everything is designed to be added, replaced, or abandoned independently. No single-point-of-failure architecture. This is microservices thinking applied to hardware, which is either brilliant or terrifying depending on your ops experience.

Incremental Scalability: You don't build a 5 GW space station and pray it works. You launch 40 MW modules, validate they function, scale up. It's the same philosophy that made AWS successful: don't bet everything on one deployment.

Failure Resiliency: In space, you can't send a technician. Components will fail. The system has to route around damage like the internet was originally designed to route around nuclear attacks. Graceful degradation isn't optional—it's existential.

Ease of Maintenance: Or rather, the complete absence of it. Everything has to be either radiation-hardened enough to outlast its usefulness, or cheap enough to abandon. There's no middle ground.