Thursday, 13 November 2025

Agentic Commerce War - Car with bumpy roads

 There's a lawsuit that should make every engineer building AI applications pause and think carefully about the world they're creating. Amazon is suing Perplexity AI, and while the legal complaint talks about "covert access" and "computer fraud," what's really happening is far more interesting: we're watching the first shots fired in a war over who gets to control the future of commerce.

"AI revolution" in shopping is probably going to make markets less competitive, not more. Let me explain why.




The Pattern-Matching Disguised as Innovation



We've seen this movie before. In the 2000s, it was about who controls app distribution. Apple and Google built "open" platforms, welcomed developers, then extracted 30% rent from everyone. In the 2010s, it was about who controls attention. Facebook and Google became the gatekeepers to your customers, then jacked up ad prices once you were dependent on them.

Now we're in the 2020s, and the game is about who controls shopping intent. The technology changed—from apps to ads to AI agents—but the fundamental power dynamics remain depressingly familiar.

Agentic commerce is the fancy term for AI systems that can shop on your behalf. Tell ChatGPT you need running shoes under $100, and it searches stores, compares options, and completes the purchase. No browsing, no clicking through pages, no "adding to cart." The AI does it all.

McKinsey forecasts this could generate $1 trillion in global commerce by 2030. Traffic to U.S. retail sites from GenAI browsers already jumped multi fold year-over-year in 2025. 

Amazon vs. Perplexity: A Case Study in Platform Power

Here's what actually happened, stripped of the legal jargon:

November 2024: Amazon catches Perplexity using AI agents to make purchases through Amazon accounts. They tell Perplexity to stop. Perplexity agrees.

July 2025: Perplexity launches "Comet," their AI browser that can shop for you. Price tag: $200/month.

August 2025: Amazon detects Comet's agents are back, but this time they're disguised as Google Chrome browsers. Amazon implements security measures to block them.

Within 24 hours: Perplexity releases an update that evades Amazon's blocks.

November 2025: Amazon files a federal lawsuit accusing Perplexity of violating the Computer Fraud and Abuse Act. Perplexity publishes a blog post titled "Bullying is Not Innovation."

Now, you might think this is about security or customer protection. And sure, those are real concerns—when AI agents access customer accounts, make purchases, and handle payment data, security matters enormously.

But let's be honest about what's actually happening here: Amazon is defending its moat.




Amazon built a trillion-dollar business by owning the customer relationship. They know what you buy, when you buy it, how much you're willing to pay, and what you'll probably want next. This data advantage is what makes Amazon Rufus (their own shopping agent) dangerous to competitors—it already knows you better than any third-party agent ever could.

If Perplexity's agents can freely roam Amazon's platform, comparison-shop ruthlessly, and complete purchases without Amazon controlling the experience, then Amazon loses three critical things:

  1. The ability to show you ads for products they want you to buy
  2. The ability to promote their own private-label brands
  3. The data about what AI-assisted shopping actually looks like

This is Amazon's "app store moment." And they learned from Apple: if you're going to allow third parties to build on your platform, you need to control who gets access and extract rent from those you approve.

Architecture of Control: How This Actually Works

Let's talk about the technical stack for a moment, because this is where it gets interesting from an engineering perspective.

The Five-Layer Problem

Layer 1: Consumers delegate shopping tasks to AI agents, often paying $20-200/month for the privilege.

Layer 2: AI Agents (ChatGPT Operator, Perplexity Comet, Amazon Rufus) search, compare, and transact on your behalf.

Layer 3: Trust & Payment Infrastructure (Visa, Mastercard, Stripe) verify agent identity and process payments.

Layer 4: Platform Gatekeepers (Amazon, Google, Apple) control access to inventory and customer data.

Layer 5: Merchants & Brands fulfill orders and watch their margins compress.

Where power concentrates: not at the AI layer where everyone's focused, but at Layer 3 (payments) and Layer 4(Platform Gatekeepers)




Why Payments Matter More Than You Think

Visa and Mastercard are quietly positioning themselves as the critical trust infrastructure for agentic commerce. They're partnering with Cloudflare to implement Web Bot Auth—a cryptographic authentication protocol that lets merchants verify which AI agents are legitimate.

Think about the implications: if every agentic transaction must flow through payment network authentication, then Visa and Mastercard become the gatekeepers of which agents can transact at all. They've turned themselves into the identity verification layer for AI agents, which means they can collect tolls on the entire ecosystem.

This is brilliant infrastructure play. While everyone's fighting over the AI layer, the payment networks are becoming the new platform.

The Security Nightmare Nobody Wants to Talk About

Here's the thing that keeps security engineers up at night: traditional fraud detection assumes humans are making purchases. You can look at behavioral patterns, device fingerprinting, velocity checks—all the usual signals that distinguish legitimate users from attackers.

But what happens when the "legitimate" user is an AI agent that behaves like a bot because it is a bot?

The attack surface is enormous:

  • Agent manipulation: Increase vulnerability rate for tricking AI agents with fake listings or manipulated reviews
  • Automated account takeover: AI can run credential stuffing attacks at scale, then use compromised accounts to make "legitimate" agent purchases
  • Synthetic identity fraud: Generate deepfakes and fake identities that pass agent verification
  • Phishing at industrial scale: AI-generated personalized phishing that tricks both humans and other agents

To successfully implement agentic commerce, you need to solve the impossible problem: identify agents, distinguish legitimate from malicious ones, verify consumer intent, and do all of this in real-time at massive scale.

This isn't just a "hard problem"—it requires fundamentally rethinking identity, authentication, and trust in ways our current infrastructure wasn't designed for.

Legal Black Hole

The most fascinating aspect of this entire situation is that nobody knows what the law actually says about AI agents making purchases on your behalf.

Consider this scenario: Your AI agent buys you running shoes. They don't fit. Who's responsible?

  • Is the AI agent your "employee" acting under your authority?
  • Is it a contractor working for the AI company?
  • Did you actually "agree" to the purchase, or did the AI misinterpret your intent?
  • Can you return them under standard return policies, or do different rules apply?

The Uniform Electronic Transactions Act (UETA) and E-SIGN Act validate electronic signatures and contracts, but they were written assuming humans click "I agree." They don't tell us how to handle situations where an AI system makes autonomous decisions based on high-level instructions like "buy me running shoes under $100."

And it gets worse. When things go wrong—the agent buys the wrong product, accesses the wrong account, or exposes payment data—who's liable?

The legal frameworks assume someone clicked a button and agreed to terms. But with agentic AI:

  • The consumer gave high-level intent ("I need shoes")
  • The AI developer built the agent with certain objectives
  • The platform (Amazon) sets rules about what's allowed
  • The payment processor enables the transaction

When something breaks, you've got four parties pointing at each other saying "not my fault."

This isn't edge case stuff—this is the fundamental contract law question that needs answering before any of this scales. And right now? It's a complete void.

 Three Scenarios 

Based on the current trajectory, few things could happen:

Scenario 1: Platform Dominance (Very High Probability)

Amazon wins the lawsuit. Google, Apple, and other major platforms watch carefully and implement similar policies. The outcome:

  • Platforms allow only "approved" agents
  • Approved agents must share 15-30% revenue with platforms
  • Platforms build superior first-party agents using proprietary data
  • Market concentration increases dramatically

This is the most likely outcome because platforms hold all the leverage. They control access to inventory, customer data, and the ability to transact. If you want your AI agent to work, you play by their rules or you don't play at all.

Winner: Existing platform giants. The "disruption" looks suspiciously like the old oligopoly, just with AI agents instead of apps.

Scenario 2: Payment Network Mediation (Medium probability)

Visa and Mastercard successfully establish themselves as neutral trust brokers. Their authentication standards become mandatory. Multiple agents can compete, but all must register with payment networks and follow their protocols.

This creates a more open ecosystem than Scenario 1, but you've still got gatekeepers—just different ones. Every transaction generates payment network fees. The rails change hands, but someone still controls the rails.

Winner: Payment networks become infrastructure monopolies. Better than platform domination, but not exactly a free market.

Scenario 3: Regulatory Intervention (Very low)

Governments step in, mandate open access standards, require algorithmic transparency, and force interoperability. The EU tries this first with AI Act enforcement.

Winner: Consumers and smaller players benefit from enforced competition.

Reality check: Given current U.S. regulatory momentum and the fact that legal frameworks are years behind AI development, this seems highly unlikely. The platforms are moving too fast, and regulators are too slow.

Why This Probably Makes Markets Less Competitive

Here's the uncomfortable truth: despite all the talk about AI "democratizing" commerce and creating more efficient markets, the likely outcome is increased market concentration.

Why? 

Trust Concentrates Around Scale

When AI agents are making autonomous purchases with your money, you need to trust them completely. That trust is hard to build and easy to destroy. Large, established players like Amazon can credibly say "we've processed billions of transactions, here's our security track record."

A startup building a shopping agent? Much harder sell. The trust moat actually gets deeper, not shallower.

Data Moats Become too big wall to jump

The best shopping agent needs to know:

  • Your purchase history
  • Your preferences and budget
  • Your calendar and schedule
  • Your payment methods and addresses
  • Context about why you're shopping

Amazon already has all of this. Google has most of it. A third-party agent has... whatever you manually tell it.

This isn't a gap you can close with "better AI." It's a fundamental data disadvantage that compounds over time.

Network Effects Intensify

Just as traditional commerce requires an ecosystem (platforms, payment processors, logistics, fraud prevention), agentic commerce needs an even more complex interconnected system. The platforms that can bundle these services—authentication, payments, fulfillment, customer service—win by default.

It's the AWS playbook: provide the full stack, make integration seamless, and competitors can't match the convenience.

Power to Block Is Power to Control

This is the key insight from the Amazon-Perplexity fight: if platforms can simply block agents they don't like, then innovation requires permission.

Want to build a revolutionary shopping agent? Great. But if Amazon, Google, and Walmart all block you, your revolutionary agent can't access any inventory. You've built a car with no roads to drive on.

The platforms learned from the app store wars: let a thousand flowers bloom, then harvest the ones that matter.

What This Means for Engineers Building AI Applications

If you're working on AI agents, here's what you need to understand:

Platform Risk Is Your Existential Risk

Don't build on platforms you don't control unless you have explicit agreements in place. The terms of service you're operating under were written before agentic AI existed, and platforms can change the rules whenever they want.

Perplexity is learning this the hard way. They built a business model that required access to Amazon's platform, then discovered Amazon could just say "no."

The Liability Problem Won't Solve Itself

Right now, there's massive ambiguity about who's responsible when AI agents screw up. This ambiguity is risk for everyone in the stack. You need to:

  • Get explicit terms in writing about agent behavior and limits
  • Build audit trails for every decision your agent makes
  • Have clear escalation paths when things go wrong
  • Understand you're probably liable for your agent's actions, even if that's not fair

Security Can't Be an Afterthought

The threat model for agentic commerce is genuinely novel. You can't just apply traditional bot detection because legitimate agents are bots. You need:

  • Cryptographic agent authentication (like Web Bot Auth)
  • Behavioral anomaly detection that works for non-human actors
  • Multi-party verification for high-value transactions
  • Fallback to human-in-the-loop when confidence is low

This is hard, expensive, and essential. The first major security breach involving agent-based shopping will tank consumer trust in the entire category.

Think in Systems, Not Just Models

The failure mode for agentic commerce isn't "the AI makes a mistake." It's "the AI makes a reasonable-seeming decision based on incomplete data, which cascades into a mess of returns, chargebacks, and customer service nightmares."

Good agentic systems need:

  • Clear boundaries on what decisions they can make autonomously
  • Confidence thresholds that trigger human review
  • Graceful degradation when uncertain
  • Mechanisms for users to understand and override decisions

This is systems engineering, not just prompt engineering.

So Who Wins?

If you're asking "who wins the agentic commerce war," here's my read:

Tier 1: Platform Oligarchs (Amazon, Google, Apple) - They control access to inventory and customers. They can block competitors and extract rent from those they allow. Amazon's lawsuit against Perplexity is them establishing this reality.

Tier 2: Payment Networks (Visa, Mastercard) - Becoming the critical trust infrastructure. Every transaction flows through them, and they're setting authentication standards for the entire ecosystem.

Tier 3: AI Insurgents (OpenAI, Perplexity, Anthropic) - High risk, high reward. They have the AI capabilities and consumer mindshare, but they need platform access to deliver value. Many will get squeezed or forced into revenue-sharing deals.

The Losers: Traditional retailers and brands who get reduced to "background utilities" in agent-controlled marketplaces. TripAdvisor already down 30% in traffic. AllRecipes lost 15%. This is the canary in the coal mine.

The uncomfortable parallel: this is the app store model all over again. Platforms create "open" ecosystems, welcome innovation, then monetize, control, and eventually squeeze everyone building on top.

In five years, we'll have agentic commerce. But it will likely be dominated by 3-5 massive platforms that control access, set standards, and extract rent. The "revolution" will look suspiciously like the old regime—just with better AI.

The Bottom Line

Agentic commerce is coming whether we're ready for it or not. The technology works, the market opportunity is massive, and the big platforms are already building it.

But let's not fool ourselves about what we're building. This isn't some perfect future where AI agents create perfect market efficiency and infinite consumer choice. It's a new battleground for the same old fight: who gets to control access to customers, and who gets to extract rent from transactions.

Amazon is suing Perplexity because they understand what's at stake. This isn't about "covert access" or "customer security"—those are the legal justifications. The real fight is about whether Amazon gets to control agentic commerce the same way Apple controlled app distribution and Google controlled digital advertising.

And based on history, platform power, and the economics of trust at scale, they probably will.

We've seen this movie before. The technology is new, but the plot is depressingly familiar.


Friday, 24 October 2025

Memorization Machine:Why AI Coding Agents Aren't Really Programming Yet

This is further exploration and sharing of experience of using Coding Agents tool. You can read some of old post 



If you believe the internet, AI has essentially solved programming. Every day brings new viral videos of LLMs building complete applications in minutes, fixing complex bugs instantly, and dramatically boosting developer productivity. The narrative is clear: AI coding agents are revolutionizing software development.

But there's a fundamental truth being obscured by all the hype: current AI coding agents are sophisticated memorization machines, not genuine programmers. And understanding this distinction explains both their impressive capabilities and their critical limitations.

Programming as Crystallized History

Here's an insight that might seem obvious once stated but has profound implications: programming is fundamentally built on accumulated patterns and historical knowledge. Every framework, algorithm, and design pattern represents decades of collective problem-solving. When we write code, we're rarely inventing something truly novel—we're recombining established solutions in contextually appropriate ways.

This makes programming uniquely suited to pattern-matching systems like LLMs. Unlike fields requiring real-time sensory input or physical manipulation, programming creates an enormous corpus of documented solutions, discussions, and examples. Stack Overflow, GitHub, documentation sites, and millions of code repositories form a vast memory bank that LLMs can internalize during training.


The memorization works at multiple levels:

  • Syntactic patterns: How code is structured
  • Semantic patterns: What code means in context
  • Pragmatic patterns: How code is actually used
  • Meta-patterns: Common problem-solving approaches

Why LLMs Excel at Coding (Within Limits)

This memorization-based architecture explains why LLMs punch above their weight in programming tasks:

Pattern Density: Code has extraordinarily high pattern density. The same structures appear repeatedly across different contexts, creating clear patterns for memorization.
Explicit Structure: Programming languages have formal syntax and semantics, making patterns more distinct and recognizable than natural language.
Solution Reusability: Most programming problems are variations of previously solved problems. An LLM that has "memorized" solutions can adapt them to new contexts with surprising effectiveness.
Rich Training Data: The internet contains millions of code examples with explanations, making it possible for LLMs to learn not just syntax but usage patterns and common approaches.
This is genuinely impressive! Sophisticated pattern matching can solve a remarkable range of programming tasks. But it's not the same as genuine programming expertise.





The Wall: Where Memorization Breaks Down

Real programming requires capabilities that pattern matching simply cannot provide:

System-Level Thinking

Great programmers understand how code fits into larger systems, considering performance, maintainability, security, and business constraints simultaneously. They think architecturally, not in isolated snippets.

LLMs can generate architecturally-sound code patterns they've memorized, but they can't make real architectural trade-offs based on your specific traffic patterns, team structure, regulatory requirements, or budget limitations.

Long-Term Reasoning

Professional programming means writing code thinking about how it will be maintained, modified, and scaled over months or years. It requires understanding technical debt, anticipating future requirements, and building for evolution.

LLMs have no persistent understanding. Each interaction is essentially fresh—they can't build up knowledge of a codebase over time like human programmers do.

Context Beyond Code

Real programming involves understanding business requirements, user needs, team capabilities, and reading between the lines of incomplete specifications.

LLMs work with the text you give them. They can't interview stakeholders, understand implicit requirements, or navigate organizational politics that shape technical decisions.

Novel Problem Solving

When you encounter a problem that doesn't match memorized patterns—a genuinely novel requirement, an unusual constraint, or an emerging technology—memorization-based systems struggle.

They can combine patterns creatively, but they can't reason from first principles or develop entirely new approaches.





The Real Test: Maintenance Programming

The biggest gap becomes obvious when you move beyond initial implementation to maintenance programming:

A viral demo might show: "I built a complete e-commerce site in 30 minutes!"

What's not shown:

  • No authentication system worth deploying
  • No error handling for edge cases
  • No data validation or security considerations
  • No scalability planning
  • No testing strategy
  • Breaks on inputs the demo didn't consider
  • Requires extensive refactoring for production

This is the difference between code that runs initially and code that serves a business reliably for years.

Experienced programmers understand why code evolved the way it did, recognize technical debt patterns, assess risks of changes, and make judgment calls about when to refactor versus work around issues. These capabilities come from experiential learning that memorization cannot replicate.

So Why All the Success Stories?

If current AI coding agents are fundamentally limited, why is the internet overflowing with success stories? The disconnect is real and worth understanding:

The Demo Problem

Success stories showcase clean, isolated problems with clear specifications: "Build a todo app," "Implement quicksort," "Create a REST API endpoint."

They don't showcase real programming work: debugging memory leaks in 500K line codebases, integrating with undocumented legacy systems, refactoring critical infrastructure without breaking anything, or making architectural decisions under business pressure.

Cherry-Picking at Scale

Even if LLMs only work impressively 10% of the time, that's still thousands of viral examples from millions of attempts. Hundreds of failures disappear without a trace.



Economic Incentives

Companies promoting these tools have billions of dollars at stake. OpenAI, Microsoft, Google, and countless startups need to demonstrate transformative results to justify valuations and drive adoption.

"Our AI can replace junior developers" sells better than "Our AI can help with boilerplate code sometimes."

The Experience Gap

Beginners are amazed by any working code generation and often can't assess code quality deeply. Experts notice subtle issues, architectural problems, and maintenance nightmares—but most viral content comes from impressed beginners.

This creates a distortion where surface-level success gets amplified while deeper limitations remain hidden until you actually try to use AI-generated code in production.

Definition Games

What counts as "success"? Code that runs initially or code that's maintainable? Solving toy problems or real business challenges? Individual productivity or team productivity? Short-term output or long-term quality?

The goalposts keep moving. As AI gets better at basic tasks, "success" shifts to whatever AI can currently do.


The Uncanny Valley of AI Programming

This creates an uncanny valley effect. AI coding agents seem very capable on surface-level tasks but fail unpredictably on deeper challenges. They can write code that looks professional but may have subtle issues that only become apparent months later.




What This Means Practically

This isn't an argument against using AI coding tools—I use them regularly and find them valuable. But it's crucial to understand their nature and limitations:

Use AI coding agents as powerful assistants, not autonomous programmers. They excel at accelerating experienced developers but can't replace the deep thinking that programming requires.

Trust but verify. AI-generated code needs careful review by someone who understands the broader context. The code might work in isolation but introduce problems in your specific system.

Focus on the right problems. AI tools shine on well-defined, pattern-matching tasks. They struggle with ambiguous requirements, system design, and novel challenges.

Beware the productivity illusion. Writing code faster doesn't mean programming faster if that code requires extensive debugging and refactoring later.


Looking Forward




For AI to move beyond sophisticated memorization to genuine programming capability, systems would need:

We're not there yet. Current AI coding agents are impressive pattern-matching systems that happen to work well in a domain built on historical patterns. That's genuinely useful, but it's not the same as artificial programming intelligence.

Memorization foundation makes them brittle in exactly the situations where you need the most help—the complex, ambiguous, high-stakes decisions that define excellent programming.

Wednesday, 23 July 2025

Impossible Dream: How facebook $85 Server Conquered Half the Planet

Story of engineering excellence in the age before clouds, and the distributed computing lessons that changed everything



You can read this POST with AI also 
ChatGpt  Perplexity Claude

Chapter 1: The Dorm Room That Shook the World

It was February 4th, 2004, and Mark Zuckerberg had a problem. His $85-per-month server was melting.

Twenty-four hours earlier, he'd launched "The Facebook" from his Harvard dorm room—a simple PHP application running on a basic LAMP stack that any college student could understand. Now, 1,200 Harvard students were frantically refreshing pages, checking profiles, and poking each other with an enthusiasm that was literally breaking the internet.

"We need more servers," someone said, watching the CPU usage spike to 100% and stay there.

But Zuckerberg's roommate, a computer science student, had a different idea. "What if we don't need bigger servers? What if we need smarter architecture?"

And with that question, one of the greatest distributed computing adventures in history began.

The First Distributed Computing Lesson: When you can't scale up, scale out—but do it intelligently.


Chapter 2: University Student aha moment 

By March 2004, Facebook was expanding to other Ivy League schools. The obvious solution was to throw all users into one massive database and hope for the best. But the engineering team noticed something fascinating: Harvard students mostly talked to other Harvard students. Yale students connected with Yale students. Princeton was its own social universe.

"Why are we fighting human nature?" asked one engineer, staring at server logs. "Let's work with it instead."

They made a decision that would echo through distributed computing history: separate database instances for each university. Not because it was technically elegant, but because it matched how humans actually behaved.

The results were magical. Database queries that once crawled across massive datasets now zipped through smaller, focused collections. Server load distributed naturally. The architecture scaled not through brute force, but through understanding.

Months later, when Facebook was serving dozens of universities with blazing speed, that engineer would realize they'd stumbled upon something profound: the best distributed systems aren't the ones that fight reality—they're the ones that embrace it.

The Second Distributed Computing Lesson: Design your data architecture around user behavior patterns, not technical elegance.


Chapter 3: Sharding Revolution

By 2005, Facebook faced its first existential crisis. Students were no longer staying within their university bubbles. They wanted to connect with high school friends at different colleges, summer camp buddies scattered across the country, and family members everywhere.

The beautiful university-based system was crumbling.

"We need to completely rethink databases," announced the head of engineering in a meeting that would last many hours. "If we can't keep users separated by school, we'll separate them by... randomness."

The room fell silent. Random database sharding? It sounded insane.

But they were desperate, and desperate times call for revolutionary thinking. They embarked on the most audacious database experiment of the early internet: random sharding across thousands of MySQL instances, with shard IDs embedded in every piece of content.

The catch? They had to eliminate cross-database JOINs entirely. Every query that once elegantly connected related data across tables now had to be redesigned from scratch.

"It's like rebuilding a skyscraper while people are living in it," muttered one engineer, refactoring the friends system for the hundredth time.

But it worked. By 2009, Facebook was processing 200 billion page views monthly using this revolutionary approach. They'd proven that with enough engineering creativity, traditional databases could support planet-scale applications.

The Third Distributed Computing Lesson: When existing technologies don't fit your scale, don't accept their limitations—reimagine them entirely.

Chapter 4: Great Cache Stampede

2007 brought a new nightmare: the cache stampede.

Picture this: A popular piece of content expires from cache simultaneously across thousands of servers. Suddenly, thousands of database queries slam the backend all at once, creating a cascading failure that brings down the entire system.

"It's like everyone in a theater trying to exit through the same door,"  They needed traffic control.

Enter the "leases" system—one of the most elegant solutions in caching history. When cache data expired, only one server got permission (a "lease") to fetch fresh data from the database. Everyone else waited patiently for the result.

But that was just the beginning. Facebook's caching infrastructure became a marvel of distributed computing:

  • 1 billion cache requests per second flowing through custom-optimized memcached
  • 521 cache lookups for an average page load, orchestrated with surgical precision
  • UDP optimization that squeezed every microsecond of performance from the network
  • Regional cache pools that shared data across continents

One engineer summed it up perfectly: "We didn't just build a cache. We built the world's largest memory bank, and every byte in it had to be exactly where it needed to be, exactly when it needed to be there."

The Fourth Distributed Computing Lesson: At planet scale, caching isn't optimization—it's fundamental architecture.

Chapter 5: PHP Impossibility

2008 brought the PHP crisis.

Facebook was serving hundreds of millions of users with a programming language that parsed and executed code from scratch on every single page request. It was like rebuilding a car engine every time you wanted to drive to the grocery store.

"PHP is killing us," said someone, staring at server utilization charts.

Traditional solutions like opcode caching helped, but Facebook needed something revolutionary. What they came up with sounded like science fiction: compile PHP to C++ and then to native machine code.

The HipHop project started as a weekend hackathon experiment. "Let's see if we can make PHP as fast as C++," said one engineer, probably not realizing they were about to rewrite the rules of web development.

The results defied belief:

  • 50% CPU reduction immediately upon deployment
  • 90% of Facebook's traffic running on their custom PHP compiler by 2010
  • 70% more traffic served on the same hardware

They had essentially created a new programming language that looked like PHP but performed like compiled code. It was engineering audacity at its finest.

The Fifth Distributed Computing Lesson: Don't let programming language limitations define your performance ceiling—rewrite the language if you have to.


Chapter 6: Impossible Geography

As Facebook exploded globally, they faced a puzzle that kept engineers awake at night: How do you serve users in Japan, Brazil, and Germany with the same millisecond responsiveness as users in California?

The solution was a masterpiece of distributed systems thinking: geographic replication with intelligent routing.

West Coast servers became the "source of truth"—all writes happened there. But reads could happen anywhere, served by mirror databases that synchronized with the masters through carefully orchestrated replication.

Here's where it got really clever: When you updated your status, Facebook set a special cookie that ensured you'd see your own changes immediately (served from the West Coast), while your friends around the world would see the update within seconds as it propagated through the global infrastructure.

"It's like having a conversation that happens simultaneously in multiple time zones," explained one engineer. "Everyone hears you speak in real-time, even though the sound waves take different amounts of time to reach each person."

By 2010, this system was handling users across six continents with response times that felt local everywhere.

The Sixth Distributed Computing Lesson: Global consistency is less important than local performance—design for eventual consistency with smart routing.

Chapter 7: Security Paradox

2009 brought Facebook's most dangerous challenge yet: securing 300 million users with no blueprint to follow.

Modern authentication systems, OAuth, and sophisticated security frameworks simply didn't exist. Facebook had to build everything from scratch while hackers around the world tried to break in.

"We're writing the security playbook for the planet-scale internet," said the security chief, "and we're doing it while under attack."

Their solutions became legendary:

  • Distributed session management using their own cache infrastructure
  • Custom API authentication for the Facebook Platform launch
  • Geographic session routing that kept users secure across continents
  • Privacy controls that learned from early mistakes and influenced industry standards

The Facebook Platform launch in 2007 added another layer of complexity: How do you let thousands of third-party developers access user data without compromising security?

Their answer: revolutionary API design with granular permissions, rate limiting, and authentication systems that later influenced how the entire internet handles third-party integrations.

The Seventh Distributed Computing Lesson: Security at planet scale requires custom solutions that evolve with your architecture—you can't retrofit security onto distributed systems.


Chapter 8: Hardware Symphony

By 2009, Facebook was orchestrating a symphony of silicon across multiple continents.

60,000 servers. Think about that number. Before cloud computing, before Infrastructure as a Service, a college website had assembled more computing power than most governments.

But the real magic wasn't in the quantity—it was in the orchestration:

  • Multi-tier load balancing with Layer 4 and Layer 7 routing intelligence
  • Custom flow control to handle their unique "incast" networking problems
  • Geographic distribution that automatically shifted traffic based on capacity and performance
  • Predictive scaling that bought and configured hardware months before it was needed

The efficiency was staggering: 1 million users per engineer—a ratio that remains impressive even by today's standards.

The Eighth Distributed Computing Lesson: Planet-scale infrastructure requires predictive thinking and symphonic coordination—you can't just add servers and hope for the best.


Chapter 9: Culture Revolution

Perhaps the most important innovation wasn't technical—it was cultural.

"Move fast and break things" wasn't just a slogan; it was survival strategy. In a world where Facebook had to build everything from scratch, traditional software development practices would have been corporate suicide.

Facebook's engineers developed a culture of fearless innovation:

  • Experiment with radical solutions like compiling PHP to C++
  • Contribute innovations back to the open-source community
  • Measure everything and optimize based on data, not opinions
  • Question fundamental assumptions about how internet infrastructure should work

It was an obligation to think differently for every enginner. Normal thinking wouldn't get to 500 million users.

This culture enabled a small team to repeatedly achieve the impossible, building custom solutions that often became industry standards.

The Ninth Distributed Computing Lesson: Technical innovation requires cultural innovation—create an environment where impossible solutions are just engineering challenges waiting to be solved.

Ultimate Lesson

Facebook's journey from $85 server to half a billion users proves that the most important ingredient in any distributed system isn't the infrastructure—it's the engineering mindset that refuses to accept "impossible" as a final answer.

They didn't wait for someone else to solve planet-scale computing. They invented planet-scale computing.

Great engineering doesn't adapt to limitations. Great engineering eliminates limitations by building the impossible solutions that become tomorrow's standard infrastructure.

Sometimes the best way to solve an impossible problem is to prove it's not impossible.

Saturday, 12 July 2025

Death and Resurrection of Test-Driven Development

 The Death and Resurrection of Test-Driven Development: How AI Agents Are Creating TDD 2.0

What happens when you give an army of AI agents the power to write, run, and evolve tests faster than any human ever could?


I've been writing tests for over a decade. I've lived through the religious wars between TDD purists and pragmatic developers. I've seen teams abandon TDD because it felt too slow, too rigid, too... human. But something extraordinary is happening in 2025 that's about to change everything we thought we knew about test-driven development.

AI agents aren't just helping us write tests. They're creating an entirely new species of TDD that operates at superhuman scale and speed.

You can read this POST with AI also 

ChatGptPerplexity   Claude

The TDD We Knew Is Dead

Let's be honest about traditional TDD's limitations. Kent Beck gave us a beautiful philosophy: Red-Green-Refactor. Write a failing test, make it pass, clean up the code. Rinse and repeat. But in practice, TDD always hit the same human bottlenecks:

  • The Imagination Gap: How many edge cases can you really think of at 2 PM on a Thursday?
  • The Speed Trap: Writing comprehensive tests takes time. Lots of time.
  • The Maintenance Burden: Tests become another codebase to maintain, debug, and evolve.
  • The Context Switch: Constantly jumping between "what should this do?" and "how should this work?"

These weren't flaws in TDD's logic—they were constraints of human cognition. We can only think of so many test cases, work so fast, and maintain so much complexity before something breaks down.

But what if we could remove the human bottlenecks entirely?

Enter the AI Test Swarm

Imagine this: You type a single line of code, and instantly, an army of AI agents springs into action. One agent generates 47 different test scenarios you never would have considered. Another creates performance benchmarks. A third spins up security vulnerability tests. A fourth simulates realistic user interactions. All of this happens in the time it takes you to reach for your coffee.

This isn't science fiction. This is Agentic TDD—and it's fundamentally different from anything we've seen before.



Some Superpowers of Agentic TDD

1. Infinite Test Hypothesis Generation

Traditional TDD: "Hmm, what should I test here?" Agentic TDD: Generates many test scenarios in 2 seconds

AI agents don't get tired. They don't get bored. They don't forget about that weird edge case where someone passes a negative array index. They systematically explore every possible branch, every boundary condition, every integration point you never thought to test.

2. Real-Time Test Evolution

Your tests used to be static artifacts—written once, modified reluctantly. Now they're living entities that evolve with your code. Change a function signature? The AI agents instantly update dozens of related tests. Add a new feature? Tests for likely extension points appear automatically.

3. Multi-Dimensional Orchestration

Why test just functionality when you can test everything simultaneously? Agentic TDD orchestrates unit tests, integration tests, performance tests, security tests, accessibility tests, and cross-platform tests as a single, coordinated symphony. Every code change triggers a comprehensive validation matrix across all dimensions.

4. Predictive Testing

The most mind-bending capability: AI agents that predict what tests you'll need before you need them. They analyze your codebase patterns, identify likely evolution paths, and pre-generate tests for features you haven't even planned yet. It's like having a crystal ball for software quality.

5. Global Learning Network

Every bug becomes training data. Every test failure becomes institutional knowledge. Agentic TDD learns from patterns across entire organizations, entire industries, entire programming ecosystems. The collective intelligence of all software development feeds back into your local testing strategy.

The Architecture of Intelligence

[The diagram shows the transformation from traditional TDD to Agentic TDD, with AI agents orchestrating multi-dimensional testing around a central intelligence hub]

At the heart of Agentic TDD sits an AI Quality Orchestrator—a central intelligence that coordinates specialized AI agents, each focused on different aspects of software quality. These agents don't just run tests; they think about tests, learn from test results, and continuously evolve their testing strategies.

What This Actually Looks Like

Let me paint you a picture of development in this new world:

9:23 AM: You write a new authentication function. Before you can even save the file, AI agents have generated 73 test cases covering normal authentication, edge cases, security vulnerabilities, performance under load, and cross-browser compatibility.

9:24 AM: The agents notice your function is similar to OAuth implementations in three other projects. They automatically generate tests for common OAuth pitfalls and suggest security improvements based on global failure patterns.

9:25 AM: Your code fails 12 of the generated tests. But instead of cryptic error messages, you get intelligent explanations: "This function is vulnerable to timing attacks. Here's a test that demonstrates the issue and three potential solutions."

9:27 AM: You fix the issues. The AI agents instantly verify the fixes, update the related tests, and generate new tests for the code paths your fixes just created.

9:30 AM: You push to production with confidence that would have taken hours or days to achieve manually.  While this is somewhat exaggerated, it demonstrates how code can be deployed to production with a 7-minute cycle time :-)


The Philosophical Shift

This isn't just about faster testing. It's about a fundamental shift in how we think about software quality.

Traditional TDD: Quality is something we add through disciplined testing practices.

Agentic TDD: Quality is an emergent property of intelligent systems that continuously validate, learn, and evolve.

We're moving from human-driven quality assurance to AI-augmented quality emergence. The difference is like the gap between a craftsman making furniture by hand and a factory that not only manufactures furniture but continuously improves its own manufacturing processes.

The Objections (And Why They're Wrong)

"But AI-generated tests will be low quality!"

This assumes AI agents are just fancy autocomplete tools. They're not. They're learning systems that get better at testing the more they test. They learn from every failure, every edge case, every successful catch. After processing millions of test scenarios, they develop intuitions about software quality that surpass human experience.

"Developers won't understand the tests!"

The best AI systems are explainable. Agentic TDD doesn't just generate tests—it explains why each test matters, what it's protecting against, and how it fits into the broader quality strategy. You'll understand your own software better, not worse.

"This will make developers lazy!"

The opposite is true. By handling the mechanical aspects of testing, AI agents free developers to focus on creative problem-solving, architectural decisions, and user experience. It's like how calculators didn't make mathematicians lazy—they made them more capable.

The Practical Reality

We're not there yet. Today's AI coding assistants are impressive but limited. They can help write tests, but they can't orchestrate comprehensive quality assurance ecosystems. We're still in the early days of this transformation.

But the trajectory is clear. Every month, AI agents become more capable at understanding code, predicting failures, and generating meaningful tests. The building blocks are falling into place:

  • Advanced code analysis that understands program behavior at a deep level
  • Simulation engines that can model complex system interactions
  • Learning algorithms that improve with every codebase they encounter
  • Orchestration platforms that coordinate multiple AI agents effectively

The Future Is Already Here

Companies like Anthropic, OpenAI, Google and others are building AI systems that can reason about code, understand requirements, and generate comprehensive test suites. Coding Agents already helps millions of developers write tests. The next logical step is systems that don't just help—they lead.

The question isn't whether Agentic TDD will happen. The question is whether you'll be ready when it does.

What This Means for You

If you're a developer, start thinking about how to work with AI agents rather than despite them. Learn to prompt AI systems effectively. Understand how to guide AI-generated tests toward your quality goals. Practice explaining your intent to AI systems in ways that produce better automated testing.

If you're a team lead, start experimenting with AI-assisted testing tools. Build processes that can scale with AI capabilities. Invest in team members who can bridge the gap between human intent and AI execution.

If you're a CTO, start planning for a world where software quality is limited not by human testing capacity but by the intelligence of your AI agents. The competitive advantage will belong to organizations that can deploy the most sophisticated AI quality assurance systems.

The Resurrection

TDD isn't dying—it's being reborn. The core principles remain the same: write tests first, get fast feedback, iterate toward quality. But the scale, speed, and sophistication are about to explode beyond anything we've imagined.

We're witnessing the evolution of TDD from a human practice to a hybrid human-AI ecosystem. The developers who embrace this transformation will build better software, faster, with fewer bugs and more confidence.

The age of Agentic TDD is beginning. The question is: are you ready to join the resurrection?


What aspects of Agentic TDD excite or concern you most? How do you think AI agents will change your development workflow? Let's discuss in the comments below.

Saturday, 5 July 2025

Applying SOLID Principal to LLM System Design

Remember when you first discovered Large Language Models? The excitement! The possibilities! You probably built something amazing in a weekend, shipped it, and felt like a genius.

Then reality hit. Your "simple" chatbot now handles customer service, generates code, moderates content, and somehow ended up managing your company's inventory. The once-elegant prompt has become a 500-line monster that breaks whenever Mercury is in retrograde.

Sound familiar? You're not alone. The AI powered Software development is repeating every mistake we made in early software development—monolithic systems, tight coupling, and code that nobody dares to touch and understand.

You can read this POST with AI also 

ChatGptPerplexity   Claude


Vibe coding also has role to play in this.



But there's hope. Robert "Uncle Bob" Martin's SOLID principles, which revolutionized object-oriented programming, can transform how we build AI systems too.


Single Responsibility Principle: One AI, One Job

"A class should have one, and only one, reason to change."

Problem: Everything AI

We've all seen them—those monstrous services that try to do everything:

java
@Service
public class AIGodService {
    public SentimentResult analyzeSentiment(String text) { ... }
    public List<Product> generateRecommendations(Customer customer) { ... }
    public String handleCustomerInquiry(String inquiry) { ... }
    public void processReturn(ReturnRequest request) { ... }
    // ...and 20 other responsibilities
}

When your sentiment analysis needs tweaking, you risk breaking the recommendation engine. When you update customer service logic, inventory management might explode. It's a house of cards waiting to collapse.

Solution: Specialized Experts

Instead of one AI that does everything poorly, create focused services that excel at specific tasks:

java
@Service
public class SentimentAnalysisService {
    public SentimentResult analyze(String text) {
        // Just sentiment, nothing else
    }
}

@Service
public class ProductRecommendationService {
    public List<Product> recommend(Customer customer) {
        // Only recommendations
    }
}

Think of it like assembling a dream team instead of hiring one overworked intern. Each service becomes an expert in its domain, leading to better accuracy and easier maintenance.



Real-world win: A content moderation system with separate detectors for toxicity, spam, and misinformation. Each can be updated independently without breaking the others.


Open/Closed Principle: Built to Grow

"Software should be open for extension, closed for modification."

Problem: Model Lock-in

Your code probably looks like this disaster waiting to happen:

java
public String generateResponse(String prompt, ModelType modelType) {
    switch (modelType) {
        case GPT_4: return openAI.complete(prompt);
        case CLAUDE: return anthropic.generate(prompt);
        case LLAMA: return ollama.run(prompt);
        // Add new model? Modify this method!
    }
}

Every new model means touching existing code. Every provider change means hunting down hardcoded logic across your entire codebase.




Solution: Plugin Architecture

Design your system like a Swiss Army knife—ready for new tools without rebuilding the handle:

java
public interface LLMPlugin {
    boolean canHandle(LLMRequest request);
    LLMResponse execute(LLMRequest request);
}

@Component
public class CodeGenerationPlugin implements LLMPlugin {
    public boolean canHandle(LLMRequest request) {
        return request.getIntent() == RequestIntent.CODE_GENERATION;
    }
    
    public LLMResponse execute(LLMRequest request) {
        // Specialized code generation logic
    }
}

Want to add GPT-5 support? Create a new plugin. Need to handle a new type of request? Another plugin. The core system never changes.


Real-world win: Adding multimodal capabilities to a text-only system without touching existing code.


Liskov Substitution Principle: True Flexibility

"Objects should be replaceable with their subtypes without breaking things."

The Problem: Fake Abstractions

You create interfaces, but they're just facades hiding provider-specific chaos:

java
// This looks good...
public interface LLMProvider {
    String generate(String prompt);
}

// But implementations leak details everywhere
public class OpenAIProvider implements LLMProvider {
    public String generate(String prompt) {
        // Returns JSON that only works with OpenAI parsing logic
        // Uses OpenAI-specific error codes
        // Expects OpenAI-formatted prompts
    }
}

Swapping providers breaks everything because they're not truly interchangeable.




Solution: Genuine Compatibility

Create abstractions that actually abstract:

java
public interface LLMProvider {
    LLMResponse generate(String prompt, GenerationConfig config);
    Set<LLMCapability> getCapabilities();
}

public class GenerationConfig {
    private final int maxTokens;
    private final double temperature;
    // Standard configuration that works everywhere
}

Now any provider can replace any other provider (within their capabilities) without your application knowing or caring.


Real-world win: Switching from expensive GPT-4 to cost-effective local models for development environments without changing a single line of business logic.


Interface Segregation Principle: Right-Sized Interfaces

"Don't force clients to depend on things they don't use."

Problem: Interface Bloat

One massive interface to rule them all:

java
public interface MegaAIService {
    String generateText(String prompt);
    String generateCode(String description);
    byte[] generateImage(String description);
    VideoResult analyzeVideo(byte[] video);
    // 47 more methods your chatbot will never use
}

Your simple chatbot shouldn't need to import video analysis capabilities.



Solution: Focused Contracts

Break interfaces into logical, cohesive groups:

java
public interface TextGenerator {
    String generate(String prompt);
}

public interface SentimentAnalyzer {
    SentimentResult analyze(String text);
}

public interface CodeGenerator {
    String generateCode(String description, Language language);
}

Now components only depend on what they actually need. Your chatbot imports text generation and sentiment analysis. Your developer assistant adds code generation. Your content moderator might use all three.


Real-world win: Different teams can work on different capabilities without stepping on each other's toes.


Dependency Inversion Principle: Abstractions Rule

"High-level modules shouldn't depend on low-level modules. Both should depend on abstractions."

Problem: Concrete Coupling

Your business logic knows way too much about AI implementation details:

java
@Service
public class OrderProcessor {
    private final OpenAIGPT4Client gpt4Client; // Locked to specific implementation
    
    public ProcessedOrder processOrder(OrderData order) {
        String openAIPrompt = formatForOpenAI(order); // Provider-specific formatting
        OpenAIResponse response = gpt4Client.complete(openAIPrompt);
        return parseOpenAIResponse(response); // Provider-specific parsing
    }
}

This code is married to OpenAI. Divorce is expensive and messy.



Solution: Abstract Dependencies

Your business logic should think in business terms, not AI implementation details:

java
@Service
public class OrderProcessor {
    private final LLMService llmService; // Abstract dependency
    
    public ProcessedOrder processOrder(OrderData order) {
        ProcessingRequest request = ProcessingRequest.builder()
                .data(order.toJson())
                .taskType(TaskType.ORDER_PROCESSING)
                .build();
        
        LLMResponse response = llmService.process(request);
        return parseResponse(response); // Generic parsing
    }
}

Now your order processor works with any LLM implementation. Today it's OpenAI, tomorrow it might be Claude, next week it could be your own fine-tuned model.


Real-world win: Testing with fast, cheap local models while running production on premium cloud models.


What do you gain from it ?

Maintainability

Each component has a clear purpose. No more "change one thing, break everything" scenarios.

Testability

Single-responsibility components are easy to test:

java
@Test
void billingPluginShouldHandleBillingQueries() {
    // Simple, focused test
    assertTrue(billingPlugin.canHandle(billingIntent));
}

Flexibility

Want to swap GPT-4 for Claude? Easy. Need to add a new capability? Just create a new plugin. Requirements change? Your architecture adapts gracefully.

Team Productivity

Different developers can work on different components without conflicts. The frontend team can use mock AI interfaces while the AI team perfects the real implementations.

Common Pitfalls to Avoid

Prompt Soup

Don't create mega-prompts that try to handle every possible scenario. Split complex tasks into focused, manageable prompts.

Model God Object

Avoid services that take a dozen parameters and try to handle every possible AI task.

Tight Coupling Trap

Don't hardcode model names, API formats, or provider-specific logic in your business code.

Getting Started

  1. Audit your current code - Look for classes doing multiple things, hardcoded model logic, and monolithic interfaces.
  2. Extract abstractions - Define clean interfaces for your AI operations.
  3. Create adapters - Implement your interfaces for specific models/providers.
  4. Inject dependencies - Let Inversion Control framework/lib wire everything together based on configuration.
  5. Add plugins - Create extension points for new capabilities.

The Bottom Line

Organizations building maintainable AI systems today will dominate tomorrow. While others struggle with technical debt from their "quick and dirty" LLM implementations, you'll be shipping new features at lightning speed.

SOLID principles aren't just academic theory—they're battle-tested guidelines that have saved countless projects from architectural nightmares. As AI becomes more central to software systems, these principles become more crucial, not less.

Start small. Pick one principle. Refactor one component. Your future self will thank you.

--------------