Friday, 30 May 2025

RAG vs Non-RAG Coding Agents

Every time a developer asks an AI coding assistant to generate code, they're initiating a search process. But the question isn't whether search happens—it's where and how that search occurs. Search can be done in model knowledge base or it can use some tools to perform search.

Code Is Different - but why ?

Searching for code is interesting search problem and it has it unique challenges.

When a human programmer approaches a codebase, they don't just look for similar examples. They build a mental model of how the system works: - How data flows through the application - What architectural patterns are being used - How different modules interact and depend on each other - What the implicit contracts and assumptions are This mental model is what enables programmers to make changes without breaking the system, debug complex issues, and extend functionality in coherent ways.

What are options for code search algorithms

Retrieval Augmentation Generation (RAG)

RAG excels at finding relevant information and synthesizing it into coherent responses. This works brilliantly for answering questions about historical facts or summarizing documents. But code isn't documentation—it's a living system of interconnected logic that demands deep understanding.

- The Precision Problem: When "Close Enough" Breaks Everything

RAG, operates on surface-level similarity. It retrieves code snippets that look relevant but may operate under completely different assumptions about data structures, error handling patterns, or architectural constraints.

In most applications, RAG's precision-recall trade-off is manageable. If a chatbot gives you 90% accurate information, that's often good enough. But code demands near-perfect precision. A single misplaced bracket, incorrect variable name, or wrong assumption about data types can crash entire systems or bad user experience as code will be rejected. RAG optimizes for semantic similarity, not functional correctness. It might retrieve code that's conceptually similar but functionally incompatible:

- A function that looks right but expects different parameter types - Error handling patterns that don't match the codebase's conventions - Solutions that work in one context but fail in another due to different dependencies This isn't just an inconvenience—it's a fundamental mismatch between what RAG provides and what coding requires.

The Context Catastrophe

Code exists in rich, interconnected contexts that span multiple files, modules, and even repositories. A seemingly simple function might depend on: - Configuration files that define system behavior - Environment variables that change at runtime - Database schemas that constrain data operations - Architectural patterns that dictate how modules interact RAG retrieves chunks of information based on similarity, but coding decisions often depend on distant context that's impossible to capture in isolated snippets. The system might retrieve the perfect function implementation, but it's designed for a completely different architectural context.

- The Dynamic System Challenge

Perhaps most critically, effective coding requires real-time interaction with living systems. Coding is fundamentally about: - Writing code and seeing how it behaves - Running tests to validate assumptions - Using compiler errors as feedback - Debugging by tracing execution paths - Iterating based on runtime behavior RAG provides static information about how someone else solved a similar problem. But what you need is dynamic interaction with your current, specific codebase.

Reasoning Retrieval Generation (RRG)

RRG is new term that i am going to use for Reasoning based approach.

Lets look into what happens in RRG based approach and it can also be called Reasoning first approach.

In reasoning first approach , chain of thought , self reflection , Tree of Thought etc becomes primary tool. Lets look at how does this thing works

- Build Mental Models in Real-Time

Instead of retrieving similar code, reasoning-based agents analyze the actual codebase to understand: - How the system is structured and why - What patterns and conventions are being followed - How data flows through different components - What the implicit contracts and assumptions are

- Leverage Tool Integration

Rather than retrieving documentation, effective coding agents interact directly with development tools: - Compilers and interpreters for immediate feedback - Testing frameworks to validate solutions - Debuggers to trace execution and find issues - Static analysis tools to understand code structure - Version control systems to understand change history

- Think Through Problems Step-by-Step

Chain of thought reasoning allows agents to: - Trace through code execution paths to understand behavior - Identify root causes of bugs through logical deduction - Reason about the implications of changes before making them - Build solutions from first principles rather than pattern matching

Trade-Off - Aspect that you can't ignore

Nothing comes for free , lets look at tradeoff of RRG

Knowledge Boundaries

RRG agents are limited by their training data. They can't access: - Documentation for recently released libraries - Community solutions to novel problems - Project-specific conventions not captured in code - Specialized domain knowledge from external sources But here's the key insight:

Understanding trumps information access.

A solid mental model of how systems work doesn't become outdated when new frameworks are released. The fundamentals of good design, debugging approaches, and architectural thinking remain stable across technology changes.

Context Window Constraints

Without retrieval, agents must work within their context limits. Large codebases can exceed what fits in memory. However, this constraint forces better architectural approaches: - Focus on understanding system structure and patterns - Use tool integration to navigate codebases systematically - Build summarization and abstraction capabilities - Develop better code analysis and navigation strategies

Specialized Domain Gaps

RRG agents may struggle with highly specialized domains not well-represented in training data. But this is where tool integration shines—rather than retrieving domain knowledge, agents can interact with domain-specific tools and APIs directly.

Cost and Resources Challenges

Needs large context models ( 100K+ or 1M)

High per request cost due to massive context usage

Not Cost optimised.

Slow inference due to processing of entire context

Instruction following limitation by LLM as context gets close to 50% fill.

What is solution - best of both world

Fusion is solution.

Mental Model Filtering Process

The battle for dominance in the coding agent landscape is heating up. Will the winner be IDE-integrated solutions like Cursor, Windsurf, VS Code, or IntelliJ? Perhaps it will be Claude Code or Openai-codex or google jules ? Or could the no-code and low-code platforms like Bolt, Loveable, Replit or Open source like Aider, or Cline ultimately ?

But here's the twist: while these coding agents compete fiercely for market share, someone else is already winning this game—and the answer might be more obvious than you think.

Tuesday, 27 May 2025

Ages of Search: From Commands to Conversations

Imagine you need to get from Los Angeles to New York to visit your sister, but you're on a tight budget. How would this conversation play out across three decades of search technology?

1995 - The Command

You type: cheap flights Los Angeles New York The computer returns 10,000 pages containing those exact words—including a poetry blog that mentions "cheap thrills," an article about Los Angeles architecture, and someone's travel diary from New York.

You spend the next two hours going through irrelevant results, gradually learning to speak the computer's rigid language.

2015 - The Interpretation

You search: "Best way to get from LA to NYC" Google understands you want travel options and shows flight comparison tools, bus routes, and train schedules.

Much better! But you still need to do all the work—comparing prices, checking dates, figuring out the tradeoffs.

2025 - The Collaboration

You tell an AI: "I need to visit my sister in New York next month, but I'm on a tight budget"

AI responds: "I'd be happy to help! What dates work for you, and what's your budget range? I can also check if there are better deals on nearby airports or alternative dates. Would you consider a bus or train if it saves significantly?"

This isn't just about better search results—it's about a fundamental shift in how humans and computers communicate. We've evolved from rigid commands to natural collaboration, mirroring the evolution of human conversation itself.

Age I: The Librarian Era (1990s-2000s)

When computers were very fast, very literal librarians

In the beginning, search engines were like that ultra-efficient but painfully literal librarian who would only help if you asked in exactly the right way. You wanted information about cars? You better not say "automobile" or "vehicle"—the computer knew what you typed, not what you meant.

How the Librarian Worked

The technical foundation was elegantly simple: computers built massive indexes of every word on every webpage, then used algorithms like TF-IDF and PageRank to rank results. Think of it as the world's largest, fastest card catalog system. When you searched for "red shoes," the computer found every document containing both "red" and "shoes" and ranked them by relevance signals like how often those words appeared and how many other sites linked to them.

This approach is very innovative:

Lightning Speed: Results appeared in milliseconds

Perfect Precision: Great for exact technical lookups

Transparent Logic: You knew exactly why you got specific results

Predictable: The same query always returned the same results

When the Librarian Shined

Keyword search was perfect for anyone who spoke the system's language. Lawyers searching legal databases, developers hunting through code repositories, and researchers looking for specific technical terms all thrived in this era. If you knew the exact terminology and needed exact matches, nothing beat keyword search.

Breaking Point

But some of critical failures exposed the limitations:

The Vocabulary Mismatch Crisis: Normal people think "heart attack," doctors write "myocardial infarction." Normal people say "car," auto websites say "vehicle" or "automobile." The computer couldn't bridge this gap.

Boolean Rigidity: users must think like programmers

No Semantic Relationship: cannot understand dog and puppy are related.

Long-Tail Problem: By the 2000s, 70% of searches were unique, multi-word phrases. "Best pizza place near downtown with outdoor seating" simply couldn't be handled by exact keyword matching.

Mobile Revolution: Voice search made keyword precision impossible. Try saying "Boolean logic" to Siri, Alexa etc and see what happens.

Age II: Translator Era (2000s-2020s)

Teaching computers to understand meaning, not just match letters

Breakthrough question shifted from "What did they type?" to "What did they mean?"

Suddenly, computers learned that "puppy" and "dog" were related, that "inexpensive" and "cheap" meant the same thing, and that someone searching for "apple" might want fruit recipes or stock information depending on the context.

Technical Revolution

The magic happened through vector embeddings—a way of representing concepts as coordinates in mathematical space. Words and phrases with similar meanings ended up close together in this multidimensional space. It's like teaching a computer that "Paris, France" and "City of Light" should be neighbors in concept-space, even though they share no letters.

The architecture evolved from simple index lookup to sophisticated understanding: Query → Intent Analysis → Vector Similarity → Contextual Ranking → Enhanced Results

Real-World Transformations

Google's Knowledge Graph changed everything. Instead of just returning links, Google started understanding entities and relationships. Search for "Obama" and get direct answers about the former president, not just a list of web pages mentioning his name.

Amazon's Recommendations stopped being "people who bought X also bought Y" and became "people who like dark psychological thrillers might enjoy this new release"—even for books with completely different titles and authors.

Netflix's Discovery learned to understand that you enjoy "witty workplace comedies with strong female leads" without you ever typing those words.

Context Awareness Breakthrough

The same query now meant different things to different people:

"Apple" returns fruit recipes for food bloggers, stock information for investors
"Pizza" automatically means "pizza near me"
"Election results" means the current election, not historical data

Some of the major breakthrough in this age include

Google PageRank Evolution

Knowledge Graph - Direct answer instead of links

BERT - Understanding context and nuance in natural language

Personalisation at Scale - Different results for different users based on context

Mobile first search - Understanding voice query and local intent

New Limitations Emerged

While semantic search solved the vocabulary mismatch problem, it created new challenges:

The Black Box Problem: Users couldn't understand why they got specific results

Computational Intensity: Required significant processing power compared to keyword search

Bias Amplification: Training data prejudices got reflected in results

Still Reactive: The system waited for users to initiate searches

Age III: The Consultant Era (2020s-Present)

From search engine to research partner

The fundamental question evolved again: from "What information exists about X?" to "How can I solve problem X?"

Instead of just finding information, AI agents now break down complex problems, use multiple tools, maintain conversation context, synthesize insights from various sources, and proactively suggest next steps.

Superpowers of AI Agents

Multi-Step Reasoning: Breaking "plan my wedding" into venue research, catering options, budget optimization, and timeline coordination
Tool Integration: Using APIs, databases, calculators, and other services seamlessly
Conversational Memory: Remembering what you discussed three questions ago
Synthesis: Creating new insights by connecting information from multiple sources
Proactive Assistance: Anticipating needs and suggesting what to explore next

How all these super power is used during search ?

Agentic Search in Action: Wedding Planning

Key Capabilities

Problem decomposition - "Plan my ....." becomes n+ interconnected sub task

Real time Integration - Live data feeds , current pricing , availability

Cross domain synthesis - Connecting insights from domains like finance , market research , user reviews simultaneously

Iterative Refinement - Learning from user in same conversation

Proactive Discovery - Features like "Have you consider ?" or "You might also want to ..."

Current limitations and Challenges

High computational cost - Pennies vs $1+ per query

Latency : Milliseconds vs Minutes for complex task

Black bock reasoning : Difficult to audit decision making

Inconsistency : Same query may yield different results or reasoning

Privacy : Conversation history or deep context is required

Hallucination : This will leave it as feature or bug both

Architecture Evolution: From Commands to Collaboration

What does future looks like ?

ROI progression is fascinating: keyword search provides immediate value, semantic search shows results in hours, while agentic search may take days/weeks to implement but can deliver transformative business impact.

I think answer is "All of the Above"

Modern search systems don't choose one approach—they intelligently route queries to the most appropriate method:

Simple lookups → Keyword search for speed
Natural language queries → Semantic search for relevance
Complex problems → Agentic search for comprehensive solutions

Google exemplifies this hybrid approach: it uses keyword matching for exact phrases, semantic understanding for intent, and agentic features for complex queries like "plan my trip to Japan in cherry blossom season."

Let me end this post with one more questions - What types of search Coding Agent like github co pilot , Aider , Cline , Cursor , Winsurf , Claude Code and ..... does ?

They also use "All of the above". In next post i will share more about it