Showing posts with label RAG. Show all posts

Friday, 30 May 2025

RAG vs Non-RAG Coding Agents

Every time a developer asks an AI coding assistant to generate code, they're initiating a search process. But the question isn't whether search happens—it's where and how that search occurs. Search can be done in model knowledge base or it can use some tools to perform search.

Code Is Different - but why ?

Searching for code is interesting search problem and it has it unique challenges.

When a human programmer approaches a codebase, they don't just look for similar examples. They build a mental model of how the system works: - How data flows through the application - What architectural patterns are being used - How different modules interact and depend on each other - What the implicit contracts and assumptions are This mental model is what enables programmers to make changes without breaking the system, debug complex issues, and extend functionality in coherent ways.

What are options for code search algorithms

Retrieval Augmentation Generation (RAG)

RAG excels at finding relevant information and synthesizing it into coherent responses. This works brilliantly for answering questions about historical facts or summarizing documents. But code isn't documentation—it's a living system of interconnected logic that demands deep understanding.

- The Precision Problem: When "Close Enough" Breaks Everything

RAG, operates on surface-level similarity. It retrieves code snippets that look relevant but may operate under completely different assumptions about data structures, error handling patterns, or architectural constraints.

In most applications, RAG's precision-recall trade-off is manageable. If a chatbot gives you 90% accurate information, that's often good enough. But code demands near-perfect precision. A single misplaced bracket, incorrect variable name, or wrong assumption about data types can crash entire systems or bad user experience as code will be rejected. RAG optimizes for semantic similarity, not functional correctness. It might retrieve code that's conceptually similar but functionally incompatible:

- A function that looks right but expects different parameter types - Error handling patterns that don't match the codebase's conventions - Solutions that work in one context but fail in another due to different dependencies This isn't just an inconvenience—it's a fundamental mismatch between what RAG provides and what coding requires.

The Context Catastrophe

Code exists in rich, interconnected contexts that span multiple files, modules, and even repositories. A seemingly simple function might depend on: - Configuration files that define system behavior - Environment variables that change at runtime - Database schemas that constrain data operations - Architectural patterns that dictate how modules interact RAG retrieves chunks of information based on similarity, but coding decisions often depend on distant context that's impossible to capture in isolated snippets. The system might retrieve the perfect function implementation, but it's designed for a completely different architectural context.

- The Dynamic System Challenge

Perhaps most critically, effective coding requires real-time interaction with living systems. Coding is fundamentally about: - Writing code and seeing how it behaves - Running tests to validate assumptions - Using compiler errors as feedback - Debugging by tracing execution paths - Iterating based on runtime behavior RAG provides static information about how someone else solved a similar problem. But what you need is dynamic interaction with your current, specific codebase.

Reasoning Retrieval Generation (RRG)

RRG is new term that i am going to use for Reasoning based approach.

Lets look into what happens in RRG based approach and it can also be called Reasoning first approach.

In reasoning first approach , chain of thought , self reflection , Tree of Thought etc becomes primary tool. Lets look at how does this thing works

- Build Mental Models in Real-Time

Instead of retrieving similar code, reasoning-based agents analyze the actual codebase to understand: - How the system is structured and why - What patterns and conventions are being followed - How data flows through different components - What the implicit contracts and assumptions are

- Leverage Tool Integration

Rather than retrieving documentation, effective coding agents interact directly with development tools: - Compilers and interpreters for immediate feedback - Testing frameworks to validate solutions - Debuggers to trace execution and find issues - Static analysis tools to understand code structure - Version control systems to understand change history

- Think Through Problems Step-by-Step

Chain of thought reasoning allows agents to: - Trace through code execution paths to understand behavior - Identify root causes of bugs through logical deduction - Reason about the implications of changes before making them - Build solutions from first principles rather than pattern matching

Trade-Off - Aspect that you can't ignore

Nothing comes for free , lets look at tradeoff of RRG

Knowledge Boundaries

RRG agents are limited by their training data. They can't access: - Documentation for recently released libraries - Community solutions to novel problems - Project-specific conventions not captured in code - Specialized domain knowledge from external sources But here's the key insight:

Understanding trumps information access.

A solid mental model of how systems work doesn't become outdated when new frameworks are released. The fundamentals of good design, debugging approaches, and architectural thinking remain stable across technology changes.

Context Window Constraints

Without retrieval, agents must work within their context limits. Large codebases can exceed what fits in memory. However, this constraint forces better architectural approaches: - Focus on understanding system structure and patterns - Use tool integration to navigate codebases systematically - Build summarization and abstraction capabilities - Develop better code analysis and navigation strategies

Specialized Domain Gaps

RRG agents may struggle with highly specialized domains not well-represented in training data. But this is where tool integration shines—rather than retrieving domain knowledge, agents can interact with domain-specific tools and APIs directly.

Cost and Resources Challenges

Needs large context models ( 100K+ or 1M)

High per request cost due to massive context usage

Not Cost optimised.

Slow inference due to processing of entire context

Instruction following limitation by LLM as context gets close to 50% fill.

What is solution - best of both world

Fusion is solution.

Mental Model Filtering Process

The battle for dominance in the coding agent landscape is heating up. Will the winner be IDE-integrated solutions like Cursor, Windsurf, VS Code, or IntelliJ? Perhaps it will be Claude Code or Openai-codex or google jules ? Or could the no-code and low-code platforms like Bolt, Loveable, Replit or Open source like Aider, or Cline ultimately ?

But here's the twist: while these coding agents compete fiercely for market share, someone else is already winning this game—and the answer might be more obvious than you think.

Sunday, 4 February 2024

Demystifying Vector Databases: The Magic of Meaningful Search

What is vector database ?

The digital world is awash in unstructured data. text documents, social media posts, images, videos, audio recordings, and more. While traditional database excel at storing and retrieving neatly organised data, they struggle with this messy, ever-growing sea of information. Enter vector databases, a new breed of database designed to unlock the hidden meaning within unstructured data.

While Large Language Models (LLMs) have brought vector databases to the forefront, their applications extend far beyond this exciting field. Recommendation systems use vector databases to suggest products you might like based on your past purchases and browsing history, even if you haven't explicitly searched for those items. Fraud detection systems leverage them to identify suspicious patterns in financial transactions, helping catch anomalies that might slip through traditional filters.

But how do these databases work their magic? It all starts with a clever trick: representing data as multi-dimensional vectors, essentially numerical lists. Imagine every data point as a location on a map. Nearby points on the map represent similar data, regardless of the original format (text, image, etc.). This is achieved through techniques like word embeddings, where words with similar meanings are mapped to close points in the vector space.

Traditional keyword-based searches often miss the mark. Imagine searching for "small, fleshy, and seedless" fruits. No exact match exists, leaving you frustrated. But a vector database understands the underlying meaning of your query.

It finds data points closest to the "small, fleshy, and seedless" vector, leading you to grapes or kiwis, even though those words weren't explicitly used. This semantic search capability unlocks a new level of data exploration and analysis.

Search - Legacy vs Semantic

How vectors are created ?

But how do these magical numbers come to life? Enter embeddings, numerical representations of data points created by deep learning models. Imagine feeding a vast collection of text documents to a sophisticated neural network. It analyses the relationships between words, their context, and their usage, eventually generating unique vector representations, or embeddings, for each word. These embeddings capture not just the literal meaning of the word, but also its nuances and semantic connections

Generally, the last layer of deep learning models focuses on specific tasks like prediction or classification. But the true treasure trove of knowledge lies in the second-to-last layer, often called the bottleneck or hidden layer.

This layer holds a condensed representation of the input data, capturing the essential features and relationships learned during training. By strategically removing the last layer and accessing the information in this penultimate layer, we can extract vector embeddings that encapsulate the model's understanding of the data.

Higher dimensionality captures more information but requires more storage and computation, while lower dimensionality is space-efficient but might miss some nuances.

The key is to find the right balance between the dimensionality (size) of the embeddings and the desired level of detail.

Forget training your own model! The world after Chat GPT offers a wealth of ready-made embedding models

How to get embeddings?

Use case vectors solve?

Get ready to explore the diverse problems solvable with vector embeddings! These powerful representations go beyond text, unlocking:

1. Semantic Search: Dive deeper than keywords. Find images, videos, or audio similar to your intent, not just literal phrases. Imagine searching for "peaceful nature scene" and discovering breathtaking waterfalls instead of generic landscapes.

2. Data Similarity Search: Uncover hidden connections across non-text data. Quickly identify similar products, faces, or even medical scans, regardless of format.

3. Personalised Recommendations: Get suggestions that truly understand you. Vector embeddings power recommendation systems that learn your preferences and suggest items you'll genuinely love, not just similar purchases

4. Retrieval-Augmented Generation (RAG): Bridge the gap between information retrieval and generation. Leverage vector embeddings to create summaries, translate languages, or even write different creative text formats based on specific requests. This is number #1 application of LLM powered apps.

5. Fraud and Anomaly Detection: Spot suspicious activity faster. Vector embeddings help identify unusual patterns in transactions, financial data, or even network traffic, leading to improved security and fraud prevention.

6. Search Result Ranking: Get the most relevant results first. Embeddings power search engines to understand your intent and rank results based on meaning, not just keyword matches.

7. Efficient Clustering: Group similar data points effortlessly. Vector embeddings enable efficient clustering of large datasets, revealing hidden patterns and facilitating further analysis.

And that's just the beginning! The potential of vector embeddings continues to expand, promising exciting solutions in areas like drug discovery, social network analysis, and more.

How Vector database uses vector?

Let's explore their first superpower: semantic similarity. Unlike traditional keyword searches, vector databases understand meaning.

You can input a vector, and the database returns vectors representing the most similar meaning content, not just exact matches.

This is classic example from popular paper written in 2013 - Efficient Estimation of Word Representations in Vector Space

Several algorithms can be used for calculating vector difference, each with its advantages and limitations depending on the specific application and data characteristics. Here are some common ones:
Jaccard similarity
This compares the proportion of shared elements between two binary vectors (containing only 0s and 1s), often used for comparing sets or sparse data.

Hamming distance

Between two strings or vectors of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of substitutions required to change one string into the other, or equivalently, the minimum number of errors that could have transformed one string into the other

Euclidean distance: This is the most straightforward and intuitive method, calculating the straight-line distance between two points in multidimensional space. It's computationally efficient but sensitive to data scaling and dimensionality.
Manhattan distance: This measures the distance by summing the absolute differences between corresponding elements of the vectors. It's less sensitive to outliers than Euclidean distance but not as intuitive for representing geometric similarity.

Inner Product : This method is a mathematical operation that measures the degree of similarity or alignment between two vectors. It tells you how "close" two vectors are in the multidimensional space they inhabit

Cosine similarity: This method measures the angle between two vectors, reflecting their directional similarity. It's independent of magnitude and useful when interpreting vectors as directions rather than exact positions.

Conclusion

This post delves into the fascinating world of vector databases, equipping you with a solid understanding of their core concepts, vector creation methods, and similarity search algorithms.
In the next section, we'll dive into the unique storage and retrieval mechanisms employed by vector databases. Unlike traditional databases that rely on B-trees or hash indexes, vector databases utilize innovative approaches specifically designed for efficient vector searches. Get ready to explore a whole new level of data exploration!