Sunday, 7 July 2024

Top large language model to watch

The LLM landscape is exploding! With the immense potential of large language models, competition is fierce as companies race to develop the most powerful and innovative models. Training these models presents a lucrative business opportunity, attracting major players and startups alike.

Keeping track of the leaders is challenging. The LLM space is highly competitive, making it difficult to identify a single frontrunner. New versions are released constantly, pushing the boundaries of what's possible. While some might see this as a race to the bottom, it's more accurate to view it as rapid innovation that will ultimately benefit everyone.


Top company as of July,2024





Above diagram is in 2 groups , one for commercial ones and other one for hybrid(commercial/open weights) 

Commercial

OpenAI

This is poster child of LLMs, it has series of GPT* models. First large scale provider consumer LLMs.



GPT4-O is flagship model and all the models are available via API. This is very well funded and microsoft is behind this.

More details about model can be found at Open AI Model 

Research paper talking about GPT4 Model is available at 

GPT-4 Technical Report 

 GPT 1.0

GPT 2.0

Language Models are Few-Shot Learners

Evaluating Large Language Models Trained on Code

Amazon

Amazon has family of models called "Titan". Amazon Titan family of models incorporates Amazon’s 25 years of experience innovating with AI and machine learning across its business. Amazon Titan foundation models (FMs) provide customers with a breadth of high-performing image, multimodal, and text model choices, via a fully managed API.


More details about model can be found at Amazon Models

No research papers are available about amazon LLM model details. It is all propriety to keep competitive edge.


Antropic

Antropic is cofounded by some of ex Open AI employee. 


Anthropic's latest offering, Claude 3.5 Sonnet, has generated significant buzz. This powerful language model builds upon their previous success with Claude 3 Opus and is claimed to outperform OpenAI's GPT-4o, particularly in coding tasks.
Antropic is also very well funded, Amazon and google are major investor.

More details about model can be found at Antropic Models

Antropic models will be based on Open-AI type of architecture but they are focused on few research principal like 
AI as Systematic Science , safety and scaling 

One of the popular research paper from antropic is mapping-mind-language-model

MoasicML

MosaicML, co-founded by an MIT alumnus and a professor, made deep-learning models faster and more efficient. It was acquired by Databricks. 

Mosaic Pretrained Transformers (MPT) are GPT-style models with some special features -- Flash Attention for efficiency, ALiBi for context length extrapolation, and stability improvements to mitigate loss spikes.


More details about model can be found at mosaic ml

Some popular research papers are Train Short, Test Long and Flash attention


InflectionAI

Inflection AI focuses on developing a large language model (LLM) for personal use called Inflection.



Not much details is available about how model was trained but they claim - world's top empathetic Large Language Model (LLM)

More details about model can be found at inflection-2-5


Hybrid/Open Source

Google

Google inventor of famous paper Attention Is All You Need that became kernel of all the LLMs we see today. 
Google has been releasing LLM to community before Chatgpt came, Bert was one of the first model based on encoder/decoder and become foundation for many LLM that we see.







Google offers large language models (LLMs) across a spectrum of availability. Some models are fully commercial with open weights, meaning the underlying code is proprietary but the model outputs are accessible.

The Gemini family exemplifies this, with variants like Ultra, Pro (introduced in v1.5), Flash, and Nano catering to different needs in terms of size and processing power.

In contrast, Gemma is Google's open-source LLM family. It's designed for developers and researchers and comes in various sizes (e.g., Gemma 2B and 7B) for flexibility


Lots of reading material is available from google on LLM and Gemma models, some of the popular ones are 


Meta

Meta builds LLama series of model, these are open source and Meta designed Llama to be efficient, achieving good performance while being trained on publicly available datasets.



Llama3 is most recent and state of art. These models are trained by meta and made available via various hosting platform. Llama3 is is extended by other vendors like Gradient , Nvidia , dolphin etc.

Details about model is available at llama3

Meta has publish lots of paper from first version of model, some of the popular ones are 




Mistral

Mistral is french based company and they release all model weights under Apache 2.0.
Mistral strives to create efficient models that require less computational power compared to some competitors. This makes them more accessible to a wider range of users.

Mistral innovation is around Grouped Query Attention (GQA). Some of the recent models are based on Mixture Of Expert.




More details about model is available at Mistral models



DataBricks

Databricks is building open source model that are based on MOE. Most recent and state of the art model is DBRX.





Details about model is available at introducing-dbrx-new-state-art-open-llm


Some of popular research papers are 

Cohere

Cohere is canadian based company. They build model called CommandR, it is a state-of-the-art RAG-optimized model designed to tackle enterprise-grade workloads.



More details about model can be found at Command-R

Some of popular research papers are RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs

 

Microsoft

While Microsoft leverages OpenAI's powerful GPT-4 language models for some functionalities, they've also made significant contributions to open-source AI with the Phi-3 family of models.

Phi-3 models are a type of small language model (SLM), specifically designed for efficiency and performance on mobile devices and other resource-constrained environments.



 More details about model can be found at phi-3

Some of popular research papers related to Phi series model are Textbooks Are All You Need , Textbooks Are All You Need II and Phi-3 Technical Report


Conclusion

We are witnessing an interesting time where many large language model (LLM) models are available for building apps, accessible to both consumers and developers. Predicting the dominant player is difficult due to the rapidly changing landscape.

One key concept to grasp is that the GENAI stack is multifaceted. Foundation models are just one layer, and they can be quite expensive due to hardware requirements. Training a foundation model can easily cost millions of dollars, making it difficult for companies to maintain a competitive edge.

As software engineers, we need to leverage this technology by selecting the best model for each specific use case. Defining "best" can be subjective, and the answer often depends on various factors.

Here's a crucial consideration: while using the top-performing LLM might be tempting, it's vital to maintain a flexible architecture. This allows you to easily switch to newer LLMs, similar to how we switch between databases or other vendor-specific technologies.

In the next part of this blog, I'll explore the inference side of LLMs, a fascinating area that will ultimately determine the return on investment (ROI) for companies making significant investments in this technology.

Saturday, 23 March 2024

Say Goodbye to Boilerplate: Annotating Your Way to Powerful API Clients

Ever feel like writing boilerplate code for every single REST API call?

Constructing URLs, managing parameters, setting headers, crafting bodies - it's all time-consuming and repetitive. Wouldn't it be amazing if, like ORMs for databases, there was a way to simplify API interactions?

Well, you're in luck!

This post will introduce you to the concept of annotation-based RPC clients, a powerful tool that can dramatically reduce your API coding burden. We'll explore how annotations can automate the tedious parts of API calls, freeing you to focus on the real logic of your application.


I will use search engine API as example to demonstrate how it will look like.


public interface GoogleSearchService {

@XGET("/customsearch/v1")
@XHeaders({"Content-Type: application/json"})
RpcReply<Map<String, Object>> search(@XQuery("key") String apiKey,
                                @XQuery("cx") String context,
                              @XQuery(value = "q", encoded = true) String searchTerm);




}


This approach uses annotations to declaratively specify everything about the API call. This declarative style offers several advantages:

  • Familiar Programming Experience: The syntax feels like you're interacting with a normal programming API, making it intuitive and easy to learn.
  • Abstraction from Implementation Details: You don't need to worry about the internal names of parameters or the specifics of the underlying protocol. The annotations handle those details for you.
  • Simplified Error Handling: The RpcReply abstraction encapsulates error handling, providing a clean and consistent way to manage potential issues.
  • Seamless Asynchronous Support: Adding asynchronous execution becomes straightforward, allowing you to make non-blocking API calls without extra effort.
  • Enhanced Testability: Everything related to the API interaction is defined through the interface, making unit testing a breeze. You can easily mock the interface behavior to isolate your application logic.

Lets have look at RpcReply abstraction 


public interface RpcReply<T> {

T value();
void execute();
int statusCode();
boolean isSuccess();
Optional<String> reply();
Optional<String> error();
Optional<Exception> exception();
}


The RpcReply object plays a crucial role in this approach. It's a lazy object, meaning its execution is deferred until explicitly requested. This allows you to choose between synchronous or asynchronous execution depending on your needs.

More importantly, RpcReply encapsulates error handling. It takes care of any errors that might occur during the actual API call. You can access the response data or any potential errors through a clean and consistent interface provided by RpcReply. This eliminates the need for manual error handling code within your application logic.


One more thing is missing, reply using map is not neat, it will be better if it is strongly type object like Searchresult.

In essence, using strongly typed objects provides a cleaner, more reliable, and more maintainable way to work with API responses.


Lets add StrongType to our API, it will look something like this.


public interface GoogleSearchService {

@XGET("/customsearch/v1")
@XHeaders({"Content-Type: application/json"})
RpcReply<Map<String, Object>> search(@XQuery("key") String apiKey,
    @XQuery("cx") String context,
     @XQuery(value = "q", encoded = true) String searchTerm);

@XGET("/customsearch/v1")
@XHeaders({"Content-Type: application/json"})
RpcReply<GoogleSearchResult> find(@XQuery("key") String apiKey,
    @XQuery("cx") String context,
    @XQuery(value = "q", encoded = true) String searchTerm);


}


Let's dive into how we can write code that interprets declarative definitions and translates them into actual HTTP API calls.

There are several approaches to generate the code behind this interface:

  • Build Plugin: This method utilizes a build plugin to generate actual code during the compilation process.
  • Dynamic Proxy: This approach employs a dynamic proxy that can leverage the provided metadata to generate real API calls at runtime.

Many frameworks leverage either of these options, or even a combination of both. In the solution I'll share, we'll be using Dynamic Proxy.


I have written about dynamic proxy in past, you can read this if new to concept or need refresher

dynamic-proxy

more-on-dynamic-proxy



High Level Sketch of implementation




In this step, we create a proxy object. This proxy will act as an intermediary between the client application and the real server. It can intercept all outgoing calls initiated by the client app.




During this stage, the client interacts with a remote proxy object. This proxy acts as a transparent intermediary, intercepting all outgoing calls.

Here's what the proxy does:

  • Builds metadata: The proxy can gather additional information about each call, such as timestamps, user IDs, or call identifiers. This metadata can be valuable for debugging, logging, or performance analysis.
  • Makes the server call: Once it has the necessary information, the proxy forwards the request to the actual server.
  • Handles errors: If the server call encounters any issues, the proxy can gracefully handle the error and provide a meaningful response back to the client.
  • Parses the response: The proxy can interpret the server's response and potentially transform it into a format that's easier for the client to understand. This can include type safety checks to ensure the returned data matches the expected format.


Code snippet that build Rpc Stack Trace


HttpCallStack callStack = new HttpCallStack(builder.client());
_processMethodTags(method, callStack);
_processMethodParams(method, args, callStack);
callStack.returnType = returnTypes(method);
return callStack;


Full code of proxy is available at ServiceProxy.java


Lets look at few examples of client service interface.


public interface DuckDuckGoSearch {
@XGET("/ac")
@XHeaders({"Content-Type: application/json"})
RpcReply<List<Map<String, Object>>> suggestions(@XQuery(value = "q", encoded = true) String searchTerm);

@XGET("/html")
@XHeaders({"Content-Type: application/json"})
RpcReply<String> search(@XQuery(value = "q", encoded = true) String searchTerm);

}


public interface DuckDuckGoService {
@XGET("/search.json")
@XHeaders({"Content-Type: application/json"})
RpcReply<Map<String, Object>> search(@XQuery("api_key") String apiKey, @XQuery("engine") String engine, @XQuery(value = "q", encoded = true) String searchTerm);

@XGET("/search.json")
@XHeaders({"Content-Type: application/json"})
RpcReply<DuckDuckGoSearchResult> query(@XQuery("api_key") String apiKey, @XQuery("engine") String engine, @XQuery(value = "q", encoded = true) String searchTerm);
}

public interface GoogleSearchService {

@XGET("/customsearch/v1")
@XHeaders({"Content-Type: application/json"})
RpcReply<Map<String, Object>> search(@XQuery("key") String apiKey, @XQuery("cx") String context, @XQuery(value = "q", encoded = true) String searchTerm);

@XGET("/customsearch/v1")
@XHeaders({"Content-Type: application/json"})
RpcReply<GoogleSearchResult> find(@XQuery("key") String apiKey, @XQuery("cx") String context, @XQuery(value = "q", encoded = true) String searchTerm);


}


Client code is very lean, it looks something like this

RpcBuilder builder = new RpcBuilder().serviceUrl("https://www.googleapis.com");
GoogleSearchService service = builder.create(GoogleSearchService.class);

String key = System.getenv("google_search");

RpcReply<Map<String, Object>> r = service.search(key, "61368983a3efc4386", "large language model");
r.execute();
Map<String, Object> value = r.value();
List<Map<String, Object>> searchResult = (List<Map<String, Object>>) value.get("items");

searchResult.forEach(v -> {
System.out.println(v.get("title") + " -> " + v.get("link"));
});

RpcReply<GoogleSearchResult> searchResults = service.find(key, "61368983a3efc4386", "large language model");

searchResults.execute();

searchResults.value().items.forEach(System.out::println);


Full client code is available @ APIClient.java


Conclusion

Dynamic proxies offer a powerful approach to abstraction, providing several benefits:

  • Protocol Independence: The underlying communication protocol can switch from HTTP to something entirely new (e.g., gRPC, custom protocols) without requiring any changes to the client code. The dynamic proxy acts as an intermediary, insulating the client from the specifics of the protocol being used.
  • Enhanced Functionality: Dynamic proxies can add valuable features to client interactions. This can include:
    • Caching: The proxy can store responses to frequently accessed data, reducing load on the server and improving performance.
    • Throttling: The proxy can limit the rate of calls made to the server to prevent overloading or comply with usage quotas.
    • Telemetry: The proxy can collect data about client-server interactions, providing insights into system performance and user behavior.

By leveraging dynamic proxies, you can achieve a clean separation between the client's core logic and the communication details. This promotes loose coupling, making your code more adaptable, maintainable, and easier to test.

This approach leads us towards a concept similar to API Relational Mapping (ARM) (though this term isn't widely used). Think of it as a specialized layer that translates between API calls and the underlying functionalities they trigger.


Full client library is available @ rpcclient.

Sunday, 4 February 2024

Demystifying Vector Databases: The Magic of Meaningful Search

What is vector database ? 


The digital world is awash in unstructured data. text documents, social media posts, images, videos, audio recordings, and more. While traditional database excel at storing and retrieving neatly organised data, they struggle with this messy, ever-growing sea of information. Enter vector databases, a new breed of database designed to unlock the hidden meaning within unstructured data.

While Large Language Models (LLMs) have brought vector databases to the forefront, their applications extend far beyond this exciting field. Recommendation systems use vector databases to suggest products you might like based on your past purchases and browsing history, even if you haven't explicitly searched for those items. Fraud detection systems leverage them to identify suspicious patterns in financial transactions, helping catch anomalies that might slip through traditional filters.


But how do these databases work their magic? It all starts with a clever trick: representing data as multi-dimensional vectors, essentially numerical lists. Imagine every data point as a location on a map. Nearby points on the map represent similar data, regardless of the original format (text, image, etc.). This is achieved through techniques like word embeddings, where words with similar meanings are mapped to close points in the vector space.

Traditional keyword-based searches often miss the mark. Imagine searching for "small, fleshy, and seedless" fruits. No exact match exists, leaving you frustrated. But a vector database understands the underlying meaning of your query.

It finds data points closest to the "small, fleshy, and seedless" vector, leading you to grapes or kiwis, even though those words weren't explicitly used. This semantic search capability unlocks a new level of data exploration and analysis.


Search - Legacy vs Semantic



 

How vectors are created ?

But how do these magical numbers come to life? Enter embeddings, numerical representations of data points created by deep learning models. Imagine feeding a vast collection of text documents to a sophisticated neural network. It analyses the relationships between words, their context, and their usage, eventually generating unique vector representations, or embeddings, for each word. These embeddings capture not just the literal meaning of the word, but also its nuances and semantic connections




Generally, the last layer of deep learning models focuses on specific tasks like prediction or classification. But the true treasure trove of knowledge lies in the second-to-last layer, often called the bottleneck or hidden layer.

This layer holds a condensed representation of the input data, capturing the essential features and relationships learned during training. By strategically removing the last layer and accessing the information in this penultimate layer, we can extract vector embeddings that encapsulate the model's understanding of the data.

Higher dimensionality captures more information but requires more storage and computation, while lower dimensionality is space-efficient but might miss some nuances.

The key is to find the right balance between the dimensionality (size) of the embeddings and the desired level of detail.

Forget training your own model! The world after Chat GPT offers a wealth of ready-made embedding models






How to get embeddings?





Use case vectors solve?

Get ready to explore the diverse problems solvable with vector embeddings! These powerful representations go beyond text, unlocking:

1. Semantic Search: Dive deeper than keywords. Find images, videos, or audio similar to your intent, not just literal phrases. Imagine searching for "peaceful nature scene" and discovering breathtaking waterfalls instead of generic landscapes.

2. Data Similarity Search: Uncover hidden connections across non-text data. Quickly identify similar products, faces, or even medical scans, regardless of format.

3. Personalised Recommendations: Get suggestions that truly understand you. Vector embeddings power recommendation systems that learn your preferences and suggest items you'll genuinely love, not just similar purchases

4. Retrieval-Augmented Generation (RAG): Bridge the gap between information retrieval and generation. Leverage vector embeddings to create summaries, translate languages, or even write different creative text formats based on specific requests. This is number #1 application of LLM powered apps.

5. Fraud and Anomaly Detection: Spot suspicious activity faster. Vector embeddings help identify unusual patterns in transactions, financial data, or even network traffic, leading to improved security and fraud prevention.

6. Search Result Ranking: Get the most relevant results first. Embeddings power search engines to understand your intent and rank results based on meaning, not just keyword matches.

7. Efficient Clustering: Group similar data points effortlessly. Vector embeddings enable efficient clustering of large datasets, revealing hidden patterns and facilitating further analysis.

And that's just the beginning! The potential of vector embeddings continues to expand, promising exciting solutions in areas like drug discovery, social network analysis, and more.


 

How Vector database uses vector?

Let's explore their first superpower: semantic similarity. Unlike traditional keyword searches, vector databases understand meaning.

You can input a vector, and the database returns vectors representing the most similar meaning content, not just exact matches.

This is classic example from popular paper written in 2013 - Efficient Estimation of Word Representations in Vector Space



Several algorithms can be used for calculating vector difference, each with its advantages and limitations depending on the specific application and data characteristics. Here are some common ones:

Jaccard similarity
This compares the proportion of shared elements between two binary vectors (containing only 0s and 1s), often used for comparing sets or sparse data.



Hamming distance 

Between two strings or vectors of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of substitutions required to change one string into the other, or equivalently, the minimum number of errors that could have transformed one string into the other



Euclidean distance: This is the most straightforward and intuitive method, calculating the straight-line distance between two points in multidimensional space. It's computationally efficient but sensitive to data scaling and dimensionality.

Manhattan distance: This measures the distance by summing the absolute differences between corresponding elements of the vectors. It's less sensitive to outliers than Euclidean distance but not as intuitive for representing geometric similarity.










Inner Product : This method is a mathematical operation that measures the degree of similarity or alignment between two vectors. It tells you how "close" two vectors are in the multidimensional space they inhabit





Cosine similarity: This method measures the angle between two vectors, reflecting their directional similarity. It's independent of magnitude and useful when interpreting vectors as directions rather than exact positions.



Conclusion

This post delves into the fascinating world of vector databases, equipping you with a solid understanding of their core concepts, vector creation methods, and similarity search algorithms.

In the next section, we'll dive into the unique storage and retrieval mechanisms employed by vector databases. Unlike traditional databases that rely on B-trees or hash indexes, vector databases utilize innovative approaches specifically designed for efficient vector searches. Get ready to explore a whole new level of data exploration!