Showing posts with label GPT. Show all posts
Showing posts with label GPT. Show all posts

Wednesday, 5 March 2025

Building a Universal Java Client for Large Language Models

Building a Universal Java Client for Large Language Models

In today's rapidly evolving AI landscape, developers often need to work with multiple Large Language Model (LLM) providers to find the best solution for their specific use case. Whether you're exploring OpenAI's GPT models, Anthropic's Claude, or running local models via Ollama, having a unified interface can significantly simplify development and make it easier to switch between providers.

The Java LLM Client project provides exactly this: a clean, consistent API for interacting with various LLM providers through a single library. Let's explore how this library works and how you can use it in your Java applications.

Core Features

The library offers several key features that make working with LLMs easier:

  1. Unified Interface: Interact with different LLM providers through a consistent API
  2. Multiple Provider Support: Currently supports OpenAI, Anthropic, Google, Groq, and Ollama
  3. Chat Completions: Send messages and receive responses from language models
  4. Embeddings: Generate vector representations of text where supported
  5. Factory Pattern: Easily create service instances for different providers

Architecture Overview

The library is built around a few key interfaces and classes:

  • GenerativeAIService: The main interface for interacting with LLMs
  • GenerativeAIFactory: Factory interface for creating service instances
  • GenerativeAIDriverManager: Registry that manages available services
  • Provider-specific implementations in separate packages

This design follows the classic factory pattern, allowing you to:

  1. Register service factories with the GenerativeAIDriverManager
  2. Create service instances through the manager
  3. Use a consistent API to interact with different providers

Getting Started

To use the library, first add it to your Maven project:

xml
<dependency> <groupId>org.llm</groupId> <artifactId>llmapi</artifactId> <version>1.0.0</version> </dependency>


Basic Usage Example

Here's how to set up and use the library:

java
// Register service providers GenerativeAIDriverManager.registerService(OpenAIFactory.NAME, new OpenAIFactory()); GenerativeAIDriverManager.registerService(AnthropicAIFactory.NAME, new AnthropicAIFactory()); // Register more providers as needed // Create an OpenAI service Map<String, Object> properties = Map.of("apiKey", System.getenv("gpt_key")); var service = GenerativeAIDriverManager.create( OpenAIFactory.NAME, "https://api.openai.com/", properties ); // Create and send a chat request var message = new ChatMessage("user", "Hello, how are you?"); var conversation = new ChatRequest("gpt-4o-mini", List.of(message)); var reply = service.chat(conversation); System.out.println(reply.message()); // Generate embeddings var vector = service.embedding( new EmbeddingRequest("text-embedding-3-small", "How are you") ); System.out.println(Arrays.toString(vector.embedding()));

Working with Different Providers

OpenAI

java
Map<String, Object> properties = Map.of("apiKey", System.getenv("gpt_key")); var service = GenerativeAIDriverManager.create( OpenAIFactory.NAME, "https://api.openai.com/", properties ); // Chat with GPT-4o mini var conversation = new ChatRequest("gpt-4o-mini", List.of(new ChatMessage("user", "Hello, how are you?"))); var reply = service.chat(conversation);

Anthropic

java
Map<String, Object> properties = Map.of("apiKey", System.getenv("ANTHROPIC_API_KEY")); var service = GenerativeAIDriverManager.create( AnthropicAIFactory.NAME, "https://api.anthropic.com", properties ); // Chat with Claude var conversation = new ChatRequest("claude-3-7-sonnet-20250219", List.of(new ChatMessage("user", "Hello, how are you?"))); var reply = service.chat(conversation);

Ollama (Local Models)

java
// No API key needed for local models Map<String, Object> properties = Map.of(); var service = GenerativeAIDriverManager.create( OllamaFactory.NAME, "http://localhost:11434", properties ); // Chat with locally hosted Llama model var conversation = new ChatRequest("llama3.2", List.of(new ChatMessage("user", "Hello, how are you?"))); var reply = service.chat(conversation);

Under the Hood

The library uses an RPC (Remote Procedure Call) client to handle the HTTP communication with various APIs. Each provider's implementation:

  1. Creates appropriate request objects with the required format
  2. Sends requests to the corresponding API endpoints
  3. Parses responses into a consistent format
  4. Handles errors gracefully

The RpcBuilder creates proxy instances of service interfaces, handling the HTTP communication details so you don't have to.

Supported Models

The library currently supports several models across different providers:

  • OpenAI: all
  • Anthropic: all
  • Google: gemini-2.0-flash
  • Groq: all
  • Ollama: any other model you have locally

Extending the Library

One of the strengths of this design is how easily it can be extended to support new providers or features:

  1. Create a new implementation of GenerativeAIFactory
  2. Implement GenerativeAIService for the new provider
  3. Create necessary request/response models
  4. Register the new factory with GenerativeAIDriverManager

Conclusion

The Java LLM Client provides a clean, consistent way to work with multiple LLM providers in Java applications. By abstracting away the differences between APIs, it allows developers to focus on their application logic rather than the details of each provider's implementation.

Whether you're building a chatbot, generating embeddings for semantic search, or experimenting with different LLM providers, this library offers a straightforward way to integrate these capabilities into your Java applications.

The project's use of standard Java patterns like factories and interfaces makes it easy to understand and extend, while its modular design allows you to use only the providers you need. As the LLM ecosystem continues to evolve, this type of abstraction layer will become increasingly valuable for developers looking to build flexible, future-proof applications.


Link to github project - llmapi


Sunday, 7 July 2024

Top large language model to watch

The LLM landscape is exploding! With the immense potential of large language models, competition is fierce as companies race to develop the most powerful and innovative models. Training these models presents a lucrative business opportunity, attracting major players and startups alike.

Keeping track of the leaders is challenging. The LLM space is highly competitive, making it difficult to identify a single frontrunner. New versions are released constantly, pushing the boundaries of what's possible. While some might see this as a race to the bottom, it's more accurate to view it as rapid innovation that will ultimately benefit everyone.


Top company as of July,2024





Above diagram is in 2 groups , one for commercial ones and other one for hybrid(commercial/open weights) 

Commercial

OpenAI

This is poster child of LLMs, it has series of GPT* models. First large scale provider consumer LLMs.



GPT4-O is flagship model and all the models are available via API. This is very well funded and microsoft is behind this.

More details about model can be found at Open AI Model 

Research paper talking about GPT4 Model is available at 

GPT-4 Technical Report 

 GPT 1.0

GPT 2.0

Language Models are Few-Shot Learners

Evaluating Large Language Models Trained on Code

Amazon

Amazon has family of models called "Titan". Amazon Titan family of models incorporates Amazon’s 25 years of experience innovating with AI and machine learning across its business. Amazon Titan foundation models (FMs) provide customers with a breadth of high-performing image, multimodal, and text model choices, via a fully managed API.


More details about model can be found at Amazon Models

No research papers are available about amazon LLM model details. It is all propriety to keep competitive edge.


Antropic

Antropic is cofounded by some of ex Open AI employee. 


Anthropic's latest offering, Claude 3.5 Sonnet, has generated significant buzz. This powerful language model builds upon their previous success with Claude 3 Opus and is claimed to outperform OpenAI's GPT-4o, particularly in coding tasks.
Antropic is also very well funded, Amazon and google are major investor.

More details about model can be found at Antropic Models

Antropic models will be based on Open-AI type of architecture but they are focused on few research principal like 
AI as Systematic Science , safety and scaling 

One of the popular research paper from antropic is mapping-mind-language-model

MoasicML

MosaicML, co-founded by an MIT alumnus and a professor, made deep-learning models faster and more efficient. It was acquired by Databricks. 

Mosaic Pretrained Transformers (MPT) are GPT-style models with some special features -- Flash Attention for efficiency, ALiBi for context length extrapolation, and stability improvements to mitigate loss spikes.


More details about model can be found at mosaic ml

Some popular research papers are Train Short, Test Long and Flash attention


InflectionAI

Inflection AI focuses on developing a large language model (LLM) for personal use called Inflection.



Not much details is available about how model was trained but they claim - world's top empathetic Large Language Model (LLM)

More details about model can be found at inflection-2-5


Hybrid/Open Source

Google

Google inventor of famous paper Attention Is All You Need that became kernel of all the LLMs we see today. 
Google has been releasing LLM to community before Chatgpt came, Bert was one of the first model based on encoder/decoder and become foundation for many LLM that we see.







Google offers large language models (LLMs) across a spectrum of availability. Some models are fully commercial with open weights, meaning the underlying code is proprietary but the model outputs are accessible.

The Gemini family exemplifies this, with variants like Ultra, Pro (introduced in v1.5), Flash, and Nano catering to different needs in terms of size and processing power.

In contrast, Gemma is Google's open-source LLM family. It's designed for developers and researchers and comes in various sizes (e.g., Gemma 2B and 7B) for flexibility


Lots of reading material is available from google on LLM and Gemma models, some of the popular ones are 


Meta

Meta builds LLama series of model, these are open source and Meta designed Llama to be efficient, achieving good performance while being trained on publicly available datasets.



Llama3 is most recent and state of art. These models are trained by meta and made available via various hosting platform. Llama3 is is extended by other vendors like Gradient , Nvidia , dolphin etc.

Details about model is available at llama3

Meta has publish lots of paper from first version of model, some of the popular ones are 




Mistral

Mistral is french based company and they release all model weights under Apache 2.0.
Mistral strives to create efficient models that require less computational power compared to some competitors. This makes them more accessible to a wider range of users.

Mistral innovation is around Grouped Query Attention (GQA). Some of the recent models are based on Mixture Of Expert.




More details about model is available at Mistral models



DataBricks

Databricks is building open source model that are based on MOE. Most recent and state of the art model is DBRX.





Details about model is available at introducing-dbrx-new-state-art-open-llm


Some of popular research papers are 

Cohere

Cohere is canadian based company. They build model called CommandR, it is a state-of-the-art RAG-optimized model designed to tackle enterprise-grade workloads.



More details about model can be found at Command-R

Some of popular research papers are RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs

 

Microsoft

While Microsoft leverages OpenAI's powerful GPT-4 language models for some functionalities, they've also made significant contributions to open-source AI with the Phi-3 family of models.

Phi-3 models are a type of small language model (SLM), specifically designed for efficiency and performance on mobile devices and other resource-constrained environments.



 More details about model can be found at phi-3

Some of popular research papers related to Phi series model are Textbooks Are All You Need , Textbooks Are All You Need II and Phi-3 Technical Report


Conclusion

We are witnessing an interesting time where many large language model (LLM) models are available for building apps, accessible to both consumers and developers. Predicting the dominant player is difficult due to the rapidly changing landscape.

One key concept to grasp is that the GENAI stack is multifaceted. Foundation models are just one layer, and they can be quite expensive due to hardware requirements. Training a foundation model can easily cost millions of dollars, making it difficult for companies to maintain a competitive edge.

As software engineers, we need to leverage this technology by selecting the best model for each specific use case. Defining "best" can be subjective, and the answer often depends on various factors.

Here's a crucial consideration: while using the top-performing LLM might be tempting, it's vital to maintain a flexible architecture. This allows you to easily switch to newer LLMs, similar to how we switch between databases or other vendor-specific technologies.

In the next part of this blog, I'll explore the inference side of LLMs, a fascinating area that will ultimately determine the return on investment (ROI) for companies making significant investments in this technology.