Thursday, 3 April 2025

Teaching LLMs to Reason: The Journey from Basic Prompting to Self-Generated Examples

In recent years, Large Language Models (LLMs) have made remarkable strides in their ability to reason—to break down complex problems, apply logic systematically, and arrive at well-justified conclusions. This post explores the fascinating evolution of reasoning mechanisms in LLMs, tracking the progression from basic pattern-matching to sophisticated reasoning techniques that approach human-like problem-solving abilities.




The evolution of reasoning in Large Language Models from pattern matching to advanced reasoning techniques

The Major Breakthroughs in LLM Reasoning

DateResearchKey InnovationImpact
Jan 2023Chain-of-Thought Prompting (Wei et al.)Breaking problems into explicit stepsDoubled performance on complex reasoning tasks
March 2023Self-Consistency (Wang et al.)Multiple reasoning paths with majority voting+10-18% improvement across reasoning tasks
March 2023LLMs as Prompt Engineers (Zhou et al.)Models generating and optimizing their own promptsOutperformed human-crafted prompts
March 2024Analogical Reasoning (ICLR 2024)Self-generated examples for new problemsEliminated need for human-created examples




Reasoning Challenge in LLMs

Early LLMs excelled at pattern recognition but struggled with multi-step reasoning. When faced with complex problems requiring logical deduction or mathematical calculation,

these models would often:

  • Jump directly to incorrect conclusions
  • Fail to break down problems into manageable steps
  • Show inconsistent reasoning abilities
  • Struggle with problems requiring more than one or two logical steps
Gap between pattern matching in traditional LLMs and the requirements of multi-step reasoning tasks


This limitation wasn't surprising. Traditional training objectives didn't explicitly reward step-by-step reasoning—they simply encouraged models to predict the next token
based on patterns in their training data.

Chain-of-Thought: The Breakthrough

The introduction of Chain-of-Thought (CoT) prompting by Wei et al. in 2022 marked a pivotal moment in LLM reasoning capabilities.

This technique demonstrated that large language models could perform complex reasoning when prompted to show their work.

How Chain-of-Thought Works

CoT prompting exists in two primary forms:

Few-Shot CoT: Providing explicit examples that include intermediate
reasoning steps

Zero-Shot CoT: Simply instructing the model to "think step by step"

Key Findings About Chain-of-Thought

The research on Chain-of-Thought revealed several important insights:

Reasoning as an Emergent Ability
CoT reasoning is an emergent capability that appears only in sufficiently large models (typically ~100B+ parameters).

Dramatic Performance Improvements
On complex reasoning tasks like GSM8K (math word problems), performance more than doubled for large models using CoT prompting.

No Fine-tuning Required
This capability was achieved through prompting alone, without model modifications.

Enabling Multi-step Problem Solving
CoT allows models to break complex problems into manageable chunks.


Self-Consistency: Enhancing Chain-of-Thought

While CoT represented a breakthrough, it still had limitations. The follow-up research by Wang et al. (2022) on "Self-Consistency" addressed a
critical weakness: reliance on a single reasoning path.

The Self-Consistency Approach

Rather than generating a single chain of thought, Self-Consistency:
  1. Samples multiple diverse reasoning paths for the same problem
  2. Lets each path reach its own conclusion
  3. Takes the most consistent answer across all paths as the final answer




This approach mimics how humans gain confidence in solutions—when multiple different
approaches lead to the same answer, we trust that result more.


LLMs as Analogical Reasoners

The next evolution in LLM reasoning came from understanding these models as analogical reasoners, introduced in research presented at ICLR 2024.
This approach mirrors how humans tackle unfamiliar problems—by recalling similar challenges we've solved before.

The Analogical Prompting Method

Analogical prompting instructs LLMs to:

  1. Self-generate relevant examples related to the current problem
  2. Generate high-level conceptual knowledge about the problem domain
  3. Apply this knowledge to solve the original problem



Key Advantages of Self-Generated Examples

This approach offers several benefits:

No manual labeling needed: Unlike few-shot CoT, no human needs to create examples

Problem-specific relevance: The examples are tailored to each specific problem type

Adaptability across domains: The technique works across mathematics, coding, and other domains

Implementation simplicity: Everything happens in a single prompt


From Reasoning to Meta-Reasoning: LLMs as Prompt Engineers

The most fascinating development is the discovery that LLMs can function as their own prompt engineers. Research by Zhou et al. on "Automatic Prompt Engineering" (APE)
demonstrates that LLMs can generate and optimize instructions for other LLMs to follow.




This creates a meta-reasoning capability where:

  1. One LLM generates candidate instructions based on examples
  2. These instructions are tested on their effectiveness
  3. The best-performing instructions are selected
  4. The process iterates toward optimal prompting strategies

The Evolution of Reasoning Prompts

Through this research, we've seen a remarkable progression in the prompts used

to elicit reasoning:

Basic CoT: Let's think step by step

Refined CoT: Let's work this out in a step by step way to be sure we have the right answer

Analogical CoT: Recall three relevant problems and their solutions followed by problem-solving

APE-generated prompts: Complex, automatically optimized instructions

Implications for AI Development

These advances in LLM reasoning have profound implications:

Emergent Capabilities: Reasoning appears to emerge at certain model scales, suggesting other cognitive abilities might similarly emerge with scale.

Human-Like Problem Solving: The success of analogical reasoning and self-consistency suggests LLMs might be modeling aspects of human cognition more
closely than previously thought.

Reduced Need for Fine-Tuning: Many reasoning improvements come from better prompting rather than model modifications, potentially reducing the computational
costs of improvement.

Meta-Learning Potential: LLMs' ability to generate effective prompts for themselves hints at meta-learning capabilities that could lead to more autonomous
AI systems.

Conclusion

The evolution of reasoning in LLMs—from simple pattern matching to chain-of-thought to analogical reasoning and beyond—represents one of the most exciting trajectories
in AI research. These advances have not only improved performance on benchmark tasks but have
also deepened our understanding of how these models function.

As research continues, we can expect further refinements in how we elicit reasoning from LLMs, potentially unlocking even more sophisticated
problem-solving capabilities.

The boundary between pattern recognition and true reasoning continues to blur, bringing us closer to AI systems that can tackle the full spectrum of human reasoning tasks.

What's particularly exciting is that many of these techniques are accessible to practitioners today through careful prompt engineering, making advanced reasoning capabilities
available without requiring specialized model training or massive computational resources.

Welcome to Inference time compute! New Market that is getting created. This should give
idea around deepseek moment :-)

Saturday, 29 March 2025

How AI Coding Assistants Reshape Productivity

 

The Jevons Paradox in Software Engineering: How AI Coding Assistants Reshape Productivity

Imagine this: You've just installed the latest AI coding assistant. The marketing promised to cut your coding time in half. Six months later, you're writing more code than ever before, tackling increasingly complex problems, and somehow still working the same hours. What happened?

Welcome to Jevons Paradox in the age of AI-assisted software development.

The Curious Case of Efficiency That Doesn't Save Time

In 1865, a British economist named William Stanley Jevons noticed something counterintuitive about coal consumption. When more efficient steam engines were introduced, logic suggested coal use would decrease. Instead, it skyrocketed. The more efficiently coal could be used, the more applications people found for it.

Fast forward to 2025: Your AI coding assistant is the modern-day steam engine, and your time and mental energy are the coal.



My Journey With AI Coding Assistants: A Personal Story

When I first integrated an AI coding assistant into my workflow last year, I had visions of shorter workdays and more time for strategic thinking. The reality? I found myself saying "now I can finally tackle that refactoring project I've been putting off" and "let's add those extra test cases we've been skipping."

Sound familiar?

The Numbers Don't Lie: The Productivity Paradox in Action

Recent industry surveys reveal a fascinating pattern:

MetricWithout AI AssistantWith AI AssistantChange
Lines of code written/week1,2001,560+30%
Tickets closed/sprint810+25%
Languages/frameworks used regularly2-34-5+67%


Below Metrics are subjective.
I loved writing more code








Average hours worked/week40 to 5070+50%+

Why We Keep Consuming Our Efficiency Gains

The Expanding Possibility Frontier

As our tools improve, our concept of what's possible expands with them. It's human nature. When we suddenly have the capacity to do more, we don't pocket the difference—we expand our ambitions.



This cycle isn't unique to software development, but our field experiences it more intensely than most because of how quickly our tools evolve.

The Four Types of Productivity Consumers

In my observation, there are four ways engineers typically "spend" their AI-driven productivity gains:

  1. The Depth Diver – Uses efficiency to create more robust solutions with better error handling, edge case management, and performance optimization
  2. The Breadth Explorer – Leverages AI to work across more languages, frameworks, and systems than previously possible
  3. The Quality Enhancer – Invests saved time in better documentation, more comprehensive tests, and cleaner code
  4. The Volume Maximizer – Simply produces more features, closes more tickets, and ships more code

Which one are you? Most of us are a blend, shifting between these archetypes depending on project requirements and personal interests.

The Great Capability Expansion

What makes AI coding assistants particularly powerful is how they expand what individual developers can accomplish:

This expansion means junior developers can contribute to complex systems earlier in their careers, while senior developers can focus more on architecture and innovation.

Reimagining Productivity in the AI Era

From "Doing Things Faster" to "Doing Better Things"

The most successful teams I've observed aren't just using AI to speed up existing processes—they're rethinking what processes should exist in the first place.

Consider this reimagined development workflow:




The key shift: humans focus on the parts of the process where creativity, judgment, and contextual understanding matter most.

How to Thrive in the Age of AI-Assisted Development

1. Embrace Strategic Inefficiency


Not everything should be optimized for speed. Sometimes, diving deep into a problem without AI assistance builds fundamental understanding that pays dividends later.

2. Set Clear Boundaries

Establish team norms around when and how to use AI assistants. Some projects benefit from exploration and creative generation; others need careful, methodical human reasoning.

3. Measure What Matters

If you're still measuring productivity by lines of code or tickets closed, you're missing the true impact of AI assistance. Consider metrics like:

  • Time to validated solution (not just working code)
  • Reduction in production incidents
  • User-reported satisfaction with features
  • Knowledge dissemination across the team

4. Continuously Reskill

The skills that make developers valuable are evolving rapidly. The future belongs to those who can:

  • Clearly articulate problems for AI to solve
  • Evaluate and refine AI-generated solutions
  • Understand and communicate system-level concerns
  • Apply deep domain knowledge to technical decisions

Looking Ahead: The Co-Evolution of Engineers and Their Tools

As our relationship with AI coding assistants deepens, we're not just changing our tools—our tools are changing us. The software engineers of 2030 will approach problems differently than those of 2020, just as today's engineers think differently than those of the pre-internet era.

The most exciting part of this journey isn't just what we'll build—it's who we'll become as builders.


What's your experience with AI coding assistants? Are you saving time, doing more, or both? Share your thoughts in the comments below!

Thursday, 6 March 2025

Building a Strongly-Typed API for Large Language Models

 In this post, we'll explore a Java interface designed to interact with Large Language Models (LLMs)  in a type-safe manner. We'll break down the GenerativeAIService interface and its supporting classes to understand how it provides a structured approach to AI interactions.

The Problem: Unstructured LLM Responses

When working with LLMs, responses typically come as unstructured text. This presents challenges when you need to extract specific data or integrate AI capabilities into enterprise applications that expect structured data.

For example, if you want an LLM to generate JSON data for your application, you'd need to:

  1. Parse the response text
  2. Extract the JSON portion
  3. Deserialize it into your application objects
  4. Handle parsing errors appropriately

This process can be error-prone and verbose when implemented across multiple parts of your application.

Enter GenerativeAIService

The GenerativeAIService interface provides a clean solution to this problem by offering methods that not only communicate with LLM APIs but also handle the parsing of responses into Java objects.

Let's look at the core interface:

java
public interface GenerativeAIService {
ChatMessageReply chat(ChatRequest conversation);

default ChatRequest prepareRequest(ChatRequest conversation, Map<String, Object> params) {
return ParamPreparedRequest.prepare(conversation, params);
}



default <T> T chat(ChatRequest conversation, Class<T> returnType) {
return chat(conversation, returnType, (jsonContent, e) -> {
throw new RuntimeException("Failed to parse JSON: " + jsonContent, e);
}).get();
}

default <T> Optional<T> chat(ChatRequest conversation, Class<T> returnType, BiConsumer<String, Exception> onFailedParsing) {
var reply = chat(conversation);
return ChatMessageJsonParser.parse(reply, returnType, onFailedParsing);
}

//Other methods
}

The interface provides three key capabilities:

  1. Basic Chat Functionality: The chat(ChatRequest) method handles direct communication with the LLM and returns raw responses.
  2. Type-Safe Responses: Overloaded chat() methods accept a Class<T> parameter to specify the expected return type, allowing the service to automatically parse the LLM response into the desired Java class.
  3. Robust Error Handling: Options to provide custom error handling logic when parsing fails.

How It Works

Behind the scenes, the ChatMessageJsonParser class does the heavy lifting:

java
public static <T> Optional<T> parse(ChatMessageReply reply, Class<T> returnType, BiConsumer<String, Exception> onFailedParsing) {
var message = reply.message().trim();
var jsonContent = _extractMessage(message);
return _cast(returnType, onFailedParsing, jsonContent);
}

It:

  1. Extracts JSON content from the LLM's response (which may be wrapped in markdown code blocks)
  2. Uses Gson to deserialize the JSON into the requested type
  3. Handles parsing errors according to the provided error handler

Parameterised Prompts

The interface also supports parameterised prompts through the ParamPreparedRequest class:

java
default ChatRequest prepareRequest(ChatRequest conversation, Map<String, Object> params) {
return ParamPreparedRequest.prepare(conversation, params);
}

This allows you to:

  1. Create template prompts with placeholders like {{parameter_name}}
  2. Fill those placeholders at runtime with a map of parameter values
  3. Validate that all required parameters are provided

Code Example: Using the Typed API

Here's how you might use this API in practice:

java
// Define a data class for the structured response
public static class ProductSuggestion {
public String productName;
public String description;
public double price;
public List<String> features;



}
// Create a parameterised prompt
String prompt = """
Suggest a {{product_type}} product with {{feature_count}} features. Reply in JSON format

<example>
{
"productName": "product name",
"description": "product description",
"price": 100.0,
"features": ["feature 1", "feature 2", "feature 3", "feature 4", "feature 5"]
}
</example>
""";
var request = new ChatRequest(
"gemini-2.0-flash",
0.7f,
List.of(new ChatRequest.ChatMessage("user",
prompt))
);

// Prepare with parameters
Map<String, Object> params = Map.of(
"product_type", "smart home",
"feature_count", 5
);
request = service.prepareRequest(request, params);


// Get typed response
var suggestion = service.chat(request, ProductSuggestion.class);

System.out.println(suggestion);

Benefits of a Typed LLM API

  1. Type Safety: Catch type mismatches at compile time rather than runtime.
  2. Clean Integration: Seamlessly incorporate AI capabilities into existing Java applications.
  3. Reduced Boilerplate: Consolidate JSON parsing and error handling logic in one place.
  4. Parameter Validation: Ensure all required prompt parameters are provided before making API calls.
  5. Flexible Error Handling: Customize how parsing errors are handled based on your application's needs.

Implementation Considerations

When implementing this interface for different AI providers, consider:

  • JSON Mode/Structure Mode: Now a days LLM support JSON or Structure mode and that can used as compared to Prompt instruction. 
  • Response formats: Ensure your parser can handle the specific output formats of each provider.

Conclusion

By creating a strongly-typed interface for LLM interactions, we bridge the gap between the unstructured world of AI and the structured requirements of enterprise applications. This approach enables developers to leverage the power of large language models while maintaining the type safety and predictability.

The GenerativeAIService interface provides a foundation that can be extended to work with various AI providers while providing a consistent interface for application code. It represents a step toward making AI capabilities more accessible and manageable in traditional software development workflows.


Code for this post is available @ TypeSafety

LLM Patterns

 Design patterns are reusable solutions to common problems in software design. They represent best practices evolved over time by experienced software developers to solve recurring design challenges.

The concept was popularized by the "Gang of Four" (Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides) in their influential 1994 book "Design Patterns: Elements of Reusable Object-Oriented Software."


In this post we are going to look at some of design patterns for LLM. 


Simple Chat

This is most simple pattern where text is send as input into LLM and Text is return. Every one start with this.



Lets look at code example 

var service = GenerativeAIDriverManager.create(GoogleAIFactory.NAME, "https://generativelanguage.googleapis.com", properties);

var messages = new ChatRequest.ChatMessage("user", "Top 5 Country by GPD");
var conversation = ChatRequest.create("gemini-2.0-flash", List.of(messages));
var reply = service.chat(conversation);
System.out.println(reply.message());

Output
Okay, here are the top 5 countries by GDP (Nominal) according to the latest estimates from the International Monetary Fund (IMF) as of October 2023:

1.  **United States:** $26.95 trillion
2.  **China:** $17.72 trillion
3.  **Germany:** $4.43 trillion
4.  **Japan:** $4.23 trillion
5.  **India:** $3.73 trillion

It's important to note:

*   **Source:**  I'm using the IMF's World Economic Outlook Database, October 2023 edition.  These are estimates and projections, and are subject to change.
*   **Nominal GDP:** This is GDP measured at current market prices, without adjusting for inflation.
*   **Data Availability:** The most current, definitive GDP figures are usually released with a bit of a lag.
*   **GDP (PPP):** It's also worth knowing that if you look at GDP based on Purchasing Power Parity (PPP), the rankings can shift somewhat, with China often being very close to, or even exceeding, the United States.

Simple Chat with Some Structure

You may wonder how to use LLM responses programmatically when looking at the output. This is precisely the problem we'll solve with this pattern. By making a small adjustment to your prompt, you can instruct the LLM to return JSON output. The revised prompt would look like: "List the top 5 countries by GDP. Reply in JSON format."


With just a small change to the prompt, the LLM can return structured output, making it function more like a proper API with a degree of type safety. Types are essential in programming—without them, code becomes difficult to maintain and debug as applications grow in complexity.

Lets look at output of prompt 

```json
{
  "top_5_countries_by_gdp": [
    {
      "rank": 1,
      "country": "United States",
      "gdp_usd": "Approximately $25+ Trillion (USD)"
    },
    {
      "rank": 2,
      "country": "China",
      "gdp_usd": "Approximately $17+ Trillion (USD)"
    },
    {
      "rank": 3,
      "country": "Japan",
      "gdp_usd": "Approximately $4+ Trillion (USD)"
    },
    {
      "rank": 4,
      "country": "Germany",
      "gdp_usd": "Approximately $4+ Trillion (USD)"
    },
    {
      "rank": 5,
      "country": "India",
      "gdp_usd": "Approximately $3+ Trillion (USD)"
    }
  ],
  "note": "GDP figures are approximate and based on the most recent available data (typically from organizations like the World Bank and the IMF).  These values fluctuate and can vary slightly depending on the source and the date the data was collected."
}
```

Chat with My Custom Structure

Now you know where we are going. JSON output is good but you need more control and consistency over what is structure of output. One more thing to note without enforcing specific type structure LLM is free to return data in any structure and that will break your API contract. This is also achieved by changing prompt to 

```
Top 5 Country by GPD. Reply in JSON format
Example:
{
"countries":[
{"name":"country 1","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country},
{"name":"country 2","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country},
{"name":"country 3","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country},
{"name":"country 4","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country},
{"name":"country 5","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country}
]
}

```


Code Sample
var prompt = """
Top 5 Country by GPD. Reply in JSON format
Example:
{
"countries":[
{"name":"country 1","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country},
{"name":"country 2","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country},
{"name":"country 3","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country},
{"name":"country 4","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country},
{"name":"country 5","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country}
]
}
""";
var messages = new ChatRequest.ChatMessage("user", prompt);
var conversation = ChatRequest.create("gemini-2.0-flash", List.of(messages));
var reply = service.chat(conversation);
System.out.println(reply.message());

Output 

```json
{
"countries": [
    {
        "name": "United States",
        "gdp": 26.95,
        "unit": "trillion USD",
        "rank": 1
    },
    {
        "name": "China",
        "gdp": 17.73,
        "unit": "trillion USD",
        "rank": 2
    },
    {
        "name": "Japan",
        "gdp": 4.23,
        "unit": "trillion USD",
        "rank": 3
    },
    {
        "name": "Germany",
        "gdp": 4.07,
        "unit": "trillion USD",
        "rank": 4
    },
    {
        "name": "India",
        "gdp": 3.42,
        "unit": "trillion USD",
        "rank": 5
    }
]
}
```
 

Strongly Typesafe Chat

We've established a solid foundation to approach type safety, and now we reach the final step: converting the LLM's string output by passing it through a TypeConverter to create a strongly typed object. This completes our transformation from unstructured text to programmatically usable data.



Changes for type safety is done in library - llmapi

```
<dependency>
    <groupId>org.llm</groupId>
    <artifactId>llmapi</artifactId>
    <version>1.2.1</version>
</dependency>
```

Sample Code

var prompt = """
Top 5 Country by GPD. Reply in JSON format
Example:
{
"countries":[
{"name":"country 1","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country},
{"name":"country 2","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country},
{"name":"country 3","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country},
{"name":"country 4","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country},
{"name":"country 5","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country}
]
}
""";
var messages = new ChatRequest.ChatMessage("user", prompt);
var conversation = ChatRequest.create("gemini-2.0-flash", List.of(messages));

var reply = service.chat(conversation, CountryGdp.class);
System.out.println(reply);


Only change in the code is using typesafe chat function from llmapi

public interface GenerativeAIService {
ChatMessageReply chat(ChatRequest var1);

default EmbeddingReply embedding(EmbeddingRequest embedding) {
throw new UnsupportedOperationException("Not Supported");
}

default <T> T chat(ChatRequest conversation, Class<T> returnType) {
....
}

default <T> Optional<T> chat(ChatRequest conversation, Class<T> returnType, BiConsumer<String, Exception> onFailedParsing) {
....
}
}

Output is instance of CountryGdp Object.

Parameter based chat

As your LLM applications grow in complexity, your prompts will inevitably become more sophisticated. One essential feature for managing this complexity is parameter support, similar to what you find in JDBC. The next pattern addresses this need, demonstrating how prompts can contain parameters that are dynamically replaced at runtime, allowing for more flexible and reusable prompt templates.



Code 

var prompt = """
Top {{no_of_country}} Country by GPD. Reply in JSON format
Example:
{
"countries":[
{"name":"country 1","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country},
{"name":"country 2","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country},
{"name":"country 3","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country},
{"name":"country 4","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country},
{"name":"country 5","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country}
]
}
""";
var messages = new ChatRequest.ChatMessage("user", prompt);
var conversation = ChatRequest.create("gemini-2.0-flash", List.of(messages));

var preparedConversion = service.prepareRequest(conversation, Map.of("no_of_country", "10"));
var reply = service.chat(preparedConversion, CountryGdp.class);

System.out.println(reply);



Conculsion

We have only scratched the surface of LLM patterns. In this post, I've covered some basic to intermediate concepts, but my next post will delve into more advanced patterns that build upon these fundamentals.

All the code used in this post is available @ llmpatterns git project



Wednesday, 5 March 2025

Building a Universal Java Client for Large Language Models

Building a Universal Java Client for Large Language Models

In today's rapidly evolving AI landscape, developers often need to work with multiple Large Language Model (LLM) providers to find the best solution for their specific use case. Whether you're exploring OpenAI's GPT models, Anthropic's Claude, or running local models via Ollama, having a unified interface can significantly simplify development and make it easier to switch between providers.

The Java LLM Client project provides exactly this: a clean, consistent API for interacting with various LLM providers through a single library. Let's explore how this library works and how you can use it in your Java applications.

Core Features

The library offers several key features that make working with LLMs easier:

  1. Unified Interface: Interact with different LLM providers through a consistent API
  2. Multiple Provider Support: Currently supports OpenAI, Anthropic, Google, Groq, and Ollama
  3. Chat Completions: Send messages and receive responses from language models
  4. Embeddings: Generate vector representations of text where supported
  5. Factory Pattern: Easily create service instances for different providers

Architecture Overview

The library is built around a few key interfaces and classes:

  • GenerativeAIService: The main interface for interacting with LLMs
  • GenerativeAIFactory: Factory interface for creating service instances
  • GenerativeAIDriverManager: Registry that manages available services
  • Provider-specific implementations in separate packages

This design follows the classic factory pattern, allowing you to:

  1. Register service factories with the GenerativeAIDriverManager
  2. Create service instances through the manager
  3. Use a consistent API to interact with different providers

Getting Started

To use the library, first add it to your Maven project:

xml
<dependency> <groupId>org.llm</groupId> <artifactId>llmapi</artifactId> <version>1.0.0</version> </dependency>


Basic Usage Example

Here's how to set up and use the library:

java
// Register service providers GenerativeAIDriverManager.registerService(OpenAIFactory.NAME, new OpenAIFactory()); GenerativeAIDriverManager.registerService(AnthropicAIFactory.NAME, new AnthropicAIFactory()); // Register more providers as needed // Create an OpenAI service Map<String, Object> properties = Map.of("apiKey", System.getenv("gpt_key")); var service = GenerativeAIDriverManager.create( OpenAIFactory.NAME, "https://api.openai.com/", properties ); // Create and send a chat request var message = new ChatMessage("user", "Hello, how are you?"); var conversation = new ChatRequest("gpt-4o-mini", List.of(message)); var reply = service.chat(conversation); System.out.println(reply.message()); // Generate embeddings var vector = service.embedding( new EmbeddingRequest("text-embedding-3-small", "How are you") ); System.out.println(Arrays.toString(vector.embedding()));

Working with Different Providers

OpenAI

java
Map<String, Object> properties = Map.of("apiKey", System.getenv("gpt_key")); var service = GenerativeAIDriverManager.create( OpenAIFactory.NAME, "https://api.openai.com/", properties ); // Chat with GPT-4o mini var conversation = new ChatRequest("gpt-4o-mini", List.of(new ChatMessage("user", "Hello, how are you?"))); var reply = service.chat(conversation);

Anthropic

java
Map<String, Object> properties = Map.of("apiKey", System.getenv("ANTHROPIC_API_KEY")); var service = GenerativeAIDriverManager.create( AnthropicAIFactory.NAME, "https://api.anthropic.com", properties ); // Chat with Claude var conversation = new ChatRequest("claude-3-7-sonnet-20250219", List.of(new ChatMessage("user", "Hello, how are you?"))); var reply = service.chat(conversation);

Ollama (Local Models)

java
// No API key needed for local models Map<String, Object> properties = Map.of(); var service = GenerativeAIDriverManager.create( OllamaFactory.NAME, "http://localhost:11434", properties ); // Chat with locally hosted Llama model var conversation = new ChatRequest("llama3.2", List.of(new ChatMessage("user", "Hello, how are you?"))); var reply = service.chat(conversation);

Under the Hood

The library uses an RPC (Remote Procedure Call) client to handle the HTTP communication with various APIs. Each provider's implementation:

  1. Creates appropriate request objects with the required format
  2. Sends requests to the corresponding API endpoints
  3. Parses responses into a consistent format
  4. Handles errors gracefully

The RpcBuilder creates proxy instances of service interfaces, handling the HTTP communication details so you don't have to.

Supported Models

The library currently supports several models across different providers:

  • OpenAI: all
  • Anthropic: all
  • Google: gemini-2.0-flash
  • Groq: all
  • Ollama: any other model you have locally

Extending the Library

One of the strengths of this design is how easily it can be extended to support new providers or features:

  1. Create a new implementation of GenerativeAIFactory
  2. Implement GenerativeAIService for the new provider
  3. Create necessary request/response models
  4. Register the new factory with GenerativeAIDriverManager

Conclusion

The Java LLM Client provides a clean, consistent way to work with multiple LLM providers in Java applications. By abstracting away the differences between APIs, it allows developers to focus on their application logic rather than the details of each provider's implementation.

Whether you're building a chatbot, generating embeddings for semantic search, or experimenting with different LLM providers, this library offers a straightforward way to integrate these capabilities into your Java applications.

The project's use of standard Java patterns like factories and interfaces makes it easy to understand and extend, while its modular design allows you to use only the providers you need. As the LLM ecosystem continues to evolve, this type of abstraction layer will become increasingly valuable for developers looking to build flexible, future-proof applications.


Link to github project - llmapi