Thursday, 6 March 2025

Building a Strongly-Typed API for Large Language Models

 In this post, we'll explore a Java interface designed to interact with Large Language Models (LLMs)  in a type-safe manner. We'll break down the GenerativeAIService interface and its supporting classes to understand how it provides a structured approach to AI interactions.

The Problem: Unstructured LLM Responses

When working with LLMs, responses typically come as unstructured text. This presents challenges when you need to extract specific data or integrate AI capabilities into enterprise applications that expect structured data.

For example, if you want an LLM to generate JSON data for your application, you'd need to:

  1. Parse the response text
  2. Extract the JSON portion
  3. Deserialize it into your application objects
  4. Handle parsing errors appropriately

This process can be error-prone and verbose when implemented across multiple parts of your application.

Enter GenerativeAIService

The GenerativeAIService interface provides a clean solution to this problem by offering methods that not only communicate with LLM APIs but also handle the parsing of responses into Java objects.

Let's look at the core interface:

java
public interface GenerativeAIService {
ChatMessageReply chat(ChatRequest conversation);

default ChatRequest prepareRequest(ChatRequest conversation, Map<String, Object> params) {
return ParamPreparedRequest.prepare(conversation, params);
}



default <T> T chat(ChatRequest conversation, Class<T> returnType) {
return chat(conversation, returnType, (jsonContent, e) -> {
throw new RuntimeException("Failed to parse JSON: " + jsonContent, e);
}).get();
}

default <T> Optional<T> chat(ChatRequest conversation, Class<T> returnType, BiConsumer<String, Exception> onFailedParsing) {
var reply = chat(conversation);
return ChatMessageJsonParser.parse(reply, returnType, onFailedParsing);
}

//Other methods
}

The interface provides three key capabilities:

  1. Basic Chat Functionality: The chat(ChatRequest) method handles direct communication with the LLM and returns raw responses.
  2. Type-Safe Responses: Overloaded chat() methods accept a Class<T> parameter to specify the expected return type, allowing the service to automatically parse the LLM response into the desired Java class.
  3. Robust Error Handling: Options to provide custom error handling logic when parsing fails.

How It Works

Behind the scenes, the ChatMessageJsonParser class does the heavy lifting:

java
public static <T> Optional<T> parse(ChatMessageReply reply, Class<T> returnType, BiConsumer<String, Exception> onFailedParsing) {
var message = reply.message().trim();
var jsonContent = _extractMessage(message);
return _cast(returnType, onFailedParsing, jsonContent);
}

It:

  1. Extracts JSON content from the LLM's response (which may be wrapped in markdown code blocks)
  2. Uses Gson to deserialize the JSON into the requested type
  3. Handles parsing errors according to the provided error handler

Parameterised Prompts

The interface also supports parameterised prompts through the ParamPreparedRequest class:

java
default ChatRequest prepareRequest(ChatRequest conversation, Map<String, Object> params) {
return ParamPreparedRequest.prepare(conversation, params);
}

This allows you to:

  1. Create template prompts with placeholders like {{parameter_name}}
  2. Fill those placeholders at runtime with a map of parameter values
  3. Validate that all required parameters are provided

Code Example: Using the Typed API

Here's how you might use this API in practice:

java
// Define a data class for the structured response
public static class ProductSuggestion {
public String productName;
public String description;
public double price;
public List<String> features;



}
// Create a parameterised prompt
String prompt = """
Suggest a {{product_type}} product with {{feature_count}} features. Reply in JSON format

<example>
{
"productName": "product name",
"description": "product description",
"price": 100.0,
"features": ["feature 1", "feature 2", "feature 3", "feature 4", "feature 5"]
}
</example>
""";
var request = new ChatRequest(
"gemini-2.0-flash",
0.7f,
List.of(new ChatRequest.ChatMessage("user",
prompt))
);

// Prepare with parameters
Map<String, Object> params = Map.of(
"product_type", "smart home",
"feature_count", 5
);
request = service.prepareRequest(request, params);


// Get typed response
var suggestion = service.chat(request, ProductSuggestion.class);

System.out.println(suggestion);

Benefits of a Typed LLM API

  1. Type Safety: Catch type mismatches at compile time rather than runtime.
  2. Clean Integration: Seamlessly incorporate AI capabilities into existing Java applications.
  3. Reduced Boilerplate: Consolidate JSON parsing and error handling logic in one place.
  4. Parameter Validation: Ensure all required prompt parameters are provided before making API calls.
  5. Flexible Error Handling: Customize how parsing errors are handled based on your application's needs.

Implementation Considerations

When implementing this interface for different AI providers, consider:

  • JSON Mode/Structure Mode: Now a days LLM support JSON or Structure mode and that can used as compared to Prompt instruction. 
  • Response formats: Ensure your parser can handle the specific output formats of each provider.

Conclusion

By creating a strongly-typed interface for LLM interactions, we bridge the gap between the unstructured world of AI and the structured requirements of enterprise applications. This approach enables developers to leverage the power of large language models while maintaining the type safety and predictability.

The GenerativeAIService interface provides a foundation that can be extended to work with various AI providers while providing a consistent interface for application code. It represents a step toward making AI capabilities more accessible and manageable in traditional software development workflows.


Code for this post is available @ TypeSafety

LLM Patterns

 Design patterns are reusable solutions to common problems in software design. They represent best practices evolved over time by experienced software developers to solve recurring design challenges.

The concept was popularized by the "Gang of Four" (Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides) in their influential 1994 book "Design Patterns: Elements of Reusable Object-Oriented Software."


In this post we are going to look at some of design patterns for LLM. 


Simple Chat

This is most simple pattern where text is send as input into LLM and Text is return. Every one start with this.



Lets look at code example 

var service = GenerativeAIDriverManager.create(GoogleAIFactory.NAME, "https://generativelanguage.googleapis.com", properties);

var messages = new ChatRequest.ChatMessage("user", "Top 5 Country by GPD");
var conversation = ChatRequest.create("gemini-2.0-flash", List.of(messages));
var reply = service.chat(conversation);
System.out.println(reply.message());

Output
Okay, here are the top 5 countries by GDP (Nominal) according to the latest estimates from the International Monetary Fund (IMF) as of October 2023:

1.  **United States:** $26.95 trillion
2.  **China:** $17.72 trillion
3.  **Germany:** $4.43 trillion
4.  **Japan:** $4.23 trillion
5.  **India:** $3.73 trillion

It's important to note:

*   **Source:**  I'm using the IMF's World Economic Outlook Database, October 2023 edition.  These are estimates and projections, and are subject to change.
*   **Nominal GDP:** This is GDP measured at current market prices, without adjusting for inflation.
*   **Data Availability:** The most current, definitive GDP figures are usually released with a bit of a lag.
*   **GDP (PPP):** It's also worth knowing that if you look at GDP based on Purchasing Power Parity (PPP), the rankings can shift somewhat, with China often being very close to, or even exceeding, the United States.

Simple Chat with Some Structure

You may wonder how to use LLM responses programmatically when looking at the output. This is precisely the problem we'll solve with this pattern. By making a small adjustment to your prompt, you can instruct the LLM to return JSON output. The revised prompt would look like: "List the top 5 countries by GDP. Reply in JSON format."


With just a small change to the prompt, the LLM can return structured output, making it function more like a proper API with a degree of type safety. Types are essential in programming—without them, code becomes difficult to maintain and debug as applications grow in complexity.

Lets look at output of prompt 

```json
{
  "top_5_countries_by_gdp": [
    {
      "rank": 1,
      "country": "United States",
      "gdp_usd": "Approximately $25+ Trillion (USD)"
    },
    {
      "rank": 2,
      "country": "China",
      "gdp_usd": "Approximately $17+ Trillion (USD)"
    },
    {
      "rank": 3,
      "country": "Japan",
      "gdp_usd": "Approximately $4+ Trillion (USD)"
    },
    {
      "rank": 4,
      "country": "Germany",
      "gdp_usd": "Approximately $4+ Trillion (USD)"
    },
    {
      "rank": 5,
      "country": "India",
      "gdp_usd": "Approximately $3+ Trillion (USD)"
    }
  ],
  "note": "GDP figures are approximate and based on the most recent available data (typically from organizations like the World Bank and the IMF).  These values fluctuate and can vary slightly depending on the source and the date the data was collected."
}
```

Chat with My Custom Structure

Now you know where we are going. JSON output is good but you need more control and consistency over what is structure of output. One more thing to note without enforcing specific type structure LLM is free to return data in any structure and that will break your API contract. This is also achieved by changing prompt to 

```
Top 5 Country by GPD. Reply in JSON format
Example:
{
"countries":[
{"name":"country 1","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country},
{"name":"country 2","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country},
{"name":"country 3","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country},
{"name":"country 4","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country},
{"name":"country 5","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country}
]
}

```


Code Sample
var prompt = """
Top 5 Country by GPD. Reply in JSON format
Example:
{
"countries":[
{"name":"country 1","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country},
{"name":"country 2","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country},
{"name":"country 3","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country},
{"name":"country 4","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country},
{"name":"country 5","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country}
]
}
""";
var messages = new ChatRequest.ChatMessage("user", prompt);
var conversation = ChatRequest.create("gemini-2.0-flash", List.of(messages));
var reply = service.chat(conversation);
System.out.println(reply.message());

Output 

```json
{
"countries": [
    {
        "name": "United States",
        "gdp": 26.95,
        "unit": "trillion USD",
        "rank": 1
    },
    {
        "name": "China",
        "gdp": 17.73,
        "unit": "trillion USD",
        "rank": 2
    },
    {
        "name": "Japan",
        "gdp": 4.23,
        "unit": "trillion USD",
        "rank": 3
    },
    {
        "name": "Germany",
        "gdp": 4.07,
        "unit": "trillion USD",
        "rank": 4
    },
    {
        "name": "India",
        "gdp": 3.42,
        "unit": "trillion USD",
        "rank": 5
    }
]
}
```
 

Strongly Typesafe Chat

We've established a solid foundation to approach type safety, and now we reach the final step: converting the LLM's string output by passing it through a TypeConverter to create a strongly typed object. This completes our transformation from unstructured text to programmatically usable data.



Changes for type safety is done in library - llmapi

```
<dependency>
    <groupId>org.llm</groupId>
    <artifactId>llmapi</artifactId>
    <version>1.2.1</version>
</dependency>
```

Sample Code

var prompt = """
Top 5 Country by GPD. Reply in JSON format
Example:
{
"countries":[
{"name":"country 1","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country},
{"name":"country 2","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country},
{"name":"country 3","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country},
{"name":"country 4","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country},
{"name":"country 5","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country}
]
}
""";
var messages = new ChatRequest.ChatMessage("user", prompt);
var conversation = ChatRequest.create("gemini-2.0-flash", List.of(messages));

var reply = service.chat(conversation, CountryGdp.class);
System.out.println(reply);


Only change in the code is using typesafe chat function from llmapi

public interface GenerativeAIService {
ChatMessageReply chat(ChatRequest var1);

default EmbeddingReply embedding(EmbeddingRequest embedding) {
throw new UnsupportedOperationException("Not Supported");
}

default <T> T chat(ChatRequest conversation, Class<T> returnType) {
....
}

default <T> Optional<T> chat(ChatRequest conversation, Class<T> returnType, BiConsumer<String, Exception> onFailedParsing) {
....
}
}

Output is instance of CountryGdp Object.

Parameter based chat

As your LLM applications grow in complexity, your prompts will inevitably become more sophisticated. One essential feature for managing this complexity is parameter support, similar to what you find in JDBC. The next pattern addresses this need, demonstrating how prompts can contain parameters that are dynamically replaced at runtime, allowing for more flexible and reusable prompt templates.



Code 

var prompt = """
Top {{no_of_country}} Country by GPD. Reply in JSON format
Example:
{
"countries":[
{"name":"country 1","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country},
{"name":"country 2","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country},
{"name":"country 3","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country},
{"name":"country 4","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country},
{"name":"country 5","gdp":gpd , "unit":"trillion or billion etc","rank":rank of country}
]
}
""";
var messages = new ChatRequest.ChatMessage("user", prompt);
var conversation = ChatRequest.create("gemini-2.0-flash", List.of(messages));

var preparedConversion = service.prepareRequest(conversation, Map.of("no_of_country", "10"));
var reply = service.chat(preparedConversion, CountryGdp.class);

System.out.println(reply);



Conculsion

We have only scratched the surface of LLM patterns. In this post, I've covered some basic to intermediate concepts, but my next post will delve into more advanced patterns that build upon these fundamentals.

All the code used in this post is available @ llmpatterns git project



Wednesday, 5 March 2025

Building a Universal Java Client for Large Language Models

Building a Universal Java Client for Large Language Models

In today's rapidly evolving AI landscape, developers often need to work with multiple Large Language Model (LLM) providers to find the best solution for their specific use case. Whether you're exploring OpenAI's GPT models, Anthropic's Claude, or running local models via Ollama, having a unified interface can significantly simplify development and make it easier to switch between providers.

The Java LLM Client project provides exactly this: a clean, consistent API for interacting with various LLM providers through a single library. Let's explore how this library works and how you can use it in your Java applications.

Core Features

The library offers several key features that make working with LLMs easier:

  1. Unified Interface: Interact with different LLM providers through a consistent API
  2. Multiple Provider Support: Currently supports OpenAI, Anthropic, Google, Groq, and Ollama
  3. Chat Completions: Send messages and receive responses from language models
  4. Embeddings: Generate vector representations of text where supported
  5. Factory Pattern: Easily create service instances for different providers

Architecture Overview

The library is built around a few key interfaces and classes:

  • GenerativeAIService: The main interface for interacting with LLMs
  • GenerativeAIFactory: Factory interface for creating service instances
  • GenerativeAIDriverManager: Registry that manages available services
  • Provider-specific implementations in separate packages

This design follows the classic factory pattern, allowing you to:

  1. Register service factories with the GenerativeAIDriverManager
  2. Create service instances through the manager
  3. Use a consistent API to interact with different providers

Getting Started

To use the library, first add it to your Maven project:

xml
<dependency> <groupId>org.llm</groupId> <artifactId>llmapi</artifactId> <version>1.0.0</version> </dependency>


Basic Usage Example

Here's how to set up and use the library:

java
// Register service providers GenerativeAIDriverManager.registerService(OpenAIFactory.NAME, new OpenAIFactory()); GenerativeAIDriverManager.registerService(AnthropicAIFactory.NAME, new AnthropicAIFactory()); // Register more providers as needed // Create an OpenAI service Map<String, Object> properties = Map.of("apiKey", System.getenv("gpt_key")); var service = GenerativeAIDriverManager.create( OpenAIFactory.NAME, "https://api.openai.com/", properties ); // Create and send a chat request var message = new ChatMessage("user", "Hello, how are you?"); var conversation = new ChatRequest("gpt-4o-mini", List.of(message)); var reply = service.chat(conversation); System.out.println(reply.message()); // Generate embeddings var vector = service.embedding( new EmbeddingRequest("text-embedding-3-small", "How are you") ); System.out.println(Arrays.toString(vector.embedding()));

Working with Different Providers

OpenAI

java
Map<String, Object> properties = Map.of("apiKey", System.getenv("gpt_key")); var service = GenerativeAIDriverManager.create( OpenAIFactory.NAME, "https://api.openai.com/", properties ); // Chat with GPT-4o mini var conversation = new ChatRequest("gpt-4o-mini", List.of(new ChatMessage("user", "Hello, how are you?"))); var reply = service.chat(conversation);

Anthropic

java
Map<String, Object> properties = Map.of("apiKey", System.getenv("ANTHROPIC_API_KEY")); var service = GenerativeAIDriverManager.create( AnthropicAIFactory.NAME, "https://api.anthropic.com", properties ); // Chat with Claude var conversation = new ChatRequest("claude-3-7-sonnet-20250219", List.of(new ChatMessage("user", "Hello, how are you?"))); var reply = service.chat(conversation);

Ollama (Local Models)

java
// No API key needed for local models Map<String, Object> properties = Map.of(); var service = GenerativeAIDriverManager.create( OllamaFactory.NAME, "http://localhost:11434", properties ); // Chat with locally hosted Llama model var conversation = new ChatRequest("llama3.2", List.of(new ChatMessage("user", "Hello, how are you?"))); var reply = service.chat(conversation);

Under the Hood

The library uses an RPC (Remote Procedure Call) client to handle the HTTP communication with various APIs. Each provider's implementation:

  1. Creates appropriate request objects with the required format
  2. Sends requests to the corresponding API endpoints
  3. Parses responses into a consistent format
  4. Handles errors gracefully

The RpcBuilder creates proxy instances of service interfaces, handling the HTTP communication details so you don't have to.

Supported Models

The library currently supports several models across different providers:

  • OpenAI: all
  • Anthropic: all
  • Google: gemini-2.0-flash
  • Groq: all
  • Ollama: any other model you have locally

Extending the Library

One of the strengths of this design is how easily it can be extended to support new providers or features:

  1. Create a new implementation of GenerativeAIFactory
  2. Implement GenerativeAIService for the new provider
  3. Create necessary request/response models
  4. Register the new factory with GenerativeAIDriverManager

Conclusion

The Java LLM Client provides a clean, consistent way to work with multiple LLM providers in Java applications. By abstracting away the differences between APIs, it allows developers to focus on their application logic rather than the details of each provider's implementation.

Whether you're building a chatbot, generating embeddings for semantic search, or experimenting with different LLM providers, this library offers a straightforward way to integrate these capabilities into your Java applications.

The project's use of standard Java patterns like factories and interfaces makes it easy to understand and extend, while its modular design allows you to use only the providers you need. As the LLM ecosystem continues to evolve, this type of abstraction layer will become increasingly valuable for developers looking to build flexible, future-proof applications.


Link to github project - llmapi


Tuesday, 4 March 2025

Measuring Developer Productivity in Age on GENAI

The GenAI Revolution: Two Years Later

November 30, 2022 marked a pivotal moment when ChatGPT was released, sparking excitement and optimism about increased efficiency across industries. Now, with over two years of GenAI integration, the industry has matured enough to properly evaluate the impact and value of these tools on various aspects of business. In this post, I'll focus specifically on measuring developer productivity.

Measuring Impact: Output vs. Outcome

The impact of any change—whether new tools, processes, or methodologies—can be measured in terms of both output and outcome.

As a product organization, outcomes are ultimately the metrics that deliver revenue or customer growth. However, this same model cannot be directly applied when measuring the impact of GenAI on developer efficiency.

A Framework for Measurement

In this post, I'll share several approaches to measure productivity with GenAI tools, focusing on a progression from:

Output → Outcome → Growth

This framework will help organizations better understand how GenAI affects developer productivity in ways that eventually translate to business value.




Developer productivity can be measured on multiple dimensions






How Fast

(Output)

Is effective 

(Output)

Impact

(Outcome)

Growth

(Outcome)

Primary Metrics

# PR per Engineers


# Test Coverage per PR

# Engineering time Index


# Non Engineering time index 

Failure Rate of Change


Usage of Feature

Time spent on new capability/products 


Time spent on R&D

Secondary Metrics

Cycle Time for PR


Deployment Frequency


Perceived rate of productivity 


Time on PRs per sprint 


Friction in  delivery


Code tech Debt Index


Code Security Debt Index

Last minute change 


Operational & Security Health 

ROI on new features 


Revenue per Engineers


New Products/Segments  


 

Finding the Right Mix of Developer Productivity Metrics

The table above outlines four key dimensions for measuring developer productivity in the GenAI era. These dimensions incorporate both quantitative and qualitative metrics, collected through various methods:

Balanced Measurement Approach

Each dimension contains metrics that vary in nature:

  • Quantitative metrics provide objective, numerical data that can be tracked over time
  • Qualitative metrics capture subjective experiences and insights that numbers alone cannot reveal


Lets start with category of metrics 

How Fast ( Output)

This metric provides a straightforward measure of how effectively development teams leverage generative AI tools to produce code and the rate at which they do so. It serves as an excellent starting point for analysis and can be fully automated for continuous monitoring.


Is Effective ( Output)

This category assesses the quality of output by analyzing the ratio of time spent on engineering versus non-engineering tasks. It also incorporates lagging indicators such as sprint-level pull request review times, code technical debt indices, and security vulnerability indices. These metrics, largely automated, provide insights into both positive outcomes and potential side effects.

Impact ( Outcome)

This category marks the initial phase of measuring the impact of generative AI-assisted work. It focuses on evaluating delivery quality, product usage, and overall product health.

Growth ( Outcome)

This final category focuses on quantifying the tangible value generated by new features, specifically in terms of return on investment (ROI) and revenue. While direct revenue impact may not be immediately apparent in short development cycles, the focus shifts to measuring the time freed up for new capability development and the potential for new product or market segment expansion.

Things to watch while you measure developer productivity. 

Measuring productivity can lead to misleading signals. Organizations should be wary of:

  • Spikes in Lines of Code (LOC) that don't mean better output.
  • High Commit/PR counts without real progress.
  • Long hours, which often signal burnout, not efficiency.
  • Burning through story points too fast, which can mean poor planning.
  • Focusing only on individual metrics, not team success.
  • Using gamification that hurts collaboration.
  • Too many unfinished POCs or WIP projects.
  • Thinking Generative AI fixes everything
  • A pattern of implementing new Generative AI tools at an unsustainable frequency, such as weekly or more

Conclusion

Metrics shared in this post are in between DORA and SPACE and gives holistic view of team productivity gain. 

If you are early in journey then refer to Implementing-genai-in-engineering-teams post that talks about how to implement transformation.