Monday, 14 August 2023

Multi Indexing

Imagine you are the owner of a grocery store, which stocks a wide variety of products such as bakery items, dairy products, eggs, grains, vegetables, oils, frozen foods, and more.

Now, you want to create a hybrid index that allows for fast lookups based on several attributes of the products, including:

Product name or ID
Product expiry date
Discount percentage
Fast-moving/selling items

In essence, this index will enable both simple lookups and more advanced functionalities like ranking products.

One crucial distinction to note between these types of lookups is that:

A search by product name or ID will yield unique items, meaning each product is listed only once.
A search by expiry date, discounted products, fast-moving items, etc., will return results ordered by their respective values, and the values are not necessarily unique.

With this hybrid index in place, you'll be able to efficiently manage your inventory and cater to customer demands more effectively. Whether it's quickly finding a specific product or identifying the most popular items, this system will streamline your grocery store operations and enhance the overall shopping experience.

Which data structures ?

Databases can effectively manage these types of access patterns through the implementation of indexes on the query columns and the strategic use of the 'ORDER BY' clause. However, it's important to note that this approach comes with its own set of trade-offs, which we'll explore in a future post.

For now, let's direct our attention towards determining the most suitable data structure for the task at hand.

When faced with the challenge of efficiently handling two distinct access patterns - key-value lookup and ordered lookup by some attribute - traditional data structures like Maps, Binary Search Trees, or Priority Queues may not individually fulfil all the requirements. As a solution, we can create a new data structure that combines the strengths of these structures into one, calling it Treep (Tree with priority).

Treep will have the following characteristics:

Key-Value Lookup: Like Maps, Treep will allow fast key-value lookups, enabling quick retrieval of data using unique identifiers.
Ordered Lookup by Attribute: Similar to Binary Search Trees and Priority Queues, Treep will maintain the data in a sorted order based on a selected attribute. This will enable efficient access to elements in a specific order (e.g., ascending or descending) with respect to that attribute.

The interface of the Treep data structure may look like below

public interface Treep<K, V> {
    void add(V value);

    Set<String> orderBy();

    V get(K key);

    V top(String attributeName);

    V takeTop(String attributeName);

    V delete(K key);

    Iterator<K> keys();

    int size();
}

get - This method efficiently handles the simple key-based access pattern. By providing a key as input, you can swiftly retrieve the associated value from the Treep. This operation is ideal for quick lookups of specific elements using their unique identifiers.

top/taketop - This method caters to the access pattern of retrieving elements ordered by a specific attribute.

Code snippet using Treep

Treep<String, Product> stores = new MapPriorityQueue<>(Product::name, Map.of(
        "discount", Comparator.comparing(Product::discount).reversed(),
        "price", Comparator.comparing(Product::price).reversed()
));


Arrays.asList(
        Product.of("AXION Yellow", 2.12f, .10f),
        Product.of("Meji Fresh Milk 2L", 6.9f, 0.0f),
        Product.of("red Chilli 100 G", 1.14f, .05f),
        Product.of("Fresh Cucumber", 1.37f, .01f),
        Product.of("China Garlic", 1.93f, 0.0f),
        Product.of("Red Onion", 1.19f, 0.07f),
        Product.of("Fuji Apple", 3.14f, .11f),
        Product.of("Banana", 3.58f, .12f)
).forEach(stores::add);

assertEquals(Product.of("Banana", 3.58f, .12f), stores.top("discount"));
assertEquals(Product.of("Meji Fresh Milk 2L", 6.9f, 0.0f), stores.top("price"));

MapPriorityQueue is using Map and Priority queue to implement multi index feature.

Full working code is available @ github. There are other ways to implement such data structure by using Map + SortedSet , Map + SkipList

Saturday, 15 July 2023

More choice using ChoiceFormat

The conversion of numbers to text is a frequently encountered problem for engineers. It manifests in various scenarios, and some of the most prevalent examples include:

Converting a number to a day of the week
Converting a number to a month
Transforming a status into user-friendly text

Now, let's explore some solutions for addressing this problem, using the conversion of a number to a day of the week as an explanatory example.

- IF/Else/Switch

public static String toDayOfWeek(int value) {

    if (value == 1) {
        return "MON";
    } else if (value == 2) {
        return "TUE";
    } else if (value == 3) {
        return "WED";
    } else if (value == 4) {
        return "THU";
    } else if (value == 5) {
        return "FRI";
    } else if (value == 6) {
        return "SAT";
    } else if (value == 7) {
        return "WED";
    }
    return "You are on moon";

}

While this solution may serve as a decent starting point, it is worth noting that even your grandfather might not approve of it.

- Maps

public static String toDayOfWeek(int value) {

    Map<Integer, String> dayOfWeek = new HashMap<>() {
        {
            int index = 1;
            put(index++, "MON");
            put(index++, "TUE");
            put(index++, "WED");
            put(index++, "THU");
            put(index++, "FRI");
            put(index++, "SAT");
            put(index++, "SUN");
        }
    };

    return dayOfWeek
            .getOrDefault(value, "You are on moon");

}

This solution appears to be an improvement and can be considered a good option, especially when the key is dynamic. However, it is important to note that maintaining this solution can become challenging over time.

- Enums

public enum DayOfWeek {
    MONDAY,
    TUESDAY,
    WEDNESDAY,
    THURSDAY,
    FRIDAY,
    SATURDAY,
    SUNDAY;
    private static final DayOfWeek[] ENUMS = DayOfWeek.values();

    public static DayOfWeek of(int dayOfWeek) {
        return ENUMS[dayOfWeek - 1];
    }
}

This approach seems quite elegant and aligns with the way JDK handles similar scenarios. It offers several advantages, such as leveraging data types to ensure compile-time safety. This not only enhances maintainability but also provides the benefit of catching errors during compilation. However, one potential drawback of this approach is that it can pose challenges when it comes to extending functionality due to the strong type safety constraints.

- Choice Format

The choice format is a relatively new feature introduced in JDK 17+, and it offers intriguing solutions for tackling this problem. Before delving into the intricacies of how it works, let's examine some code examples to get a better understanding.

public static String toDayOfWeek(int value) {
    double[] limits = {1, 2, 3, 4, 5, 6, 7};
    String[] formats = {"Mon", "Tue", "Wed", "Thur", "Fri", "Sat", "Sun"};

    ChoiceFormat form = new ChoiceFormat(limits, formats);
    return form.format(value);
}

This solution solves the problem by employing a straightforward approach: using pairs of arrays or vectors to map numbers to their corresponding strings.

At first glance, it may not seem particularly remarkable, resembling a Map structure where the keys are represented by one array and the values by another. However, the true test begins when we pass values that are not present in the array.

Now, let's speculate. What do you think this function would return if we were to pass 0 or 100? One possibility could be "Undefined," which would hold true if this were JavaScript. Another option could be null or a NullPointerException, which is a close guess given your familiarity with Java. However, this is where the solution gets interesting.

Lets look at the output for 1/2/0/100

1 -> Mon 2 -> Tue 0 -> Mon 100 -> Sun

The output obtained from this solution provides some valuable insights into how ChoiceFormat operates. For example, when we pass 0, it returns "Monday," and when we pass 100, it returns "Sunday."

Based on these results, you might start getting an idea of how ChoiceFormat functions. It relies on a few key elements:

Ascending limits array: The limits array is arranged in ascending order, defining the intervals.
Format array: This array has the same size as the limits array and contains the corresponding text representations for each interval.
Interval Behaviour: This values in limit array represent half-open interval, meaning that the lower limit is inclusive while upper limit is exclusive.

These factors play a crucial role in determining the appropriate text representation based on the input value within the defined intervals.

Lets look at half-open interval match with below function

public static String weekDayOrWeekend(int value) {
    double[] limits = {1, 6};
    String[] formats = {"WeekDay", "Weekend"};

    ChoiceFormat form = new ChoiceFormat(limits, formats);
    return form.format(value);
}

This particular implementation exhibits an interesting behavior: for values less than 5, the function returns "Weekday," while for values 6 and above, it returns "Weekend."

Isn't it fascinating how ChoiceFormat manages to accomplish this range-based search and deliver the appropriate result? It's remarkable how this small utility class can perform such a useful trick.

Let's consider one more simple example before delving into the greater capabilities of this utility class.

public static String workDays(int value) {
    double[] limits = {1, 2, 5, 6};
    String[] formats = {"Monday Blues", "WorkHard", "It is Friday!!", "Relax"};

    ChoiceFormat form = new ChoiceFormat(limits, formats);
    return form.format(value);
}

This should give fair bit of idea that this class will be useful in many places.

Lets look at some application

- Better log messages

public static String files(int value) {
    double[] limits = {0, 1, 2};
    String[] formats = {"No files", "One files", "Many files"};

    ChoiceFormat form = new ChoiceFormat(limits, formats);
    return form.format(value);
}

System.out.println("Found " + files(100)); //Found Many files
System.out.println("Found " + files(0));//Found No files
System.out.println("Found " + files(1));//Found One files

- Conditional Log message

public static String formatMessage(String format, int value) {
    ChoiceFormat form = new ChoiceFormat(format);
    return form.format(value);
}

String format = "0#no files | 1#one file |2# two files |3< more than 2 ";
System.out.println(formatMessage(format, 2)); //two files
System.out.println(formatMessage(format, 10)); //more than 2
System.out.println(formatMessage(format, 0)); //no files
System.out.println(formatMessage(format, 1)); //one file

This example showcases the power of advanced string interpolation by utilizing rules embedded within the format string.

By leveraging this technique, we can define rules directly within the format string itself, which provides a flexible and concise approach to handle various scenarios.

- Parameterised Conditional Log message

Multiple ways to do this, lets look at few example.

public static ChoiceFormat usingPair() {
    double[] priceLimits = {0.0, 10.0, 50.0, 100.0};
    String[] priceFormats = {
            "The item is not available",
            "The item is on sale for {0}",
            "The item is moderately priced at {0}",
            "The item is expensive at {0}"
    };
    return new ChoiceFormat(priceLimits, priceFormats);
}

public static ChoiceFormat usingStringLiteral() {
    return new ChoiceFormat(
            "0#The item is not available |10#The item is on sale for {0} |50#The item is moderately priced at {0} |100#The item is expensive at {0}");
}

public static ChoiceFormat usingStringRules() {

    String rules = String.join(" |",
            "0#The item is not available",
            "10#The item is on sale for {0}",
            "50#The item is moderately priced at {0}",
            "100#The item is expensive at {0}");

    return new ChoiceFormat(rules);
}

ChoiceFormat can be created using any of the methods mentioned above, each with its own advantages and considerations. However, some methods may be easier to maintain and less error-prone than others.

Among the options, if I were to choose one, I would prefer using the last method demonstrated, which involves using string rules. This method provides greater flexibility and simplicity in defining the rules for the ChoiceFormat. By using string rules, you can easily specify the mappings between input values and their corresponding text representations in a concise and readable manner. This approach often results in code that is easier to understand, modify, and maintain.

Above format can be used as below

ChoiceFormat priceFormat = usingStringRules();

double price = 120;
Object[] formatArguments = {price};
String formattedPrice = MessageFormat.format(priceFormat.format(price), formatArguments);
System.out.println(formattedPrice); // The item is expensive at 120

Just imaging how this capability can be used by logging framework !

ChoiceFormat empowers to amplify the range of choices available for message formatting. By incorporating ChoiceFormat into your code, you can introduce a multitude of options, enriching the formatting possibilities.

The versatility of ChoiceFormat allows you to define and customize a wide array of choices, each with its own designated format. This flexibility enables you to create dynamic and adaptive messages that cater to different input values.

With ChoiceFormat at your disposal, you can enhance your message formatting capabilities, opening up new avenues for crafting comprehensive and adaptable output.

Monday, 1 May 2023

Distributed Context Propgation

Why distributed context ?

Nowadays, modern applications are designed to be distributed across multiple data centers and regions, leveraging diverse technology stacks. As a result, a typical application architecture may span multiple geographical locations and incorporate various technologies, such as microservices, containers, and server less computing, to meet the complex demands of modern businesses. Application stack might look something like this.

Application Stack

In distributed systems, several challenges arise when it comes to maintaining certain aspects such as global state, telemetry recording, long-duration workflow pipeline checkpointing, and feature flags. These challenges are critical to ensuring the reliability, scalability, and fault tolerance of the system.

One of the primary challenges is managing global state, which refers to the data that needs to be shared and synchronized across multiple instances of the application. This can be a complex task, especially in highly distributed environments, where consistency and concurrency control are crucial.

Another key challenge is recording telemetry, which involves collecting, analyzing, and visualizing data about the system's performance, health, and usage. This helps teams monitor and troubleshoot the system, identify bottlenecks, and optimize its overall efficiency.

In addition, keeping track of long-duration workflow pipelines is essential for ensuring that complex tasks are completed reliably and efficiently. This requires checkpointing and resuming workflows from a particular point in the event of failures or other disruptions.

Finally, managing feature flags, which enable the selective enabling or disabling of certain features in the system, is critical for rapid iteration and experimentation. However, managing feature flags across multiple instances of the application can be challenging, as teams need to ensure that the correct flag configuration is propagated to all instances.

What is solution ?

Several solutions exist to tackle these challenges, such as distributed caching, message-oriented middleware, or specialized tools like Zookeeper for managing state. While these solutions can be effective, they often come with the complexity of managing a sophisticated infrastructure. Additionally, implementing a complex infrastructure for a relatively small use case can lead to more issues than solutions.

To address these challenges, OpenTelemetry offers a novel approach called "context and propagation" that operates at the transport layer. The idea behind this approach is to provide a simple and unified way to propagate information across different components of the distributed system.

Inspired by OpenTelemetry's context and propagation, I would like to introduce an idea that could help address these challenges more effectively. This approach involves utilizing a lightweight middleware layer that sits between the different components of the distributed system. The middleware layer would be responsible for managing the state, telemetry, and feature flags, and would provide a simple and standardized API for the application components to interact with.

By using this approach, teams can simplify the management of complex infrastructure while ensuring that their distributed system remains reliable, scalable, and fault-tolerant. Additionally, this approach could help teams experiment with new features and iterate quickly without compromising on the overall performance and stability of the system.

How does context API look ?

The proposed Context API would have several key features to help manage the state, telemetry, and feature flags of a distributed system more effectively. Some of these features include:

Context Name: This feature would enable the association of a user-friendly name with all the state, telemetry, and feature flags in the system. This name can be used to index and query the information easily.
Time to Live: Some context information is short-lived and should be automatically managed by the underlying implementation. This feature would enable the setting of a time to live for the context information, allowing for automatic expiration and cleanup when no longer needed.
Abstract Data Types: The Context API would support abstract data types such as counters, maps, and sets, allowing for flexible and efficient management of the state, telemetry, and feature flags.
Durability: The context information would be durable, meaning it would survive application restarts and remain available for a longer duration, enabling teams to understand the behavior of the distributed system over time.

By incorporating these features, the proposed Context API would provide a lightweight and standardized way to manage the complex state, telemetry, and feature flags of a distributed system, enabling teams to focus on building reliable and scalable applications without the burden of managing a complex infrastructure.

API -> Implementation

One of the key benefits of designing a system with a clear separation between the API and its implementation is the flexibility it offers. This approach allows teams to easily swap out and upgrade different components of the system without impacting the overall functionality of the API.

Moreover, a clear separation between the API and implementation enables the support of heterogeneous ecosystems, where different components of the system may use different technology stacks or operate in different environments. This allows teams to leverage the strengths of each technology stack and environment, without compromising the overall functionality and interoperability of the system.

In summary, a clear separation between the API and implementation provides teams with flexibility, interoperability, modularity, and maintainability, allowing them to build complex and reliable systems that can support the needs of diverse and dynamic ecosystems.

What operations are supported on Abstract data type

Let's take a closer look at the sample API and the operations it supports. One of the benefits of the proposed Context API is its ability to support a wide range of use cases, and new abstract data types (ADTs) can be added to address more advanced scenarios.

At its core, the Context API provides a simple and standardized way to manage state, telemetry, and feature flags in a distributed system.

The Context API can be extended with new ADTs to support more advanced use cases. For example, a new ADT could be added to support distributed locks, allowing different components of the system to coordinate access to a shared resource. Another ADT could be added to support distributed queues, enabling components to communicate asynchronously in a reliable and scalable manner.

A sample implementation that leverages the concepts discussed above is available at https://github.com/ashkrit/corejava/tree/master/playground/src/main/java/context. This implementation showcases how the proposed Context API can be used to manage state, telemetry, and feature flags in a distributed system.

The sample implementation uses an H2 database for the persistence of state, and this database is abstracted through the ContextProviderClient.java interface. This abstraction enables the use of any backend system to store states, providing flexibility and adaptability.

One of the key challenges of the Context API is state replication in a distributed system. To address this, various approaches can be taken, depending on the requirements of the system. One option could be to use a REST API that has access to a replicated database system, where all state management is done through the API. Alternatively, a multi-data center cache, such as Redis, could be used to store and replicate state across different regions. In some cases, a distributed and replicated file system could also be used to store and manage state.

Overall, the choice of backend system and replication strategy will depend on the specific needs of the distributed system and the desired level of fault tolerance, consistency, and scalability. The flexibility and abstraction provided by the Context API make it possible to use a variety of backend systems and replication strategies, enabling teams to choose the approach that best meets their requirements.

Overall, the sample implementation provides a practical example of how the Context API can be used to simplify the management of state, telemetry, and feature flags in a distributed system, while promoting flexibility, scalability, and reliability.