Monday 14 August 2023

Multi Indexing

Imagine you are the owner of a grocery store, which stocks a wide variety of products such as bakery items, dairy products, eggs, grains, vegetables, oils, frozen foods, and more.

Now, you want to create a hybrid index that allows for fast lookups based on several attributes of the products, including:

  • Product name or ID
  • Product expiry date
  • Discount percentage
  • Fast-moving/selling items

In essence, this index will enable both simple lookups and more advanced functionalities like ranking products.

One crucial distinction to note between these types of lookups is that:

  1. A search by product name or ID will yield unique items, meaning each product is listed only once.
  2. A search by expiry date, discounted products, fast-moving items, etc., will return results ordered by their respective values, and the values are not necessarily unique.

With this hybrid index in place, you'll be able to efficiently manage your inventory and cater to customer demands more effectively. Whether it's quickly finding a specific product or identifying the most popular items, this system will streamline your grocery store operations and enhance the overall shopping experience.

Which data structures ?

Databases can effectively manage these types of access patterns through the implementation of indexes on the query columns and the strategic use of the 'ORDER BY' clause. However, it's important to note that this approach comes with its own set of trade-offs, which we'll explore in a future post.

For now, let's direct our attention towards determining the most suitable data structure for the task at hand.

When faced with the challenge of efficiently handling two distinct access patterns - key-value lookup and ordered lookup by some attribute - traditional data structures like Maps, Binary Search Trees, or Priority Queues may not individually fulfil all the requirements. As a solution, we can create a new data structure that combines the strengths of these structures into one, calling it Treep (Tree with priority).

Treep will have the following characteristics:

  1. Key-Value Lookup: Like Maps, Treep will allow fast key-value lookups, enabling quick retrieval of data using unique identifiers.

  2. Ordered Lookup by Attribute: Similar to Binary Search Trees and Priority Queues, Treep will maintain the data in a sorted order based on a selected attribute. This will enable efficient access to elements in a specific order (e.g., ascending or descending) with respect to that attribute.

The interface of the Treep data structure may look like below

public interface Treep<K, V> {
void add(V value);

Set<String> orderBy();

V get(K key);

V top(String attributeName);

V takeTop(String attributeName);

V delete(K key);

Iterator<K> keys();

int size();

get - This method efficiently handles the simple key-based access pattern. By providing a key as input, you can swiftly retrieve the associated value from the Treep. This operation is ideal for quick lookups of specific elements using their unique identifiers.

top/taketop - This method caters to the access pattern of retrieving elements ordered by a specific attribute.

Code snippet using Treep

Treep<String, Product> stores = new MapPriorityQueue<>(Product::name, Map.of(
"discount", Comparator.comparing(Product::discount).reversed(),
"price", Comparator.comparing(Product::price).reversed()

Product.of("AXION Yellow", 2.12f, .10f),
Product.of("Meji Fresh Milk 2L", 6.9f, 0.0f),
Product.of("red Chilli 100 G", 1.14f, .05f),
Product.of("Fresh Cucumber", 1.37f, .01f),
Product.of("China Garlic", 1.93f, 0.0f),
Product.of("Red Onion", 1.19f, 0.07f),
Product.of("Fuji Apple", 3.14f, .11f),
Product.of("Banana", 3.58f, .12f)

assertEquals(Product.of("Banana", 3.58f, .12f),"discount"));
assertEquals(Product.of("Meji Fresh Milk 2L", 6.9f, 0.0f),"price"));

MapPriorityQueue is using Map and Priority queue to implement multi index feature.

Full working code is available @ github. There are other ways to implement such data structure by using Map + SortedSet , Map + SkipList