Showing posts with label data structure. Show all posts
Showing posts with label data structure. Show all posts

Monday, 14 August 2023

Multi Indexing

Imagine you are the owner of a grocery store, which stocks a wide variety of products such as bakery items, dairy products, eggs, grains, vegetables, oils, frozen foods, and more.




Now, you want to create a hybrid index that allows for fast lookups based on several attributes of the products, including:

  • Product name or ID
  • Product expiry date
  • Discount percentage
  • Fast-moving/selling items

In essence, this index will enable both simple lookups and more advanced functionalities like ranking products.

One crucial distinction to note between these types of lookups is that:

  1. A search by product name or ID will yield unique items, meaning each product is listed only once.
  2. A search by expiry date, discounted products, fast-moving items, etc., will return results ordered by their respective values, and the values are not necessarily unique.

With this hybrid index in place, you'll be able to efficiently manage your inventory and cater to customer demands more effectively. Whether it's quickly finding a specific product or identifying the most popular items, this system will streamline your grocery store operations and enhance the overall shopping experience.





Which data structures ?


Databases can effectively manage these types of access patterns through the implementation of indexes on the query columns and the strategic use of the 'ORDER BY' clause. However, it's important to note that this approach comes with its own set of trade-offs, which we'll explore in a future post.


For now, let's direct our attention towards determining the most suitable data structure for the task at hand.

When faced with the challenge of efficiently handling two distinct access patterns - key-value lookup and ordered lookup by some attribute - traditional data structures like Maps, Binary Search Trees, or Priority Queues may not individually fulfil all the requirements. As a solution, we can create a new data structure that combines the strengths of these structures into one, calling it Treep (Tree with priority).

Treep will have the following characteristics:

  1. Key-Value Lookup: Like Maps, Treep will allow fast key-value lookups, enabling quick retrieval of data using unique identifiers.

  2. Ordered Lookup by Attribute: Similar to Binary Search Trees and Priority Queues, Treep will maintain the data in a sorted order based on a selected attribute. This will enable efficient access to elements in a specific order (e.g., ascending or descending) with respect to that attribute.


The interface of the Treep data structure may look like below

public interface Treep<K, V> {
void add(V value);

Set<String> orderBy();

V get(K key);

V top(String attributeName);

V takeTop(String attributeName);

V delete(K key);

Iterator<K> keys();

int size();
}

get - This method efficiently handles the simple key-based access pattern. By providing a key as input, you can swiftly retrieve the associated value from the Treep. This operation is ideal for quick lookups of specific elements using their unique identifiers.

top/taketop - This method caters to the access pattern of retrieving elements ordered by a specific attribute.



Code snippet using Treep

Treep<String, Product> stores = new MapPriorityQueue<>(Product::name, Map.of(
"discount", Comparator.comparing(Product::discount).reversed(),
"price", Comparator.comparing(Product::price).reversed()
));


Arrays.asList(
Product.of("AXION Yellow", 2.12f, .10f),
Product.of("Meji Fresh Milk 2L", 6.9f, 0.0f),
Product.of("red Chilli 100 G", 1.14f, .05f),
Product.of("Fresh Cucumber", 1.37f, .01f),
Product.of("China Garlic", 1.93f, 0.0f),
Product.of("Red Onion", 1.19f, 0.07f),
Product.of("Fuji Apple", 3.14f, .11f),
Product.of("Banana", 3.58f, .12f)
).forEach(stores::add);

assertEquals(Product.of("Banana", 3.58f, .12f), stores.top("discount"));
assertEquals(Product.of("Meji Fresh Milk 2L", 6.9f, 0.0f), stores.top("price"));


MapPriorityQueue is using Map and Priority queue to implement multi index feature.






Full working code is available @ github. There are other ways to implement such data structure by using Map + SortedSet , Map + SkipList


Thursday, 24 December 2020

Ordered data structure using B+Tree and Skip List

This post is based on rum-conjecture paper for data systems. This paper talks about read, update & memory overhead of data structure and gives some nice examples on how balancing 2 factors leaves 3rd one in bad shape. 

 
Source : http://daslab.seas.harvard.edu/rum-conjecture/



Some of the popular read optimized ordered Key-value data structure are BST , Skip List , Prefix Tree , B+ tree etc.

To make read fast it does some trade off on space. 

In this post I will share space(i.e memory) efficient data structure that takes ideas from B+Tree and Skip List to create a new ordered data structure. 

Quick recap before we build our own data structure. 

B+ Tree

B+Tree is variant BST, every node contains n children. Tree contains 2 types of nodes root, internal & leaves node(i.e external).

Leaves node contains all the entries. Root and internal nodes will contain just keys and pointers to the leaf node. 

Every node has some capacity and when it gets full then split process is used to create another node that will have half of the values from the original node. This splitting helps in keeping allocation in control and also reduces rebalancing overhead as compared to BST.


B+Tree. Source:Wikipedia

This is a very popular data structure for databases, all relational databases and many non relational databases provide indexes that are backed up by B+Tree. 

Its popularity has only increased over time.

Now some databases are built using Log structured merge tree that solves some of the issues with B+Tree.

One of the trade off with B+Tree is that keys are stored at multiple places(leaf & internal nodes) and splits of nodes can cause cascade effects.

Every value is wrapped up in some Node container and that causes extra 32 bit overhead per value and also extra indirection to get to value.

Extra indirection does not work well with current CPU that have big cache lines. 

Skip List

Skip list takes ideas from the ordered link list and to make read and write efficient it adds upper layer or lane of sorted link list that has fewer values and acts like a fast lane to get to value.  

 

Skip List. Source:Wikipedia

This type of structure helps with concurrency because installing new value can be done with lock free operation like CAS.

One of the big trade-offs in skip lists is many keys are duplicated and level for key is decided randomly due to which every time skip levels could be very different although input is the same.

This also has other memory overhead due to extra container objects required to hold the values like B+Tree.

Let's look at new data structures that can take some ideas from these 2 data structures and reduce memory overhead and still maintain performance. 

B+Skip List

I don't have any good name for this DS, so let's call it B+SkipList.

I will take few ideas 

- Ordered memory block

B+Tree allocate nodes that can hold n values in an array. These values are stored in order by using simple array manipulation. This gives the nice property that related values are close together and have good chances to be in the same cache line. 

- Split node when full

B+Tree split full node into 2 to balance level tree. 

- Skip Lanes

Although SkipList is just an ordered link list but skip lane idea makes linked list like array, lower level nodes can be accessed quickly by skip lanes.


Let's create something !

Ordered memory block is straightforward to implement. It will look something like below

Ordered Array



Code snippet for ordered collection. I will take an example of int value for simplicity but the same thing applies for any custom object also.  

public OrderedIntCollection(int maxElement) {
this.maxElement = maxElement;
this.members = new int[1 + maxElement];
this.members[0] = Integer.MAX_VALUE;
this.length = 0;
}

public void insert(int value) {
this.min = Math.min(min, value);
this.max = Math.max(max, value);

int offSet = offSet(value, members, length);

if (members[offSet] == value) {
return; //This will skip copy overhead
}

shiftFrom(offSet);
members[offSet] = value;
length++;
}


Our first building block is ready, let's move to split logic now.

- Capacity check. 

This is required to decide if split is required or not.

- Split logic.

This will do an actual split and return a new node.

- Additional metadata

Now nodes can be split so we need to track some meta data like min, max value.

Split process might look like below



Split is one way to handle when a node is full and it has some advantage but other options are also available like extending current nodes and it could be a useful strategy in some cases to reduce memory fragmentation. We will talk about this later.


Code snippet for split. It is simple array manipulation and makes sure that each node is independent and value can only contain 1 node. 


public boolean hasCapacity(int count) {
return length + count < members.length;
}
public OrderedIntCollection split() {
int half = members.length / 2;

//Split and allocate new
int[] copy = new int[1 + maxElement];
int newLength = this.length - half;
System.arraycopy(members, half, copy, 0, newLength);
copy[newLength] = Integer.MAX_VALUE;
OrderedIntCollection collection = new OrderedIntCollection(maxElement, copy, newLength);

//Update current
this.members[half] = Integer.MAX_VALUE;
this.length = half;
this.max = members[half - 1];
Arrays.fill(members, half + 1, members.length, 0);

return collection;

} 


Lets move to the final piece that is skip lanes.

With the split process now we can have multiple nodes and we have to keep track of all the nodes and maintain order of nodes.

Few things are required for lane data structure 

- Dynamic list that allows you to add elements at any position.

- Ordering info for comparing nodes to identify which node comes first.

- Some way of finding which node will contain value.

- Some efficient algorithm to search values in all the available nodes.


Let's enhance OrderedIntCollection to add ordering info

public int max() {
return max;
}

public int min() {
return min;
}

@Override
public int compareTo(Integer value) {
if (value < min) return 1;
else if (value > max) return -1;
return 0;
}

Min/Max and compareTo function will help in having all the ordering info.

compareTo function will be later used to find which node can contain value.

With help of compareTo now we can store all the nodes in some array based data structure.

At high level it might look at something like below

B+ Skip List




This layout has some nice property like

- Keys are not duplicated.

This is a big gain and helps in reducing memory requirements. Only additional memory allocation is for lists that maintain reference to nodes. 

- Child node are just at 2nd level

Child nodes can be always reached in just 2 hops, as compared to n ( tree depth) in b+tree or SkipList. 

- Fanout of node is good.

It gets fanout the same as B+Tree without overhead of additional internal nodes and it plays an important role in read/writes.

- Data locality in single node.

Some of the discussion after the RUM paper was published was to add Cache also as dimension to look at data structure, so it becomes CRUM.

Packing everything together helps with the "C" part as it provides better cache locality and this is one of the big gains because it can take some benefits of the cache line of the processors.

Since nodes reference is stored in List based structure and are in order, so binary search can be used to identify nodes that are target for read or write.

Read/Write can be done in Log N time. Once a node is identified then it is sequential scan to locate value and that can add some cost but it is possible to use binary search within the node also to reduce overhead.

Trade off

Some of the trade off

Binary search can't be used 100% for writing.

Read request can be always served by binary search but for writing we need a hybrid approach, for eg in above sample if value is inserted that does not fall in any range then sequential search is required to find right node. Some of the sample value that can trigger sequential search are 1 (i.e min value), 300(i.e max value),105 (i.e comes in between node 2 and 3) 

Fragmentation

Due to splitting rules we might see many nodes that are half filled but that issue can be solved by relaxing the splitting rule like allow to choose between split or extend based on data distribution. 

Compact nodes by merging content. 

Code snippet of Skip Lane.
public class OrderedInts {
private final List<OrderedIntCollection> values = new ArrayList<>();
private final int blockSize;
private final AtomicLong bsHit = new AtomicLong();
private final AtomicLong seqHit = new AtomicLong();

public OrderedInts(int blockSize) {
this.blockSize = blockSize;
}

public void insert(int value) {
OrderedIntCollection e = search(value);
e.insert(value);
}

private OrderedIntCollection search(int value) {
if (values.isEmpty()) {
OrderedIntCollection last = new OrderedIntCollection(blockSize);
values.add(last);
return last;
} else {
int position = binarySearch(values, value);
if (position >= 0) {
bsHit.incrementAndGet();
return splitIfRequired(value, position, values.get(position));
} else {
seqHit.incrementAndGet();
return sequentialSearch(value);
}
}
}
}

Conclusion

In computer science we only have trade off and nothing comes for free.
RUM way of looking at data structure provides key insights on trade-off and also provides some ideas around new data structure possibility.
 

Friday, 11 December 2015

Mr Cool - Bloom Filter


HashCode based data structure are basis of many algorithms, it is used for fast look up , membership queries, group by etc.

HashCode is also used to build some interesting class of data structure. In this blog i will share one of such data structure that has many useful application in real world.

Take a example that you have million or billions of item and you want to keep track of distinct items.
This is very simple problem and can be solved by using HashMap but that requires huge memory depending on number of distinct items because it has to stores actual element also.

To make this problem little interesting we can put memory constraints on solution, so effectively we need space efficient solution to test membership of item.

Nothing comes for free in this world , so with less memory we have to trade off accuracy and in some case it is fine for e.g Unique Visitor on website, Distinct Page visited etc

First intuition to solve this problem by allocating N buckets and mark the bucket based on hashcode, no need to store actual value just mark slot index.







Quick question that comes to mind is what will happen in case of collision and in case of that our result will be wrong but it will be wrong by some percentage only and there are couple of ways improve result.

 - Allocate more buckets but use single hash function.
 - Use multiple hash functions and each of the hash function has its own dedicated bucket.

Multiple hash function options works very well , using that result can be around 96% correct !

So if we use multiple hash function then data structure will look something like below.















This data structure is called Bloom Filter and it maintains X bit vector and apply X hash function on input value to mark bits.

for checking if value already exists just check if bits are set to true or not by using multiple hash function.

In terms of memory requirement bit vector of X length is required and get more accurate result multiple hash based bit vector can be used. Quick math will give fair idea about memory requirement


Bloom filter is very compact in terms of memory and size can be controlled depending on requirement.
Memory can be further reduced by using compressed bit vector.

Trade Off
It is important to know trade off done to achieve the efficiency .

 - It is probabilistic data structure means result are not 100% correct but interesting thing about this data structure is that it can give FALSE POSITIVE but never FALSE NEGATIVE. Which makes it good fit for many practical application

 - Multiple hash function are used , so read/write performance will be based on hash function.

Application
- DB Query filter
One of the most common use case. Useful in reducing false query to DB

- Joins
Yes bloom filter provide alternate way to do joins especially in distributed system. So on one of the dataset(i.e smaller) build bloom filter and send it to other nodes for joining, since it is very compact in memory so transport to remote system does not adds big overhead.
One of the disruptive Big data processing system Spark is using it for joins

- Alternate to B-Tree
B-Tree can be also used for membership queries but if size of B-Tree becomes large enough that it can't fit in the memory then all the benefit is lost.
Another big data system Cassandra is using it for read request by having bloom filter type data-structure on top of data block to avoid IO operation

Another Open source in big data space Parquet is using it for fast filters.

- Partition Search
This is just another variation of above use case, Apache druid.io which is again into big data space for fast analytic is using it by partitioning data on ingestion by time and in each partition column has bloom filter type of index for fast filters.

- Safe Website filter
This is an interesting one, google chrome uses bloom filter to find if site is safe or not.


Simple implementation of Bloom Filter is available @ github
In this implementation each hash function has independent bit vector , but single common vector can be used for all the hash function.