Monday 16 November 2020

Rethinking key value store

In this post i will share how key-value stores can be used for building different types of application, but before we start deep dive lets do quick recap of database systems.

Database technology is made of 2 important components ( Storage + Query). Such type of architecture allow to replace part of system or only build storage or query part of it and reuse other from open source.

Many modern database are taking advantage of plugin architecture and focusing on only one part of system, for example storage could be again broken down in various format based on access pattern for reads/write


Each storage format ( hash,b-tree,lsm or log) is optimized for specific read/write access pattern. Many database uses multiples format also.

Once we come to storage format then Row vs Column layout also comes in play, so in nutshell it is hard to build storage system, so it makes sense to use off the self solution rather than building something from ground up.

Query area is also huge as this include different types of API, Query engine, Optimizer , transactions etc


With some high level understanding of databases now we can now focus on specific type of storage system and try to explore what types of application can be built on it.

Key-Value store is very popular storage system options. 

Key value store are like hash table that is designed to store/retrieve key & value tuple. Key-Value store can be memory or disk based. Some key value stores variations like B-Tree or LSM tree are ordered key value store.
Ordered KV stores provide efficient range scan and is very important if client interface is SQL like.

I will used ordered KV store as example to share type of application that can be build. I will use ecommerce Order data model as example for all the usecase

- Row Store
  This is the most straight forward use-case for any KV stores and also very good fit for building unique index for primary key. For order table KV store will contain below entry

Order/OrderId/1 -> {......}
Order/OrderId/2 -> {......}
Order/OrderId/3 -> {......}

Important thing to note in this format is that it contains fully qualified id of record that us made up of {table name}/{PK Column}/{PK Value}.

This type of key provides flexibility to use single KV store for multiple tables, another advantage that we get it since all the values are ordered by Key so query has to load less numbers of pages to get all the data for this table.

Ordered KV store can be extended to build non unique indexes also so that it can be used as efficient lookup table for underlying data.

Lets try to build couple of non unique index on order table.

Index on customer id 

Order/CustomerId/100/1 -> {......}
Order/CustomerId/101/1 -> {......}
Order/CustomerId/102/2 -> {......}

Index on Order Date

Order/OrderDate/20201001/1 -> {......}
Order/OrderDate/20201001/2 -> {......}
Order/OrderDate/20201010/3 -> {......}

Before i show other complex example lets try to understand format of key, it is contains "{table name}/{Index Col}/{Index Value}/{PK}" pattern, this pattern allow to handle duplicate values in index because primary key is suffix of every entry.

Lets keep going on this with little more complex example of index by Status & Order date

Order/StatusByOrderDate/CANCELLED_20201010/3 -> {......}
Order/StatusByOrderDate/CANCELLED_20201010/6 -> {......}
Order/StatusByOrderDate/PENDING_20201001/2 -> {......}
Order/StatusByOrderDate/PENDING_20201001/7 -> {......}
Order/StatusByOrderDate/PENDING_20201003/8 -> {......}
Order/StatusByOrderDate/SHIPPED_20201001/1 -> {......}

In above example Index key is made of 2 column.

- Range Scans

By this point it will be becoming clear how efficient range scan can be done with above index data.

Lets use some example query to verify this .

 - find all the orders that are cancelled between 20201001 to 20201015.
Above query can be resolved by looking for all the records between Order/StatusByOrderDate/CANCELLED_20201001 to Order/StatusByOrderDate/CANCELLED_20201031

find all the orders that are cancelled in Oct 2020.
All records matching Order/StatusByOrderDate/CANCELLED_202010 pattern

-find all the pending orders
All records matching Order/StatusByOrderDate/PENDING pattern

One thing to note that all the related keys are together and can be retrieved using less number of IO operations. 

- Partition Index
Distributed databases creates partitions to split big data into chunk of manageable data so that it can be distributed and replicated.
Partition can be done using hash key or by key range. Key range based partition are must of interface is SQL or allows range scanning. 

Lets take above data for key column as example set to distributed. 

One way to create partition is to create partition of size 5 and store start & end key of that partition with data node reference, so that query looking for data in that range can be quickly resolved.

Partition index might look something like below.

- Column Store
One more place where ordered KV stores shines is creating column store.

We need slight change in format to achieve column store, it will use {table name}/{column}/{pk} -> {column value} format

This type of key format will put all the column for given table together.
Any query that needs selective column can take advantage of this layout as all the column are next to each other.
Last one is building SQL interface on ordered map! SQL query need 2 basic operation to answer all the request
 - Point lookup
 - Range scan

With above examples of KV stores it is easy to build simple SQL using above mention access pattern.

Sample Application
As part of the research for this post, i build implementation that is based on all the ideas shared above. High level architecture of simple implementation looks something like below

In this application i have used 3 implementation of Ordered map to show underlying storage can be changed without given up on any functionality. 

- SkipList 
This is java "ConcurrentSkipListMap", which is in memory ordered map and is used in many real open source database to build memory store on top of LSM tree. Cassandra uses SkipList data structure for Memory table.

- B Tree
H2 database is based on B-Tree storage called "MVStore", it is possible to use MvStore as library. I used MVStore as B-Tree storage

- LSM Tree
Many implementation are available and popular ones are LevelDB, RocksDB etc. I used RocksDB as it is already used in many databases as storage engine. MyRocks, CassandraRocks, cockroachdb  etc

Implementation is based on 2 interface ( KeyValueStore & SSTable) to expose both key value and SQL interface on different storage backend.

public interface KeyValueStore {

<Row_Type> SSTable<Row_Type> createTable(String tableName, Class<Row_Type> type,
Map<String, Function<Row_Type, Object>> schema,
Map<String, Function<Row_Type, String>> indexes);

<Row_Type> SSTable<Row_Type> createTable(String tableName, Class<Row_Type> type,
Map<String, Function<Row_Type, Object>> schema);

<Row_Type> SSTable<Row_Type> createTable(TableInfo<Row_Type> tableInfo);

List<String> desc(String table);

void close();

<Row_Type> SSTable<Row_Type> table(String tableName);

default void execute(String sql, Consumer<RowValue> consumer) {
new SqlAPI(this).execute(sql, consumer);
public interface SSTable<T_TYPE> {
List<String> cols();

//Search functions
void scan(Consumer<T_TYPE> consumer, int limit);

void search(String indexName, String searchValue, Consumer<T_TYPE> consumer, int limit);

void search(String indexName, String searchValue, Collection<T_TYPE> container, int limit);

void rangeSearch(String index, String startKey, String endKey, Collection<T_TYPE> container, int limit);

T_TYPE get(String pk);

default Map<String, Function<T_TYPE, Object>> schema() {
return null;

default Map<String, Function<T_TYPE, String>> indexes() {
return null;

default Object columnValue(String col, Object row) {
return null;

//Mutation functions
void insert(T_TYPE row);

void update(T_TYPE record); // Secondary index needs rebuilding

All the code is available on github @ query repo.

Each implementation is unit tested using contract based test, so feel free to experiment with it.

While implementation first i build KV related functionality and at the end added simple SQL interface. Adding simple SQL was done with help of Calcite SQL parser. Once basic query primitive was available ( point and range scan) then simple SQL interface was easy to add and that is the reason why i have "sql" as default function on interface. 

Ordered KV store are powerful data structure and can be used for solving many interesting problems. Many commercial and open source database are using some of techniques mention in post. 

In this post i have intentionally not covered about how to handle different data types and how to optimized Index key. These are things that can be solved by better encoding that still maintains sort property of key.

I plan to cover about encoding rules in future post.

Saturday 7 November 2020

Private method or state testing in JVM

Unit testing private method is not recommended but some time we are in situation when unit test requires to inspect private state of object. As guideline we must avoid such type of design but some time especially when using some framework or library we are left with no options.

One of the such thing i found recently while writing some unit test around spark data frame. As part of one of the feature dataframe/dataset caching was required and no easy way to verify whether caching was done or not. I didn't want to add other layer of abstraction on Spark API to do this.

Code snippet that needs to be tested.
.cache // Cache this DF/DS

Java has excellent support for Metaprogramming from day 1 and reflection plays big role it that. Reflection can be used in such scenario.

sparkSession.sharedState.cacheManager maintain cached tables details. 

CacheManager is internal spark class and may change with new spark version , so test based on internal details has risk of breaking but it also gives good idea about internals of framework.

Lets try to access private state of cachemanager via java reflection.

Below code snippet will search the fields with specific pattern and make it accessible. 

def fieldValue[T](fieldName: String, obj: Any, cls: Class[T]): T = {
val field = {
val matchedField = classOf[CacheManager].getDeclaredFields()
.map(f => {


setAccessible(true) is the key thing, it is required for private or protected members.
One more to highlight that "endsWith" is used for matching rather than exact match because class under test is written in Scala and for Scala part of the field names are generated by compiler, so not possible to find exact match. Java reflection has getDeclaredField function that accept field name also. 

Once member is marked as accessible then read/write is possible for that field.

Sample call to this API will look something like below

fieldValue("cachedData", sqlSession.sharedState().cacheManager(), LinkedList.class)
Unit test code checking cache can be something like below 

LinkedList cachedData = fieldValue("cachedData", sparkSession().sharedState().cacheManager(), LinkedList.class);
int before = cachedData.size();
int after = cachedData.size();
Assert.isTrue(after == before + 1);

Reflection is very powerful and comes handy in such scenario but comes with tradeoff of relaxed type safety guarantee by treating method or variable as String but it is still better than not testing at all. 
One more thing that i discovered during this exercise is that JVM spends some time in checking field access and it causes some overhead when reflection is used.
Turning off access check will make code fast also.

With this trick now you can start testing some of the private function that you wished were public or left public only for testing.