Are you ready: SBE

Showing posts with label SBE. Show all posts

Friday, 10 July 2020

Data encoding and storage

Data encoding and storage format is evolving field, it has seen so many changes starting from naive text based encoding to advance compact nested binary format.

Encoding/Decoding

Selecting correct encoding/storage format has big impact on application performance and how easily it can evolve. Data encoding has big impact on whether application is backward/forward compatible.

Selecting right encoding format can be one of the important factor for data driven application agility.

Application developer tends to makes default choice of text(xml, csv or json) based encoding because it is human readable and language agonist.

Text format are not very efficient, they are take time time/space and also struggle to evolve. If someone care about efficiency then binary format is the way to go.

In this post i will compare text vs binary encoding and build simple persistent storage that supports flexible encoding.

We will compare popular text/binary encoding like csv , json , avro , chronicle and sbe

I will use above Trade object as example for this comparison.

CSV

It is one of the most popular textual format, it has no support for types and makes no distinction between different type of numbers. One of the major restriction is that it only supports scalar types, if we have to store nested or complex object then custom encoding is required. Column and rows values are separated by deliminator and special handling is required when deliminator is part of column value.

Reader application has to parse text and convert into proper type at read time, it produces garbage and is also CPU intensive.

Best thing is that it can be edit in any text editor. All programming language can read and write CSV.

JSON

This is what drives Web today. Majorities of micro services that are user facing are using JSON for REST APIs.

This address some of the issues with CSV by making distinction between string and number, also support nested types like Map,Array, Lists etc. It is possible to have schema for JSON message but it is not in practice because it takes ways flexible schema. This is new XML these days.

One of major drawback is size, size of JSON message is more as it has to keep key/attribute name as part of message. I have heard in some document based database attribute names takes up more than 50% of the space, so be careful when you select attribute name in json document.

Both of these text format are very popular inspite of all the inefficiency. Across team if you need any friction less data format interface then go for text based one.

Chronicle/Avro/SBE

These are very popular binary format and very efficient for distributed or trading systems.

SBE is very popular in financial domain and used as replacement of FIX protocol. I shared about it in post inside-simple-binary-encoding-sbe.

Avro is also very popular and it is built by taking lots of learning from protobuffer and thrift. For row based and nested storage this is very good choice. It supports multiple languages. Avro applies some cool encoding tricks to reduce size of message, you can read about it in post integer-encoding-magic

Chronicle-Wire is picking up and i came across this very recently. It has nice abstraction over text and binary message with single unified interface. This library allows to choose different encoding based on usecase.

Lets look at some number now. This is very basic comparison just on size aspect of message. Run your benchmark before making any selection.

We will try to save above 2 records in different format and compare size.

Chronicle is most efficient in this example and i have used RawWire format for this example and it is the most compact option available in library because it only stores data, no schema metadata is stored.

Next one is Avro and SBE, very close in terms of size but sbe is more efficient in terms of encoding/decoding operation.

CSV is not that bad, it took 57 bytes for single row but don't select CSV based on size. As expected JSON takes up more bytes to represent same message. It is taking around 2X more than Chronicle.

Lets look at some real application of these encoding. These encoding can be used for building logs , queues , block storage, RPC message etc.

To explore more i created simple storage library that is backed by file and allows to specific different encoding format.

public interface RecordContainer<T> extends Closeable {
    boolean append(T message);

    void read(RecordConsumer<T> reader);

    void read(long offSet, RecordConsumer<T> reader);

    default void close() {
    }

    int size();

    String formatName();

}

This implementation allow to append records at the end of buffer and access the buffer from starting or randomly from given message offset. This can seen as append only unbounded message queue, it has some similarity with kafka topic storage.

RandomAccessFile form java allow to map file content as array buffer and after that file content can be managed like any array.

All the code used in this post is available @ encoding github

Monday, 25 June 2018

Inside Simple Binary Encoding (SBE)

SBE is very fast serialization library which is used in financial industry, in this blog i will go though some of the design choices that are made to make it blazing fast.

Whole purpose of serialization is to encode & decode message and there many options available starting from XML,JSON,Protobufer , Thrift,Avro etc.

XML/JSON are text based encoding/decoding, it is good in most of the case but when latency is important then these text based encoding/decoding become bottleneck.

Protobuffer/Thrift/Avro are binary options and used very widely.

SBE is also binary and was build based on Mechanical sympathy to take advantage of underlying hardware(cpu cache, pre fetcher, access pattern, pipeline instruction etc).

Small history of the CPU & Memory revolution.
Our industry has seen powerful processor from 8 bit, 16 , 32, 64 bit and now normal desktop CPU can execute close to billions of instruction provided programmer is capable to write program to generate that type of load. Memory has also become cheap and it is very easy to get 512 GB server.
Way we program has to change to take advantage all these stuff, data structure & algorithm has to change.

Lets dive inside sbe.

Full stack approach

Most of the system rely on run-time optimization but SBE has taken full stack approach and first level of optimization is done by compiler.

Schema - XML file to define layout & data type of message.
Compiler - Which takes schema as input and generate IR. Lot of magic happen in this layer like using final/constants, optimized code.
Message - Actual message is wrapper over buffer.

Full stack approach allows to do optimization at various level.

No garbage or less garbage
This is very important for low latency system and if it is not taken care then application can't use CPU caches properly and can get into GC pause.

SBE is build around flyweight pattern, it is all about reuse object to reduce memory pressure on JVM.

It has notion of buffer and that can be reused, encoder/decoder can take buffer as input and work on it. Encoder/Decoder does no allocation or very less(i.e in case of String).

SBE recommends to use direct/offheap buffer to take GC completely out of picture, these buffer can be allocated at thread level and can be used for decoding and encoding of message.

Code snippet for buffer usage.

final ByteBuffer byteBuffer = ByteBuffer.allocateDirect(4096);
final UnsafeBuffer directBuffer = new UnsafeBuffer(byteBuffer);

tradeEncoder        .tradeId(1)
        .customerId(999)
        .qty(100)
        .symbol("GOOG")
        .tradeType(TradeType.Buy);

Cache prefetching

CPU has built in hardware based prefetcher. Cache prefetching is a technique used by computer processors to boost execution performance by fetching instructions or data from their original storage in slower memory to a faster local memory before it is actually needed.
Accessing data from fast CPU cache is many orders of magnitude faster than accessing from main memory.

latency-number-that-you-should-know blog post has details on how fast CPU cache can be.

Prefetching works very well if algorithm is streaming and underlying data used is continuous like array. Array access is very fast because it sequential and predictable

SBE is using array as underlying storage and fields are packed in it.

Data moves in small batches of cache line which is usually 8 bytes, so if application asks for 1 byte it will get 8 byte of data. Since data is packed in array so accessing single byte prefetch array content in advance and it will speed up processing.

Think of prefetcher as index in database table. Application will get benefit if reads are based on those indexes.

Streaming access

SBE supports all the primitive types and also allows to define custom types with variable size, this allows to have encoder and decoder to be streaming and sequential. This has nice benefit of reading data from cache line and decoder has to know very little metadata about message(i.e offset and size).

This comes with trade off read order must be based on layout order especially if variable types of data is encoded.

For eg Write is doing using below order

tradeEncoder        .tradeId(1)
        .customerId(999)
        .tradeType(TradeType.Buy)
        .qty(100)
        .symbol("GOOG")
        .exchange("NYSE");

For String attributes(symbol & exchange) read order must be first symbol and then exchange, if application swaps order then it will be reading wrong field, another thing read should be only once for variable length attribute because it is streaming access pattern.

Good things comes at cost!

Unsafe API

Array bound check can add overhead but SBE is using unsafe API and that does not have extra bound check overhead.

Use constants on generated code

When compiler generate code, it pre compute stuff and use constants. One example is field off set is in the generated code, it is not computed.

Code snippet

public static int qtyId()

{

return 2;

}

public static int qtySinceVersion()

{

return 0;

}

public static int qtyEncodingOffset()

{

return 16;

}

public static int qtyEncodingLength()

{

return 8;

}

This has trade-off, it is good for performance but not good for flexibility. You can't change order of field and new fields must be added at end.
Another good thing about constants is they are only in generated code they are not in the message to it is very efficient.

Branch free code
Each core has multiple ports to do things parallel and there are few instruction that choke like branches, mod, divide. SBE compiler generates code that is free from these expensive instruction and it has basic pointer bumping math.
Code that is free from expensive instruction is very fast and will take advantage of all the ports of core.
Sample code for java serialization

public void writeFloat(float v) throws IOException {
    if (pos + 4 <= MAX_BLOCK_SIZE) {
        Bits.putFloat(buf, pos, v);        pos += 4;    } else {
        dout.writeFloat(v);    }
}

public void writeLong(long v) throws IOException {
    if (pos + 8 <= MAX_BLOCK_SIZE) {
        Bits.putLong(buf, pos, v);        pos += 8;    } else {
        dout.writeLong(v);    }
}

public void writeDouble(double v) throws IOException {
    if (pos + 8 <= MAX_BLOCK_SIZE) {
        Bits.putDouble(buf, pos, v);        pos += 8;    } else {
        dout.writeDouble(v);    }

}

Sample code for SBE

public TradeEncoder customerId(final long value)
{
    buffer.putLong(offset + 8, value, java.nio.ByteOrder.LITTLE_ENDIAN);    return this;}

public TradeEncoder tradeId(final long value)
{
    buffer.putLong(offset + 0, value, java.nio.ByteOrder.LITTLE_ENDIAN);    return this;}

Some numbers on message size.

Type class marshal.SerializableMarshal -> size 267
Type class marshal.ExternalizableMarshal -> size 75
Type class marshal.SBEMarshall -> size 49

SBE is most compact and very fast, authors of SBE claims it is around 20 to 50X times faster than google proto buffer.

SBE code is available @ simple-binary-encoding
Sample code used in blog is available @ sbeplayground