What is an efficient data structure to model order book?
The specifics depend on if you're implementing for equities (order-based) or futures (level-based). I recommend https://web.archive.org/web/20110219163448/http://howtohft.wordpress.com/2011/02/15/how-to-build-a-fast-limit-order-book/ for a general overview of a good architecture for the former.
Building off of that, though, I have found that using array-based over pointer-based data structures provides faster performance. This is because for modern pipelining, caching CPUs, branching and dereferences (especially from uncached memory) are the biggest performance killers because they introduce data dependencies resulting in pipeline / memory stalls.
If you can, you should also optimize for the particular exchange. For instance, it turns out that, last I checked, Nasdaq generates order IDs incrementally starting from a small number, so you can store all the orders in a giant array instead of a hashtable. This is really cache- and TLB-friendly compared to a hashtable because most updates tend to happen to recently-dereferenced orders.
Speaking from experience with equities (order-based), my preferred architecture is:
- A large associative array from order ids to order metadata (
std::vectorif you can swing it such as in the case of Nasdaq).
- The order metadata includes pointers to the order book (essentially consisting of the price-levels on both sides) and price-level it belongs to, so after looking up the order, the order book and price level data structures are a single dereference away. Having a pointer to the price allows for a
O(1)decrement for an
Order Reduceoperation. If you want to keep track of time-priority as well, you can keep pointers to the
previousorders in the queue.
- Since most updates happen near the inside of the book, using a
vectorfor the price levels for each book will result in the fastest average price lookup. Searching linearly from the end of the
vectoris on average faster than a binary search, too, because most of the time the desired price is only a few levels at most from the inside, and a linear search is easier on the branch predictor, optimizer and cache. Of course, there can be pathological orders far away from the inside of the book, and an attacker could conceivably send a lot of updates at the end of the book in order to slow your implementation down. In practice, though, most of the time this results in a cache-friendly, nearly
delete(with a worst-case
O(N)memcpy). Specifically for a Best Bid / Offer (BBO) update, you can get an implementation to calculate the update in just a few instructions (essentially appending or deleting an element from the end of the
vector), or about three dereferences.
- The behavior of this is
O(1)best-case behavior for insertion, lookup, delete and update with very low constants (I've clocked it to be on average the cost of a single fetch from main memory - roughly 60ns). Unfortunately you can get
O(N)worst case behavior, but with low probability and still with very good constants due to the cache-, TLB- and compiler-friendliness. It is also very fast, nearly optimally so, in the case of BBO updates which is what you are usually interested in anyways.
I have a reference implementation here for Nasdaq's ITCH protocol which illustrates these techniques and more. It clocks in at a mean of just over 61ns / tick (about 16 million ticks / second) by only keeping track of the quantity at each price-level and not the order queue, and uses almost no allocation besides
thank you. could you explain a bit more about the difference between level-based and order-based order book?
- A large associative array from order ids to order metadata (
Else the setup of LOBSTER is supposed to be quick.
Nice. I was thinking of a simple array based implementation. The best implementation in Quant Cup does something similar.
However, note that it will be less optimal when price levels are sparse and we care about every price level.
The first link the the answer is dead. This is why we discourage link-only answers.
Here is a gist mirror of the "winning implementation": https://gist.github.com/druska/d6ce3f2bac74db08ee9007cdf98106ef
The simple array implementation does not take into account how far away the prices are from the best bid/ask. This might not be optimal.
I recently stumbled upon this question after needing to do this to do some real-time microstructure analysis and have taken a look at the various possible implementations. Here are some of the pros/cons of each implementation (I'll use C/C++ terminology):
Array: You have an array of structs ordered from top of book (index 0) to worst (index N). You have to pre-allocate the array to be big enough.
Insertion is O(LOG N): Perform binary search to find the insertion point, then performing a memcpy to shift the tail of the array. Finally, you overwrite the item at the insertion point.
Lookup is O(1): Your indices directly correspond to levels of market depth. Just hop to where you want.
Deletions are O(LOG N): You seek to the order you want to delete, then perform a memcpy shifting everything back by 1 index.
Updates are worst case a deletion + insertion. Best case, you check if the updated order changes ordering at all.
- Extremely cache-friendly.
- Easy to optimize for the compiler
- Not a lot of re-ordering
- Fastest lookups
- Copy operations are usually vectorized (typically if you use std::copy instead of memcpy)
- Limited vectorization (SSE/AVX2) opportunities
- lots of copy operations
Singly-linked List: Same concept as an array (head in linked list is top of book, tail is bottom of book).
Insertion is O(N): You have to seek through the list, update your new order's "next pointer" to the previous order's "next pointer", update the previous order's "next pointer" to point to your new quote, and update the head/tail/length of the list if applicable.
Lookup is O(N): Top of book is trivial O(1) since it's the head of the list. Otherwise you'll need to seek through the list.
Deletion is O(N): You have to seek through the list to find the target spot. Then swap two pointers.
Updates are a deletion + insertion in the worst case. Best case is O(1) assuming you know the right index (I'll get into this deeper in this response)
A few optimizations give it acceptable performance:
- Pre-allocate an array of linked list node structs with a quote struct as a member that correspond to the max depth of your book. Use these pointers in your linked list. Since the memory will be contiguous the indirection cost of using pointers will be lessened since you won't get that many cache misses
- Maintain 25%, 50%, and 75% market depth pointers in addition to head and tail. This significantly reduces the "N" in O(N) for insertion, deletion, and update operations.
Pros: - No repeated copying once you seek to the right place - Updates/inserts/lookups very fast if most operations are near head of list
- Lookup of deeper levels incurs high cost O(N)
- Bad cache performance if you're not careful
- Dynamic memory allocations if you don't pre-allocate an arena
Ultimately, I found that the array version has the best average performance if you're ok with just price/qty priority. If you need the precision of price/qty/time priority the arrays just get far too large.
If you're OK with price/qty priority: We can use the property that quantities tend to be very similar for securities to limit the number of copy operations:
Create an array 4x the size of the expected possible price range for the day (you'll obviously need to re-allocate your array if something wild happens). Place the price level corresponding to the expected open in the middle of the array. Whenever you get a new order, the desired index is close to the number of cents difference from this price and the order's price. You can then look forward/backwards to see if the price/qty level exists. If it doesn't, then you'll need to shift the array via a copy. But if it does, you're all set. This "centered array" approach means that after a few minutes of quotes, you'll have the vast majority of your price/qty levels defined. This method only works if you're ok with just price/qty priority. If you require price/qty/time priority then you should probably go with the linked list.
You insertions and deletions for the array implementations are O(N), not O(logN) because of the copy operations to do the memcpy shifting. Furthermore since operations more often happen at the top of the book, you'll very often be shifting (i.e loading into the cache) the entire array.
I do not know in which context you want to implement a matching engine (ME). But according to me the two nice challenges in this context are:
- implement one in an FPGA
- for simulation / fast replay purpose, design the most efficient in/out bus to put synthetic traders in front of the ME.
The last point is about simulating one day in few seconds.
To answer more directly your question, the reference is Josh Levine's code of Island in foxpro. Levin's has been one of the pioneer in the raise of electronic trading. It is not only an executable spec, but a piece of history.
If you assume that most updates will happen near the top of the book (a fair assumption) a linked-list ordered by price levels will be very efficient.
You can check CoralMD for an example of a very fast, garbage-free market data book implementation that provides not just the global view of the market (all exchanges) but also a per-exchange book view. It also provides the infra-structure to write exchange feeds and to distribute market data internally.
Disclaimer: I am one of the developers of CoralMD
I think it depens on the inner workings of the preferred language in the first place.
I suppose many order books are nowadays written in JS. So after considering many data structures like priority queues etc i think the best turns out to be a simple array and you don't even have to implement binary searches whatsoever.
My idea is to use a sparse array with a precision factor. So if your Bitcoin price could be represented like USD 33,112.06 then your precision is 100 but if it's USD 0.38092 like the current Ada price, then your precision is 100,000. You just need to multiply the price with your precision factor to calculate your index value. So for instance as per Bitcoin your price is at index 3311206 and for Ada it is at index 38092. So that's it, you have a sparse array with some (many) indices missing and who cares? You want a sorted dense list..? then just do
Object.keys()or use the
for inloop. You need the first
kmany items then just turn the
ktimes. O(k). Want to make your list descending? Then complement your price and index relationship.
It's lighting fast since it jumps over the missing keys (non existent properties of an object). Access to
bitCoinSalesBookis O(1). If that particular price level gets emptied simply delete that key like
delete bitCoinSalesBookjust O(1).
What's the problem..?
You can also utilize two red black trees. Bids would be stored in descending order in one tree. Asks would be stored in ascending order in the other.
The best bid and ask would always be the first node of their respective trees.
Your inserts and deletes would be in O(log n) time.
I think it has been explained here why red black trees are not much used in practice https://quant.stackexchange.com/questions/63140/red-black-trees-for-limit-order-book
Curious how nobody in 9 years has provided an optimal answer yet?
You need a B-Tree that has been extended with a double-linked list on each node.
The B-Tree to find (or not find if they're missing) things as fast as possible, and the list to walk along the neighbours to join gaps (such as when filling trades that spam many orders).
Even with big books in slow non-compiled languages like python, you can easily process millions of trades/orders per second this way.