How are HFT systems implemented on FPGA nowadays?
I have read about different implementations of HFT systems on FPGAs.
- Argon HFT system (http://trading-gurus.com/argon-design-an-fpga-based-hft-platform/)
- Hardware-only implementations or hybrid solutions http://msrg.org/publications/pdf_files/2010/moTrading-Efficient_Event_Processing_through_.pdf)
- Domain Specific Langauges for FPGAs to implement feed handlers (http://msrg.org/publications/pdf_files/2010/moTrading-Efficient_Event_Processing_through_.pdf).
Vendors like Cisco claim they have achieved the same results with high performance NIC's (http://www.cisco.com/c/dam/en/us/products/collateral/switches/nexus-3000-series-switches/white_paper_c11-716030.pdf).
My question is, what part of HFT systems are mostly implemented on FPGAs nowadays? Are FPGAs still very popular? Is only the feed handler implemented on the FPGAs? Because some of these systems described above only have a feed handler implemented on the FPGA, because the strategy changes too much, or is too hard to implement on FPGAs. Others claim that they have also implemented trading strategies on FPGAs or using high performance NICs instead of FPGAs to build HFT systems. I've read about different approaches but I find it hard to compare as most of the results are tested on different input sets.
There are more uses of FPGA's than the ones you mention. An FPGA board often sits where the NIC does in a CPU-only system. In some cases the FPGA is nearly stand-alone, receiving market data, calculating a strategy's theos, firing orders and hedging fills. At the other end of the scale they work as a smart NIC, pulling in the raw market data and order traffic and feeding an application running on the server with only the data it needs. An FPGA or NIC has similar transceiver latency to the network as it does over the PCIE bus, but 2x as more for everything that goes to the app.
Here's a way to think about it: imagine you can do something in an ASIC (i.e. directly in hardware). However, the process of fabrication is in itself expensive, and you get a design that you cannot change afterwards. ASICs make sense for predefined tasks such as Bitcoin mining, well-known data processing algorithms, etc.
On the other hand we have ordinary CPUs (as well as coprocessor CPUs and GPUs) which are general-purpose, but process a small (in terms of concurrent instructions) set of instructions at a very high speed.
FPGAs are the middle ground. They are 'hardware emulators' and as such can be considered to be 10x slower than actual hardware, but still way more performant for concurrent operations than CPUs provided you are able to utilize the die to spread your logic accordingly.
Some uses of FPGAs are:
- Video transcoding (e.g. HD video decoding in TVs) as well as various data acquisition boards
- Fixed data structure parsing (Regex parsing)
- Discrete system simulation (for example, simulating the outcome of a card game)
- Lots of 'properly embedded' applications such as e.g. in aerospace or scientific research
The problem with FPGAs for quant uses is that it's not so good for floating-point calculations, particularly since ordinary CPUs are already optimized for that with things like SIMD. However, for anything fixed-point or fixed-size data structures, FPGA design allows you to configure the device to do a lot of processing at the same time.
Some things done in trading are using FPGA for feed handlers (parsing directly from the network stream) as well as building certain parts of the trading structure (e.g. order books) in hardware in order to be able to deal with the rapidly changing data structure without loading the CPU.
FPGAs mainly aim to address the concern of quickly processing data without paying the propagation costs. This is particularly in contrast with devices such as GPGPU (or any PCI-dwelling card, such as Xeon Phi) which pay performance penalties for getting data to/from the device. That said, DMA options are improving in this regard, too.