Phase 3 Months 13 – 18

Low-Latency
Mastery

This is the HFT-specific phase. Lock-free concurrency, hardware-level optimisation, kernel bypass networking, and a full trading system portfolio. By the end you should perform well in on-site interviews and take-home challenges. The gap from generic SWE to quant SWE closes here.

01Lock-Free & Advanced Concurrency
Atomics, memory ordering, lock-free structures 5 topics
  • Learn atomics in depthstd::atomic<T>. compare_exchange_weak vs compare_exchange_strong. ABA problem. Why CAS is the foundation of lock-free algorithms.
  • Learn memory fencesstd::atomic_thread_fence. Memory order semantics: relaxed, acquire, release, acq_rel, seq_cst. When do you need each? What does the CPU reorder without them?
  • Learn lock-free patternsMichael-Scott queue. Hazard pointers. RCU (read-copy-update). Seqlock. Know the theory before implementing.
  • Implement: ring buffer (SPSC)Single-producer single-consumer ring buffer using atomics. Cache-line padding to avoid false sharing. Measure throughput vs mutex version.
  • Implement: lock-free queue (MPSC or SPSC)Multi or single producer, single consumer. Use compare-exchange. Write stress tests. This is a standard HFT interview project.
Lock-free data structures are the bread and butter of low-latency quant systems. Building one proves you understand memory ordering — something most candidates cannot do. This single project separates junior from mid-level in quant interviews.
02Hardware-Level Optimisation
CPU architecture for performance engineers 4 topics
  • Study CPU cache hierarchyL1/L2/L3 sizes and latencies. Cache line size (64 bytes). Prefetching: manual __builtin_prefetch and hardware prefetcher behaviour. Cache thrashing. Designing for cache locality.
  • Study NUMA (Non-Uniform Memory Access)Multi-socket systems. Local vs remote memory access latency. numactl. Memory allocation on the correct NUMA node. HFT servers are often multi-socket.
  • Study SIMD / AVX intrinsicsSSE2, AVX, AVX2. _mm256_add_ps, _mm256_load_ps. Vectorising numerical loops. When the compiler auto-vectorises and when it doesn't.
  • Optimise numerical loops using intrinsicsWrite a risk calculation loop (sum of weighted positions, for example). Benchmark scalar vs auto-vectorised vs manual SIMD version. The difference can be 4–8x.
HFT engineers think in nanoseconds. A cache miss costs ~100ns on L3. A NUMA remote access costs ~300ns. SIMD can process 8 floats in one instruction. Understanding hardware lets you write code that is fast by design, not by accident.
03Low-Latency Networking
Kernel bypass, packet processing, feed handling 3 tasks
  • Experiment with kernel bypass — DPDK or io_uringDPDK: run in user space, bypass kernel network stack entirely, poll-mode drivers. io_uring: the Linux async I/O interface that avoids syscall overhead. Even understanding the concept positions you well in interviews.
  • Experiment with packet processingCapture raw packets with pcap. Parse a binary market data protocol manually. Understand how latency is measured at the packet level (hardware timestamps).
  • Build: high-speed UDP feed handlerReceive simulated market data ticks over UDP at high rate. Parse them into your order book without dropping. Measure and report latency percentiles (p50, p95, p99, p99.9).
HFT firms use kernel bypass (Solarflare/Onload, DPDK, RDMA) to get network-to-application latency below 1 microsecond. You don't need to be a kernel developer — but knowing why these techniques exist and how they work is a strong signal.
04Numerical Computing
Eigen, Boost, and optimised loops 2 tasks
  • Use Eigen or Boost.MathEigen: fast linear algebra, expression templates, no dynamic allocation in hot paths. Boost.Math: statistical distributions, special functions. Use them in your order book for any risk or pricing calculations.
  • Write and optimise risk calculation loopsSimulate a simple P&L or delta calculation over a portfolio of positions. Write the naive version. Profile it. Vectorise it. Measure the improvement.
SWE quant roles are not quant research roles — you don't need to build pricing models. But understanding numerical stability, floating-point precision, and how to write tight arithmetic loops is expected at mid-level and above.
05Trading System Architecture
End-to-end HFT system understanding 2 deliverables
  • Design a full HFT system diagramMarket Data → Protocol Parsing → Order Book → Strategy → Risk Check → Execution → Exchange. Label the latency budget at each stage. Understand what co-location means and why firms pay for it.
  • Understand co-location and latency budgetsWhat is a latency budget? How do you allocate nanoseconds across components? What is the speed of light latency between Amsterdam and Frankfurt (~5ms)? Why co-lo at AMS-IX or Equinix AM3 matters.
System design interviews at quant firms look different from FAANG. They want to see how you think about latency budgets and physical constraints. Being able to draw and explain this diagram confidently is a strong mid-level signal.
06Portfolio — Phase 2: Full Engine
Expand your order book into a complete system 6 deliverables
  • Market data simulator (UDP ticks)Generate synthetic price ticks over UDP multicast. Configurable rate. Sequence numbers. Simulate occasional packet loss to test your handler's gap detection.
  • Full matching engineExpand the Phase 2 order book. Price-time priority, partial fills, order expiry (IOC/GTC), trade reporting.
  • Strategy module (simple)A market-making stub: post bid and ask quotes at configurable spread, cancel/replace on fill, basic position limits. Not a real strategy — a demonstration of the flow.
  • Lock-free queues between componentsConnect market data → order book → strategy using your SPSC ring buffer. No mutexes in the hot path.
  • Performance benchmarksMeasure: messages processed per second, order book update latency (p50/p95/p99), end-to-end latency tick-to-order. Use Google Benchmark or manual CLOCK_MONOTONIC measurements.
  • Latency measurements with chartsPlot latency histograms. Show before/after for any optimisation you made. This is what you show in interviews — not just "it's fast" but "here is the data."
This portfolio piece is your primary interview differentiator. It shows domain knowledge, C++ mastery, performance thinking, and the ability to build a complete system. A well-documented, benchmarked order book is worth more than any certification.
07DSA — 300 → 400+
Hard problems, mock interviews, advanced patterns Focus: DP, advanced graphs, bits
  • 2D dynamic programmingLongest common subsequence, edit distance, unique paths, coin change II. The jump from 1D to 2D DP is the hardest conceptual leap in the entire DSA curriculum.
  • Advanced graph algorithmsDijkstra, Bellman-Ford, Floyd-Warshall, union-find, minimum spanning tree (Prim/Kruskal).
  • TriesWord search, prefix matching. Understand the structure and when it beats a hash map.
  • Bit manipulationXOR tricks, bit masks, counting set bits. HFT engineers use these for fast flag operations.
  • Mock interviews weekly from month 14interviewing.io or Pramp. Talk through your solution. Time yourself. Get feedback.
08Interview Preparation — Intensification
HFT-specific and system design prep 5 focus areas
  • OS interview questionsWhat happens when you call malloc? What is a page fault? How does the kernel scheduler decide which thread runs next? What is a spinlock vs a mutex?
  • C++ internals questionsWhat is the vtable? What is the difference between new and malloc? What is placement new? What does volatile do? What is a memory fence?
  • Networking questionsWhat is the difference between TCP and UDP? What is Nagle's algorithm? What is a socket buffer? What happens when your receive buffer fills up?
  • Probability brainteasers at interview pacePractice verbalising your reasoning, not just reaching the answer. Optiver interviewers want to see your thought process under pressure.
  • System design: HFT componentsDesign a market data handler. Design a risk engine. Design an order management system. Focus on latency budgets and failure modes.
Milestone — End of Month 18

You should be able to: Implement a lock-free queue from memory and explain memory ordering choices. Profile C++ code and identify hotspots. Explain the full path of a market order from wire to trade. Present your order book with benchmark data. Solve hard LeetCode problems within 35 minutes. Pass an Optiver/IMC on-site with strong probability. At this point you are genuinely mid-level quant SWE material.