How We Made a 128-Arm Pattern Match Run 10Γ Faster
The 128-branch dispatch problem, and the compile-time optimization system that solved it.
Pattern matching is coming to C++ β eventually. C++26 might bring it. But if you're writing C++17 today and need to match on variants, enums, or arbitrary values with guards, destructuring, and combinators, you're on your own.
I've been building Patternia, a C++17 header-only pattern matching library, for the past year. The API is simple enough:
#include <ptn/patternia.hpp>
using namespace ptn;
auto result = match(subject) | on(
val<Status::Ready> >> []{ return "ready"; },
val<Status::Pending> >> []{ return "pending"; },
is<Error>(ds(status)) >> [](int s){ return "error: " + std::to_string(s); },
_ >> []{ return "unknown"; }
);
Clean. But the API isn't the hard part. The hard part is making it fast.
The Problem
When you write a pattern match with 128 cases (not unusual for protocol parsers or command dispatchers), every on() call constructs those 128 case objects β guards, destructuring patterns, combinators β every single time the match is evaluated.
The naive approach: 10.79 nanoseconds per call.
A hand-written switch with 128 branches: 1.09 nanoseconds per call.
That's a 10Γ gap. If your library is slower than the code it replaces, nobody will use it.
The Solution: Automatic Dispatch Plan Selection
The core insight is that at compile time, the library knows everything about the patterns β which ones are compile-time constants, which ones match variant types, which ones have guards. So why not let the compiler pick the optimal dispatch strategy?
Patternia does this with a six-tier dispatch plan hierarchy, selected at compile time via dispatch_plan_selector:
1. static_literal_dense β O(1) array lookup for val<V> compile-time values
2. literal_runtime_dense β O(1) for runtime lit() values
3. literal_linear β Direct dispatch for integral/enum subjects
4. variant_simple β Precomputed case-index table by variant alt
5. variant_alt_bucketed β Binary search over alt-bucketed cases
6. sequential β Linear fallback
The selector analyzes the pattern set and picks the best strategy at compile time β no runtime overhead for the decision:
template <typename AnalysisResult>
struct dispatch_plan_selector {
static constexpr dispatch_plan_kind kind =
analysis_t::can_use_static_literal_dispatch
? dispatch_plan_kind::static_literal_dense
: (analysis_t::can_use_runtime_literal_dense_dispatch
? dispatch_plan_kind::literal_runtime_dense
: (analysis_t::can_use_simple_literal_dispatch
? dispatch_plan_kind::literal_linear
: (analysis_t::can_use_simple_variant_dispatch
? dispatch_plan_kind::variant_simple
: (analysis_t::can_use_variant_alt_dispatch
? dispatch_plan_kind::variant_alt_bucketed
: dispatch_plan_kind::sequential))));
};
Each tier applies density heuristics to decide whether to engage. For example, static_literal_dense only activates when the value span is at most 4Γ the number of literal cases β ensuring the lookup table isn't too sparse. Tuning parameters are baked in:
k_static_literal_dense_dispatch_max_cases = 256;
k_static_literal_dense_dispatch_max_span = 512;
k_static_literal_dense_dispatch_max_density = 4;
k_runtime_literal_dense_dispatch_max_cases = 256;
k_runtime_literal_dense_dispatch_max_span = 512;
k_runtime_literal_dense_dispatch_max_density = 8;
The [[no_unique_address]] Bug
Here's where it got weird. The optimization system checks whether a pattern type is "cacheable" β empty types can be skipped, reducing the case construction overhead. It used std::is_empty for this check.
One pattern type, static_literal_pattern, holds a comparator:
template <auto V, typename Cmp = std::equal_to<>>
struct static_literal_pattern : base::pattern_base<static_literal_pattern<V, Cmp>> {
[[no_unique_address]] Cmp cmp{};
// ...
};
std::equal_to<> is technically an empty type. We annotated it with [[no_unique_address]]. But GCC applied the attribute inconsistently β sometimes the member was optimized away, sometimes not. When GCC left it in, std::is_empty returned false for the pattern type, and the cache path was silently skipped.
The fix: replace std::is_empty with a sizeof <= 1 heuristic. Works consistently across GCC, Clang, and MSVC.
Finding this bug required reading the generated assembly for a dozen compiler versions. The lesson: [[no_unique_address]] is a hint, not a guarantee.
Results
With automatic dispatch plan selection, the 128-branch match drops from 10.79 ns to 1.13 ns β within 4% of a hand-written switch (1.09 ns):
| Approach | Latency |
|---|---|
Raw on() (naive) |
10.79 ns |
| PTN_ON macro (manual cache) | 1.16 ns |
| Auto-cache (this work) | 1.13 ns |
| Hand-written switch | 1.09 ns |
The key win: automatic optimization means users get the fast path without knowing dispatch plans exist. They just write match(subject) | on(...) and the compile-time analyzer does the rest.
Real-World Benchmarks
Beyond the 128-branch microbenchmark, Patternia holds its own in realistic scenarios:
| Scenario | Patternia | Baseline | Result |
|---|---|---|---|
| Command parser | 1.790 ns | Switch (2.258 ns) | +26% faster |
| Protocol router | 1.628 ns | IfElse (1.866 ns) | +14% faster |
| Variant mixed | 1.997 ns | std::visit (1.991 ns) | Tied |
How It Compares
Patternia isn't the only C++ pattern matching library. But it brings a few things the others don't:
| Feature | Patternia | matchit.cpp | Mach7 |
|---|---|---|---|
| Pipeline syntax | β | β | β |
| Compile-time dispatch optimization | β | Limited | β |
| Structural bindings | β | β | β |
| Guard expressions | β | β | β |
| Pattern combinators (any/all/neg) | β | β | β |
| C++ standard | C++17 | C++17 | C++14 |
| Header-only | β | β | β |
The automatic dispatch optimization is the differentiator. Other libraries can match β Patternia matches fast.
Try It
The library is a single header include away:
# CMake FetchContent, vcpkg, or just copy the headers
git clone https://github.com/sentomk/patternia
#include <ptn/patternia.hpp>
using namespace ptn;
// Your first pattern match, automatically optimized
auto result = match(value) | on(
val<1> >> []{ return "one"; },
val<2> >> []{ return "two"; },
is<std::string>(ds(s)) >> [](auto& s){ return s; },
_ >> []{ return "other"; }
);
The compile-time optimizer picks the right dispatch strategy. You don't have to think about it.
Patternia on GitHub β 179 stars, 438 commits, C++17 header-only.
United States
NORTH AMERICA
Related News
How Brazeβs CTO is rethinking engineering for the agentic area
10h ago
Amazon Employees Are 'Tokenmaxxing' Due To Pressure To Use AI Tools
21h ago

Implementing Multicloud Data Sharding with Hexagonal Storage Adapters
15h ago

DeepMindβs CEO Says AGI May Be ~4 Years Away. The Last Three Missing Pieces Are Not What Most People Think.
15h ago

CCSnapshot - A Claude Code Configs Transfer Tool
21h ago