
Table of Contents
Open Table of Contents
- The folklore
- Ground truth: what the Standard says
- “But my disassembler shows identical code!”
- The hidden foot-guns of
if (a & b) - Performance anatomy
- Measuring instead of guessing
- When does bitwise pay off?
- Debunking common myths
- Concurrency and atomics
- Style, intent, and code-review heuristics
- A quick guideline table
- Conclusion
The folklore
Branching is slow, so replace
&&with&(or||with|) and the optimizer will thank you.
That advice shows up in code reviews, on Reddit threads, and even in seasoned code bases. The intuition is simple:
&&may short-circuit, so the compiler must generate a conditional branch.&always evaluates both operands, so the resulting instruction stream can be branch-free.
Therefore, the bitwise form “must” be faster—right?
We are going to stress-test that claim from four angles:
- Language rules (what the Standard guarantees).
- What compilers actually emit in 2025.
- Micro-architecture (branch prediction, cache, ILP).
- Human factors—maintainability, bugs, and intent.
The result is a nuanced picture: sometimes & wins, often it makes no difference, and in many real programs it is a liability.
Ground truth: what the Standard says1
- Built-in
&&must evaluate left → right and must not evaluate the right operand when the left isfalse(or when the result is already known for||). This is sequenced, not negotiable.2 - Overloaded
operator&&/||are just function calls—no short-circuiting. - Bitwise
&/|always evaluate both operands and produce an integral value.
Key takeaway: Behavioural semantics are fixed. The optimizer can re-arrange side-effect-free instructions, but it may not “speculatively” evaluate an operand if doing so would create an observable difference in the abstract machine.
Cppreference summarises the rule crisply: “Built-in && and || perform short-circuit evaluation; bitwise logic operators do not.”3
“But my disassembler shows identical code!”
That is not a myth. For side-effect-free operands the optimiser is allowed to collapse:
bool fast = (x & y); // bitwise
bool safe = (x && y); // logical
into the same branch-free sequence because it can prove that:
xandyare scalars already materialised in registers,- reading them twice is free,
- the observable result (
fast/safe) is identical.
A nice real-world benchmark shows exactly that: Clang 15 with -O3 produced nearly identical inner loops for & and && scans over a 160 M-element array.4
What happened to short-circuiting? It is still honoured semantically, but the optimiser recognised that evaluating the second operand eagerly does not change program behaviour, so it emitted branch-free code that happens to read both variables. The standard’s “as-if” rule is satisfied.
The hidden foot-guns of if (a & b)
| Category | Risk when replacing && with & |
|---|---|
| Side effects | b++ now always increments, even when a is false. |
| Operator precedence | & binds weaker than ==, stronger than &&; many subtle bugs stem from forgotten parentheses. |
| Integral promotion | & works on numbers. If a/b are bool, the result is still int, not bool. A subsequent comparison with true becomes tautological. |
| Overloaded operators | Custom operator& might exist and change semantics silently. |
| Readability / intent | Future maintainers will assume “bit mask” semantics, not “logical test”. |
Performance anatomy
-
Branch prediction: Modern CPUs predict the
jnegenerated by&&with >95 % accuracy for stable data patterns. Misprediction costs ~14 cycles, but only when the data is adversarial. -
Instruction-level parallelism: Eager evaluation can sometimes reduce ILP because both operands must be ready before the fused compare can retire.
-
Cache traffic: If
bis a pointer dereference, forcing the read every time may hurt cache residency. Short-circuiting can be a win. -
Speculative loads: An out-of-order core may fetch
bearly anyway; if it turns out the load was unnecessary the cost is often hidden. -
GPU / SIMD kernels: Here branch divergence is lethal; replacing
&&with&(or*) makes sense—but you should express it explicitly in a data-parallel style (e.g.select(mask, …)), not with C++ control flow.
Measuring instead of guessing
Below is a minimal benchmark you can paste into Compiler Explorer or run with perf stat—modify the MODE macro to flip operators.
#include <vector>
#include <random>
#include <chrono>
#include <iostream>
#ifndef MODE // 0 = bitwise &, 1 = logical &&
#define MODE 1
#endif
int main() {
constexpr size_t N = 128 * 1024 * 1024;
std::vector<uint8_t> a(N), b(N);
std::mt19937 rng(0);
std::uniform_int_distribution<int> dist(0, 255);
for (size_t i = 0; i < N; ++i) {
a[i] = dist(rng);
b[i] = dist(rng);
}
size_t hits = 0;
auto t0 = std::chrono::steady_clock::now();
for (size_t i = 0; i < N; ++i) {
#if MODE
if ((a[i] > 200) && (b[i] > 200)) // logical
#else
if ((a[i] > 200) & (b[i] > 200)) // bitwise
#endif
++hits;
}
auto t1 = std::chrono::steady_clock::now();
std::cout << "hits=" << hits << " "
<< std::chrono::duration<double>(t1 - t0).count()
<< " s\n";
}
On an Ice Lake i7 at -O3 the difference is typically < 1 %, completely drowned by measurement noise unless you pin the benchmark and flush caches between runs.
When does bitwise pay off?
-
Inside tight, branch-averse GPU shaders or SIMD loops, where every warp must execute the same instruction stream.
-
When computing a compound predicate that you will reuse later:
uint32_t m = flags & READY & ENABLED & !ERROR; if (m) … -
For bit-mask idioms (
if (flags & FLAG_WRITE))—but that is not a boolean logic replacement; the intent is genuinely “test bit k”.
Even in those cases document your intent with a comment; the form is unusual enough that reviewers will ask.
Debunking common myths
| Myth | Reality |
|---|---|
“Compilers can ignore the short-circuit rule if b has no side effects.” | They can re-order or speculate as long as the abstract machine observes the same effects. Sequenced side effects (I/O, volatile access, atomic<>) still forbid premature evaluation. |
“& is branch-free, therefore faster.” | Only if the branch is unpredictable and the second operand is in cache. Otherwise && is at worst equal, sometimes better. |
| “Logical operators expand to bigger machine code.” | Modern optimisers fold simple boolean logic into flag registers; the size difference is mostly the conditional jump. |
“Using & avoids the register dependency chain.” | Not necessarily—both forms need both values before retirement unless the compiler transforms the predicate into a single compare-and-branch. |
Concurrency and atomics
For std::atomic<bool> the value read is a side effect visible to other threads (§6.9.2.1 ¶7). A conforming compiler must not merge or speculate away the short-circuit. Rely on && to avoid spurious loads/stores across threads; replacing it with & can lengthen the critical section or even change lock-free algorithms.
Style, intent, and code-review heuristics
- Default to
&&/||unless you have a measured reason. - Document any deliberate use of
&on booleans (// branch-free predicate for GPU). - Isolate tricks behind helper functions or constexpr lambdas so the intent is explicit.
- Write tests that capture side-effect expectations. A refactor that changes
++btobcan silently break logic when&is used.
A quick guideline table
| Situation | Recommended operator |
|---|---|
| Plain boolean predicate in CPU code | && / || |
Bit-mask test (flags & FLAG) | & / | |
| GPU kernel, shader, or wide SIMD | & / | (with a comment) |
| Mixed types or potential side effects | Stick to && / || |
| Overloaded user-defined types | Prefer functions (all_of, any_of) |
Conclusion
The performance delta between && and & has mostly evaporated on today’s optimising compilers and superscalar CPUs. What remains is a semantic delta that can—and regularly does—bite maintainers.
Unless you have a profiler trace, an ISA you know intimately, and operands guaranteed side-effect-free, treat if (a & b) for what it is: a micro-optimisation gamble that trades clarity and correctness for at best marginal speed gains.
Opt for the code that communicates intent first. Let the optimiser prove you wrong later.
TL;DR – Use && unless you can prove that & is faster and harmless. The compiler usually does the proving for you.
Footnotes
-
C++ Standard (ISO/IEC 14882:2023) §7.6.14/§7.6.15 – sequencing rules for logical operators. ↩
-
compilation - C++ compiler optimizations and short-circuit evaluation - Stack Overflow ↩