Imagine you have a stream of shapes:

A simple question: is there any shape that appears more than half the time? In other words, does any shape win by absolute majority?

The Obvious Solution

Easy. Build a counter — a dictionary that tracks how many times each shape appears:

counter = {

:7> 12/2 ✓

} # 12 elements, 4 unique keys

Majority found: Circle (7/12)

This works perfectly. Scan the stream once, increment a counter for each shape, and at the end check which (if any) exceeds $N/2$ .

Time: $O(N)$ . Space: $O(K)$ , where $K$ is the number of distinct shapes.

For 4 shapes, this is trivial. But what if $K$ is enormous?

When Counting Breaks

Imagine your shapes are not 4 types but millions — user IDs, IP addresses, DNA sequences. The dictionary grows to hold every unique key in memory. At some point, the counter itself doesn’t fit in RAM.

And there’s a second problem. What if your data is so large it’s split across hundreds of machines?

Each node could build its own local dictionary, but to find the global majority you’d need to send all the dictionaries to one place and merge them. That means moving potentially gigabytes of key-value pairs over the network — slow, expensive, and fragile.

We need something smaller. Something that compresses the answer into a constant-size summary that’s cheap to compute, cheap to send, and cheap to merge.

The Intuition: Cancellation

Before looking at any algorithm, let’s build an intuition for what “majority” really means — and why it’s structurally different from “most common”.

Consider a sequence of 10 shapes where squares appear exactly 5 times — half the sequence:

We can pair every square with a non-square:

This way of reordering and grouping elements clearly shows that half of the elements are squares, but it also reveals something deeper: every square can be cancelled out by a non-square. There is no “excess” square that stands alone.

Now change just one non-square into a square, so squares appear 6 out of 10 times:

Try the same pairing strategy:

Four pairs cancel, but two squares have no opponent. They cannot be cancelled out.

This leftover is unavoidable. It doesn’t depend on how you form the pairs of different elements. It’s a direct consequence of the pigeonhole principle: there simply aren’t enough “attackers” to destroy all copies of the majority.

What if we go in the other direction? Change a square into another shape:

In this example, not all pairs contain a square, but the idea is the same: every square can be cancelled out by a non-square.

Does this mean that, when no shape is a majority, any way of pairing the elements will cancel everything out? Not really. Look at different ways of pairing the same sequence and see how the “survivor” shape changes:

Two circles cannot be paired up

Or this other way:

Two squares cannot be paired up

When a majority exists, perfect cancellation is impossible — at least one copy must survive. When no majority exists, the outcome is ambiguous — any leftover is a coincidence, not a guarantee.

Now suppose the sequence is unknown. You only know there are 10 elements total, and that exactly two squares survived. What can we infer?

Two squares cannot be paired up

We know there must be at least 2 squares — the survivors themselves. But what about the maximum? Since 2 items were left unpaired, 8 items were paired off — forming 4 pairs. It is possible for every pair to consist of one square and one non-square. That means at most 4 squares could have been cancelled. Adding the survivors back in, the total count of squares cannot exceed $4 + 2 = 6$ .

Two squares cannot be paired up

When no majority exists, the pairing process can look arbitrary, but it is not meaningless. Cancellation still removes matched pairs, which constrains how often the final survivor could have appeared.

We cannot recover the order or the exact count, but we can place the survivor’s true frequency within a bounded range.

Turning Cancellation into a Streaming Algorithm

The cancellation idea is beautiful, but pairing up elements requires seeing them all at once. Can we do it one element at a time, using only constant space?

Yes. We just need two variables:

A candidate shape (our current best guess).
A strength counter (how many unmatched copies we’ve seen).

The rules are simple:

If strength is 0, the incoming shape becomes the new candidate (strength = 1).
If the incoming shape matches the candidate, strength goes up (+1).
If it doesn’t match, they cancel each other out and strength goes down (−1).

Step 3 is the key — it’s the pairwise cancellation happening implicitly. Every decrement simulates throwing away one copy of the candidate and one of the incoming shape.

At the end, whatever candidate remains survived all cancellations. If a majority existed, it’s guaranteed to be this survivor.

Try stepping through the algorithm below. Watch how the candidate changes with each new element until the true majority stabilizes:

Size12

Candidate

Strength0

This Algorithm is known as the Boyer–Moore majority vote algorithm. It is elegant, efficient, and surprisingly powerful.

Can We Trust the Result?

As discussed before, we cannot assume the strength is going to be the same as the actual count. The algorithm guarantees that if a majority exists, it will be the surviving candidate. But if no majority exists, it will still output something — a leftover that happened to survive by luck.

For example, in a stream like

the algorithm will output triangle with strength 1, but none of the three has a majority.

To be certain, you’d need a second pass to count the actual occurrences of the triangle. But that’s $O(N)$ additional work. Note this is still better than the naive counting approach, which needs to keep track of all candidates from the start. This second pass only needs to count one type of shape, ignoring the rest.

Can we avoid this second pass? Not with certainty — but we can bound the true count using only the final strength $s$ and total length $N$ :

s \leq C \leq s + \lfloor (N - s) / 2 \rfloor

Where $C$ is the actual count of the candidate.

The minimum is $s$ : each unit of strength is one net unmatched copy.
The maximum adds back the cancellations: each consumed one candidate and one other shape, so at most $\lfloor (N - s) / 2 \rfloor$ candidates were cancelled.

If the lower bound already exceeds $N/2$ , you have a guaranteed majority without any second pass. The demo above shows these bounds when the algorithm finishes.

Going Distributed

Here’s where this gets powerful. The state of the algorithm is just two values: (candidate, strength). This tiny summary is mergeable.

Suppose you split the data into partitions and run the algorithm independently on each. To combine two summaries $(C_1, S_1)$ and $(C_2, S_2)$ :

If $C_1 = C_2$ : they agree — sum the strengths → $(C_1, S_1 + S_2)$ .
If $C_1 \neq C_2$ : they disagree — the stronger one survives with the difference → $(C_w, |S_1 - S_2|)$ .

This is exactly the cancellation principle again, applied to summaries instead of individual shapes. The merge itself is $O(1)$ .

Compare this to the naive approach: sending entire dictionaries and merging millions of key-value pairs. Here, you’re sending and merging two numbers.

Try it below. Even when different partitions produce different local winners, the global merge correctly surfaces the true majority:

PartitionsSize / Partition

Partition 1

→

CandidateStrength

Partition 2

→

CandidateStrength

Partition 3

→

CandidateStrength

Global Merge Result

Candidate

Strength2

Actual Count:7/ 12✓ Majority

Without a second pass, the true count is bounded:

Min2

<= actual <=

Max7

count in [2, 7] out of 12

0N/2N=12

Implementation

The entire algorithm fits in a few lines:

def majority_vote(stream):
    candidate = None
    strength = 0

    for element in stream:
        if strength == 0:
            candidate = element
            strength = 1
        elif element == candidate:
            strength += 1
        else:
            strength -= 1

    return candidate, strength

And the merge:

def merge(summary_a, summary_b):
    c1, s1 = summary_a
    c2, s2 = summary_b

    if c1 == c2:
        return c1, s1 + s2
    elif s1 >= s2:
        return c1, s1 - s2
    else:
        return c2, s2 - s1

Complexity

	Time	Space
Naive (counting)	$O(N)$	$O(K)$
Sorting	$O(N \log N)$	$O(1)$
Boyer-Moore	$O(N)$	$O(1)$

Linear time, constant space. You literally can’t do better.

And for distributed workloads, the merge cost per pair of partitions is $O(1)$ instead of $O(K)$ . When $K$ is in the millions, this is the difference between a query that takes hours and one that takes seconds.

Pushing the idea further

The algorithm keeps a single candidate and a single counter. That choice is deliberate — it is the smallest possible summary. But it naturally raises a question: what if we kept more than one candidate?

With two or more candidates, fewer shapes would be forced to compete for the same slot. Intuitively, this could preserve more information about the stream and improve the quality of the estimate when no strict majority exists. Whether that extra precision is worth the extra state is a separate question.

A similar question appears if we relax another assumption. So far, every shape contributes a unit count. But what if shapes carried weights instead? The problem then becomes identifying which shape contributes more than half of the total weight, not just half of the appearances.

Both ideas point in the same direction: Boyer–Moore is not an isolated trick, but the simplest instance of a broader family of cancellation-based summaries. What changes is not the core logic, but how much state we are willing to keep — and what kind of dominance we want to detect.

References

Misra-Gries Summaries. I heard about this algorithm for the first time in this paper, which describes at extension of the majority vote idea to track multiple candidates.
LeetCode Problem 169 - Majority Element