Imagine this: you’re planning your next getaway, and the last thing you want is to vacation in an area teeming with COVID infections. Being the data-driven person you are, you decide to download an open dataset of COVID infections across your country—each infected person marked as a latitude-longitude point (yeah, I know this dataset wouldn’t be GDPR compliant, but let’s roll with it for this example).

When you plot all these points on a map, you end up with a chaotic scatter of dots:

At first glance, it’s nearly impossible to spot any clusters or patterns among the noise. So, you think, “How can I make sense of this mess?” The smart move is to aggregate the data. You overlay a grid onto the map and break the country into larger zones. Then, using a choropleth with a well-chosen color scale, you paint areas with fewer infections in calming green and those with more infections in alarming red.

Now, the heatmap below might simply be yelling, “Avoid the east-central region—COVID is going wild there!” With just one quick glance, you can easily pinpoint the spots with the lowest infection rates, making it look like the perfect place for a safe, relaxing holiday.

Well, here’s where things get tricky. Notice that these squares aren’t set in stone, they exist purely by arbitrary choice. The grid’s position is nothing more than a decision made without any inherent logic. Now, imagine if we shifted the grid just a little to one side, the way the data aggregates could change completely.

→

The underlying data is exactly the same, dots have not been modified. However, the center-east of the map no longer looks dangerous. In general all areas are pretty much safe except for one reddish square on the right. Different results that lead us to take completely different decisions.

That is what the Modifiable Areal Unit Problem (MAUP) is all about. It is a fancy name for something that boils down to: the way you group your data can massively affect the results you get.

Why Does This Happen?

It all comes down to aggregation. When the map’s dots (each representing an infected individual) are grouped into zones, the size, shape, and position of those zones can change the numbers you see. A tiny shift in the grid could lump a few extra dots into one cell, making it look like a hotspot. Conversely, a different configuration might spread out those dots more evenly, making the area appear safer than it really is.

Arbitrary Boundaries: Whether you’re using squares, hexagons, or even administrative regions, the lines that divide the map are, in many cases, arbitrary. A small nudge of these boundaries might push a few infection dots from one cell into another, altering the average infection rate.
Scale/Resolution: The cell size matters too. Large cells smooth out the data, potentially hiding small clusters of infections. On the other hand, smaller cells can highlight micro-clusters that may not be significant when viewed in a broader context.
Shape Differences: Different shapes capture data differently. A hexagon might cover an area in a way that minimizes boundary issues compared to a square. But again, that doesn’t mean one shape is inherently “better”—it just means your conclusions might vary depending on your choice.

Play With It

In the interactive version of this notebook, you could tweak all parameters like grid type, orientation, cell size, number of dots, and even apply a Gaussian filter to smooth the data. Unfortunately, those controls aren’t available here, but feel free to experiment by changing the props passed to the <MaupSimulator /> component!

Solutions to MAUP

That’s content for a different note!

Conclusion

In the end, MAUP reminds us that there’s no one “correct” way to see the world—just different perspectives. A slight tweak in your grid can change the story entirely, so never settle for just one view. Explore multiple configurations, question the boundaries, and let curiosity guide you to truly informed insights.