The CAP Theorem - the Theory and the Practice
In short, out of C, A, and P, you can’t sacrifice partition tolerance.
👋 Hi, this is Thomas, with a new issue of “Beyond Code: System Design and More”, where I geek out on all things system design, software architecture, distributed systems and… well, more.
As organizations increasingly depend on complex and distributed software systems, the importance of effective system analysis and design (SAD) grows. This discipline combines principles from computer science, information technology, and business management to develop and enhance software systems.
Key aspects of SAD include adopting a lean, iterative approach, leveraging appropriate tools, and engaging stakeholders. One crucial concept in the SAD toolkit is the CAP theorem.
This theorem helps critique distributed system designs, highlighting necessary trade-offs and providing a framework to create robust, scalable systems aligned with long-term strategic goals.
The Cap Theorem
The CAP theorem, also known as Brewer’s theorem, is a cornerstone principle in distributed system design, addressing three critical characteristics: consistency, availability, and partition tolerance.
Consistency: Ensures that all clients see the same data simultaneously, regardless of the node they connect to. Every data write is immediately propagated across all nodes, meaning you always read the latest data or newer.
Availability: Guarantees the system continues to operate and respond to client requests, even if some nodes are down. Every request receives a valid response, whether it's a read or write operation.
Partition Tolerance: Allows the system to function despite network “partitions” (i.e. communication breakdowns between nodes).
When designing distributed systems, teams often aim to achieve both consistency and availability. In *theory* it should be possible.
Especially when visualized using a Venn diagram where “C” and “A” overlap, giving the impression that this is indeed a feasible option.
In practice, achieving both availability and consistency simultaneously is only possible in synchronous models on a perfect network, which is unrealistic. Thus, partition tolerance (P) is a constant.
The CAP theorem states that given P, you must choose either A (availability) or C (consistency). This means during network partitions, you either keep accepting writes (and, potentially, corrupt data) or stop accepting writes to maintain data integrity.
This is a critical insight of the CAP theorem: partition tolerance is non-negotiable in any distributed system—since network failures are inevitable.
Software developers must strategically choose between consistency and availability when partitions occur, carefully evaluating the tradeoffs based on their specific use case.
For example, a banking app prioritizes consistency to avoid account discrepancies, even if it means temporary service denial during a partition. Conversely, a social network like X (formerly Twitter) prioritizes availability to ensure continuous user interaction, even if some data might be outdated.
Understanding and applying the CAP theorem involves practical decision-making aligned with business goals. System architects must evaluate their system’s requirements and constraints to strike the right balance between these properties, fundamentally impacting system architecture and reliability.
Despite numerous discussions and mathematical proofs validating the CAP theorem, it can be rigid in practice. Not all network partitions and consistency levels are equal. Two common models adding flexibility to the CAP theorem are the Yield/Harvest models and PACELC.
I originally wrote about this topic in this article:
I explored these topics:
Foundations of system analysis and design
System analysis phases and steps
System analysis and design patterns
Architecture evaluation
The CAP theorem
Best practices in system analysis
📚 Interesting Articles & Resources
“An Illustrated Proof of the CAP Theorem” - Michael Whittaker
A summary of Gilbert and Lynch's specification and proof of the CAP Theorem with pictures.
“You Can’t Sacrifice Partition Tolerance” - Coda Hale
Any distributed databases that describe themselves as being “CA” means that they do not understand the CAP theorem and its implications.
“A Distributed Systems Reading List” - Fred Hebert
A collection of useful resources and definitions about distributed systems, to serve as a quick reference to understand this space and the possibilities around it.