Recently I looked through Lamport’s Paxos papers again and in retrospective, I feel Paxos as an algorithm is very elegant and easy to understand. In this post I’ll try to summary the core concepts and key steps of the single decree Paxos protocol. In future posts, I’ll write more about multi-paxos, and compare multi-paxos with similar protocols such as ZAB and Raft.
Terminology
This post will use the usual terminologies for the protocol entities from the Paxos Made Simple paper, such as Proposer, Acceptor, Learner, Leader, etc.
The Problem
Paxos is used to solve consensus problem. A consensus is about reaching agreement on a single value, among a set of values. Each value in the set is proposed by a proposer.
Assumptions / Use Case Constraints
Paxos can solve consensus problem under a set of constraints, and these constraints happen to reflect real world use case constraints for distributed systems, thus it’s very practical and applicable. In particular:
- Fail-Recovery: each participant can fail. Failed participants can recover.
- Asynchronous: participants communicate with each other through async messages.
- None Byzantine Faults: These messages can be delayed, or lost, but will never be forged.
It’s also worth noting that to reach consensus, a set of additional protocol level constraints have to be exposed on participants. One example is each participant can’t just insist on the value from its own, because if everyone does that, it’s possible no consensus could ever be made. In other words, participants are more interested on reach consensus on something, but they are not interested on what exactly the value of that agreement is.
Safety and Liveness
Because of FLP impossibility, it’s not possible to reach consensus with the constraints Paxos protocol is exposed to, yet maintain both safety and liveness properties. Paxos chooses safety over liveness.
The safety properties Paxos guarantees:
- Only a value that has been proposed may be chosen,
- Only a single value is chosen, and
- A process never learns that a value has been chosen unless it actually has been.
Paxos does not guarantee liveness, in other words it does not guarantee making progress, or terminate.
For making progress, it’s possible to have live lock (which can be fixed easily through backoffs and randomization).
For termination, it could never terminate if not enough number of participants (the quorum) are live.
Protocol Constraints
I feel it’s easier to understand the protocol by understanding the protocol constraints exposed to participants. There are two key constraints, one on the proposer, one on the acceptor.
Proposer: a proposer must be willing to change the value it proposes, by using the value it acquires from the prepare phase. A proposer can propose any value though, if in prepare phase the queries on a quorum of acceptors returns null value (meaning those acceptors haven’t accepted anything.).
Acceptor: an acceptor must be willing to reject the proposals it receives, unless the proposal it receives has a proposal number that’s equal or greater than the maximum of all proposal numbers that the acceptor ever see during the prepare phase. In other words, acceptor has to make a promise upon receiving a proposal with a bigger proposal number than it ever sees, and the promise is to not accept any future proposals with smaller proposal number.
Phases
Paxos is a two phased protocol. There are prepare phase, and accept phase. Prepare phase is used for proposer to get an updated view of the quorum and decide what to propose (e.g. update the proposal value from the response of the acceptor in prepare request if the proposal was rejected due to a smaller proposer number). Accept phase is to finish the protocol by having acceptors record the consensus and responds to proposer.
To Be Continued…