Distributed transactions — two-phase commit and sagas explained
distributed transaction problem, two-phase commit, 2PC coordinator failure, saga pattern, choreography vs orchestration, compensating transactions
The Distributed Transaction Problem
In a microservices system, an operation that spans multiple services (place order: charge payment, reserve inventory, schedule delivery) must either fully succeed or fully fail. Traditional database transactions don't span service boundaries.
Two-Phase Commit (2PC)
- Phase 1 (Prepare): coordinator asks all participants if they can commit. Each participant locks resources and responds yes/no.
- Phase 2 (Commit): if all said yes, coordinator sends commit. Otherwise, sends rollback.
Problem: if the coordinator crashes after Phase 1, participants are stuck holding locks indefinitely (blocking protocol). High latency. Rarely used in modern microservices.
Saga Pattern
Break the transaction into a sequence of local transactions. Each step publishes an event triggering the next. On failure, execute compensating transactions in reverse.
# Saga for order placement
# Step 1: charge payment
payment_service.charge(order) # publishes: PaymentCharged
# Step 2: reserve inventory (triggered by PaymentCharged)
inventory_service.reserve(order) # publishes: InventoryReserved
# On failure at step 2: compensating transaction
def on_inventory_failed(order):
payment_service.refund(order) # compensates step 1Sagas are eventually consistent — there's a window where payment is charged but inventory isn't reserved yet. Design compensating transactions carefully.
