· nervico-team · software-development · 9 min read
Event-Driven Architecture: Patterns and Anti-Patterns
Practical guide to event-driven architecture: fundamental patterns, common anti-patterns, when to use events vs synchronous calls, and how to implement EDA without losing control.
LinkedIn processes more than 5 trillion events per day through its messaging platform. Netflix uses event-driven architecture to coordinate video encoding, recommendations, and content delivery to over 260 million subscribers. Uber processes millions of events per second to coordinate drivers, riders, payments, and maps in real time.
These numbers are impressive, but also misleading. Most projects do not need to process trillions of events. However, event-driven architecture (EDA) solves real problems that appear in systems of any size: decoupling between components, asynchronous processing, and the ability to react to state changes flexibly.
The problem is that poorly implemented EDA can be worse than not having it. Lost events, duplicate processing, impossible-to-debug flows, and operational complexity that far exceeds the benefits.
This guide covers the patterns that work, the anti-patterns to avoid, and a decision framework for knowing when EDA makes sense in your project.
What Event-Driven Architecture Is
Fundamental Concepts
In an event-driven architecture, system components communicate by emitting and reacting to events. An event is a notification that something has happened: a user registered, an order was placed, a payment was processed.
Three basic components:
Producer: The component that emits the event. It publishes the event and does not care about what happens next.
Broker: The infrastructure that transports events from the producer to consumers. Kafka, RabbitMQ, Amazon SNS/SQS, Google Pub/Sub.
Consumer: The component that receives and processes the event. It reacts to the event by executing its own logic.
Key difference from synchronous communication: In a synchronous call (REST, gRPC), the sender waits for a response. In event-based communication, the producer waits for nothing. It emits the event and moves on. The consumer will process it when it can.
Types of Events
Notification events: They inform that something happened but do not contain all the data needed to process it. The consumer needs to make an additional call to get details.
Example: OrderPlaced { orderId: "123" } — The consumer needs to call the orders service to get details.
Event-carried state transfer events: They contain all the information needed for the consumer to process the event without additional calls.
Example: OrderPlaced { orderId: "123", items: [...], total: 299.00, shippingAddress: {...} }
Domain events: They represent significant business facts. They are part of the domain’s ubiquitous language.
The general recommendation: use event-carried state transfer when possible. It reduces coupling and latency.
Fundamental Patterns
Publish-Subscribe
The most basic pattern. A producer publishes an event to a topic, and all consumers subscribed to that topic receive it.
How it works:
- The orders service publishes OrderPlaced to the “orders” topic
- The notifications service (subscribed to the topic) sends an email to the customer
- The inventory service (subscribed to the topic) reserves the products
- The analytics service (subscribed to the topic) records the metric
Advantages:
- Total decoupling. The producer does not know the consumers
- Extensibility: adding a new consumer does not require changing the producer
- Each consumer processes the event at its own pace
Disadvantages:
- No guarantee of processing order between consumers
- Difficult to know if all consumers processed the event correctly
- Can create hidden dependencies that are difficult to trace
Event Sourcing
Instead of storing the current state of an entity, you store the sequence of events that brought it to that state.
Example: Instead of storing “the account balance is 1,500 euros,” you store:
AccountCreated { balance: 0 }DepositMade { amount: 2,000 }WithdrawalMade { amount: 500 }
The current state is calculated by replaying all events.
When it makes sense:
- Financial systems where you need a complete audit trail
- Systems where you need to reconstruct state at any point in time
- Collaboration systems where multiple users modify the same data concurrently
When it does not make sense:
- Simple CRUDs where current state is all you need
- Systems with complex query requirements (queries on event stores are difficult)
- Teams without event sourcing experience (the learning curve is significant)
Real complexity: Event sourcing adds considerable complexity. You need to handle event versioning, projections for queries, snapshots for performance, and a mental model different from traditional development. Do not implement it unless you have clear reasons.
CQRS (Command Query Responsibility Segregation)
Separates write operations (commands) from read operations (queries) into different models.
How it works:
- Write model: optimized for processing commands and maintaining consistency
- Read model: optimized for answering queries quickly
- Changes in the write model propagate to the read model through events
Practical example:
- When a user places an order, the command is processed in a normalized database (PostgreSQL)
- An OrderPlaced event updates a materialized view in Elasticsearch, optimized for searches and listings
- Frontend queries go directly to Elasticsearch
Advantages:
- Each model is optimized for its purpose
- You can scale reads and writes independently
- Allows multiple representations of the same data
Disadvantages:
- Eventual consistency between write and read models
- Data duplication
- Significant operational complexity
- More infrastructure to maintain
Golden rule: CQRS is appropriate when read and write patterns are fundamentally different in volume, complexity, or performance requirements. If your reads and writes are similar, CQRS is over-engineering.
Saga Pattern
Sagas coordinate distributed transactions across multiple services using events.
Example: A purchase process involves:
- Reserve inventory
- Process payment
- Confirm shipping
If payment fails after reserving inventory, you need to compensate: release the inventory reservation.
Two approaches:
Choreography: Each service listens for events and decides what to do. There is no central coordinator. The inventory service emits InventoryReserved, the payment service listens and emits PaymentProcessed or PaymentFailed, and the inventory service listens for PaymentFailed to release the reservation.
Orchestration: A coordinator service (saga orchestrator) directs the flow. It sends commands to each service and decides what to do based on responses.
Choreography vs orchestration:
| Aspect | Choreography | Orchestration |
|---|---|---|
| Complexity | Distributed, hard to follow | Centralized, easier to understand |
| Coupling | Low | Medium (orchestrator knows everyone) |
| Failure points | Multiple | Centralized |
| Best for | Simple flows (2-3 steps) | Complex flows (4+ steps) |
Outbox Pattern
Guarantees that an event is published if and only if the database transaction completes. Solves the dual write problem: how to update the database AND publish an event atomically.
The problem: If you update the database and then publish the event, it can happen that the database gets updated but the event is not published (if the broker is down). Or vice versa.
The solution:
- In the same database transaction, write the data AND a record in an “outbox” table
- A separate process (CDC with Debezium, or a poller) reads the outbox table and publishes events
- Once published, it marks the record as processed
When to use it: Whenever you need to guarantee that a state change in the database is reliably communicated to other services.
Anti-Patterns You Must Avoid
Event Soup
Publishing events for everything without a clear design. The result is a system where hundreds of events flow in all directions and nobody understands the relationship between them.
Warning signs:
- You cannot draw the event flow between services in 5 minutes
- Event names do not follow a clear convention
- Multiple services react to the same event in contradictory ways
- Nobody knows what would happen if an event were removed
Solution: Document each event: who produces it, who consumes it, what data it carries, and why it exists. If you cannot justify an event’s existence, eliminate it.
Event-Driven CRUD
Using events for operations that would be simpler as direct synchronous calls. If service A needs to create a record in service B and wait for confirmation, a REST call is simpler and more reliable than publishing a CreateRecord event and waiting for a RecordCreated event.
Practical rule: If the producer needs the consumer’s response to continue, do not use events. Use a synchronous call.
Events as API
Designing events thinking about specific consumers rather than the domain. This creates subtle coupling: if a consumer needs a new field, it gets added to the event, which forces all other consumers to handle the change.
Solution: Events represent domain facts, not consumer needs. If a consumer needs additional data, it can combine multiple events or make a direct call to obtain it.
Lack of Idempotency
Not handling the possibility of receiving the same event twice. In distributed systems, exactly-once delivery is practically impossible. You need idempotent design: processing the same event twice must produce the same result as processing it once.
Practical implementation:
- Each event has a unique ID
- The consumer stores IDs of processed events
- Before processing an event, it checks if it was already processed
- If already processed, it silently discards it
Excessively Large Events
Including all possible data in every event “just in case.” Events of 50KB or more with data that most consumers do not need.
Problem: More bandwidth, more storage, more serialization/deserialization time, and frequent event schema changes whenever any data changes.
Solution: Include only data that is natural for the event. Data that represents the fact that occurred, not data that some consumer might need.
Infrastructure: Which Technology to Use
Apache Kafka
Ideal for: High event volume, need for replay, multiple consumers processing at different rates.
Not ideal for: Small projects, simple point-to-point messaging, when sub-millisecond latency is critical.
Operational complexity: High. Kafka requires specialized knowledge to operate correctly.
RabbitMQ
Ideal for: Traditional messaging with complex routing patterns, work queues, point-to-point communication.
Not ideal for: High event volume with replay needs, when multiple consumers need to process the same events.
Operational complexity: Medium. Simpler than Kafka but still requires attention.
Amazon SNS/SQS
Ideal for: AWS projects that need simple pub/sub without managing infrastructure.
Not ideal for: When you need event replay or stream processing.
Operational complexity: Low. Managed service.
Decision Table
| Requirement | Kafka | RabbitMQ | SNS/SQS |
|---|---|---|---|
| High volume | Excellent | Good | Good |
| Event replay | Yes | Not native | No |
| Multiple consumers | Excellent | Good | Good |
| Simple operations | Complex | Medium | Simple |
| Latency | Low (ms) | Very low (sub-ms) | Medium |
| Operational cost | High | Medium | Low |
When to Use (and When Not to) Event-Driven Architecture
Use EDA When
- You need to decouple services that evolve at different rates
- You have processes involving multiple services that can run asynchronously
- You need replay capability (reconstructing state from past events)
- Multiple consumers need to react to the same business fact
- You need to scale producers and consumers independently
Do Not Use EDA When
- Communication is inherently synchronous (the user is waiting for a response)
- The flow is simple and linear (A calls B, B responds)
- Your team has no experience with distributed systems
- Event volume is low and there is no need for decoupling
- Strong consistency is a non-negotiable requirement
Observability in Event-Driven Systems
The Traceability Problem
In a synchronous system, you can follow a request from start to finish with a trace ID. In an event-driven system, an event can trigger a chain of events that branches into multiple flows.
Solution: Correlation IDs. Each event carries a correlation ID that propagates to all derived events. Tools like Jaeger, Zipkin, or AWS X-Ray can visualize the complete chain.
Critical Monitoring
- Consumer lag: How many events are pending processing. If it grows, the consumer cannot keep up.
- Failed events: Events that could not be processed. They need dead letter queues and alerts.
- End-to-end latency: Time from when an event is produced until all consumers process it.
- Throughput: Events processed per second. You need to know the system’s maximum capacity.
Conclusion
Event-driven architecture is a powerful tool for building decoupled, scalable, and flexible systems. But its complexity is not trivial, and applying it where it is not necessary adds cost without benefit.
Three principles for using EDA practically:
- Start synchronous, evolve to asynchronous. Direct calls are simpler to implement, debug, and monitor. Introduce events when you have clear reasons.
- Design for failures. Events can be lost, duplicated, or arrive out of order. Your system must handle all these cases.
- Observability from day one. Without traceability, an event-driven system becomes an impossible-to-debug black box.
If you need help designing the architecture for your custom software projects, at NERVICO we have experience implementing event-driven systems that work in production, not just in diagrams.