Event-Driven Architecture on AWS, Part I: The Basics

The abundance of services provided by AWS often makes it possible to implement the same functionality in different ways. In the case of messaging systems, AWS offers services such as Simple Notification Service (SNS), Simple Queue Service (SQS), EventBridge, Kinesis, and Managed Streaming for Apache Kafka (MSK). It may seem that at least a subset of these services duplicates the same functionality. In this post, I want to describe my go-to architecture and explain why, in my opinion, it is the simplest, most cost-effective, and robust solution for the majority of cases.

Event-Driven Architecture

Components of an event-driven system communicate by publishing and subscribing to events. The asynchronous integration offers significant non-functional advantages. For example, it decouples the integrated components’ lifecycles. To a certain degree, the system can keep functioning even in the face of unavailability of some of its services, thus also reducing the coordination overhead needed to deploy updated components, and to evolve the system.

A basic event-driven integration is comprised of two parts: a component of the system can publish events describing important occurrences in its lifecycle, and it can react (subscribe) to events published by other parts of the system. Let’s see what managed services we can leverage for implementing this.

Publishing: SNS

AWS Simple Notification Service (SNS) is a fully managed service that allows you to send notifications to be consumed by other components of the system. Its serverless nature and low pricing make SNS a great candidate for publishing events. Figure 1 illustrates a service called “Producer” publishing its events through an SNS topic.

The “Producer” service publishes events to an SNS topic

Figure 1: The “Producer” service publishes events to an SNS topic.

But what about the “Consumer” service on the right-hand side of the figure? How can it subscribe to the messages published by the Producer?

SNS supports the delivery of messages through multiple protocols, such as HTTP, triggering AWS Lambda functions, SMS, and others. On paper, it may seem plausible for the Consumer to expose an HTTP endpoint and use it as a destination for the SNS topic, as illustrated in Figure 2.

SNS topic can forward messages to HTTP endpoints. But is it the optimal solution? Figure 2: SNS topic can forward messages to HTTP endpoints. But is it the optimal solution?

Is it an optimal solution, though? Assuming that the two services, Producer and Consumer, belong to different teams, who is in charge of the SNS topic? The Producer team needs to ensure the topic is configured correctly to receive its messages, while the teams in charge of the consuming services need to ensure that the destinations are always correct. Such shared ownership is a recipe for friction. Let’s consider a different option.

Subscribing: SQS

AWS SQS is a fully managed event queue that temporarily holds messages (events) generated by producers until they are processed by consumers. It allows for distributed event handling with load balancing across multiple consumers, crucial for maintaining fault tolerance in event-driven workflows. In my opinion, it also provides much greater visibility into the queues than SNS topics, as well as convenient control over how messages are delivered to consumers.

Since SQS is one of the destinations supported by SNS topics, let’s set up a queue for messages to be processed by the “Consumer” services, as illustrated in Figure 3.

Using SQS as an event-consuming mechanism Figure 3: Using SQS as an event-consuming mechanism.

With this setup, you have clear ownership boundaries:

  • The SNS topic used for publishing messages belongs to the team in charge of the originating service (Producer).
  • The SQS queue used for consuming messages belongs to the team in charge of the subscribing service (Consumer).

The separation of concerns between the two services has to be reflected in the system’s architecture.

SNS + SQS: Simple EDA

It is widely accepted that a microservice should enable access to its data through a well-defined public interface, while its database is considered an implementation detail and should be hidden from consumers. Such strict encapsulation of the persistence mechanism enables clearer ownership boundaries, more flexibility to evolve microservices, and much greater control over public interfaces.

The message bus used by the system is just another persistence mechanism — albeit a much more limited one — and should be treated as such. As a result, in addition to a database, each (micro)service needs an SNS topic for publishing events and an SQS queue for consuming events, as illustrated in Figure 4:

An EDA requires defining clear ownership boundaries not only for databases but also for their messaging mechanisms (SNS and SQS) Figure 4: An EDA requires defining clear ownership boundaries not only for databases but also for their messaging mechanisms (SNS and SQS).

The arrows between the services — the subscriptions from topics to queues — belong to a higher architectural level of abstraction than the services themselves. For example, there might be a CloudFormation template for each individual service and a higher-level CloudFormation template for the resultant system. The latter one is in charge of defining the subscriptions.

It’s worth mentioning that a subscription doesn’t mean that all of the published events are blindly dumped on the consumers; a subscription can specify a filter to forward only the events relevant to each consumer.

This approach aligns with the smart endpoints, dumb pipes principle, which is essential for the simplicity and flexibility of distributed systems. According to the principle, the intelligence — the logic — should reside in the services themselves (endpoints), not in the infrastructural components used for integration (pipes). The pipes — messaging infrastructure and communication channels — should only be in charge of reliably transporting data between services. The goal is to reduce dependencies between services, allowing for easier scaling, debugging, and faster development while avoiding the bottlenecks and complexities typically associated with sophisticated middleware.

That makes it time to talk about other messaging options available on AWS.

Alternative Message Delivery Services

As I mentioned in the introduction, there are many other AWS-managed solutions related to messaging. Here I want to briefly address other options and why I think the solution described above is a better fit in 80% of cases.

EventBridge

The first “S” in SNS and SQS stands for “simple,” and it’s there for a reason. SNS and SQS are simple services. EventBridge is much more flexible in message filtering and routing rules. In my opinion, it is closer to the concept of an enterprise message bus from the SOA days. Instead of dumb pipes, you get a central point for receiving and routing events across all components of the system, and even across multiple systems. EventBridge, of course, has its use cases — for example, if you need to integrate with third-party systems.

Kinesis and MSK (Managed Kafka)

Both Kinesis and MSK are services for working with streaming data. It can be said that streaming data is a subset of event-driven architecture. Both involve working with messages that are published and consumed asynchronously. However, the usage pattern is different: while traditional EDA focuses on individual events or messages, working with streaming data entails processing continuous flows of related events, which may not be as efficiently handled by traditional message bus solutions. Hence, tools like Kinesis and MSK exist. If you do not need to process continuous streams of messages, simpler tools like SNS and SQS will result in a more straightforward system.

Summary

This blog post discussed implementing event-driven architecture (EDA) on AWS using SNS and SQS for publishing and consuming events. You learned how SNS and SQS together form a simple, cost-effective solution with clear ownership boundaries, providing flexibility and fault tolerance.

The next post in the series discusses the event publishing aspect of EDA in greater detail and introduces practices that will help you avoid common pitfalls in this seemingly simple process.

Posts In The Series

  1. Event-Driven Architecture on AWS, Part I: The Basics (Current Post)
  2. Event-Driven Architecture on AWS, Part II: The Advanced Basics
  3. Event-Driven Architecture on AWS, Part III: The Hard Basics

If you liked this post, please share it with your friends and colleagues:

comments powered by Disqus