Event-Driven Architecture on AWS, Part III: The Hard Basics

My previous post in this series discussed the reliability issues many messaging-based systems suffer from and how to address them by implementing the outbox pattern. This post switches the focus from publishing to processing messages published by other components of the system.

As before, I will start by defining the problem we have to tackle.

The Problem

Let’s revisit the outbox pattern once more. It addresses the common problem of publishing messages after the underlying business transaction has already been committed: the events might not get published at all. By implementing the pattern, you ensure that all the events will be published—at least once. Let me explain the last part.

The outbox pattern Figure 1: The outbox pattern

[Read More]

Event-Driven Architecture on AWS, Part II: The Advanced Basics

In my previous post in the series, I discussed the basic building blocks for implementing event-driven architecture (EDA) using AWS managed services. This post is about the advanced basics. It’s advanced because, based on my consulting experience, very few companies apply the practices I want to discuss. However, I still consider this as basics because these practices are not optional but essential for implementing reliable messaging-based systems.

Although the examples in this post use AWS services, the material applies to any system, regardless of the infrastructure it runs on. The implementation might differ, but the underlying principles remain the same.

Since this post is about a pattern—and patterns, by definition, are repeatable solutions to commonly occurring problems—I want to begin by discussing a commonly overlooked problem in EDA-based systems.

The Problem

In the previous post, I suggested using AWS SNS for publishing messages in an event-driven system. Here’s a common example of how such publishing is carried out:

...

sns.publish(
    TopicArn=users_topic_arn,
    Message=json.dumps({
        'event_type': 'user_registered',
        'event_id': str(uuid.uuid4()),
        'user_id': 'USER12345',
        'name': 'John Doe',
        'email': 'john.doe@example.com',
        'source': 'mobile_app',
        'registration_date': '2024-10-11T20:01:00Z'
    })
)

...

In the above example, an event of type user_registered is published to an SNS topic. But did that event come from thin air? Is that all services do, just publish messages? Of course not. The event is part of a larger business process that typically involves updating some state in an operational database before notifying external components about it. A more accurate representation of this process would look like this:

[Read More]

Event-Driven Architecture on AWS, Part I: The Basics

The abundance of services provided by AWS often makes it possible to implement the same functionality in different ways. In the case of messaging systems, AWS offers services such as Simple Notification Service (SNS), Simple Queue Service (SQS), EventBridge, Kinesis, and Managed Streaming for Apache Kafka (MSK). It may seem that at least a subset of these services duplicates the same functionality. In this post, I want to describe my go-to architecture and explain why, in my opinion, it is the simplest, most cost-effective, and robust solution for the majority of cases.

Event-Driven Architecture

Components of an event-driven system communicate by publishing and subscribing to events. The asynchronous integration offers significant non-functional advantages. For example, it decouples the integrated components’ lifecycles. To a certain degree, the system can keep functioning even in the face of unavailability of some of its services, thus also reducing the coordination overhead needed to deploy updated components, and to evolve the system.

A basic event-driven integration is comprised of two parts: a component of the system can publish events describing important occurrences in its lifecycle, and it can react (subscribe) to events published by other parts of the system. Let’s see what managed services we can leverage for implementing this.

[Read More]