Streaming journal data from Amazon QLDB - Amazon Quantum Ledger Database (Amazon QLDB)

Streaming journal data from Amazon QLDB

Amazon QLDB uses an immutable transactional log, known as a journal, for data storage. The journal tracks every change to your committed data and maintains a complete and verifiable history of changes over time.

You can create a stream in QLDB that captures every document revision that is committed to your journal and delivers this data to Amazon Kinesis Data Streams in near-real time. A QLDB stream is a continuous flow of data from your ledger's journal to a Kinesis data stream resource.

Then, you use the Kinesis streaming platform or the Kinesis Client Library to consume your stream, process the data records, and analyze the data contents. A QLDB stream writes your data to Kinesis Data Streams in three types of records: control, block summary, and revision details. For more information, see QLDB stream records in Kinesis.

Common use cases

Streaming lets you use QLDB as a single, verifiable source of truth while integrating your journal data with other services. The following are some of the common use cases supported by QLDB journal streams:

  • Event-driven architecture – Build applications in an event-driven architectural style with decoupled components. For example, a bank can use AWS Lambda functions to implement a notification system that alerts customers when their account balance drops below a threshold. In such a system, the account balances are maintained in a QLDB ledger, and any balance changes are recorded in the journal. The AWS Lambda function can trigger the notification logic upon consuming a balance update event that is committed to the journal and sent to a Kinesis data stream.

  • Real-time analytics – Build Kinesis consumer applications that run real-time analytics on event data. With this capability, you can gain insights in near-real time and respond quickly to a changing business environment. For example, an ecommerce website can analyze product sales data and stop advertisements for a discounted product as soon as sales reach a limit.

  • Historical analytics – Take advantage of the journal-oriented architecture of Amazon QLDB by replaying historical event data. You can choose to start a QLDB stream as of any point in time in the past, in which all revisions since that time are delivered to Kinesis Data Streams. Using this feature, you can build Kinesis consumer applications that run analytics jobs on historical data. For example, an ecommerce website can run analytics as needed to generate past sales metrics that were not previously captured.

  • Replication to purpose-built databases – Connect QLDB ledgers to other purpose-built data stores using QLDB journal streams. For example, use the Kinesis streaming data platform to integrate with Amazon OpenSearch Service, which can provide full text search capabilities for QLDB documents. You can also build custom Kinesis consumer applications to replicate your journal data to other purpose-built databases that provide different materialized views. For example, replicate to Amazon Aurora for relational data or to Amazon Neptune for graph-based data.

Consuming your stream

Use Kinesis Data Streams to continuously consume, process, and analyze large streams of data records. In addition to Kinesis Data Streams, the Kinesis streaming data platform includes Amazon Data Firehose and Amazon Managed Service for Apache Flink. You can use this platform to send data records directly to services such as Amazon OpenSearch Service, Amazon Redshift, Amazon S3, or Splunk. For more information, see Kinesis Data Streams consumers in the Amazon Kinesis Data Streams Developer Guide.

You can also use the Kinesis Client Library (KCL) to build a stream consumer application to process data records in a custom way. The KCL simplifies coding by providing useful abstractions above the low-level Kinesis Data Streams API. To learn more about the KCL, see Using the Kinesis Client Library in the Amazon Kinesis Data Streams Developer Guide.

Delivery guarantee

QLDB streams provide an at-least-once delivery guarantee. Each data record that is produced by a QLDB stream is delivered to Kinesis Data Streams at least once. The same records can appear in a Kinesis data stream multiple times. So you must have deduplication logic in the consumer application layer if your use case requires it.

There are also no ordering guarantees. In some circumstances, QLDB blocks and revisions can be produced in a Kinesis data stream out of order. For more information, see Handling duplicate and out-of-order records.

Delivery latency considerations

QLDB streams typically deliver updates to Kinesis Data Streams in near-real time. However, the following scenarios might create additional latency before newly committed QLDB data is emitted to a Kinesis data stream:

  • Kinesis can throttle data that is streamed from QLDB, depending on your Kinesis Data Streams provisioning. For example, this might occur if you have multiple QLDB streams that write to a single Kinesis data stream, and the request rate of QLDB exceeds the capacity of the Kinesis stream resource. Throttling in Kinesis can also occur when using on-demand provisioning if the throughput grows to more than double the previous peak in less than 15 minutes.

    You can measure this exceeded throughput by monitoring the Kinesis metric WriteProvisionedThroughputExceeded. For more information and possible solutions, see How do I troubleshoot throttling errors in Kinesis Data Streams?.

  • With QLDB streams, you can create an indefinite stream with a start date and time in the past and with no end date and time. By design, QLDB starts emitting newly committed data to Kinesis Data Streams only after all prior data from the specified start date and time is delivered successfully. If you perceive additional latency in this scenario, you might need to wait for the prior data to be delivered, or you can start the stream from a later start date and time.

Getting started with streams

The following is a high-level overview of the steps that are required to get started with streaming journal data to Kinesis Data Streams:

  1. Create a Kinesis Data Streams resource. For instructions, see Creating and updating data streams in the Amazon Kinesis Data Streams Developer Guide.

  2. Create an IAM role that allows QLDB to assume write permissions for the Kinesis data stream. For instructions, see Stream permissions in QLDB.

  3. Create a QLDB journal stream. For instructions, see Creating and managing streams in QLDB.

  4. Consume the Kinesis data stream, as described in the previous section Consuming your stream. For code examples that show how to use the Kinesis Client Library or AWS Lambda, see Developing with streams in QLDB.