Denormalize and Sync Aurora PG CDC to OpenSearch

102 views Asked by At

I have enabled a CDC replication process that takes several tables from my PG RDBMS and needs to be synchronised to OpenSearch for efficient querying. This is working this way:

AWS Aurora Postgres -> AWS DMS -> Kinesis DataStream -> Lambda -> OpenSearch.

The indexes are all in OpenSearch.

The problem with this approach is that for queries I will need to perform joins. I'd love to avoid that, an instead, be able to build a denormalized searchable document before getting into OpenSearch, but I am working with Streams and I receive changes from different tables in different events, so I am not sure how to achieve that in real-time.

What's the right way to solve this? I've used ksqldb in the past and there I was able to join data streams and build the data I need using the middleware db.

Thanks.

1

There are 1 answers

0
Hugo Marcelo Del Negro On

I was finally able to solve this by changing a bit the flow:

AWS Aurora Postgres -> AWS DMS -> Kinesis DataStream -> Kinesis Data Analytics (Flink SQL) -> Lambda -> OpenSearch.

This flow is very similar (almost the same) than when I used KSQLDB.

Another way can be:

AWS Aurora Postgres -> AWS DMS -> DynamoDB -> DynamoDB Stream -> Lambda -> Build Searchable Document -> OpenSearch.

If anyone has a better idea, please add your comments.