Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Real-time Data Processing Pipeline With MongoDB, Kafka, Debezium And RisingWave

Posted on Jul 19 • Originally published at Medium Today, the demand for real-time data processing and analytics is higher than ever before. The modern data ecosystem requires tools and technologies that can not only capture, store, and process vast amounts of data but also it should deliver insights in real-time. This article will cover the powerful combination of MongoDB, Kafka, Debezium, and RisingWave to analyze real-time data, how they work together, and the benefits of using these open-source technologies.Before we dive into the implementation details, it's important to understand what these tools are and what they do.Once we have knowledge about each tool, let’s discuss how MongoDB, Kafka, Debezium, and RisingWave can work together to create an efficient real-time data analysis pipeline. These technologies are free to use and easy to integrate with each other.This pipeline's key strengths are its ability to handle large volumes of data, process events in real time, and produce insights with minimal latency. For example, this solution can be used for building a global hotel search platform to get real-time updates on hotel rates and availability. When rates or availability change in one of the platform's primary databases, Debezium captures this change and streams it to Kafka, and RisingWave can do trend analysis. This ensures that users always see the most current information when they search for hotels.This guide shows you how to configure technically the MongoDB Debezium Connector to send data from MongoDB to Kafka topics and ingest data into RisingWave.After completing this guide, you should understand how to use these tools to create a real-time data processing pipeline, and create a data source and materialized view in RisingWave to analyze data with SQL queries.To complete the steps in this guide, you must download/clone and work on an existing sample project on GitHub. The project uses Docker for convenience and consistency. It provides a containerized development environment that includes the services you need to build the sample data pipeline.To run the project in your local environment, you need the following.The docker-compose file starts the following services in Docker containers:To start the project run simply the following command from the tutorial directoryWhen you start the project, Docker downloads any images it needs to run. You can see the full list of services in docker-compose.yaml file.App.py generates random user data (name, address, and email), and inserts them into MongoDB users collection. Because we configured the Debezium MongoDB connector to point to the MongoDB database and the collection we want to monitor, it captures data in real-time and sinks them to Redpanda into a Kafka topic called dbserver1.random_data.users. Next steps, we consume Kafka events and create a materialized view using RisingWave.To consume the Kafka topic with RisingWave, we first need to set up a data source. In the demo project, Kafka should be defined as the data source. Open a new terminal window and run to connect to RisingWave:As RisingWave is a database, you can directly create a table for the Kafka topic:To normalize user’s data, we create a materialized view in RisingWave:The main benefit of materialized views is that they save the computation needed to perform complex joins, aggregations, or calculations. Instead of running these operations each time data is queried, the results are calculated in advance and stored.Use the SELECT command to query data in the materialized view. Let's see the latest results of the normalized_users materialized view:In response to your query, a possible result set (with random data) might look like:This is a basic setup for using MongoDB, Kafka, Debezium, and RisingWave for a real-time data processing pipeline. The setup can be adjusted based on your specific needs, such as adding more Kafka topics, tracking changes in multiple MongoDB collections, implementing more complex data processing logic, or combining multiple streams in RisingWave. 🙋 Join the Risingwave CommunityTemplates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well Confirm For further actions, you may consider blocking this person and/or reporting abuse brian mk - May 20 Chris Achinga - May 23 Stavro Xhardha - May 22 sasidhar Gadepalli - May 22 Once suspended, bobur will not be able to comment or publish posts until their suspension is removed. Once unsuspended, bobur will be able to comment and publish posts again. Once unpublished, all posts by bobur will become hidden and only accessible to themselves. If bobur is not suspended, they can still re-publish their posts from their dashboard. Note: Once unpublished, this post will become invisible to the public and only accessible to Bobur Umurzokov. They can still re-publish the post if they are not suspended. Thanks for keeping DEV Community safe. Here is what you can do to flag bobur: bobur consistently posts content that violates DEV Community's code of conduct because it is harassing, offensive or spammy. Unflagging bobur will restore default visibility to their posts. DEV Community — A constructive and inclusive social network for software developers. With you every step of your journey. Built on Forem — the open source software that powers DEV and other inclusive communities.Made with love and Ruby on Rails. DEV Community © 2016 - 2023. We're a place where coders share, stay up-to-date and grow their careers.



This post first appeared on VedVyas Articles, please read the originial post: here

Share the post

Real-time Data Processing Pipeline With MongoDB, Kafka, Debezium And RisingWave

×

Subscribe to Vedvyas Articles

Get updates delivered right to your inbox!

Thank you for your subscription

×