Customise Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorised as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyse the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customised advertisements based on the pages you visited previously and to analyse the effectiveness of the ad campaigns.

No cookies to display.

Svenska

Kafka Connect: From Postgres to Amazon S3 – Part 1

A pipeline that moves data from a source to a sink can be created using Kafka Connect, Postgres (Source), and Amazon S3 (Sink).

Introduction

I am going to demonstrate how to use Kafka connect to build an E2E pipeline using Postgres as the source connector and S3 Bucket as the sink connector

Part 1(Current Page) : Producer Postgres records to Kafka

Part 2 : Consumer Kafka topic records to s3 bucket

Pipeline Architecture

  • Stream Flow

What is Kafka Connect?

Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka® and other data systems. It makes it simple to quickly define connectors that move large data sets in and out of Kafka.

Kafka Connect can ingest entire databases or collect metrics from all your application servers into Kafka topics, making the data available for stream processing with low latency.

  • Stream Flow Using Kafka Connect
  1. Source Connector: Jdbc source plugin to fetch data from Postgres database and send it into Kafka topic
  2. Kafka Brokers /Topics: Hold messages in topics
  3. Sink Connector: S3 Sink plugin to fetch data from topic

Technical Implementation

  1. Install Confluent: we are going to use a confluent local instance
  2. Setup Postgres Source
  3. Setup S3 Sink

Step 1: Install Confluent

Install confluent community edition on Linux WSL (windows)

. Download Confluent

curl -O https://packages.confluent.io/archive/7.5/confluent-community-7.5.1.tar.gz
Download Confluent Comunity
.File Downloaded

. Extract download content

Extract the contents of the archive
  • Configure CONFLUENT_HOME
nano ~/.bashrc
# you must run this command on terminal to update env variable
$ source ~/.bashrc
# add those lines to file ~/.bashrc
export CONFLUENT_HOME=your path/confluent/confluent-7.5.1
export PATH=$PATH:$CONFLUENT_HOME/bin

. Start Confluent Locally

confluent local services start
  • install confluent-hub client command
wget https://client.hub.confluent.io/confluent-hub-client-latest.tar.gz
export PATH=$PATH:$CONFLUENT_HOME/bin:your_confluent_hub_path/confluent-hub/bin
confluent-hub

Step 2: Setup Postgres Source plugin

Step 3: Download and Install the JDBC source plugin

confluent-hub install confluentinc/kafka-connect-jdbc:10.7.4

Step 4: Configure Postgres source: create a file with the name source-quickstart-postgres.properties and add below content to it.

name=source-postgres-jdbc
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
tasks.max=1
connection.url=jdbc:postgresql://localhost:5432/dvdrental??currentSchema=public

mode=bulk


topic.prefix=postgres-jdbc-source-

table.whitelist=city

poll.interval.ms=120000
connection.user=postgres
connection.password=postgres

Step 5: Run the connector

confluent local services connect connector load source-postgres-jdbc --config source-quickstart-postgres.properties

Step 6: Validate Output

validate by checking topic creation

.validate by consuming messages with the local consumer to verify data is loaded

kafka-avro-console-consumer \
--bootstrap-server localhost:9092 \
--property schema.registry.url="http://localhost:8081" \
--topic postgres-jdbc-source-city

Conclusion

Part 1 — Deploy Postgres as source connector

We have successfully completed phase one (1) of our pipeline, setting up and executing the Postgres source connector. However, we have only completed part 1 of the process. To finish the pipeline, please proceed to part 2, which involves setting up and configuring the S3 sink.

Please feel free to share your feedback or ask any questions you may have. I am happy to assist you.

References

Kafka Connect | Confluent Documentation
Kafka Connect, an open source component of Apache Kafka, is a framework for connecting Kafka with external systems such…docs.confluent.io

https://docs.confluent.io/confluent-cli/current/command-reference/local/services/connect/connector/index.html#confluent-local-services-connect-connector

Share to

Latest Topic

Authors

Arda Cetinkaya

Wael Abdullah

Islam Ibrahim

Sasha Zezulinsky

Essam Ammar

Moemen Elzeiny

Wageeh Mankaryos

Blog stats

Loading

Follow SwedQ