Kafka Connect Flashcards
What is use of Kafka Connect?
Connect will be used to pull data from the external store to Kafka or push data from Kafka to external store.
It provides scalable and reliable way to move data between Kafka and other datastores.
How many types of kafka connect are there?
Two types: Source connect and Sink connect
What is Kafka connect at Source?
Connect which takes input from external store and pushes it to Kafka is called Kafka Connect at source.
What is Kafka connect at Sink?
Connect which pulls data from Kafka and pushes to external store.
Features of Kafka Connect
1) Distributed and standalone modes
2) Common framework for Kafka connectors
3) Distributed and scalable by default
4) REST interface
5) Streaming and batch integration
6) Automatic offset management
Which are some of already available Kafka source?
1) JDBC Source
2) Syslog source
3) MongoDB source
4) Cassandra Source
etc
Which are some of already avaialable Kafka Sink?
1) HDFS Sink
2) HBase Sink
3) S3 Sink
4) Elastic Search Sink
5) Cassandra Sink
Command to start Connect worker?
For distributed: sh connect-distributed.sh config/connect-distributed.properties
For standalone: sh connect-standalone.sh config/connect-standalone.properties
Which are mandatory Connect properties that need to be provided?
1) broker.list
2) group.id
3) key.converter
4) value.converter
Which are the two ways to build data pipeline?
1) ETL - Extract Transform Load
Data pipeline is responsible for making modifications to the data as it flows through the pipeline
Saves time and storage because we don’t need to store the data modify it and store it again
But sometimes shifts the burden of computation and storage to the data pipeline itself.
2) ELT - Extract Load Transform
Data pipeline does only minimal transformation (also called high fidelity pipelines or data lake architecture)
The system provides maximum flexibility to users since they have access to all the data
Drawback is transformation take CPU and storage resources at the target system