Kafka Connect Flashcards

Question 1

Q

What is use of Kafka Connect?

Answer

A

Connect will be used to pull data from the external store to Kafka or push data from Kafka to external store.
It provides scalable and reliable way to move data between Kafka and other datastores.

Question 2

Q

How many types of kafka connect are there?

Answer

A

Two types: Source connect and Sink connect

Question 3

Q

What is Kafka connect at Source?

Answer

A

Connect which takes input from external store and pushes it to Kafka is called Kafka Connect at source.

Question 4

Q

What is Kafka connect at Sink?

Answer

A

Connect which pulls data from Kafka and pushes to external store.

Question 5

Q

Features of Kafka Connect

Answer

A

1) Distributed and standalone modes
2) Common framework for Kafka connectors
3) Distributed and scalable by default
4) REST interface
5) Streaming and batch integration
6) Automatic offset management

Question 6

Q

Which are some of already available Kafka source?

Answer

A

1) JDBC Source
2) Syslog source
3) MongoDB source
4) Cassandra Source
etc

Question 7

Q

Which are some of already avaialable Kafka Sink?

Answer

A

1) HDFS Sink
2) HBase Sink
3) S3 Sink
4) Elastic Search Sink
5) Cassandra Sink

Question 8

Q

Command to start Connect worker?

Answer

A

For distributed: sh connect-distributed.sh config/connect-distributed.properties

For standalone: sh connect-standalone.sh config/connect-standalone.properties

Question 9

Q

Which are mandatory Connect properties that need to be provided?

Answer

A

1) broker.list
2) group.id
3) key.converter
4) value.converter

Question 10

Q

Which are the two ways to build data pipeline?

Answer

A

1) ETL - Extract Transform Load
Data pipeline is responsible for making modifications to the data as it flows through the pipeline
Saves time and storage because we don’t need to store the data modify it and store it again
But sometimes shifts the burden of computation and storage to the data pipeline itself.

2) ELT - Extract Load Transform
Data pipeline does only minimal transformation (also called high fidelity pipelines or data lake architecture)

The system provides maximum flexibility to users since they have access to all the data

Drawback is transformation take CPU and storage resources at the target system

Kafka Connect Flashcards

(10 cards)