Distributed Transport and Streaming Flashcards
What is Kafka
Kafka is a Distributed event based data streaming platform
What is Sqoop and what is is used for
Sqoop is and ingestion application for importing/exporting structured data
What is flume and what is its use case
Flume is and event-based ingestion service used for streaming data. It was originally designed for importing logs to HDFS
What are the four main components of Flume
Sources: Data origin
Sink: Data destination
Channel: Acts as a bridge between source and sink
Agent: Independent daemon process which receives and forwards events to sinks and/or other agents
Explain the 4 main components of Kafka
Producers: Publishes data to a topic in Lafka
Topics: A particular stream of data with a replication factor and a partition size
Consumers: Subscribes to a topic in Kafka
Consumer group: Collection of consumers reading from the same topic
How does Kafka ensure high availability
Kafka is built up as a cluster of multiple instances called brokers, that can distribute reads and writes. Data is also replicated across different brokers ensuring access to data if one broker fails. Replicates are divided into leaders and followers
What is zookeeper
Zookeeper is a Distributed Coordination Service for Distributed Applications. E.g. it helps keep services as Kafka alive
What is ksqldb
Specialized database optimized for stream processing which exposes an SQL-Like interface to handle the data in kafka.
ksqldb runs as its own fault tolerant cluster
What is kafka connect
Kafka Connect is a framework for connecting external systems to Kafka. Kafka connect can be used as a replacement for sqoop and flume.
Explain the 6 main parts of Kafka Connect
Connectors: A connector connects systems to kafka. There are both sources and sinks
Tasks: The implementation of how data is copied to or from Kafka
Workers: A process that executes connectors and tasks
Transforms: Simple logic to alter each message produce or sent by a connector
Dead Letter Queue: How Connect handles Connector Errors