Practice Exam 1 Flashcards
What client protocol(s) is/are supported for the schema registry?
HTTP and HTTPS.
Clients can interact with the schema registry using the HTTP or HTTPS interface.
What is the risk of increasing max.in.flight.requests.per.connection while also enabling retries in a producer?
Message order is not preserved
Some messages may require multiple retries. If more than 1 request is in flight, it may result in messages being received out of order.
** NOTE **
An exception to this rule is if you enable the producer setting, “enable.idempotence=true”, which takes care of the out of order case on its own, as well as prevents duplicates.
We would like to be in an “at-most once” consuming scenario.
Which offset commit strategy would you recommend?
Commit the offsets in Kafka before processing the data.
Explanation: Here, we must commit the offsets right after receiving a batch from a call to “.poll()” and before processing the data.
In Avro, what type of schema evolution would it be if we are adding a field to a record without default?
Forward Schema Evolution.
Explanation: Clients with old schema will be able to read records saved with new schema.
Is “min.insync.replicas” a producer setting? (T/F)
No, “min.insync.replicas” is a topic or broker setting, and is only effective when acks=all
Is the “acks” (aka acknowledgements) setting a producer setting? (T/F)
Yes, the acks setting is a producer setting / client configuration. It denotes the number of brokers that ust receive the record before we consider the “write” as successful. It supports three values - 0, 1, and all.
“acks=0” no acknowledgements needed. So, the producer will not wait for a response from broker.
“acks=1” - The producer will consider the “write” successful when the leader receives the record. The leader broker will know to immediately respond the moment it receives the record and not wait any longer.
“acks=all” - The producer will consider the “write” successful when all of the
In-sync replicas receive the record. This is achieved by the leader broker sending a response back once all the in-sync replicas receive the record themselves.
And, when “acks=all”,
I am producing Avro data on my Kafka cluster that is integrated with the Confluent Schema Registry. After a schema change that is incompatible, I know my data will be rejected. Which component will reject the data?
The Confluent Schema Registry is the component that will reject the data. It is your safeguard against incompatible schema changes and will be the component that ensure no breaking schema evolution will be possible.
Note: Kafka brokers do not look at your payload and your payload schema, so if this is one of the choices, you can quickly negate it as a potentially correct answer.
Which of the following errors are retriable from a producer perspective?
Choose two of the following:
- TOPIC_AUTHORIZATION_FAILED
- NOT_LEADER_FOR_PARTITION
- INVALID_REQUIRED_ACKS
- NOT_ENOUGH_REPLICAS
- MESSAGE_TOO_LARGE
NOT_LEADER_FOR_PARTITION and NOT_ENOUGH_REPLICAS are both retriable errors. The others are non-retriable errors.
The “exactly once” guarantee in the Kafka Streams is for which flow of data?
- External ==> Kafka
- Kafka ==> Kafka
- Kafka ==> External
Kafka ==> Kafka
Kafka Streams can only guarantee exactly once processing if you have a Kafka to Kafka topology.
How will you read all the messages from a topic in your KSQL query?
(Select one)
1) Use KSQL CLI to set “auto.offset.reset” property to “earliest”.
2) KSQL reads from the end of a topic. This cannot be changed.
3) KSQL reads from the beginning of a topic by default.
Choice 1 is the correct answer.
Consumers can set auto.offset.reset property to earliest to start consuming from the beginning. For KSQL, SET ‘auto.offset.reset’=‘earliest’;
A Kafka topic has a replication factor of 3 and “min.insync.replicas” setting of 1.
What is the maximum number of brokers that can be down so that a producer with “acks=all” can still produce to the topic?
2 Brokers can go down and one replica will still be able to receive and serve data.
What info is NOT stored inside of Zookeeper?
- Consumer offset
- Broker Registration info
- Schema Registry schemas
- Controller registration
- ACL info
Consumer offsets are stored in a Kafka topic “__consumer_offsets” and the Schema Registry schemas are stored in the “_schemas” topic.
The Broker registration info, Controller registration, and ACL info are all stored inside of Zookeeper.
Using the Confluent Schema Registry, where are Avro schema stored?
The Schema Registry stores all the schemas in the _schemas Kafka topic.
A client connects to a broker in the cluster and sends a fetch request for a partition in a topic. It gets an exception NotLeaderForPartitionException in the response.
How does the client handle this situation?
Answer: It sends a metadata request to the same broker for the topic and select the broker hosting the leader replica.
Explanation: In case the consumer has the wrong leader of a partition, it will issue a metadata request. The metadata request can be handled by any node, so clients know afterwards which broker is the designated leader for the topic partitions.
Produce and consume requests can only be sent to the node hosting the partition leader.
If you want to have an extremely high confidence that leaders and replicas have the correct data, what should the settings be for “acks”, replication factor, and “min.insync.replicas”??
- acks=1, rep factor=3, min ISR=2
- acks=all, rep factor=2, min ISR=1
- acks=all, rep factor= 3, min ISR=2
- acks=all, rep factor=3, min ISR=1
Answer: acks=all, replication factor = 3, min.insync.replicas=2
Explanation:
acks=all means that the leader will wait for all in-sync replicas to acknowledge the record. Also, the “min.insync.replicas” setting specifies the minimum number of replicas that need to be in-sync for the partition to remain available for writes.
There are 2 consumers, C1 and C2, belonging to the same group, G. Group G is subscribed to topics T1 and T2. Each of the topics has 3 partitions. How will the partitions be assigned to consumers with “Partition Assignor” being “RoundRobinAssignor”??
C1 will be assigned partitions 0 and 2 from T1 and partition 1 from T2.
C2 will have partition 1 from T1 and partitions 0 and 2 from T2.
No matter what the choices are, the correct option is the one where the two consumers share an equal number of partitions amongst the two topics of three partitions.
You want to perform table lookups against a KTable every time a new record is received from the KStream. What is the out put of a KStream-KTable join?
A KStream.
Here, a KStream is being processed to create another KStream.
To import data from external databases, which of the following should you use?
- Confluent REST Proxy
- Kafka Streams
- Kafka Connect Sink
- Kafka Connect Source
Kafka Connect Source.
Explanation:
The Kafka Connect Sink is used to export data from Kafka into an external DB, and the Kafka Connect Source is used to import data from an external DB into Kafka.
The Confluent REST Proxy provides a RESTful interface to an Apache Kafka cluster, making it easy to produce and consume messages, view the state of the cluster, and perform administrative actions without using the native Kafka protocol or clients.
Kafka Streams is a library for building streaming applications, specifically apps that transform input Kafka topics into output Kafka topics (or calls to external services, or updates to DBs, or whatever…). It lets you do this with concise code in a way that is distributed and fault-tolerant.
You are running a Kafka Streams application in a Docker container managed by Kubernetes, and upon application restart, it takes a long time for the docker container to replicate the state and get back to processing the data.
How can you dramatically improve the application restart time?
- Increase number of Streams threads
- increase num of partitions in your inputs topic
- Reduce the Streams caching property
- Mount a persistent volume for your RocksDB
Mount a persistent volume for your RocksDB.
Although any Kafka Streams app is stateless, since the state is stored in Kafka, it can take a lot of time and resources to recover the state from Kafka itself. In order to speed up recover, it is advised to store the Kafka Streams state on a persistent volume, so that only the missing part of the state needs to be recovered.
A consumer wants to read messages from a specific partition of a topic. How can this be achieved?
- Call “assign()” passing a Collection of TopicPartitions as the arg
- Call “subscribe()” passing TopicPartition as the arg
- Call “subscribe(String topic, int partition) passing the topic and partition number as the arguments.
Call “assign()” passing a Collection of TopicPartitions as the argument.
Explanation: assign() can be used for manual assignment of a partition to a consumer, in which case subscribe() must not (and cannot) be used. They can never be used together. The assign() method takes a collection of TopicPartion objects as an argument.