Practice Exam 1 Flashcards by Gary Hagen

What client protocol(s) is/are supported for the schema registry?

HTTP and HTTPS.

Clients can interact with the schema registry using the HTTP or HTTPS interface.

How well did you know this?

Not at all

Perfectly

What is the risk of increasing max.in.flight.requests.per.connection while also enabling retries in a producer?

Message order is not preserved

Some messages may require multiple retries. If more than 1 request is in flight, it may result in messages being received out of order.
** NOTE **
An exception to this rule is if you enable the producer setting, “enable.idempotence=true”, which takes care of the out of order case on its own, as well as prevents duplicates.

How well did you know this?

Not at all

Perfectly

We would like to be in an “at-most once” consuming scenario.

Which offset commit strategy would you recommend?

Commit the offsets in Kafka before processing the data.

Explanation: Here, we must commit the offsets right after receiving a batch from a call to “.poll()” and before processing the data.

How well did you know this?

Not at all

Perfectly

In Avro, what type of schema evolution would it be if we are adding a field to a record without default?

Forward Schema Evolution.

Explanation: Clients with old schema will be able to read records saved with new schema.

How well did you know this?

Not at all

Perfectly

Is “min.insync.replicas” a producer setting? (T/F)

No, “min.insync.replicas” is a topic or broker setting, and is only effective when acks=all

How well did you know this?

Not at all

Perfectly

Is the “acks” (aka acknowledgements) setting a producer setting? (T/F)

Yes, the acks setting is a producer setting / client configuration. It denotes the number of brokers that ust receive the record before we consider the “write” as successful. It supports three values - 0, 1, and all.

“acks=0” no acknowledgements needed. So, the producer will not wait for a response from broker.

“acks=1” - The producer will consider the “write” successful when the leader receives the record. The leader broker will know to immediately respond the moment it receives the record and not wait any longer.

“acks=all” - The producer will consider the “write” successful when all of the
In-sync replicas receive the record. This is achieved by the leader broker sending a response back once all the in-sync replicas receive the record themselves.

And, when “acks=all”,

How well did you know this?

Not at all

Perfectly

I am producing Avro data on my Kafka cluster that is integrated with the Confluent Schema Registry. After a schema change that is incompatible, I know my data will be rejected. Which component will reject the data?

The Confluent Schema Registry is the component that will reject the data. It is your safeguard against incompatible schema changes and will be the component that ensure no breaking schema evolution will be possible.

Note: Kafka brokers do not look at your payload and your payload schema, so if this is one of the choices, you can quickly negate it as a potentially correct answer.

How well did you know this?

Not at all

Perfectly

Which of the following errors are retriable from a producer perspective?
Choose two of the following:

TOPIC_AUTHORIZATION_FAILED
NOT_LEADER_FOR_PARTITION
INVALID_REQUIRED_ACKS
NOT_ENOUGH_REPLICAS
MESSAGE_TOO_LARGE

NOT_LEADER_FOR_PARTITION and NOT_ENOUGH_REPLICAS are both retriable errors. The others are non-retriable errors.

How well did you know this?

Not at all

Perfectly

The “exactly once” guarantee in the Kafka Streams is for which flow of data?

External ==> Kafka
Kafka ==> Kafka
Kafka ==> External

Kafka ==> Kafka

Kafka Streams can only guarantee exactly once processing if you have a Kafka to Kafka topology.

How well did you know this?

Not at all

Perfectly

How will you read all the messages from a topic in your KSQL query?
(Select one)
1) Use KSQL CLI to set “auto.offset.reset” property to “earliest”.
2) KSQL reads from the end of a topic. This cannot be changed.
3) KSQL reads from the beginning of a topic by default.

Choice 1 is the correct answer.

Consumers can set auto.offset.reset property to earliest to start consuming from the beginning. For KSQL, SET ‘auto.offset.reset’=‘earliest’;

How well did you know this?

Not at all

Perfectly

A Kafka topic has a replication factor of 3 and “min.insync.replicas” setting of 1.
What is the maximum number of brokers that can be down so that a producer with “acks=all” can still produce to the topic?

2 Brokers can go down and one replica will still be able to receive and serve data.

How well did you know this?

Not at all

Perfectly

What info is NOT stored inside of Zookeeper?

Consumer offset
Broker Registration info
Schema Registry schemas
Controller registration
ACL info

Consumer offsets are stored in a Kafka topic “__consumer_offsets” and the Schema Registry schemas are stored in the “_schemas” topic.

The Broker registration info, Controller registration, and ACL info are all stored inside of Zookeeper.

How well did you know this?

Not at all

Perfectly

Using the Confluent Schema Registry, where are Avro schema stored?

The Schema Registry stores all the schemas in the _schemas Kafka topic.

How well did you know this?

Not at all

Perfectly

A client connects to a broker in the cluster and sends a fetch request for a partition in a topic. It gets an exception NotLeaderForPartitionException in the response.
How does the client handle this situation?

Answer: It sends a metadata request to the same broker for the topic and select the broker hosting the leader replica.

Explanation: In case the consumer has the wrong leader of a partition, it will issue a metadata request. The metadata request can be handled by any node, so clients know afterwards which broker is the designated leader for the topic partitions.
Produce and consume requests can only be sent to the node hosting the partition leader.

How well did you know this?

Not at all

Perfectly

If you want to have an extremely high confidence that leaders and replicas have the correct data, what should the settings be for “acks”, replication factor, and “min.insync.replicas”??

acks=1, rep factor=3, min ISR=2
acks=all, rep factor=2, min ISR=1
acks=all, rep factor= 3, min ISR=2
acks=all, rep factor=3, min ISR=1

Answer: acks=all, replication factor = 3, min.insync.replicas=2

Explanation:
acks=all means that the leader will wait for all in-sync replicas to acknowledge the record. Also, the “min.insync.replicas” setting specifies the minimum number of replicas that need to be in-sync for the partition to remain available for writes.

How well did you know this?

Not at all

Perfectly

There are 2 consumers, C1 and C2, belonging to the same group, G. Group G is subscribed to topics T1 and T2. Each of the topics has 3 partitions. How will the partitions be assigned to consumers with “Partition Assignor” being “RoundRobinAssignor”??

C1 will be assigned partitions 0 and 2 from T1 and partition 1 from T2.
C2 will have partition 1 from T1 and partitions 0 and 2 from T2.

No matter what the choices are, the correct option is the one where the two consumers share an equal number of partitions amongst the two topics of three partitions.

How well did you know this?

Not at all

Perfectly

You want to perform table lookups against a KTable every time a new record is received from the KStream. What is the out put of a KStream-KTable join?

A KStream.

Here, a KStream is being processed to create another KStream.

How well did you know this?

Not at all

Perfectly

To import data from external databases, which of the following should you use?

Confluent REST Proxy
Kafka Streams
Kafka Connect Sink
Kafka Connect Source

Kafka Connect Source.

Explanation:
The Kafka Connect Sink is used to export data from Kafka into an external DB, and the Kafka Connect Source is used to import data from an external DB into Kafka.
The Confluent REST Proxy provides a RESTful interface to an Apache Kafka cluster, making it easy to produce and consume messages, view the state of the cluster, and perform administrative actions without using the native Kafka protocol or clients.
Kafka Streams is a library for building streaming applications, specifically apps that transform input Kafka topics into output Kafka topics (or calls to external services, or updates to DBs, or whatever…). It lets you do this with concise code in a way that is distributed and fault-tolerant.

How well did you know this?

Not at all

Perfectly

You are running a Kafka Streams application in a Docker container managed by Kubernetes, and upon application restart, it takes a long time for the docker container to replicate the state and get back to processing the data.
How can you dramatically improve the application restart time?

Increase number of Streams threads
increase num of partitions in your inputs topic
Reduce the Streams caching property
Mount a persistent volume for your RocksDB

Mount a persistent volume for your RocksDB.

Although any Kafka Streams app is stateless, since the state is stored in Kafka, it can take a lot of time and resources to recover the state from Kafka itself. In order to speed up recover, it is advised to store the Kafka Streams state on a persistent volume, so that only the missing part of the state needs to be recovered.

How well did you know this?

Not at all

Perfectly

A consumer wants to read messages from a specific partition of a topic. How can this be achieved?

Call “assign()” passing a Collection of TopicPartitions as the arg
Call “subscribe()” passing TopicPartition as the arg
Call “subscribe(String topic, int partition) passing the topic and partition number as the arguments.

Call “assign()” passing a Collection of TopicPartitions as the argument.

Explanation: assign() can be used for manual assignment of a partition to a consumer, in which case subscribe() must not (and cannot) be used. They can never be used together. The assign() method takes a collection of TopicPartion objects as an argument.

How well did you know this?

Not at all

Perfectly

Where are the dynamic configurations for a topic stored?

On the Kafka broker file system
In server.properties
In an internal Kafka topic called “__topic_configurations”
In Zookeeper

Study These Flashcards

In Zookeeper!

Dynamic topic configurations are maintained in Zookeeper.

Is KSQL ANSI SQL compliant?

Study These Flashcards

No. KSQL is not ANSI SQL compliant. For now there are no defined standards on streaming SQL languages.

A customer has man consumer apps that process messages from a Kafka topic. Each consumer app can only process 50 MB/s. Your customer wants to achieve a target throughput of 1 GB/s. What is the minimum number of partitions that you should suggest to the customer for that particular topic?

Study These Flashcards

20 partitions.

Each consumer can process only 50 MB/s, so we need at least 20 consumers that are consuming one partition so that the 1GB target is achieved.
Multiply the consumer processing speed by the number of consumers consuming one partition, and this will give you the target throughput.
In our case, 50 MB/s * X partitions = 1GB/s
Since 1GB = 1000MB, just divide the target throughput of 1000MB/s by processing speed of 50MB/s and we get the answer of 20 partitions.

A Kafka producer application wants to send log messages to a topic that does not include any key. What are the properties that are mandatory to configure for the producer configuration? (Select three)

partition
bootstrap.servers
value.serializer
key
key.serializer
value

Study These Flashcards

Both the key serializer and value serializer, as well as the bootstrap.servers (broker) are required/mandatory.

You have a Kafka cluster and all the topics have a replication factor of 3. One intern at your company stopped a broker, and accidentally deleted all the data from that broker on the disk. What will happen if the broker is restarted? - The broker will start and other topics will also be deleted as the broker data on the disk got deleted. - The broker will start and won’t have any data. If the broker becomes leader, we have a data loss. - The broker will start and won’t be online until all the data it needs to have is replicated from other leaders. - The broker will crash.

The broker will start and won’t be online until all the data it needs to have is replicated from other leaders. Explanation: Kafka replication mechanism makes it resilient to the scenarios where the broker loses data on disk, but can recover from replication from other brokers.

Select all the ways for one consumer to subscribe simultaneously to the following topics: topic.history, topic.sports, topic.politics (Select 2) - consumer.subscribe(“topic.history”); consumer. subscribe(“topic.sports”); consumer. subscribe(“topic.politics”); - consumer.subscribe(Pattern.compile(“topic\..*”)); - consumer.subscribePrefix(“topic.”); - consumer.subscribe(Arrays.asList(“topic.history”, “topic.sports”, “topic.politics”));

consumer.subscribe(Pattern.compile(“topic\..*”)); and consumer.subscribe(Arrays.asList(“topic.history”, “topic.sports”, “topic.politics”)); Explanation: Multiple topics can be passed as a list or regents pattern.

Which of the following event processing application is stateless? (Select 2) - Find the minimum and maximum stock prices for each day of trading. - Publish the top 10 stocks each day. - Read log messages from a stream and write ERROR events into a high-priority stream, and the rest of the events into a low-priority stream. - Read events from a stream and modifies them from JSON to Avro.

Answer: Read log messages from a stream and write ERROR events into a high-priority stream, and the rest of the events into a low-priority stream. And - Read events from a stream and modifies them from JSON to Avro. Explanation: Stateless means processing of each message depends only on the message, so converting from JSON to Avro or filtering a stream are both stateless operations.

Which of the following Kafka Streams operators are stateful? - flatmap - reduce - aggregate - count - peek - joining

Reduce, aggregate, count, joining, and windowing are all stateful Kafka streams operators.

If I want to send binary data through the REST Proxy to topic “test_binary”, it needs to be base64 encoded. A consumer connecting directly into the Kafka topic “test_binary” will receive which of the following? - Avro data - Binary data - JSON data - Base64 encoded data, it will need to decode it

Answer: Binary Data. Explanation: On the producer side, after receiving base 64 data, the REST Proxy will convert it into bytes and then send that bytes payload to Kafka. Therefore, consumers reading DIRECTLY from Kafka will receive binary data.

How will you find out which partitions have one or more replicas that are not in-sync with the leader? Using the CLI command “Kafka-topics.sh” as the prefix to the following choice: - —broker-list localhost:9092 —describe —under-replicated-partitions - —bootstrap-server localhost:9092 —describe —unavailable-partitions - —zookeeper localhost:2181 —describe —unavailable-partitions - —zookeeper localhost:2181 —describe —under-replicated-partitions

Kafka-topics.sh —zookeeper localhost:2181 —describe —under-replicated-partitions

You want to sink data from a Kafka topic to S3 using Kafka Connect. There are 10 brokers in the cluster, the topic has 2 partitions with replication factor of 3. How many tasks will you configure for the S3 connector? - 10 - 6 - 3 - 2

Answer: 2 Explanation: You cannot have more sink tasks (consumers) than the number of partitions. So, the answer is 2 tasks.

A consumer starts and has “auto.offset.reset=latest”, and the topic partition currently has data for offsets going from 45 to 2311. The consumer group has committed the offset 643 for the topic before. Where will the consumer read from? - offset 45 - offset 2311 - it will crash - offset 643

Offset 643 The offsets are already committed for this consumer group and topic partition, so the property “auto.offset.reset” is ignored.

Where are the ACLs stored in a Kafka cluster by default? - Inside the broker’s data directory - Inside the Zookeeper’s data directory - Under Zookeeper node /Kafka-acl/ - In Kafka topic “__kafka_acls”

ACLs are stored in Zookeeper node /kafka-acls/ by default.

If I produce to a topic that does not exist, and the broker setting “auto.create.topic.enable=true”, what will happen? - Kafka will automatically create the topic with num.partitions equal to the number of brokers and replication.factor=3 - Kafka will automatically create the topic with the broker settings num.partitions and default.replication.factor - Kafka will automatically create the topic with the indicated producer settings num.partitions and default.replication.factor - Kafka will automatically create the topic with 1 partition and replication factor.

- Kafka will automatically create the topic with the broker settings num.partitions and default.replication.factor The broker settings come into play when a topic is auto created.

Which KSQL queries write to Kafka? - CREATE STREAM WITH and CREATE TABLE WITH - CREATE STREAM AS SELECT and CREATE TABLE AS SELECT - COUNT and JOIN - SHOW STREAMS and EXPLAIN statements

- CREATE STREAM WITH and CREATE TABLE WITH - CREATE STREAM AS SELECT and CREATE TABLE AS SELECT Explanation: SHOW STREAMS and EXPLAIN statements run against the KSQL server that the KSQL client is connected to. The do not communicate directly with Kafka. CREATE STREAM WITH and CREATE TABLE WITH write metadata to the KSQL command topic. Persistent queries based on “CREATE STREAMS AS SELECT” and “CREATE TABLE AS SELECT” read and write to Kafka topics. Non-persistent queries based on SELECT that are stateless only read from Kafka topics: for example, “SELECT FROM foo WHERE ”. Non-persistent queries that are stateful read and write to Kafka. For example, COUNT and JOIN. The data in Kafka is deleted automatically when you terminate the query with CTRL-C.

What does producing with a key do?

Producing with a key allows you to influence partitioning of the producer messages. Keys are necessary if you require strong ordering or grouping for messages that share the same key. If you require that messages with the same key are always seen in the correct order, attaching a key to messages will ensure messages with the same key always go to the same partition in a topic. Kafka guarantees order within a partition, but not across partitions in a topic. So, alternatively, not providing a key will not maintain such order - instead, it results in a round-robin distribution across partitions.

What isn’t a feature of the Confluent schema registry? - Store schemas - Enforce compatibility rules - Store Avro data

Answer: Store Avro data Explanation: The Confluent schema registry has features to store schemas and enforce compatibility rules. However, data is stored on brokers, so the Confluent schema registry does not have a feature to store Avro data.

There are 3 producers writing to a topic with 5 partitions. There are 5 consumers consuming from the topic. How many Controllers will be present in the cluster?

1 There is only one controller in a cluster at all times.

Which of the following settings increase the chance of batching for a Kafka Producer? - Increase linger.ms - Increase batch.size - Increase the number of producer threads - Increase message.max.bytes

- Increase Linger.ms Linger.ms forces the producer to wait to send messages, and therefore increases the chance of creating batches.

If a topic has a replication factor of 3… - 3 replicas of the same data will live on 1 broker - Each partition will live on 2 different brokers - Each partition will live on 3 different brokers - Each partition will live on 4 different brokers

- Each partition will live on 3 different brokers Replicas are spread across available brokers, and each replica = 1 broker. Replication Factor, RF=3 ==> 3 brokers

A Zookeeper ensemble contains 3 servers. Over which ports are the members of the ensemble able to communicate in default configurations?

Client Port = 2181, Peer Port / Election Port = 2888, and Leader Port = 3888

What can you do to enhance compression? (By increasing the chances of batching and using what?)

The linger.ms setting, which forces the producer to wait before sending messages. Hence, increasing the chance of creating batches that can be heavily compressed.

A consumer application is using KafkaAvroDeserializer to deserialize Avro messages. What happens if the message schema is not present in AvroDeserializer local cache? - Throws DeserializationException - Throws SerializationException - Fails silently - Fetches schema from Schema Registry

First, local cache is checked for the message schema. In the case of it being missing from the cache, the schema is pulled from the schema registry. An exception will be thrown if the Schema Registry does not have the schema (which should never happen if you set it up correctly).

What isn’t an internal Kafka Connect topic? - connect-offsets - connect-status - connect-configs - connect-jars

Connect-jars is not an internal Kafka Connect topic. Connect-configs stores configurations, connect-status helps to elect leaders for connect, and connect-offsets store source offsets for source connectors.

A topic receives all the orders for the products that are available on a commerce site. Two applications want to process all the messages independently - order fulfillment and monitoring. The topic has 4 partitions, how would you organize the consumers for optimal performance and resource usage? How many consumer groups, consumers per group? Would they be for the number of topics, partitions, or for each of the applications?

Create two consumer groups for two applications with 4 consumers in each. Two partition groups - one for each app, so that all messages are delivered to both apps. 4 consumers in each, as there are 4 partitions of the topic and you cannot have more consumers per groups than the number of partitions. Otherwise they will be inactive and wasting resources.

Your producer is producing at a very high rate and the batches are completely full each time. How can you improve the producer throughput? (Select two) - increase batch.size - decrease linger.ms - disable compression - decrease batch.size - enable compression - increase linger.ms

Increase the batch.size and enable compression. The batch.size controls how many bytes of data to collect before sending messages to the Kafka broker. Set this as high as possible without exceeding the available memory. Enabling compression can also help make more compact batches and increase the throughput of your producer. The setting of linger.ms will have no effect, since the batches are already full.

You are using JDBC source connector to copy data from 2 tables to 2 Kafka topics. There is one connector created with “max.tasks” equal to 2 deployed on a cluster of 3 workers. How many tasks are launched?

2 tasks are launched. We have two tables, so the max number of tasks is 2.

A bank uses a Kafka cluster for credit card payments. What should be the value of the property “unclean.leader.election.enable”?

The property “unclean.leader.election.enable” should be set to FALSE. Setting it to true means we allow out-of-sync replicas to become leaders. This means we will lose messages when this occurs, effectively losing credit card payments, which would not be acceptable for the customers.

What field is optional in an Avro record? - namespace - doc - fields - name

Doc represents an optional description of a message.

How do Kafka brokers ensure great performance between the producers and consumers? (Select two) - it buffers the messages on disk, and sends messages from the disk reads. - it dose not transform the messages. - it compresses the messages as it writes to the disk. - it leverages zero-copy optimizations to send data straight from the page-cache - it transforms the message into a binary format

- it dose not transform the messages. - it leverages zero-copy optimizations to send data straight from the page-cache. Kafka transfers data with zero-copy and sends the raw bytes it receives from the producer straight to the consumer, leveraging the RAM available as page cache.

Practice Exam 1 Flashcards

(50 cards)