Test 1 Flashcards

1
Q

In avro, adding a field to a record without default is a __ schema evolution

A

Forward

Explanation
Clients with old schema will be able to read records saved with new schema

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A consumer wants to read messages from a specific partition of a topic. How can this be achieved?

A

Call assign() passing a collection of TopicPartitions as the argument

assign() can be used for manual assignment of a partition to a consumer, in which case subscribe() must not be used. Assign() takes a collection of TopicPartition object as an argument (https://kafka.apache.org/23/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#assign-java.util.Collection-)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

We would like to be in an at-most once consuming scenario. Which offset commit strategy would you recommend?

A

Commit the offsets in Kafka, before processing the data

Explanation
Here, we must commit the offsets right after receiving a batch from a call to .poll()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The exactly once guarantee in the Kafka Streams is for which flow of data?

A

Kafka => Kafka

Explanation
Kafka Streams can only guarantee exactly once processing if you have a Kafka to Kafka topology.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

To import data from external databases, I should use

A

Kafka Connect Source

Explanation
Kafka Connect Sink is used to export data from Kafka to external databases and Kafka Connect Source is used to import from external databases into Kafka.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

You want to sink data from a Kafka topic to S3 using Kafka Connect. There are 10 brokers in the cluster, the topic has 2 partitions with replication factor of 3. How many tasks will you configure for the S3 connector?

A

2

Explanation
You cannot have more sink tasks (= consumers) than the number of partitions, so 2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the risk of increasing max.in.flight.requests.per.connection while also enabling retries in a producer?

A

Message order is not preserved

Explanation
Some messages may require multiple retries. If there are more than 1 requests in flight, it may result in messages received out of order. Note an exception to this rule is if you enable the producer setting: enable.idempotence=true which takes care of the out of ordering case on its own. See: https://issues.apache.org/jira/browse/KAFKA-5494

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What client protocol is supported for the schema registry? (select two)

A

HTTP
HTTPS

Explanation
clients can interact with the schema registry using the HTTP or HTTPS interface

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Where are the dynamic configurations for a topic stored?

A

In Zookeeper

Explanation
Dynamic topic configurations are maintained in Zookeeper.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which of the following errors are retriable from a producer perspective? (select two)

A

NOT_LEADER_FOR_PARTITION
NOT_ENOUGH_REPLICAS

Explanation

Both of these are retriable errors, others non-retriable errors.

See the full list of errors and their “retriable” status here: https://kafka.apache.org/protocol#protocol_error_codes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

There are two consumers C1 and C2 belonging to the same group G subscribed to topics T1 and T2. Each of the topics has 3 partitions. How will the partitions be assigned to consumers with PartitionAssignor being RoundRobinAssignor?

A

C1 will be assigned partitions 0 and 2 from T1 and partition 1 from T2. C2 will have partition 1 from T1 and partitions 0 and 2 from T2.

Explanation
The correct option is the only one where the two consumers share an equal number of partitions amongst the two topics of three partitions. An interesting article to read is: https://medium.com/@anyili0928/what-i-have-learned-from-kafka-partition-assignment-strategy-799fdf15d3ab

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

is KSQL ANSI SQL compliant?

A

No

Explanation
KSQL is not ANSI SQL compliant, for now there are no defined standards on streaming SQL languages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

A bank uses a Kafka cluster for credit card payments. What should be the value of the property unclean.leader.election.enable?

A

False

Explanation
Setting unclean.leader.election.enable to true means we allow out-of-sync replicas to become leaders, we will lose messages when this occurs, effectively losing credit card payments and making our customers very angry.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

You want to perform table lookups against a KTable everytime a new record is received from the KStream. What is the output of KStream-KTable join?

A

KStream

Explanation
Here KStream is being processed to create another KStream.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What isn’t a feature of the Confluent schema registry?

A

Store Avro data

Explanation
Data is stored on brokers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

If I want to send binary data through the REST proxy to topic “test_binary”, it needs to be base64 encoded. A consumer connecting directly into the Kafka topic “test_binary” will receive

A

Binary data

Explanation
On the producer side, after receiving base64 data, the REST Proxy will convert it into bytes and then send that bytes payload to Kafka. Therefore consumers reading directly from Kafka will receive binary data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

A topic receives all the orders for the products that are available on a commerce site. Two applications want to process all the messages independently - order fulfilment and monitoring. The topic has 4 partitions, how would you organise the consumers for optimal performance and resource usage?

A

Create 2 consumer groups for 2 applications with 4 consumers each

Explanation
two partitions groups - one for each application so that all messages are delivered to both the application. 4 consumers in each as there are 4 partitions of the topic, and you cannot have more consumers per groups than the number of partitions (otherwise they will be inactive and wasting resources)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

If I want to have an extremely high confidence that leaders and replicas have my data, I should use

A

acks=all, replication factor=3, min.insync.replicas=2

Explanation
acks=all means the leader will wait for all in-sync replicas to acknowledge the record. Also the min in-sync replica setting specifies the minimum number of replicas that need to be in-sync for the partition to remain available for writes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Which is an optional field in an Avro record?

A

doc

Explanation
doc represents optional description of message

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Which of the following setting increases the chance of batching for a Kafka Producer?

A

increase linger.ms

Explanation
linger.ms forces the producer to wait to send messages, hence increasing the chance of creating batches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Where are the ACLs stored in a Kafka cluster by default?

A

Under Zookeeper node /kafka-acl/

Explanation
ACLs are stored in Zookeeper node /kafka-acls/ by default.

22
Q

You are using JDBC source connector to copy data from 2 tables to two Kafka topics. There is one connector created with max.tasks equal to 2 deployed on a cluster of 3 workers. How many tasks are launched?

A

2

Explanation
we have two tables, so the max number of tasks is 2

23
Q

What isn’t an internal Kafka Connect topic?

A

connect-jars

Explanation
connect-configs stores configurations, connect-status helps to elect leaders for connect, and connect-offsets store source offsets for source connectors

24
Q

A kafka topic has a replication factor of 3 and min.insync.replicas setting of 1. What is the maximum number of brokers that can be down so that a producer with acks=all can still produce to the topic?

A

2

Explanation
Two brokers can go down, and one replica will still be able to receive and serve data

25
Q

I am producing Avro data on my Kafka cluster that is integrated with the Confluent Schema Registry. After a schema change that is incompatible, I know my data will be rejected. Which component will reject the data?

A

The Confluent Schema Registry

Explanation
The Confluent Schema Registry is your safeguard against incompatible schema changes and will be the component that ensures no breaking schema evolution will be possible. Kafka Brokers do not look at your payload and your payload schema, and therefore will not reject data

26
Q

Which of the following event processing application is stateless? (select two)

A

Read events from a stream and modify them from JSON to Avro
Read Log messages from a stream and write error events to a high priority stream and the rest to a low priority stream

Explanation
Stateless means processing of each message depends only on the message, so converting from JSON to Avro or filtering a stream are both stateless operations

27
Q

A customer has many consumer applications that process messages from a Kafka topic. Each consumer application can only process 50 MB/s. Your customer wants to achieve a target throughput of 1 GB/s. What is the minimum number of partitions will you suggest to the customer for that particular topic?

A

20

Explanation
each consumer can process only 50 MB/s, so we need at least 20 consumers consuming one partition so that 50 * 20 = 1000 MB target is achieved.

28
Q

If I produce to a topic that does not exist, and the broker setting auto.create.topic.enable=true, what will happen?

A

Kafka will automatically create the topic with the broker settings num.partitions and default.replication.factor

Explanation
The broker settings comes into play when a topic is auto created

29
Q

A Kafka producer application wants to send log messages to a topic that does not include any key. What are the properties that are mandatory to configure for the producer configuration? (select three)

A

key. serializer
value. serializer
bootstrap. servers

Explanation
Both key and value serializer are mandatory.

30
Q

Producing with a key allows to…

A

Influence partitioning of the producer messages

Explanation
Keys are necessary if you require strong ordering or grouping for messages that share the same key. If you require that messages with the same key are always seen in the correct order, attaching a key to messages will ensure messages with the same key always go to the same partition in a topic. Kafka guarantees order within a partition, but not across partitions in a topic, so alternatively not providing a key - which will result in round-robin distribution across partitions - will not maintain such order.

31
Q

How do Kafka brokers ensure great performance between the producers and consumers? (select two)

A

It does not transform the messages
It leverages zero-copy optimizations to send data straight from the page-cache

Explanation
Kafka transfers data with zero-copy and sends the raw bytes it receives from the producer straight to the consumer, leveraging the RAM available as page cache

32
Q

If a topic has a replication factor of 3…

A

Each Partition will live on 3 different brokers

Explanation
Replicas are spread across available brokers, and each replica = one broker. RF 3 = 3 brokers

33
Q

To enhance compression, I can increase the chances of batching by using

A

linger.ms = 20

Explanation
linger.ms forces the producer to wait before sending messages, hence increasing the chance of creating batches that can be heavily compressed.

34
Q

To get acknowledgement of writes to only the leader partition, we need to use the config…

A

acks = 1

Explanation
Producers can set acks=1 to get acknowledgement from partition leader only.

35
Q

A client connects to a broker in the cluster and sends a fetch request for a partition in a topic. It gets an exception NotLeaderForPartitionException in the response. How does client handle this situation?

A

Send Metadata request to the same broker for the topic and select the broker hosting the leader replica

Explanation
In case the consumer has the wrong leader of a partition, it will issue a metadata request. The Metadata request can be handled by any node, so clients know afterwards which broker are the designated leader for the topic partitions. Produce and consume requests can only be sent to the node hosting partition leader.

36
Q

What information isn’t stored inside of Zookeeper? (select two)

A

Consumer Offset
Schema Registry Schemas

Explanation
Consumer offsets are stored in a Kafka topic __consumer_offsets, and the Schema Registry stored schemas in the _schemas topic.

37
Q

How will you find out all the partitions where one or more of the replicas for the partition are not in-sync with the leader?

A

Kafka-topic.sh –zookeper localhost:2181 –describe –under-replicated-partitions

38
Q

A consumer application is using KafkaAvroDeserializer to deserialize Avro messages. What happens if message schema is not present in AvroDeserializer local cache?

A

Fetches schema from Schema Registry

Explanation
First local cache is checked for the message schema. In case of cache miss, schema is pulled from the schema registry. An exception will be thrown in the Schema Registry does not have the schema (which should never happen if you set it up properly)

39
Q

You are running a Kafka Streams application in a Docker container managed by Kubernetes, and upon application restart, it takes a long time for the docker container to replicate the state and get back to processing the data. How can you improve dramatically the application restart?

A

Mount a persistent volume for your RocksDB

Explanation
Although any Kafka Streams application is stateless as the state is stored in Kafka, it can take a while and lots of resources to recover the state from Kafka. In order to speed up recovery, it is advised to store the Kafka Streams state on a persistent volume, so that only the missing part of the state needs to be recovered.

40
Q

Which of the following Kafka Streams operators are stateful? (select all that apply)

A

joining
aggregate
count
reduce

Explanation
See: https://kafka.apache.org/20/documentation/streams/developer-guide/dsl-api.html#stateful-transformations

41
Q

Which KSQL queries write to Kafka?

A

COUNT and JOIN
CREATE STREAM AS SELECT and CREATE TABLE AS SELECT
CREATE STREAM WITH SELECT and CREATE TABLE WITH

Explanation
SHOW STREAMS and EXPLAIN statements run against the KSQL server that the KSQL client is connected to. They don’t communicate directly with Kafka. CREATE STREAM WITH and CREATE TABLE WITH write metadata to the KSQL command topic. Persistent queries based on CREATE STREAM AS SELECT and CREATE TABLE AS SELECT read and write to Kafka topics. Non-persistent queries based on SELECT that are stateless only read from Kafka topics, for example SELECT … FROM foo WHERE …. Non-persistent queries that are stateful read and write to Kafka, for example, COUNT and JOIN. The data in Kafka is deleted automatically when you terminate the query with CTRL-C.

42
Q

A Zookeeper ensemble contains 3 servers. Over which ports the members of the ensemble should be able to communicate in default configuration? (select three)

A

2888
3888
2181

Explanation
2181 - client port, 2888 - peer port, 3888 - leader port

43
Q

Using the Confluent Schema Registry, where are Avro schema stored?

A

In the _schemas topic

Explanation
The Schema Registry stores all the schemas in the _schemas Kafka topic

44
Q

There are 3 producers writing to a topic with 5 partitions. There are 5 consumers consuming from the topic. How many Controllers will be present in the cluster?

A

1

Explanation
There is only one controller in a cluster at all times.

45
Q

You have a Kafka cluster and all the topics have a replication factor of 3. One intern at your company stopped a broker, and accidentally deleted all the data of that broker on the disk. What will happen if the broker is restarted?

A

The broker will start, and won’t be able to be online until all the data it needs to have is replicated from other leaders.

Explanation
Kafka replication mechanism makes it resilient to the scenarios where the broker lose data on disk, but can recover from replicating from other brokers. This makes Kafka amazing!

46
Q

Select all that applies

A

asks is a producer setting

min. insync.replicas is a topic setting
min. insync.replicas only matters if acks=all

Explanation
acks is a producer setting min.insync.replicas is a topic or broker setting and is only effective when acks=all

47
Q

How will you read all the messages from a topic in your KSQL query?

A

Use KSQL CLI to set auto.offset.reset property to earliest

Explanation
Consumers can set auto.offset.reset property to earliest to start consuming from beginning. For KSQL, SET ‘auto.offset.reset’=’earliest’;

48
Q

A consumer starts and has auto.offset.reset=latest, and the topic partition currently has data for offsets going from 45 to 2311. The consumer group has committed the offset 643 for the topic before. Where will the consumer read from?

A

643

Explanation
The offsets are already committed for this consumer group and topic partition, so the property auto.offset.reset is ignored

49
Q

Your producer is producing at a very high rate and the batches are completely full each time. How can you improve the producer throughput? (select two)

A

Enable compression
Increase Batch Size

Explanation
batch.size controls how many bytes of data to collect before sending messages to the Kafka broker. Set this as high as possible, without exceeding available memory. Enabling compression can also help make more compact batches and increase the throughput of your producer. Linger.ms will have no effect as the batches are already full

50
Q

Select all the way for one consumer to subscribe simultaneously to the following topics - topic.history, topic.sports, topic.politics? (select two)

A

consumer. subscribe(Arrays.asList(“topic.history”.”topic.sports”,”topics.politics”);
consumer. subscribe(Pattern.compile(“topic..*”));

Explanation
Multiple topics can be passed as a list or regex pattern.