Cloud Guru Practice Questions Flashcards
The default topic retention period for your cluster is 64000 ms, but you have one topic which needs a longer retention period of 120000 ms. Which technique should you use to set the longer retention period for that topic?
- Add a configuration override to the topic, setting the retention period to 120000
- Create a new topic with a retention period of 120000
- Change the cluster default retention period to 120000
- Locate the broker that is functioning as the leader for the topic and set its retention period to 120000.
- Add a configuration override to the topic, setting the retention period to 120000
You can add a configuration override to change the retention period for the specific topic that needs it.
What is the purpose of a Serde when using the Kafka Streams API in Java?
- Combine data from multiple streams.
- Determine the priority for stream processing operations that require a relatively large amount of resources.
- Specify a serializer/deserializer to translate Kafka data to and from typed Java data.
- Determine which topics to read from and write to.
- Specify a serializer/deserializer to translate Kafka data to and from typed Java data.
Serde is short for “serializer/deserializer”. Serdes are used with Kafka Streams to convert Kafka data into typed Java data.
A partition has no ISRs, but there are some replicas available. Assuming that unclean.leader.election.enable is set to false, what will happen?
- Any messages sent by publisher will be lost.
- The topic will not accept new messages and producers will have to wait.
- Kafka will crash.
- An out-of-sync replica will become the new leader.
- The topic will not accept new messages and producers will have to wait.
Since unclean leader election is not enabled, the topic will not accept new messages until an ISR (in-sync replica) becomes available for leader election.
You have two streams, and you want to combine them into one stream that contains all of the records of the input streams as separate records. Which stateless transformation would you use?
- Map
- Merge
- Join
- Combine
Merge - it combines two streams into one new stream.
You are using the Kafka Streams API for Java. You have a KStream called “stream”. Which of the following lines of code would ensure that the output is sent to a topic called “output-topic”?
- KStream.setOutputTopic(stream, “output-topic”);
- stream.output(“output-topic”);
- stream.to(“output-topic”);
- stream.send(“output-topic”);
- stream.to(“output-topic”);
The “.to()” method sends the output to the specified topic.
Which of the following commands could you use to list all topics in a cluster?
- ./bin/kafka-topics.sh —bootstrap-server localhost:9092 —list
- ./bin/list-topics.sh —bootstrap-server localhost:9092 —list
- ./bin/kafka-topics.sh —bootstrap-server localhost:9092 —describe —topic
- ./bin/kafka-topics.sh —bootstrap-server localhost:9092 —all
- ./bin/kafka-topics.sh —bootstrap-server localhost:9092 —list
This command would list all topics in the cluster.
There are two topics with the following data: ————————|—————————————— Topic A (Names) : Topic B (Emails): 0034353: J Doe 0034353: jdoe@co.com 0017654: J Smith 0023466: bsimpson@co.com ————————|——————————————— Select the join type which could be used to produce the following output: “0034353: Jane Doe, jdoe@company.com”
- Left Join
- Outer Join
- Special Join
- Inner Join
Inner Join.
An inner join contains only records that are present in both source streams. Therefore, an inner join would be able to output Jane’s name and email address, and would not contain any record for John since he is missing from the email’s topic.
Which of the following scenarios would allow you to successfully join two streams reading from two input topics? (Choose two)
- Topic 1 has 4 partitions, topic 2 has 3 partitions, and your stream app is using a GlobalKTable.
- Topic 1 has 4 partitions, topic 2 has 8 partitions, and your stream app is using a KTable.
- Topic 1 has 4 partitions, topic 2 has 4 partitions, and your stream app is using a KTable.
- Topic 1 has 1 partition, topic 2 has 3 partitions, and your stream app is using a KTable.
- Topic 1 has 4 partitions, topic 2 has 3 partitions, and your stream app is using a GlobalKTable.
This scenario allows a join because even though the topics are not co-partitioned, a GlobalKTable is being used. - Topic 1 has 4 partitions, topic 2 has 4 partitions, and your stream app is using a KTable.
This scenario allows a join because the topics are co-partitioned.
You have a consumer group with 6 consumers consuming from a topic with 5 partitions. What will happen to the extra consumer?
It will remain idle and not process messages.
If there are more consumers than partitions, any extra consumers will remain idle and only process messages if another consumer goes down.
What Kafka Streams transformation would you use to print the value of each record to the console without modifying the stream, assuming that you still want to output the data to an output topic?
Peek - allows you to do arbitrary operations like printing to the console and allows further processing like outputting to an output topic.
You have a stream containing some records, and you do not need to output the stream to a topic or do any further processing. Which transformation would you use to print the value of each record to the console?
- Stop
- Foreach
- Map
- Peek
Foreach.
Foreach would allow you to print the values to the console. It’s the best scenario since it is a terminal operation that stops any further processing.
You are performing a join, and you only want to join two records if their time stamps are within five minutes of one another. Which windowing strategy should you use?
Sliding Time Windows - because they are used for joins and are tied to the timestamps of records.
You have two topics, one with employee names and another with employee email addresses. Both topics use an employee ID number as the key, which is unique to each employee. What kind of transformation would you use to combine these two topics into one stream of records where the keys are the employee ID numbers and the values contain both the employee name and email address?
- Merge
- Combine
- flatMap
- Join
A Join would work in this scenario because there is a shared key between the two topics.
Under normal circumstances, how many leaders are there in a topic that has 3 partitions and a replication factor of 2?
3.
There is one leader per partition.
You have a stream of records. Which type of window would you use to perform a count aggregation that counts the number of records for each key that appears during each hour of the day?
Tumbling Time Windows
Since you are counting records by each hour of the day, you should use Tumbling Time Windows to divide the records into non-overlapping, gapless buckets for each hour.
Kafka Broker configurations such as “background.threads” can be updated in such a way that will automatically roll out the change to the entire cluster, without requiring broker restarts. Which dynamic update mode applies to these configurations?
- per-broker
- read-only
- Auto-updating
- cluster-wide
Cluster-wide configurations can be updated dynamically across the whole cluster.
Sarah has been asked to retrieve some data from a Kafka topic. She decides to use the Confluent REST Proxy. Assuming REST Proxy has not been used with this cluster before, what is the first thing she should do?
- Subscribe the consumer to the topic.
- Enable the topic to serve data vía REST Proxy.
- Make a GET request to retrieve the records.
- Create a consumer and consumer instance.
- Create a consumer and consumer instance.
Before proceeding, she will need to create the consumer and consumer instance.
You have been asked to build a Kafka producer in Java. Which class can you use to handle interactions between your code and the cluster?
- KafkaProducer
- KafkaConsumer
- MockProducer
- KafkaPublisher
KafkaProducer handles interactions with the Kafka cluster.
Consider the following piece of code:
Producer producer = new KafkaProducer<>(props); ProducerRecord record = new ProducerRecord<>(“output_topic”, key, value); producer.send(record, (RecordMetadata metadata, Exception e) -> { if (e != null) { System.out.println(“Error publishing message: “ + e.getMessage()); } else { System.out.println(“Published message: key=“ + record.key() + “, value = “ + record.value() + “, topic=“ + metadata.topic() + “, partition=“ + metadata.partition() + ‘, offset=“ + metadata.offset()); } });
In the context of this code, what will be printed to the console as a result of the expression “metadata.offset()” in the “System.out.println” statement?
The offset of the record after it is published to the kafka topic.
This statement is part of a callback that is called after the record is published, and “metadata.offset()” refers to the record’s offset.
When using the “.poll()” method on a consumer, what will happen if you then execute“consumer.commitSync();”
- consumer.commitSync() provides the ability to perform manual offset commits. So, it will commit the consumer’s offsets to the cluster.
Which of the follow statements about Schema Registry compatibility checking is true?
- It will automatically determine which compatibility mode you need based on the changes you want to make.
- Compatibility checking allows you to decide what aspects of your schema can be changed.
- Compatibility checking merely warns you if you are making a change that is not allowed.
- It will guarantee that there are no problems as you update schemas.
- Compatibility checking allows you to decide what aspects of your schema can be changed.
Compatibility checking allows you to select a compatibility type to determine what can and cannot be changed.
What does consumer.subscribe( … ) do?
Determines which topic(s) the consumer will read from. This is because consumer.subscribe() sets a list of topics from which the consumer will consume records.
You have a cluster with 5 brokers, a topic with 3 partitions, and a replication factor of 2. How many replicas, total, exist for this topic?
6.
Every topic partition in Kafka is replicated n times, where n is the replication factor of the topic. There are 2 partitions, each with 3 replicas which total 6 replicas.
Which of the following data sets would be best modeled as a table? (Choose 2)
- Records of transactions at an airport restaurant.
- Real-time records that are created whenever a plane departs.
- The current status of which passengers have checked in for a flight.
- Passengers on a plane and which seat they are assigned to.
- The current status of which passengers have checked in for a flight.
Since this data represents a state that can be updated (i.e. when passengers check in), it would be best represented as a table. - Passengers on a plane and which seat they are assigned to.
Since this data represents a state that can be updated (i.e. a new passenger buys a ticket, or a passenger changes their seat), it would be best represented as a table.
What limits the number of records that can be processed in parallel with Kafka Streams?
- The number of records in the log.
- The number of instances of the streams application.
- The number of partitions in the topic.
- The topic replication factor.
- The number of partitions in the topic.
Streams consume records in the same way a consumer does, so it assigns a thread to each partition and can process one record per partition at a time.
A single instance can process multiple records in parallel, so the number of instances is not the limit.