Introduction Flashcards

1
Q

What is a paradigm shift that has been happening in the latest years (according to confluence)?

A

A shift from state based system to even driven systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What’s the basic analogy of what kafka is?

A

It’s a distributed log storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How many companies use kafka worldwide?

A

35% of the Fortune 500’s.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the basic use case for kakfa?

A

Processing large stream of events in event driven systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a producer in kakfa?

A

And application that gets data into the cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a Broker in kafka?

A

A broker is an individual “node” in a kafka cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a consumer in kakfa?

A

An application that processes the data from the cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Can a consumer also be a producer?

A

Yes. An application can be both and generate events as much as consume it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Can a consumer also be a producer?

A

Yes. An application can be both and generate events as much as consume it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the zookeeper ensamble?

A

It’s a small cluster responsible for keeping consensus and cluster status data for the kakfa cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Will zookeeper be used in the future?

A

Since version 2.8, it’s been available as an option to self manage the cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a topic in kafka?

A

Is a storage of related events (like a log).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Is there a limit on the number of topics?

A

There’s not theoretical limit on number of topics but a practical limit on the number of partitions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a partition in kafka?

A

Partitions are the blocks of data that compose a topic and can be stored into different brokers for the same of durability and replication.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How is data written to topics?

A

Data is always appended to the end of the “log”/topic and is immutable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the semantics of writing data to kafka?

A
  • You always write at the end of the log
  • Data is immutable
  • Data can have expiration date
  • Each event has a sequencial offset number
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the semantics of reading data from kakfa?

A
  • Reading doesn’t remove of destroy the data.

- Consumers read data independently reading from different offsets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the structure of a kafka record?

A
  • key
  • value
  • timestamp
  • optional headers
19
Q

What is the native language support for producers/consumers?

A

Java.

20
Q

What is the role of the key in a kafka record?

A

If a key is provided with the data, the key will be hashed out and the data stored in a particular partition. This guarantees that data with the same key is ordered correctly.
This is used when ordering of the data is important.

21
Q

What happens when you dont specify a key for a record?

A

It’s will be allocated to a partition in a round robin fashion.

22
Q

What is the consumer offset topic?

A

A special topic that keeps track of the offsets each consumer consumed.

23
Q

What is a consumer group?

A

A way to scale and group consumers that do the same job.

Each consumer is a consumer group by default.

24
Q

How do you subscript to multiple topics at once?

A

Either specify a list of names or a regular expression to match the names.

24
Q

How do you subscript to multiple topics at once?

A

Either specify a list of names or a regular expression to match the names.

25
Q

What is the code architecture of a producer?

A
  • KafkaProducer class.
  • Server list
  • Serializer for keys/value
  • ProducerRecord(topic, key, value)
  • producer.send(record)
26
Q

What is the code architecture of a consumer?

A
  • Server list
  • Group id
  • OnMessage()
  • OnError()
  • OnConsumerError()
  • consumer.subscribe(“topic”)
  • while(true) { consumer.poll() }
26
Q

What is the code architecture of a consumer?

A
  • Server list
  • Group id
  • OnMessage()
  • OnError()
  • OnConsumerError()
  • consumer.subscribe(“topic”)
  • while(true) { consumer.poll() }
27
Q

How long are topics kept by default?

A

1 week

28
Q

How do you set data retention policy?

A
  • Globally

- By topic

29
Q

How are delivery guarantees defined?

A
  • At most once
  • At least once
  • Exactly once
30
Q

Who’s a good candidate for “at most once” processing?

A

Data that you can afford to lose and care more about latency. Example: not so important logs.

31
Q

Who’s a good candidate for “at least once” processing?

A

Idempotent Producers/consumers where you don’t care if a duplicate slips through.

32
Q

What are the semantics for “exactly once” processing?

A
  • Strong transactional guarantees

- Give you a guarantee that you only process each message a single time.

33
Q

What is a compacted topic?

A

Is a topic that is configured to keep only the last event of each key when you don’t care about the previous events.

34
Q

What is kafka connect?

A

It’s a plugable system to integrate automatically with external systems like elastic, cassandra, mysql, etc.

35
Q

How does Kafka Connect run?

A

It’s its own service that is separate from the Kafka Cluster (probably kinda like Kibana in a way?).

36
Q

Does Kafka support http rest API?

A

Yes, via a separate REST proxy.

37
Q

What is the confluent schema registry?

A

It’s Confluent’s solution to the problem of versioning data, producer and consumers and handle schema migrations while keeping everyone in sync.

38
Q

How is the Schema Registry run?

A

It’s a separate application like Kafka Connect.

39
Q

What is AVRO?

A

It’s a JSON schema type definition from Apache.

40
Q

What is ksqlDB?

A

It’s an SQL like language to perform operations over streams of data (like aggregations, for example), so you don’t have to write a consumer for simple tasks.

41
Q

How do ksqlDB queries work?

A

They create a stream of data that is always running and outputting the result to a result topic.

42
Q

What is Kakfa Streams?

A

It’s a library to do more high level operations with topics like exactly once semantic, aggregations, filtering, microservice operations, etc.