Avro Flashcards

1
Q

Why AVRO for Kafka?

A

Avro relies on a schema. This means every field is properly described and documented
Avro data format is a compact binary format, so it takes less space both on a wire and on a disk
It has support for a variety of programming languages
in Avro, every message contains the schema used to serialize it. That means that when you’re reading messages, you always know how to deserialize them, even if the schema has changed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Disadvantages Avro for Kafka

A

Every Avro message contains the schema used to serialize the message

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Separating Schema from Message

A

That’s where a Schema Registry comes into play. Schema Registry is developed by Confluent, a company behind Apache Kafka, and it provides a RESTful interface for storing and receiving Avro schemas.
Instead of sending the schema inside a Kafka record, a producer starts by checking whether schema already exists in the Schema Registry. If not, it will write the schema there (step 1 below). Then the producer will obtain the id of the schema (step 2) and send that id inside the record (step 3), saving a lot of space this way. The consumer will read the message (step 4) and then contact the Schema Registry with the schema id from the record to get the full schema (step 5) and cache it locally.

https://i0.wp.com/codingharbour.com/wp-content/uploads/2020/03/schema_registry.jpg?resize=650%2C324&ssl=1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Schema Creation

A

Avro describes its Schema using a JSON format. There are mainly four attributes for a given Avro Schema:

Type- which describes the type of Schema whether its complex type or primitive value
Namespace- which describes the namespace where the given Schema belongs to
Name – the name of the Schema
Fields- which tells about the fields associated with a given schema. Fields can be of primitive as well as complex type.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Sample avsc file

A

{
“namespace”: “com.mailshine.springboot.kafka.avro.model”,
“type”: “record”,
“name”: “Student”,
“fields”: [
{
“name”: “studentName”,
“type”: “string”
},
{
“name”: “studentId”,
“type”: “string”
},
{
“name”: “age”,
“type”: “int”
}
]
}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Avro Example 2

A

{
“type”: “record”,
“name”: “thecodebuzz_schema”,
“namespace”: “thecodebuzz.avro”,
“fields”: [
{
“name”: “username”,
“type”: “string”,
“doc”: “Name of the user account on Thecodebuzz.com”
},
{
“name”: “email”,
“type”: “string”,
“doc”: “The email of the user logging message on the blog”
},
{
“name”: “timestamp”,
“type”: “long”,
“doc”: “time in seconds”
}
],
“doc:”: “A basic schema for storing thecodebuzz blogs messages”
}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Schema Evolution with example

A

Initial Schema
{
“type”: “record”,
“name”: “User”,
“fields”: [
{ “name”: “name”, “type”: “string” }
]
}

Updated Schema
{
“type”: “record”,
“name”: “User”,
“fields”: [
{ “name”: “name”, “type”: “string” },
{ “name”: “age”, “type”: “int”, “default”: 0 }
]
}
Older data without age can still be read with a default value of 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Complex Types

A

record - Collection of named fields
enum - A set of predefined symbols
array - A list of values of specified type
map - A collection of key value pairs with string keys and values of a specified type
union - A field that can hold one of multiple types
fixed - A fixed size byte array

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Logical Types

A

Logical types add semantic meaning to primitive types (e.g., date, time, decimal).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Union Example

A

{
“type”: “record”,
“name”: “User”,
“fields”: [
{ “name”: “name”, “type”: “string” },
{ “name”: “age”, “type”: “int” },
{ “name”: “email”, “type”: [“null”, “string”], “default”: null }
]
}
Fields:
name: Mandatory string.
age: Mandatory integer.
email: Optional string with a default value of null (union type).
{ “name”: “Alice”, “age”: 25 }

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Enum Example

A

{
“type”: “record”,
“name”: “Order”,
“fields”: [
{ “name”: “id”, “type”: “int” },
{ “name”: “status”, “type”: { “type”: “enum”, “name”: “Status”, “symbols”: [“PENDING”, “SHIPPED”, “DELIVERED”] } }
]
}

{ “id”: 101, “status”: “PENDING” }

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Arrays Example

A

{
“type”: “record”,
“name”: “ShoppingCart”,
“fields”: [
{ “name”: “items”, “type”: { “type”: “array”, “items”: “string” } }
]
}

{ “items”: [“book”, “pen”, “notebook”] }

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Map Example

A

{
“type”: “record”,
“name”: “Metadata”,
“fields”: [
{ “name”: “properties”, “type”: { “type”: “map”, “values”: “string” } }
]
}

{ “properties”: { “color”: “red”, “size”: “medium” } }

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Logical Type

A

Logical types extend primitive types with additional semantics.

{
“type”: “record”,
“name”: “Event”,
“fields”: [
{ “name”: “timestamp”, “type”: { “type”: “long”, “logicalType”: “timestamp-millis” } }
]
}

{ “timestamp”: 1625841723000 }

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Nested Record

A

{
“type”: “record”,
“name”: “Employee”,
“fields”: [
{
“name”: “personalDetails”,
“type”: {
“type”: “record”,
“name”: “PersonalDetails”,
“fields”: [
{ “name”: “name”, “type”: “string” },
{ “name”: “age”, “type”: “int” },
{ “name”: “email”, “type”: [“null”, “string”], “default”: null }
]
}
},
{
“name”: “jobDetails”,
“type”: {
“type”: “record”,
“name”: “JobDetails”,
“fields”: [
{ “name”: “designation”, “type”: “string” },
{ “name”: “salary”, “type”: “double” },
{ “name”: “department”, “type”: “string” }
]
}
}
]
}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Nested records with referencing

A

{
“type”: “record”,
“name”: “Address”,
“fields”: [
{ “name”: “street”, “type”: “string” },
{ “name”: “city”, “type”: “string” },
{ “name”: “state”, “type”: “string” },
{ “name”: “zipCode”, “type”: “string” }
]
},
{
“type”: “record”,
“name”: “Person”,
“fields”: [
{ “name”: “name”, “type”: “string” },
{ “name”: “age”, “type”: “int” },
{ “name”: “address”, “type”: “Address” }
]
}

Example
{
“name”: “John Doe”,
“age”: 35,
“address”: {
“street”: “123 Main St”,
“city”: “Springfield”,
“state”: “IL”,
“zipCode”: “62701”
}
}