Avro Flashcards
Why AVRO for Kafka?
Avro relies on a schema. This means every field is properly described and documented
Avro data format is a compact binary format, so it takes less space both on a wire and on a disk
It has support for a variety of programming languages
in Avro, every message contains the schema used to serialize it. That means that when you’re reading messages, you always know how to deserialize them, even if the schema has changed
Disadvantages Avro for Kafka
Every Avro message contains the schema used to serialize the message
Separating Schema from Message
That’s where a Schema Registry comes into play. Schema Registry is developed by Confluent, a company behind Apache Kafka, and it provides a RESTful interface for storing and receiving Avro schemas.
Instead of sending the schema inside a Kafka record, a producer starts by checking whether schema already exists in the Schema Registry. If not, it will write the schema there (step 1 below). Then the producer will obtain the id of the schema (step 2) and send that id inside the record (step 3), saving a lot of space this way. The consumer will read the message (step 4) and then contact the Schema Registry with the schema id from the record to get the full schema (step 5) and cache it locally.
https://i0.wp.com/codingharbour.com/wp-content/uploads/2020/03/schema_registry.jpg?resize=650%2C324&ssl=1
Schema Creation
Avro describes its Schema using a JSON format. There are mainly four attributes for a given Avro Schema:
Type- which describes the type of Schema whether its complex type or primitive value
Namespace- which describes the namespace where the given Schema belongs to
Name – the name of the Schema
Fields- which tells about the fields associated with a given schema. Fields can be of primitive as well as complex type.
Sample avsc file
{
“namespace”: “com.mailshine.springboot.kafka.avro.model”,
“type”: “record”,
“name”: “Student”,
“fields”: [
{
“name”: “studentName”,
“type”: “string”
},
{
“name”: “studentId”,
“type”: “string”
},
{
“name”: “age”,
“type”: “int”
}
]
}
Avro Example 2
{
“type”: “record”,
“name”: “thecodebuzz_schema”,
“namespace”: “thecodebuzz.avro”,
“fields”: [
{
“name”: “username”,
“type”: “string”,
“doc”: “Name of the user account on Thecodebuzz.com”
},
{
“name”: “email”,
“type”: “string”,
“doc”: “The email of the user logging message on the blog”
},
{
“name”: “timestamp”,
“type”: “long”,
“doc”: “time in seconds”
}
],
“doc:”: “A basic schema for storing thecodebuzz blogs messages”
}
Schema Evolution with example
Initial Schema
{
“type”: “record”,
“name”: “User”,
“fields”: [
{ “name”: “name”, “type”: “string” }
]
}
Updated Schema
{
“type”: “record”,
“name”: “User”,
“fields”: [
{ “name”: “name”, “type”: “string” },
{ “name”: “age”, “type”: “int”, “default”: 0 }
]
}
Older data without age can still be read with a default value of 0.
Complex Types
record - Collection of named fields
enum - A set of predefined symbols
array - A list of values of specified type
map - A collection of key value pairs with string keys and values of a specified type
union - A field that can hold one of multiple types
fixed - A fixed size byte array
Logical Types
Logical types add semantic meaning to primitive types (e.g., date, time, decimal).
Union Example
{
“type”: “record”,
“name”: “User”,
“fields”: [
{ “name”: “name”, “type”: “string” },
{ “name”: “age”, “type”: “int” },
{ “name”: “email”, “type”: [“null”, “string”], “default”: null }
]
}
Fields:
name: Mandatory string.
age: Mandatory integer.
email: Optional string with a default value of null (union type).
{ “name”: “Alice”, “age”: 25 }
Enum Example
{
“type”: “record”,
“name”: “Order”,
“fields”: [
{ “name”: “id”, “type”: “int” },
{ “name”: “status”, “type”: { “type”: “enum”, “name”: “Status”, “symbols”: [“PENDING”, “SHIPPED”, “DELIVERED”] } }
]
}
{ “id”: 101, “status”: “PENDING” }
Arrays Example
{
“type”: “record”,
“name”: “ShoppingCart”,
“fields”: [
{ “name”: “items”, “type”: { “type”: “array”, “items”: “string” } }
]
}
{ “items”: [“book”, “pen”, “notebook”] }
Map Example
{
“type”: “record”,
“name”: “Metadata”,
“fields”: [
{ “name”: “properties”, “type”: { “type”: “map”, “values”: “string” } }
]
}
{ “properties”: { “color”: “red”, “size”: “medium” } }
Logical Type
Logical types extend primitive types with additional semantics.
{
“type”: “record”,
“name”: “Event”,
“fields”: [
{ “name”: “timestamp”, “type”: { “type”: “long”, “logicalType”: “timestamp-millis” } }
]
}
{ “timestamp”: 1625841723000 }
Nested Record
{
“type”: “record”,
“name”: “Employee”,
“fields”: [
{
“name”: “personalDetails”,
“type”: {
“type”: “record”,
“name”: “PersonalDetails”,
“fields”: [
{ “name”: “name”, “type”: “string” },
{ “name”: “age”, “type”: “int” },
{ “name”: “email”, “type”: [“null”, “string”], “default”: null }
]
}
},
{
“name”: “jobDetails”,
“type”: {
“type”: “record”,
“name”: “JobDetails”,
“fields”: [
{ “name”: “designation”, “type”: “string” },
{ “name”: “salary”, “type”: “double” },
{ “name”: “department”, “type”: “string” }
]
}
}
]
}
Nested records with referencing
{
“type”: “record”,
“name”: “Address”,
“fields”: [
{ “name”: “street”, “type”: “string” },
{ “name”: “city”, “type”: “string” },
{ “name”: “state”, “type”: “string” },
{ “name”: “zipCode”, “type”: “string” }
]
},
{
“type”: “record”,
“name”: “Person”,
“fields”: [
{ “name”: “name”, “type”: “string” },
{ “name”: “age”, “type”: “int” },
{ “name”: “address”, “type”: “Address” }
]
}
Example
{
“name”: “John Doe”,
“age”: 35,
“address”: {
“street”: “123 Main St”,
“city”: “Springfield”,
“state”: “IL”,
“zipCode”: “62701”
}
}