all Flashcards
Which are the security protocols
- PLAINTEXT
- SSL
- SASL_PLAINTEXT
- SASL_SSL
broker.id
- Broker config
- General broker parameter
- integer
listeners
- Broker config
- General broker parameter
- comma-separated list of URIs
- URI look like:
<protocol>://<hostname>:<port> e.g. SSL://localhost:9091
what happens if a broker’s listener port is lower than 1024
Kafka must be started as root
listener.security.protocol.map
- General broker parameter
- configured if a listener name is not a common security protocol
zookepeer.connect
- Broker config
- General broker parameter
- semicolon-separated (semicolon) list of hostname:port/path
path is optional chroot path
log.dirs
- Broker config
- General broker parameter
- the directories where log segments are stored
- one partition’s log segments are stored within the same path
- broker will store partitions in “least used” fashion
-defaults to log.dir (singular) if missing
num.recovery.threads.per.data.dir
- Broker config
- General broker parameter
- num threads per log dir
- threads are used to:
- open log segment files
- close log segment files
- check and truncate log segment files after failure
- safe to increase their number
auto.create.topics.enable
- Broker config
- General broker parameter
- the broker will automatically create topic when:
- producer starts writing
- consumer starts reading
- any client requests metadata
auto.leader.rebalance.enable
- Broker config
- General broker parameter
- enables background thread checking distribution of partitions
- seeks to avoid having topic leadership concentrated in one or few brokers
leader.imbalance.check.interval.seconds
- Broker config
- General broker parameter
- every how many seconds the broker will check for partition leader imbalances
leader.imbalance.per.broker.percentage
- Broker config
- General broker parameter
- if leadership imbalance exceeds this value, then a rebalance is initiated
delete.topic.enable
- General broker parameter
- dangerous
num.partitions
- Broker config
- topic default
- defaults to 1
-primarily used when auto topic creation is enabled - partitons can never be decreased
default.replication.factor
- Broker config
- topic default
- if auto-topic creation enabled, this value sets the replication factor
- should be at least 1 over the min.insync.replicas (RF+)
- even better is RF++ to allow maintenance and prevent outages
log.retention.ms
- Broker config
- topic default
- takes precedence over log.retention.minutes and log.retention.hours
- how long kafka will retain messages
- retention is performed by examining the last modified time on each log segment file on disk. The tome the log segment was closed.
- this retention is on topic level
- if log.retention.bytes has also been configured, messages may be removed when either criteria is met
log.retention.minutes
- Broker config
- topic default
- takes precedence over log.retention.hours
log.retention.hours
- Broker config
- topic default
- see log.retention.ms
log.retention.bytes
- Broker config
- topic default
- applied per partition (bytes per partition, so adding partitions increases total topic retention size
- can happen to have both this and log.retention.ms set… then messages may be removed when either criteria is met
log.segment.bytes
- Broker config
- topic default
- defaults to 1GB
- once segment reaches the size soecified in the log.segment.bytes, the segment is closed and it can be considered for expiration
log.roll.ms
- Broker config
- topic default
- the amount of time after which a log segment should be closed
- not mutually exclusive with log.segment.bytes
- consider that multiple log segments will be closed at the same time (impact on disk performance) for low volume partitions
min.insync.replicas
- Broker config
- topic default
- defaults to 1
- how many replicas need to acknowledge the write for it to be successful
- setting it to 2 ensures 2 replicas are in sync with the producer
message.max.bytes
- Broker config
- topic default
- defaults to 1MB
- messages larger than this value will not be accepted and producer will get error message
- this value is the max size of a compressed message
- must be coordinated with the configs:
- fetch.message.max.bytes
- replica.fetch.max.bytes
Major factors for performance bottlenecks
- disk throughput
- disk capacity
- memory
- CPU
- networking
Faster disk writes =
lower produce latency
What part of memory is more important for Kafka
Page Cache, the heap is just for the JVM and 5GB will do for 150k messages / second and data rate of 200 megabits per second
Why is there a networking imbalance
outbound traffic higher than inbound (many consumers for one producer). Recommended 10GB NICs
Does Kafka need extremely performant CPU
No, kafka uses CPU to decompress message batches to validate the checksum and then recompresses the batches… that’s all
Kafka per broker size recommendations
- < 14K partition replicas
- < 1M replicas per cluster
Broker configuration requirements
- all brokers must have same `zookeper.connect
- all brokers must have unique `broker.id
OS Tuning - Virtual Memory
- set vm.swappiness = 1 (i.e. do not swap unless there is an out-of-memory condition)
- vm.dirty_background_ratio = 5 (default is 10), it’s a % of total system memory)
- vm.dirty_ratio = 60 to 80 (default is 20, % of total system memory before synchronous flush to disk.
- vm.max_map_count = 400k to 600k (these are the files descriptor needed)
- vm.overcommit = 0 (it’s the default)
vm.swappiness
- OS virtual memory setting
- set vm.swappiness = 1 (i.e. do not swap unless there is an out-of-memory condition)
vm.dirty_background_ratio
- OS virtual memory setting
- set vm.dirty_background_ratio = 5 (default is 10), it’s a % of total system memory allowed in dirty pages before process to flush them to disk starts)
vm.dirty_ratio
- OS virtual memory setting
- set vm.dirty_ratio = 60 to 80 (default is 20, % of total system memory before synchronous flush to disk.
vm.max_map_count
- OS virtual memory setting
- vm.max_map_count = 400k to 600k (these are the files descriptor needed)
vm.overcommit
- OS virtual memory setting
- vm.overcommit = 0 (it’s the default). setting to 0 means the kernel determines the amount of free memory from an application
OS tuning - Disk
- XFS filesystem better tan Ext4
- set `noatime mount option (i.e. no access-time writes. Disabling acces-time writes is safe)
- set `largeio which improves efficiency for larger disk writes
OS tuning - networking
- increase socket buffer sizes
1. net.core.wmem.default = 131072 (128KiB)
2. net.core.rmem.default = 131072 (128KiB)
3. net.core.wmem.max = 2097152 (2MiB)
4. net.core.rmem.max = 2097152 (2MiB) - increase TCP socket buffer sizes
1. net.ipv4.tcp_wmem=<min> <default> <max>
2. net.ipv4.tcp_rmem=<min> <default> <max>
e.g. 4096 65536 2048000 (4KiB, 64KiB, 2MiB)</max></default></min></max></default></min> - net.ipv4.tcp_window_scaling=1 allows more efficient data transfers
- net.ipv4.tcp_max_syn_backlog= above 1024 allows more simultaneous connections
- net.core.netdev_max_backlog= more than 1000 good for bursts of network traffic
Kafka producer - mandatory properties
- bootstrap.servers
- key.serializer
- value.serializer
kafka producer - primary send methods
- fire-and-forget
- synchronous send
- asynchronous send