Cassandra Flashcards
What are the peculiarities of Cassandra which differentiate it from other databases?
- No JOINS (do joins in application code if required)
- Data partitioned and sorted on disk based on Primary Key
- Leaderless (no zoo keeper, no leader election)
- Build-in sharding
- No downtime while adding/removing shards from the cluster
- Tunable consistency per query (R+W > N)
- Schema based (unlike Mongo)
What is “Hinted Handoff” in Cassandra?
When a node is offline and there is data to be written on that node then the coordinator node proxy for it and saves the write for 3 hrs. During that time if the node comes online the saved data is handoff to the node.
What is “Read Repair” in Cassandra?
Data on various replica nodes goes out of sync from time to time. E.G Write was done with CL < RF.
Now when a read request is made, the coordinator node is responsible to sync data across nodes. It requests data from multiple replicas. The coordinator picks the latest data and updates that to other replicas as well.
How to configure Cassandra for Strong Consistency?
In a Cassandra cluster of N nodes, R+W > N ensures strong consistency
R = 1, W = N Slow Write
R = N, W = 1 Slow Read
R & W Acks count can be specified per query (CL).
Which node act as the coordinator node in Cassandra?
Any node can act as a coordinator and each node is aware of partitioning and can route reads or writes to the correct node.
How do we define the replication factor in Cassandra? What does RF = 2 mean?
While creating the table.
RF = 2 means 1 Original + 1 Copy
What will be the max values of Read and Write Acks or Consistency Level per query?
Equal to Replication Factor. Typically, we will choose lesser values to increase the application performance.
What is the difference between Primary Key, and Partitioning Key in Cassandra?
Primary Key = ((Partition Key), Clustering Column)
The partitioning Key is equivalent to the sharding key and defines the boundary of partitions.
The clustering column specifies the sorting requirement.
The primary key must be unique in a table.
The primary key in SQL databases is any key that is unique.
Cassandra does not provide referential integrity across tables (videos, video_by_user). How to achieve that and what are the limitations?
Using transactions (aka Log Batch) that offer atomicity using rollbacks. However, these transactions are not isolated. Hence, not ACID.