Cassandra Flashcards
KeySpace
Keyspace is the outermost container for data in Cassandra. The basic attributes of a Keyspace in Cassandra are −
Replication factor − It is the number of machines in the cluster that will receive copies of the same data.
Replica placement strategy − It is nothing but the strategy to place replicas in the ring.
Column families − Keyspace is a container for a list of one or more column families. A column family, in turn, is a container of a collection of rows. Each row contains ordered columns. Column families represent the structure of your data. Each keyspace has at least one and often many column families.
https://www.tutorialspoint.com/cassandra/images/keyspace.jpg
Link
https://www.tutorialspoint.com/cassandra/cassandra_data_model.htm
KeySpace creation
CREATE KEYSPACE Keyspace name
WITH replication = {‘class’: ‘SimpleStrategy’, ‘replication_factor’ : 3};
Column Family
A column family is a container for an ordered collection of rows. Each row, in turn, is an ordered collection of columns.
In Cassandra, although the column families are defined, the columns are not. You can freely add any column to any column family at any time.
Unlike relational tables where a column family’s schema is not fixed, Cassandra does not force individual rows to have all the columns.
https://www.tutorialspoint.com/cassandra/images/cassandra_column_family.jpg
Column Family (or Table)
Tables store data in rows and columns, but unlike relational databases, each row can have different columns.
Cassandra does not enforce foreign keys or joins.
Each row must have a PRIMARY KEY (Partition Key + Optional Clustering Columns).
CREATE TABLE users (
user_id UUID PRIMARY KEY,
name TEXT,
email TEXT,
created_at TIMESTAMP
);
Partitioning Key or Clustering Key
Partition Key (Required) → Determines which node stores the data.
Clustering Columns (Optional) → Defines row sorting within a partition.
CREATE TABLE orders (
user_id UUID, – Partition Key (distributes data across nodes)
order_id UUID, – Clustering Column (orders sorted by order_id)
item TEXT,
price DECIMAL,
order_date TIMESTAMP,
PRIMARY KEY (user_id, order_id) – Compound Primary Key
);
Partition Key (user_id) ensures all orders of a user are stored together.
Clustering Column (order_id) sorts orders within each user’s partition.
Denormalization and Data Duplication
Cassandra denormalizes data instead of using joins. Data is modeled for queries, not for normalization.
CREATE TABLE user_orders (
user_id UUID,
order_id UUID,
product_id UUID,
product_name TEXT,
quantity INT,
price DECIMAL,
order_date TIMESTAMP,
PRIMARY KEY ((user_id), order_id, product_id) – Partitioned by user_id
);
Orders are partitioned by user_id (ensuring all a user’s orders are together).
Data redundancy helps eliminate joins, improving query speed.
Primary Key
Simple Primary Key - PRIMARY KEY (user_id)
Composite Primary Key - PRIMARY KEY (user_id, order_id) - Uses a Partition Key + Clustering Column, allowing multiple rows per partition.
Compound Primary Key - PRIMARY KEY ((user_id), order_id, product_id) - Uses Partition Key (user_id) + Multiple Clustering Columns (order_id, product_id) to organize data within partitions.
Super Column
Super Columns are now deprecated, and the preferred way to model hierarchical data in Cassandra is by using tables with composite primary keys.
Query Patterns
query-first approach to start designing the data model for an application.
Q1. Find hotels near a given point of interest.
Q2. Find information about a given hotel, such as its name and location.
Q3. Find points of interest near a given hotel.
To name each table, you’ll identify the primary entity type for which you are querying and use that to start the entity name. If you are querying by attributes of other related entities, append those to the table name, separated with by. For example, hotels_by_poi.
Artem Chebotko
Way to represent Cassandra Data model
Wide Partition Pattern/ Wide Row pattern
The essence of the pattern is to group multiple related rows in a partition in order to support fast access to multiple rows within the partition in a single query.
Basically all data related to the partition is in multiple rows in the same partition.
let’s now consider how to support query Q4 to help the user find available rooms at a selected hotel for the nights they are interested in staying. Note that this query involves both a start date and an end date. Because you’re querying over a range instead of a single date, you know that you’ll need to use the date as a clustering key. Use the hotel_id as a primary key to group room data for each hotel on a single partition, which should help searches be super fast. Let’s call this the available_rooms_by_hotel_date table.
Time Series Pattern
The time series pattern is an extension of the wide partition pattern. In this pattern, a series of measurements at specific time intervals are stored in a wide partition, where the measurement time is used as part of the partition key. This pattern is frequently used in domains including business analysis, sensor data management, and scientific experiments.
Use Cases
Cassandra is well-suited for use cases requiring high availability, fault tolerance, and linear scalability, such as real-time analytics, IoT data management, and content delivery systems. Its decentralized architecture and support for multi-data center replication make it ideal for applications requiring continuous availability and resilience to hardware failures.
Cassandra Schema
CREATE KEYSPACE reservation WITH replication = {‘class’:
‘SimpleStrategy’, ‘replication_factor’ : 3};
CREATE TYPE reservation.address (
street text,
city text,
state_or_province text,
postal_code text,
country text );
CREATE TABLE reservation.reservations_by_confirmation (
confirm_number text,
hotel_id text,
start_date date,
end_date date,
room_number smallint,
guest_id uuid,
PRIMARY KEY (confirm_number) )
WITH comment = ‘Q6. Find reservations by confirmation number’;
CREATE TABLE reservation.reservations_by_hotel_date (
hotel_id text,
start_date date,
end_date date,
room_number smallint,
confirm_number text,
guest_id uuid,
PRIMARY KEY ((hotel_id, start_date), room_number) )
WITH comment = ‘Q7. Find reservations by hotel and date’;
CREATE TABLE reservation.reservations_by_guest (
guest_last_name text,
hotel_id text,
start_date date,
end_date date,
room_number smallint,
confirm_number text,
guest_id uuid,
PRIMARY KEY ((guest_last_name), hotel_id) )
WITH comment = ‘Q8. Find reservations by guest name’;
CREATE TABLE reservation.guests (
guest_id uuid PRIMARY KEY,
first_name text,
last_name text,
title text,
emails set,
phone_numbers list,
addresses map<text,
frozen<address>,
confirm_number text )
WITH comment = ‘Q9. Find guest by ID’;