NoSQL Flashcards
Strengths of RDBMS (4)
- Consistency (ACID)
- Integration of data with schema normalization
- SQL language well known
- Robust (40+ years in organizations)
Weaknesses of RDBMS (4)
- Bad scaling
- Prioritizes consistency over latency
- Schema rigidity (no evolution)
No SQL common features (4)
- No just rows and tables
- Freedom from joins
- Schemaless or soft-schema
- Distributed architecture (cluster)
Can NoSQL systems be used for OLAP?
Possibly, but through analytical tools like Spark
4 main data models seen in NoSQL
- Key-value
- Document
- Wide column
- Graph
Graph data model
Focuses on relationship between data elements with vertices representing the entity, arcs representing relationship between entities, and properties descriping the vertices
2 examples of graph database queries
- Find friends of friends
(user)-[:KNOWS]-(friend)-[:KNOWS]-(foaf) - Find shortest path between A and B
shortestPath(:KNOWS*…5]-(userB))
What is the opposite of Graph Modeling?
Aggregate modeling
(key-value, document, wide-column)
What do we call tables in the document data model?
Collections, which hold a list of documents (often JSON format)
What does each document need to contain in the document data model? (2)
A set of fields corresponding to Key-value pair and mandatory ID
{
“_id”: 1,
“name”: “Martin”,
“adrs”: [
{“street”:”Adam”, “city”:”Chicago”, “state”:”illinois”, “code”:60007},
{“street”:”9th”, “city”:”NewYork”, “state”:”NewYork”, “code”:10001}
],
“orders”: [ {
“orderpayments”:[
{“card”:477, “billadrs”: {“street”:”Adam”, “city”:”Chicago”, “state”:”illinois”, “code”:60007}},
{“card”:457, “billadrs”: {“street”:”9th”, “city”:”NewYork”, “state”:”NewYork”, “code”:10001}}
],
“products”:[
{“id”:1, “name”:”Cola”, “price”:12.4},
{“id”:2, “name”:”Fanta”, “price”:14.4}
],
“shipAdrs”: {“street”:”9th”, “city”:”NewYork”, “state”:”NewYork”, “code”:10001}
}]
We can query this into different sections, such as Product Collection and Order Collection to return document with just those relevant fields
What does a key contain and what does a value contain in document data model
unique string (path, queries, REST calls, ID)
BLOB (binary large object) - HTML, pdf etc
Why is the value considered a black box when querying key-values in NoSQL?
There are no indexes on the values, no “where” clauses allowed. Schema information is often indicated in the key
Ex:
Key Value
user: 1234: name Enrico
Wide-column data model
in a RDBMS, data is stored in tables with rows that span a certain number of columns. If a particular record/row needs another column, you have to add it to the entire table. In Wide-column, you don’t
Key-value vs wide-column
Key-Value databases are the simplest model and can be thought of as a configuration file or a two-column table of keys with an associated value. Wide-column databases expand that key-value store concept across multiple columns, but only the columns that are needed for that record.
How to query wide-column data
SQL-like language works since its similar to relational model
Is it easier to scale aggregate data or graph?
Aggregate, because splitting graph data across cluster often means arcs are “cut” meaning several cross-machine links
sharding
distributing data across different nodes
replication
creating copies of the data on several nodes
3 good practices for sharding
- Data locality (italian customer data in european data center)
- Balance (same amount of data on each node)ù
- Related data accessed together (orders for each client stored on same node)
Hash data partition strategy
equal distribution across nodes, but range queries become inefficient
range data partition strategy
distribute based off value ranges, can lead to heavy data redistribution
Master job in NoSQL
manage data and handle write operations
Slaves job in NoSQL
Enable read operations and become master if the master fails
peer-to-peer replication
different from master-slave model, each node has same importance and can handle write operations, but two users may update the same value from different replicas…
Write conflict mitigation methods (3)
Last write wins
conflict prevention - verify that value hasn’t changed since last read
conflict detection - preserve history, merge results and let user decide
Consistency in RDBMS (ACID)
Atomicity (no partial transactions)
Consistency
Isolation
Durability
Consistency in NoSQL (CAP)
A distributed system can only guarantee 2 out of the following 3 properties:
- Consistency (C)
- Availability (A)
- Partition Tolerance (P)
P.AC E.LC model
Refinement of CAP theorem adding another dimension
PA EL= “Prioritize Availability”
PA EC= “Sacrifice consistency only for partitioning”
PC EL = “Enforce consistency during partitioning”
PC EC = “Strong consistency at all times”