NoSQL - Week 9 Flashcards
Can a NoSQL database support SQL?
Yes, some support languages that look a bit like SQL
What does NoSQL revisit about relational databases?
- The provision of a declared schema
- Strict transactions
What is Data Integrity?
As enforced by a schema, and relied upon by applications
What are strict transactional semantics?
That concurrent programs do not lead to inconsistencies
What is important in enterprise applications, but may not be prioritised in all cases over scale and flexibility?
Data integrity and strict transactional semantics
What needs are NoSQL solutions typically associated with?
Elastic scaling, particularly the ability to grow rapidly for web-scale applications
Simple Operations, in particular accommodating data that tends to be accessed / updated in isolation
Examples: shopping carts, user profiles, blog posts, calendar data, product stock data, customer data, hotel availability, …
The queries don’t refer to all shopping carts or all product data, just to individual chunks
What kind of database is useful if you’re expecting 100,000 users but may get 100 million
NoSql
What are the 6 abilities that distinguish NoSQL databases?
- To horizontally scale the throughput of simple-operation workloads over many servers.
- To replicate and distribute data (through partitioning) over (thousands of) servers
- To expose a simple call-level interface or protocol
- To offer less strict transactional guarantees
- To use distributed indexes efficiently for replication-rich, elastic provision of data storage
- To cope with variations in the structure of objects stored.
What are the three types NoSQL databases are classified into?
Key-Value
Document
Wide Column
What is a key-value database?
Access or update a value given a key; the database doesn’t necessarily provide much functionality for the value e.g. queries
(Redis, Oracle NoSQL, Amazon DynamoDB)
What is a document database?
Access or update a document using a key; the database will provide functionality for accessing the value (e.g. queries)
Examples: Couchbase, MongoDB
What is a wide column database?
Access or update a collection of column families associated with a key (e.g. CustomerID links to contact details, sales details, marketing details)
Examples: Apache Cassandra, HBase
What is a Key-Value store?
Associate a key with some data.
It is the responsibility of the application to create and operate on the data.
What is the API for a key-value store?
Put(Key,Value)
Get(key)
Delete(key)
What is a BLOB?
Binary Large OBject
What is a document store?
Associate a key with some data.
Format the data with some recognized format (e.g. JSON)
What is the API for a document store?
Put(key, document)
Get(key)
Find(key,filter)
Delete(key)
What is a wide column?
Associate a key with some data structured using column families.
API:
- Put(Key, {column family})
- Get(key)
- Delete(key)
- Find(key, filter)
- Update(key, expression)
How do you partition data in NoSQL databases vs Relational databases?
Partitioning involves allocating data to different nodes.
In relational databases, partitioning can be vertical (put the data of selected columns together) or horizontal (put collections of rows together)
In NoSQL all the data associated with a key is stored on a single node, so data is horizontally partitioned. This is also known as sharding.
Is there a standard language or model for NoSQL?
No
Is there a standard language or model for relational databases?
Yes, the SQL standard
What is Amazon Dynamo?
A key-value NoSQL database developed to support amazon services, and Amazon hosted applications
It has a different internal model from DynamoDB
Designed for web-scale applications: handling user profiles, shopping carts, game states, leader boards, …
Each service that uses Dynamo has its own instances
What are the Stonebreaker-Cattell Rules
R1 Look for shared-nothing scalability
R2 High-level languages are good and need not hurt performance
R3 Plan to carefully leverage main memory databases
R4 High availability and automatic recovery are essential for simple-operation scalability
R5 Online everything
R6 Avoid multi-node operations
R7 Don’t try to build ACID consistency yourself
R8 Look for administrative simplicity
R9 Pay attention to node performance
R10 Open source gives you more control over your future
What is a shared-memory design, what is it’s problem?
e.g. multicore, single-node DBMSs over shared primary and secondary memory
Suffer from contention and starve the cores, forcing designers into sharding.
Can only scale to tens of nodes.
What is a shared-disk design, what is it’s problem?
(e.g. a multi-core, single node DBMS with private memory per CPU but sharing secondary memory) suffer from complex buffer and lock management needs which limit scalability.
Can only scale to tens of nodes.
What is a shared-nothing design, what makes them scalable?
(e.g. each node with its own private and secondary memory)
Are scalable if partitioning is load-balancing (does not lead to hotspots) and if operations touch as few partitions as possible (ideally one)
Can scale to hundreds or thousands of nodes.
Does Amazon Dynamo follow R1?
Runs on normal data center nodes, and is shared nothing.
Support get(key), put(key, value), delete (key) operations that act on a single partition.
Scales to hundred or thousands of nodes