NoSQL Flashcards
4 points addressed by NoSQL database
non-relational
distributed
open-source
horizontal scalable (we increase the computing power by adding more nodes to the system rather than upgrading an individual node)
6 defining characteristics in NoSQL database
- schema-free
- easy replication support
- simple API
- eventually consistent
- BASE principles (not ACID principles)
- huge data amounts
6 examples of NoSQL DBs
hadoop/Hbase cassandra Amazon SimpleDB MongoDB Apache Flink Google BigTable
What does BASE stand for?
Basically Available
Soft state
Eventually Consistent
6 Characteristics of BASE database
- weak consistency (stale data ok)
- availability first
- best effort
- approximate answers ok
- aggressive
- simpler and faster
What is CAP theorem and how does it apply to NoSQL databases
the idea that it is theoretically impossible to have all 3 of consistency, availability, and partition tolerance
You can only have at most 2 of these. The NoSQL database you choose to use will be mostly based on which of these two characteristics you need the most
CAP Theorem: Consistency
all servers in the system will have the same data so anyone using the system will get the same copy regardless of which server answers their request
CAP Theorem: Availability
the system will always respond to a request (even if it’s not the latest data or consistent or just a message saying the system isn’t working)
CAP Theorem: Partition Tolerance
the system continues to operate as a whole even if individual servers fail or cannot be reached
4 NoSQL database types
column store
document store
key-value store
graph database
4 basic key-values function calls
- Get(key) - return value
- Put(key, value) - add a key value pair
- Multi-get(key1,..,keyN) return list of values associated with list of keys
- Delete(key) - remove key-value pair from data store
2 main issues with Key-value stores
- this model does not provide any traditional database capabilities such as atomicity
- maintaining unique values for keys may become more difficult as the volume of data increases
What is a document DB?
expands on key-value store idea but keys refer to “documents” which can contain more complex data
documents hold “semi-structured” data
Example of Key-value store
AWS DynamoDB
2 examples of document DBs
Couch DB
Mongo DB
What is a column store DB?
they store cells in a corresponding column as a continuous disk entry
relational databases store individual rows as continuous disk entries
What is the benefit of a column store DB?
accessing a single attribute, searching through a single attribute, and aggregation all only require one disk reference
what is a column family?
a logical grouping of columns. The column entries have IDs that allow columns in the column family to be joined to produce a full picture of the data
What is a graph DB?
a database based on a graph where data is represented by vertices and the relationships between the data are represented by edges
What is the benefit of a graph DB?
it’s ideal for representing complex relationships
What is MapReduce?
a programming paradigm that is a technique for indexing and searching large data volumes
The two phases of the MapReduce paradigm
the Map phase:
-extracting sets of key-value pairs from underlying data potentially in parallel of different machines
the Reduce phase:
-merge and sort sets of key-value pairs