The NoSQL Ecosystem Flashcards
NoSQL: definition, according to the community
Not Only a SQL interface, referring to providing an alternative rather than a wholesale replacement for SQL
SQL’s expressiveness makes it challenging to ___
reason about the cost of each query, and thus the cost of a workload
Application developers may find using relational data models to be challenging because ___
it may not be perfect for modeling every kind of data (i.e. lists, queues, sets, etc)
If relational data grows past the capacity of one server, then ___
the tables in the database will have to be partitioned across computers, leading to denormalization
Complex query logic is typically left to the application, resulting in ___
a data store with more predictable query performance because of lack of variability in queries
Two characteristics of Google’s BigTable
hierarchical range-based partitioning scheme
strict consistency
Two characteristics of Amazon’s Dynamo
Maps keys to application-specific blobs of data
Loose consistency makes the partitioning model resilient to failure
Considerations regarding NoSQL systems (SPACTSDD)
Scalability Partitioning Analytical workloads Consistency Transactional semantics Single-server performance Data and query model Durability
The simplest form of a NoSQL store is a ___
key-value store
key-value store
each key is mapped to a value containing arbitrary data
store popularized by Redis
key-data structure store
key-data structure store
assigns each value a type (i.e. integer, string, list, set, sorted set, etc)
store common to CouchDB, MongoDB, Riak
key-document store
key-document store
map a key to some document that contains structured information in a JSON or JSON-like format
key-document stores grant a lot of freedom in document modeling, however ___
application-based query logic can become exceedingly complex
store common to HBase, Cassandra
BigTable column family store
column family store
- complex key identifies a row containing data stored in one or more Column Families
- each row can contain multiple columns with a CF
- values within each column are timestamped
store common to HyperGraphDB, Neo4J
graph store
exception to key-only lookup: MongoDB
allows indexing of data based on any number of properties and has a relatively high-level language for specifying which data to retrieve
exception to key-only lookup: BigTable-based systems
support scanners to iterate over a column family and select particular items by a filter on a column
exception to key-only lookup: CouchDB
allows creation of different views of the data and running MapReduce tasks across the table to facilitate more complex lookups and updates
ACID
Atomic
Consistency
Isolation
Durability
Most NoSQL systems choose performance over ___
full ACID guarantees
Redis is an exception to NoSQL’s no-transaction trend, in that ___
it provides a MULTI command to combine multiple operations atomically and a WATCH command to allow isolation
benefits of schema-free storage
supports less structured data requirements and requires less structured data requirements
after a few iterations of a project relying on sloppy-schema NoSQL systems, ___
data and schema versioning is usually present in application-level code
single-server durability
ensures that any data modification will survive a server restart or power loss
the OS may not immediately write data to an on-disk file, instead ___
buffering the write to group several writes together in a single operation
typical hard drives can perform ___ random accesses (seeks) per second
100 - 200
typical hard drives are limited to ___ of sequential writes
30 - 100 MB/s
ensuring efficient single-server durability means ___
limiting the number of random writes the system incurs and increasing the number of sequential writes per hard drive
Ideally, one wants to minimize ___ and maximize ___, all while ___
the number of writes between fsync calls
the number of those writes that are sequential
never telling the user their data has written until the write has been fsynced
Techniques for improving performance of single-server durability guarantees
Control fsync frequency
Increase sequential writes by logging
Increase throughput by grouping writes
Memcached offers ___ in exchange for ___
no on-disk durability
extremely fast in-memory operations