The NoSQL Ecosystem Flashcards

Question

benefits of schema-free storage

Answer 1

supports less structured data requirements and requires less structured data requirements

Answer 2

data and schema versioning is usually present in application-level code

Answer 3

ensures that any data modification will survive a server restart or power loss

Answer 4

buffering the write to group several writes together in a single operation

Answer 5

30 - 100 MB/s

Answer 6

limiting the number of random writes the system incurs and increasing the number of sequential writes per hard drive

Answer 7

the number of writes between fsync calls the number of those writes that are sequential never telling the user their data has written until the write has been fsynced

Answer 8

Control fsync frequency Increase sequential writes by logging Increase throughput by grouping writes

Answer 9

no on-disk durability extremely fast in-memory operations

Answer 10

when to call fsync

Answer 11

append update operations to a sequentially written file (a log)

Answer 12

combining logs and lookup data structures into one

Answer 13

result in improved write throughput, but require a periodic log compaction

Answer 14

grouping multiple concurrent updates within a short window into a single fsync call

Answer 15

increase in throughput, as multiple log appends can happen in a single fsync

Answer 16

higher latency per update, as users must wait on several concurrent updates for acknowledgement of their own update

Answer 17

traditional primary-replica structure replication where multiple servers store copies of the data

Answer 18

adding more RAM and disks to handle load on one machine

Answer 19

replicate data and spread requests across multiple machines

Answer 20

linear scalability

Answer 21

doubling the number of machines in your storage system doubles the query capacity of the system

Answer 22

the act of splitting your read and write workloads across multiple machines to scale out your storage system

Answer 23

has to handle the write workload on the entire dataset answer queries about the entire dataset

Answer 24

system complexity

Answer 25

read replicas caching

Answer 26

make copies of the data on multiple machines, while write requests go to a primary node

Answer 27

use read replicas to improve read-only query performance

Answer 28

read replicas caching

Answer 29

just add another Memcached host

Answer 30

a coordinator distributes requests to individual CouchDB instances based on the key of the requested doc

Answer 31

takes standalone data stores and arranges them in trees of any depth to partition keys by key range

Answer 32

Voldemort Riak Cassandra

Answer 33

a kind of hashing such that when a hash table is resized, only K/n keys need to be remapped on average, where K is the number of keys, and n is the number of slots

Answer 34

two keys that are next to each other in the key's sort order are likely to appear in the same partition

Answer 35

having a load manager that can reduce the size of a range on an overloaded server

Answer 36

stores a range of row keys and values within a column family, maintaining all necessary logs and data structures to answer queries

Answer 37

two small tablets may merge or a big tablet splits in two

Answer 38

tablet size, load, and availability

Answer 39

a distributed locking system for managing server membership and liveness

Answer 40

manage secondary leader servers and tablet server reassignment

Answer 41

maintaining tablet assignment in a metadata table, which is also sharded into tablets

Answer 42

using HDFS to handle data storage, replication, and consistency, leaving the rest to servers

Answer 43

using config nodes to specify key ranges, staying in sync with a two-phase commit protocol

Answer 44

preserving order in its partitioning, mapping data to the server directly managing its key range

Answer 45

form routing hierarchies of any depth, assigning ranges of keys to servers below them in the hierarchy

Answer 46

one will be frequently be performing range scans over the keys of the data, avoiding random node jumps over the network

Answer 47

maintaining routing and configuration nodes

Answer 48

in small chunks which can be re-assigned in high-load situations

Answer 49

crash and get out of sync crash and never come back networks will partition two sets of replicas messages between machines will get delayed or lost

Answer 50

strong consistency | eventual consistency

Answer 51

ensure that the replicas of a data item will always be able to come to consensus on the value of a key

Answer 52

R + W = N + 1

Answer 53

it has been replicated to all N servers (W=N) will be satisfied by a single replica (R=1)

Answer 54

vector clocks

Answer 55

returning multiple copies of the key to the requesting client application

Answer 56

using the most recently timestamped version of the data

Answer 57

it identifies a conflict and allows users to query for conflicted keys for manual repair, but deterministically picks a version to return to users until conflicts are repaired

Answer 58

repairing out-of-sync replicas of the data in the background while returning the non-conflicting data to the requestor

Answer 59

assigning a node to temporarily take over an unavailable node's write workload, forwarding all those writes when the node is available again

Answer 60

Merkle trees

Answer 61

periodically (~1s) a node will communicate with a random node to exchange knowledge on other nodes' health