Chapter 7 Flashcards
On-disk storage
low cost hard-disk drives for long-term storage
On-disk storage can be implemented with a distributed file system or a database
distributed file system storage device is suitable
large datasets of raw data are to
be stored or when archiving of datasets is required
Relational database management systems (RDBMSs)
good for handling transactional
workloads involving small amounts of data with random read/write properties
RDBMSs are ACID-compliant, and, to honor this compliance, they are generally restricted to a
single node. For this reason, RDBMSs do not provide out-of-the-box redundancy and fault
tolerance.
is rbdms good for large scale? Why or why not?
To handle large volumes of data arriving at a fast pace, relational databases generally need
to scale. RDBMSs employ vertical scaling, not horizontal scaling, which is a more costly
and disruptive scaling strategy. This makes RDBMSs less than ideal for long-term storage
of data that accumulates over time.
shortcomings of RDBMS
relational databased require data to adhere to a schema
so storage of semi-structured and unstructured data whose schemas are non-relational is not directly supported.
This latency makes relational databases a less than ideal choice for storing high velocity data that needs a highly available database storage device with fast data write capability.
As a result of its shortcomings, a traditional RDBMS is generally not useful as the primary storage device in a Big Data solution environment
NoSQL storage devices can mainly be divided into four types based on the way they store
data
- key-value
- document
- column-family
- graph
Key-value storage devices
store data as key-value pairs and act like hash tables. The table
is a list of values where each value is identified by a key.
Document storage devices
store data as key-value pairs. However, unlike key-value
storage devices, the stored value is a document that can be queried by the database. These documents can have a complex nested structure, such as an invoice
Column-family storage devices
store data much like a traditional RDBMS but group
related columns together in a row, resulting in column-families
Graph storage devices
used to persist inter-connected entities. Unlike other NoSQL
storage devices, where the emphasis is on the structure of the entities, graph storage
devices place emphasis on storing the linkages between entities
NoSQL storage devices shortcomings
are highly scalable, available, fault-tolerant and fast for read/write
operations
However, they do not provide the same transaction and consistency support as exhibited by ACID compliant RDBMSs. Following the BASE model, NoSQL storage
devices provide eventual consistency rather than immediate consistency. They therefore will be in a soft state while reaching the state of eventual consistency. As a result, they are
not appropriate for use when implementing large scale transactional systems.
NewSQL storage devices
combine the ACID properties of RDBMS with the scalability
and fault tolerance offered by NoSQL storage devices.
NewSQL databases can be used for developing OLTP systems with very high volumes of transactions, for example a banking system
As compared to a NoSQL storage device, a NewSQL storage device provides an easier
transition from a traditional RDBMS to a highly scalable database due to its support for
SQL.
An in-memory storage device
utilizes RAM, the main memory of a computer, as
its storage medium to provide fast data access.
In-Memory Data Grids (IMDGs)
store data in memory as key-value pairs across multiple nodes where the keys and
values can be any business object or application data in serialized form. This supports schema-less data storage through storage of semi/unstructured data.
In-Memory Databases IMDBs
in-memory storage devices that employ database technology and leverage the performance of RAM to overcome runtime latency issues that plague on-disk storage devices.