Chapter 7 Flashcards

1
Q

On-disk storage

A

low cost hard-disk drives for long-term storage

On-disk storage can be implemented with a distributed file system or a database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

distributed file system storage device is suitable

A

large datasets of raw data are to

be stored or when archiving of datasets is required

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Relational database management systems (RDBMSs)

A

good for handling transactional
workloads involving small amounts of data with random read/write properties
RDBMSs are ACID-compliant, and, to honor this compliance, they are generally restricted to a
single node. For this reason, RDBMSs do not provide out-of-the-box redundancy and fault
tolerance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

is rbdms good for large scale? Why or why not?

A

To handle large volumes of data arriving at a fast pace, relational databases generally need
to scale. RDBMSs employ vertical scaling, not horizontal scaling, which is a more costly
and disruptive scaling strategy. This makes RDBMSs less than ideal for long-term storage
of data that accumulates over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

shortcomings of RDBMS

A

relational databased require data to adhere to a schema
so storage of semi-structured and unstructured data whose schemas are non-relational is not directly supported.
This latency makes relational databases a less than ideal choice for storing high velocity data that needs a highly available database storage device with fast data write capability.
As a result of its shortcomings, a traditional RDBMS is generally not useful as the primary storage device in a Big Data solution environment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

NoSQL storage devices can mainly be divided into four types based on the way they store
data

A
  • key-value
  • document
  • column-family
  • graph
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Key-value storage devices

A

store data as key-value pairs and act like hash tables. The table
is a list of values where each value is identified by a key.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Document storage devices

A

store data as key-value pairs. However, unlike key-value
storage devices, the stored value is a document that can be queried by the database. These documents can have a complex nested structure, such as an invoice

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Column-family storage devices

A

store data much like a traditional RDBMS but group

related columns together in a row, resulting in column-families

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Graph storage devices

A

used to persist inter-connected entities. Unlike other NoSQL
storage devices, where the emphasis is on the structure of the entities, graph storage
devices place emphasis on storing the linkages between entities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

NoSQL storage devices shortcomings

A

are highly scalable, available, fault-tolerant and fast for read/write
operations
However, they do not provide the same transaction and consistency support as exhibited by ACID compliant RDBMSs. Following the BASE model, NoSQL storage
devices provide eventual consistency rather than immediate consistency. They therefore will be in a soft state while reaching the state of eventual consistency. As a result, they are
not appropriate for use when implementing large scale transactional systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

NewSQL storage devices

A

combine the ACID properties of RDBMS with the scalability
and fault tolerance offered by NoSQL storage devices.
NewSQL databases can be used for developing OLTP systems with very high volumes of transactions, for example a banking system

As compared to a NoSQL storage device, a NewSQL storage device provides an easier
transition from a traditional RDBMS to a highly scalable database due to its support for
SQL.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

An in-memory storage device

A

utilizes RAM, the main memory of a computer, as

its storage medium to provide fast data access.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In-Memory Data Grids (IMDGs)

A

store data in memory as key-value pairs across multiple nodes where the keys and
values can be any business object or application data in serialized form. This supports schema-less data storage through storage of semi/unstructured data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In-Memory Databases IMDBs

A

in-memory storage devices that employ database technology and leverage the performance of RAM to overcome runtime latency issues that plague on-disk storage devices.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

IMDGs are often deployed together with on-disk
storage devices that act as the backend storage. This is achieved via the following
approaches

A
  • read-through
  • write-through
  • write-behind
  • refresh-ahead
17
Q

read-through

A

If a requested value for a key is not found in the IMDG, then it is synchronously read from
the backend on-disk storage device, such as a database. Upon a successful read from the
backend on-disk storage device, the key-value pair is inserted into the IMDG, and the
requested value is returned to the client. Any subsequent requests for the same key are
then served by the IMDG directly, instead of the backend storage.

18
Q

write-through

A

Any write (insert/update/delete) to the IMDG is written synchronously in a transactional
manner to the backend on-disk storage device, such as a database. If the write to the
backend on-disk storage device fails, the IMDG’s update is rolled back. Due to this
transactional nature, data consistency is achieved immediately between the two data
stores.

19
Q

write-behind

A

Any write to the IMDG is written asynchronously in a batch manner to the backend ondisk
storage device, such as a database.
A queue is generally placed between the IMDG and the backend storage for keeping track
of the required changes to the backend storage.

20
Q

refresh-ahead

A

Refresh-ahead is a proactive approach where any frequently accessed values are
automatically, asynchronously refreshed in the IMDG, provided that the value is accessed
before its expiry time as configured in the IMDG. If a value is accessed after its expiry
time, the value, like in the read-through approach, is synchronously read from the backend
storage and updated in the IMDG before being returned to the client.