Big Data Flashcards
What is Big Data
Extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions.
What are the three Vs of Big Data?
Volume, Variety, Velocity
What is RAID?
Redundant Array of Inexpensive Disks is a way of storing data in different places on multiple hard disks to protect data in the case of a drive failure.
What is Bit Torrent Storage Architecture
A protocol for distributed file sharing that segments files into smaller pieces, distributed across network nodes to enable efficient data distribution and access without a central server.
DDA-RM
designed around the structured schema of relational databases, which support ACID properties ensuring reliable and consistent transactional processing across distributed networks.
Struggles with the high demands of Big Data’s volume and velocity due to its rigid schema
DDA-NRM
utilizes NoSQL databases, which are inherently more flexible and scalable.
MapReduce
A programming model for processing large data sets with a distributed algorithm on a cluster. It processes data in two phases (Map and Reduce), making it suitable for processing large volumes of data in parallel.
NoSQL
A category of database management systems that does not adhere to the traditional relational database management system (RDBMS) features and is useful for storing unstructured data. It supports a variety of data models, including document, graph, key-value, and columnar.
Hadoop
An open-source software framework used for distributed storage and processing of Big Data using the MapReduce programming model. It includes HDFS (Hadoop Distributed File System) and is designed to scale up from single servers to thousands of machines.