Week 12 Flashcards
Big Data - Name this member of the Five Vs:
the vast amount of data that is generated every
second/minute/hour/day in the digitized world
Volume
Examples: Online transcations (banking), sensors like GPS, accelerometer, facebook & twitter
Big Data - Name this member of the Five Vs:
refers to the speed at which data is being
generated and the pace at which data moves from
one point to the next
Velocity
Big Data - Name this member of the Five Vs:
refers to the ever-increasing different forms of data
that can come in.
Brings challenges in terms of data integration,
transformation, processing and storage
Variety
Big Data - Name this member of the Five Vs: refers to the quality of the data, which can vary
greatly. Lack of this could mean there is noise that needs to removed
Veracity
Noise means meaningless/corrupt/distorted data
Big Data - Name this member of the Five Vs: Refers to the usefulness of data for
an enterprise
Value
The longer it takes data to be turned into meaningful info, the less value is has for the business. This means _____ and ____ are inversely related
value and time
This is a tightly coupled collection of servers or nodes. These servers usually have the same hardware and are connected together on a network, and act as a single unit.
Cluster
T/F - Each node in the cluster shares it resources
False. Each node has its own dedicated resources (memory, processor, hdd)
T/F - A cluster can execute a task by splitting it into small pieces and distributing those pieces to different computers in the cluster
True
T/F - A file system provides a logical view of data, sorting it into a tree structure
True
T/F - Distributed file systems can appear local to the client
True (logically, physically they’re not local)
Distributed file systems store large _____ spread across nodes of a ______
files spread across nodes of a cluster
This is the process of horizontally partitioning a large dataset into a collection of smaller, more manageable dataset
Sharding
Shards are distributed across multiple _____, which in this context are ______s or ________s
nodes; servers, machines
T/F - In sharding, each shard is stored on the same node
False, they’re stored on separate nodes