Introduction to Big Data Flashcards

1
Q

What is Big Data?

A

Extremely large volumes of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the three factors to consider when designating something as Big Data?

A

Its volume, variety and velocity (the rate of change of data growth).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is variety?

A

Data that is considered unstructured, usually derived from unstructured digital content.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the problems with Big Data?

A

Storage, Computational Efficiency, Data loss, Cost.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the traditional solutions for Big Data?

A

Relational Database Management systems (RDBMS), Grid Computing, RAID Systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is RDBMS and what issues does it have?

A

Relational Database Management Systems (MySQL, PL/SQL, etc.). It has scalability issues (as the data gets bigger, so does the computational time). They are also designed to handle structured data.

**RDBMS are not horizontally scalable (you cannot improve performance by adding more computing power)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is grid computing and what are some of its drawbacks?

A

Putting computers in parallel and have a program run do computations on each data. While good for low volume, it is intensive on computational tasks.

** also requires experience in low level programming languages (not suitable for mainstream)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is RAID and what are some of the drawbacks?

A

Redundant Array of Independent Disks (RAID) systems were not designed to scale. As volume increases, so does cost, and though they have tried to be sold as scalable systems, their efforts have largely failed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Hadoop?

A

A framework for distributed computing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the two main components of Hadoop?

A

Hadoop Distributed File System (HDFS) -> storage solution
MapReduce -> Computation Solution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does the Hadoop Distributed File System (HDFS) do?

A

Takes care of all your distributed storage complexities.

  • Splitting your data into blocks
  • Replicating each block to more than one node
  • Keep track of which block is stored in which node
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is MapReduce?

A

A programming model implemented by Hadoop that takes care of all the computational complexities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What was Hadoop built to work on?

A

Commodity hardware.

**needs a machine that has a processor, hard disk and RAM.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What was Hadoop built to work on?

A

Commodity hardware.

**needs a machine that has a processor, hard disk and RAM.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What do you need to deal with Big Data?

A

Distributed computing platform.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly