00 intro: General info Flashcards

Question 1

Q

Random access

Answer

A

Random access refers to the ability to access any particular piece of data from a storage device directly, without the need to sequentially read through the entire storage. It allows for immediate retrieval of data from any location within the storage, regardless of the order in which the data is stored.

Question 2

Q

What does “Big Data” look like?

Answer

A

CSV, TSV and JSONs files, web pages, graphs, twitter tweets, server access logs,

Question 3

Q

The 4 Big V’s of “Big Data”

Answer

A

Volume: Lots of data
Velocity: Changing / growing data
Variety: Heterogeneity of data
Verity: Correct / true or not?

Question 4

Q

Scale-Out vs Scale-Up

Answer

A

Scale-Out:
use of hundreds, thousands small machines vs

Scale-Up:
a single, rather powerful server

Question 5

Q

if P = failures of a single machine during a certain period of time then probability of N machine at the same time?

Answer

A

P_n = 1 - ( 1 - P) ^ N

Question 6

Q

Fallacies of Distributed Computing

Answer

A

Reliablity of network
Latency
Bandwidth is infinite
Security of network
Topology does not change
Administrator is only one user
Transport cost is zero
Homogeneous network

Question 7

Q

name few Cloud Computing Platforms

Answer

A

Amazon Elastic Cloud 2 (EC2)
Microsoft Azure
Google Cloud Platform (GCP)

Question 8

Q

What is MapReduce?

Answer

A

Map Phase:
in a parallel and distributed manner stored in memory, that divided data and apply mapping function creating
key-value pairs

Reduce Phase:
key-value pairs grouped based on their keys, creating aggregates, summarizes, or other computation.
The output of the reduce tasks is typically written to a file or storage system.

Question 9

Q

Apache Hadoop

Answer

A

is a popular open-source implementation of the MapReduce model, providing a scalable and reliable framework for distributed data processing.

Question 10

Q

what is Spark

Answer

A

In addition to simple MapReduce operations, Spark supports SQL
queries, streaming data, and complex analytics such as machine
learning and graph algorithms out-of-the-box.

Brainscape's Knowledge GenomeTM

00 intro: General info Flashcards

Brainscape's Knowledge Genome^TM