Quiz 3 Flashcards

1
Q

What is Cloud Storage?

A

Data storage in clouds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the three things cloud providers support?

A

Scalability, Elasticity and Pay as you go

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the three models of cloud storage?

A

File System
Blob/Object Storage
Databases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the cloud file system?

A

A system that organizes data into files and directories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a file/directory?

A

A file is a logical unit of data on a storage device
An array of bytes which can be created, read, written and deleted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What type of architecture do cloud file systems have?

A

Tree architecture

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the AWS Elastic Block Store good at?

A

Managing data that is too big for VM’s memory, data processing frameworks that rely on local storage, Databases, MySQL, MS SQL Server

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the AWS Elastic Block Store bad at?

A

EC2 only, No seamless scalability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the AWS Elastic File System good at?

A

Its a good replacement of NFS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the AWS Elastic File System bad at?

A

Its slow

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the two storage types that Google Compute Cloud has?

A

Persistent Disks
Local SSD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Advantages of a cloud file system?

A

Familiarity
Many applications support file systems (without much modification)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Disadvantages of a cloud file system?

A

Scalability
Generally support concurrency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does BLOB stand for?

A

Binary Large Object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is BLOB or object storage?

A

A flat object model for storing data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the features of BLOB storage?

A

Stores unstructured data
Highly scalable
Automatic backup/replica management

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Blob/Object Storage Pros?

A

Simple, Performs well, Reliable, No modification needed, No file-level synchronization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Disadvantages of Blob/Object Storage?

A

Little support to organize data
No support for search by file context
Requires index mechanism
No mechanism to work with structured data
Cannot be mounted as a file system directly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

If you wanted to use a Blob/Object Storage for a file system how would you do this?

A

By using open sources projects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the two types of databases?

A

Relational databases and NoSQL databases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are some features of relational databases?

A

Designed for structured data
Tables, SQLs
Indexing and join operations
Supports ACID semantics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are some features of NOSQL databases?

A

Cloud scale database by giving up ACID semantics
Supports CAP theorem
Eventual consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are some relation databases?

A

AWS RDS
Azure databases
Google Cloud SQL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are some NoSQL databases?

A

Key/Value Store
Document DB
Graph DB
In-Memory DB
Time-Series DB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is big data?

A

A collection of data sets which is so large and complex so that it becomes difficult to process using traditional relation database management systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What are the three types of big data sets?

A

Structured Data
Semi-structured Data
Unstructured Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is Structured Data?

A

Data that can be represented in Table with Schema

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is Semi-Structured Data?

A

Data that cannot be stored in RDBMS but has organizational properties

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is Unstructured Data?

A

Data that is not organized in a pre-defined manner or does not have a pre-defined data model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What are the Big 4V’s of Big Data?

A

Volume
Variety
Velocity
Veracity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is the major challenge of Big Data?

A

Processing

32
Q

An Iphone 15 has how many times more computing power than the Beowulf-1?

A

20000 times more

33
Q

What is the magic infrastructure that allows map-reduce to work?

A

The Google File System

34
Q

What are the disadvantages of divide and conquer with many machines?

A

Merging all of the results can be difficult
If the machines or disks fail there can be an issue

35
Q

What are the cons of Map-Reduce?

A

Needs magic to address the failures
Performance may still be an issue

36
Q

What is the Google File System?

A

A scalable, fault tolerant distributed file system that stores 100s of TB of scaled data to support map reduce

37
Q

What is the workload for the GFS?

A

Large stream reads
Small random reads
Many large sequential appends
No random write that overwrites (updates) data

38
Q

What is the GFS Architecture?

A

A single master with multiple chunkservers, and multiple clients

39
Q

What does the master maintain?

A

All metadata

40
Q

What does the master’s metadata hold?

A

Namespace in GFS, Access control, Current location of chunks

41
Q

Why does the Master periodically communicates with other chunk servers

A

To perform a health check
To determine chunk locations and evaluate the state of the overall system

42
Q

What do GFS chunkservers do?

A

Manage chunks

43
Q

How can chunkservers identify chunks?

A

Through immutable and globally unique chunk handles

44
Q

What are the two request sent by the GFS client?

A

Control requests to master servers and data requests directly to chunk servers

45
Q

What is the default chunk size?

46
Q

What is the default shuck size in linux?

A

4KB to 256KB

47
Q

What is the Cons to having a 64mB chunk size?

A

Waste storage space due to internal fragmentation
High overhead from many small files

48
Q

What is the Pros to having a 64mB chunk size?

A

Larger chunk size == small # of chunks

49
Q

What is a Borg Cell?

A

A set of machines managed by borg as one unit

50
Q

What is a Borg Job?

A

The form that users submit work in

51
Q

What is a task?

A

The things that jobs do

52
Q

What is a Borg Alloc?

A

A reserve set of resources and a job

53
Q

What is a Borg instance?

A

Instances having jobs

54
Q

What is a borg master?

A

The central brain of the system
Holds the cluster state
Uses paxos for leader election and log replication
Uses Shared State Scheduling

55
Q

What is a Borglet?

A

A unit that manages and monitors tasks and resources

56
Q

What is a Borglet called in Kubernetes?

57
Q

What is the MapReduce Data Flow?

A

Read data from GFS pass to Mappers pass to intermediate local files pass to reducers pass to write data to GFS

58
Q

What is the role of the Job Tracker in Hadoop?

A

Coordinates the execution of jobs

59
Q

What is the role of the Task Tracker in Hadoop?

A

Controls the execution of map and reduce tasks in slave machines

60
Q

What is the Name Node in Hadoop?

A

Manages the file system, keeps metadata

61
Q

What is the Data Node in Hadoop?

A

Follows the instructions from the name node, stores, retrieves data

62
Q

What happens if a task fails in hadoop?

A

Task tracker detects the failure
Sends message to Job Tracker
Job Tracker reschedules the task

63
Q

What happens if a data node fails in Hadoop?

A

Both Name Node and Job tracker detects the failure
All tasks on the failed node are re-scheduled
Name node replicates the data chunk to another one

64
Q

What are some benefits of Hadoop?

A

Highly Scalabe
Fault Tolerant
Simple Programming Model

65
Q

What is some limitations of Hadoop?

A

64MB block size
Batch processing only
Data Locality

66
Q

What are some reasons database users do not like map reduce?

A

Its a giant leap backwards
Sub optimal implementation
Not novel
Missing most of the features in current data bases
Incompatible with all of the tools

67
Q

What are two add ons for hadoop?

68
Q

What is the general workflow for hive?

A

User sends a hiveQL to hive system -> Hive parses and plans the execution of query -> Query is converted to map reduce and executed on HDFS

69
Q

What are the pros of Hive?

A

Built on top of hadoop
SQLish batch jobs over large sets
Support SQLish language
Similar to RDBMS
Can handle much larger dataset than RDBMS

70
Q

What are the cons of Hive?

A

Its not designed for OLTP but OLAP
No real time queries, Latency
Batch Jobs
It is not RDBMS

71
Q

What are the two categories of large-scale data?

A

Web search data
Web access data

72
Q

What type of large-scale data is important to understand user’s behavior?

A

Web access logs

73
Q

What is BigTable or HBase?

A

A sparse, distributed, persistent, multidimensional sorted map

74
Q

What are the three components in HBase Architecture?

A

Master
Region Server
Zookeeper