Quiz 3 Flashcards

1
Q

What is Cloud Storage?

A

Data storage in clouds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the three things cloud providers support?

A

Scalability, Elasticity and Pay as you go

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the three models of cloud storage?

A

File System
Blob/Object Storage
Databases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the cloud file system?

A

A system that organizes data into files and directories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a file/directory?

A

A file is a logical unit of data on a storage device
An array of bytes which can be created, read, written and deleted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What type of architecture do cloud file systems have?

A

Tree architecture

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the AWS Elastic Block Store good at?

A

Managing data that is too big for VM’s memory, data processing frameworks that rely on local storage, Databases, MySQL, MS SQL Server

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the AWS Elastic Block Store bad at?

A

EC2 only, No seamless scalability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the AWS Elastic File System good at?

A

Its a good replacement of NFS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the AWS Elastic File System bad at?

A

Its slow

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the two storage types that Google Compute Cloud has?

A

Persistent Disks
Local SSD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Advantages of a cloud file system?

A

Familiarity
Many applications support file systems (without much modification)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Disadvantages of a cloud file system?

A

Scalability
Generally support concurrency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does BLOB stand for?

A

Binary Large Object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is BLOB or object storage?

A

A flat object model for storing data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the features of BLOB storage?

A

Stores unstructured data
Highly scalable
Automatic backup/replica management

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Blob/Object Storage Pros?

A

Simple, Performs well, Reliable, No modification needed, No file-level synchronization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Disadvantages of Blob/Object Storage?

A

Little support to organize data
No support for search by file context
Requires index mechanism
No mechanism to work with structured data
Cannot be mounted as a file system directly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

If you wanted to use a Blob/Object Storage for a file system how would you do this?

A

By using open sources projects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the two types of databases?

A

Relational databases and NoSQL databases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are some features of relational databases?

A

Designed for structured data
Tables, SQLs
Indexing and join operations
Supports ACID semantics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are some features of NOSQL databases?

A

Cloud scale database by giving up ACID semantics
Supports CAP theorem
Eventual consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are some relation databases?

A

AWS RDS
Azure databases
Google Cloud SQL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are some NoSQL databases?

A

Key/Value Store
Document DB
Graph DB
In-Memory DB
Time-Series DB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is big data?
A collection of data sets which is so large and complex so that it becomes difficult to process using traditional relation database management systems
26
What are the three types of big data sets?
Structured Data Semi-structured Data Unstructured Data
27
What is Structured Data?
Data that can be represented in Table with Schema
28
What is Semi-Structured Data?
Data that cannot be stored in RDBMS but has organizational properties
29
What is Unstructured Data?
Data that is not organized in a pre-defined manner or does not have a pre-defined data model
30
What are the Big 4V's of Big Data?
Volume Variety Velocity Veracity
31
What is the major challenge of Big Data?
Processing
32
An Iphone 15 has how many times more computing power than the Beowulf-1?
20000 times more
33
What is the magic infrastructure that allows map-reduce to work?
The Google File System
34
What are the disadvantages of divide and conquer with many machines?
Merging all of the results can be difficult If the machines or disks fail there can be an issue
35
What are the cons of Map-Reduce?
Needs magic to address the failures Performance may still be an issue
36
What is the Google File System?
A scalable, fault tolerant distributed file system that stores 100s of TB of scaled data to support map reduce
37
What is the workload for the GFS?
Large stream reads Small random reads Many large sequential appends No random write that overwrites (updates) data
38
What is the GFS Architecture?
A single master with multiple chunkservers, and multiple clients
39
What does the master maintain?
All metadata
40
What does the master's metadata hold?
Namespace in GFS, Access control, Current location of chunks
41
Why does the Master periodically communicates with other chunk servers
To perform a health check To determine chunk locations and evaluate the state of the overall system
42
What do GFS chunkservers do?
Manage chunks
43
How can chunkservers identify chunks?
Through immutable and globally unique chunk handles
44
What are the two request sent by the GFS client?
Control requests to master servers and data requests directly to chunk servers
45
What is the default chunk size?
64mb
46
What is the default shuck size in linux?
4KB to 256KB
47
What is the Cons to having a 64mB chunk size?
Waste storage space due to internal fragmentation High overhead from many small files
48
What is the Pros to having a 64mB chunk size?
Larger chunk size == small # of chunks
49
What is a Borg Cell?
A set of machines managed by borg as one unit
50
What is a Borg Job?
The form that users submit work in
51
What is a task?
The things that jobs do
52
What is a Borg Alloc?
A reserve set of resources and a job
53
What is a Borg instance?
Instances having jobs
54
What is a borg master?
The central brain of the system Holds the cluster state Uses paxos for leader election and log replication Uses Shared State Scheduling
55
What is a Borglet?
A unit that manages and monitors tasks and resources
56
What is a Borglet called in Kubernetes?
A Kubelet
57
What is the MapReduce Data Flow?
Read data from GFS pass to Mappers pass to intermediate local files pass to reducers pass to write data to GFS
58
What is the role of the Job Tracker in Hadoop?
Coordinates the execution of jobs
59
What is the role of the Task Tracker in Hadoop?
Controls the execution of map and reduce tasks in slave machines
60
What is the Name Node in Hadoop?
Manages the file system, keeps metadata
61
What is the Data Node in Hadoop?
Follows the instructions from the name node, stores, retrieves data
62
What happens if a task fails in hadoop?
Task tracker detects the failure Sends message to Job Tracker Job Tracker reschedules the task
63
What happens if a data node fails in Hadoop?
Both Name Node and Job tracker detects the failure All tasks on the failed node are re-scheduled Name node replicates the data chunk to another one
64
What are some benefits of Hadoop?
Highly Scalabe Fault Tolerant Simple Programming Model
65
What is some limitations of Hadoop?
64MB block size Batch processing only Data Locality
66
What are some reasons database users do not like map reduce?
Its a giant leap backwards Sub optimal implementation Not novel Missing most of the features in current data bases Incompatible with all of the tools
67
What are two add ons for hadoop?
Hive Hbase
68
What is the general workflow for hive?
User sends a hiveQL to hive system -> Hive parses and plans the execution of query -> Query is converted to map reduce and executed on HDFS
69
What are the pros of Hive?
Built on top of hadoop SQLish batch jobs over large sets Support SQLish language Similar to RDBMS Can handle much larger dataset than RDBMS
70
What are the cons of Hive?
Its not designed for OLTP but OLAP No real time queries, Latency Batch Jobs It is not RDBMS
71
What are the two categories of large-scale data?
Web search data Web access data
72
What type of large-scale data is important to understand user's behavior?
Web access logs
73
What is BigTable or HBase?
A sparse, distributed, persistent, multidimensional sorted map
74
What are the three components in HBase Architecture?
Master Region Server Zookeeper
75