Big Data Management Flashcards
1
Q
Characterization (3Vs)
A
Variety: Different forms of data
Volume: Petabytes of data
Velocity: Real-time data
2
Q
Big Data Analysis Pipeline
A
- Data acquisition: Select important data to be stored
- Information extraction & cleaning: Pull out required information from underlying sources
- Data integration & aggregation & representation: Full integration not always possible. Problem, that origin can not be tracked on derived data, selection of storage complex
- Modeling & analysis: Big data discloses hidden patterns and knowledge. Big pictures shows simple models
- Interpretation: Annotate base data and discuss interpretation of metadata
3
Q
Data Lake requirements
A
- Secure
- Scalable
- Reliable
- Throughput
- Low Latency
- Store details
- Store native Forman
- All sources
4
Q
Advantages Cloud
A
- Cost
- Extensibility
- Reliability
- Workload
- Sharing
5
Q
Disadvantages Cloud
A
- Custom software
- Networking
- Maintenance
- Security
- Parallelization not always possible
6
Q
Three-Tier Server
A
Presentation ➔ Logic ➔ Data
7
Q
Design Cloud
A
- Transparent
- Flexible
- Reliable
- Performant
- Scalable
8
Q
Fallancies of cloud
A
- Network is reliable
- Latency is zero
- Bandwidth is infinite
- Network is secure
- Topology doesn’t change
- There is one administrator
- Transport cost is zero
- The network is homogeneous
9
Q
Cloud characteristics
A
- Dynamic
- Massively scalable
- Multi-tenant
- Self-service
- Per-usage based pricing model
- IP-based architecture
10
Q
Google File System
A
Store chunks across chunk servers, replicate chunks, access control by master node
11
Q
Map Reduce
A
- Extract data as key value
- Group by key
- Reduce groups
- Split data and perform mapping parallel
12
Q
ACID
A
- Atomicity
- Consistency
- Isolation
- Durability
13
Q
CAP
A
- Consistency
- Availability
- Partition-tolerance
➔ Not all three possible at the same time
14
Q
BASE
A
- Basically Available
- Soft state
- Eventual Consistency
15
Q
Types of NoSQL storage
A
- Key/Value
- Wide-column
- Document database
- Graph database