Hierarchical, federated system Delegation of requests Distributed world-wide Caching at multiple levels Trusting cached result through signatures Based on UDP

Integrity constraints Transactional guarantees, i.e., ACID Powerful query languages Materialized views

Client library Master Metadata operations Load balancing Tablet server Data operations (read / write)

Assigns tablets to tablet servers Detects addition and expiration of tablet servers Balance tablet server load Etc.

Manages a set of tablets (up to a thousand) Handles read and write requests for the tablets it manages Splits tablets that have grown too large

Chapter 4: Distributed Systems in Context Flashcards by Andreas Hein

Why does DNS work?

Hierarchical, federated system
- Delegation of requests
Distributed world-wide
Caching at multiple levels
- Trusting cached result through signatures
Based on UDP

How well did you know this?

Not at all

Perfectly

DNS Attacks? How to mitigate?

Type of denial-of-service attack
- Attackers use EDNS0, an extension of the DNS protocol
- http://www.rfc-editor.org/rfc/rfc2671.txt
In certain cases a 36 byte long requests can trigger a 3000 bytes long reply => amplification of ~100x
Last attack March 19, 2013 Spamhaus
- 75Gbps DDOS
- Mitigated by load balancing requests
- http://blog.cloudflare.com/the-ddos-that-knocked- spamhaus-offline-and-ho

Mirigation:

DNSSEC – DNS security extensions
Resource records digitally signed
Recursive resolver validates data integrity and origin authentication
Deployed in 2010, not everywhere available

How well did you know this?

Not at all

Perfectly

What are key-value stores?

Container for key-value pairs
Distributed multi-component systems
Category of NOSQL-databases (not only SQL)
Databases that break with philosophy of traditional (relational) DBMS to overcome their idiosyncrasies (weaknesses ?)

How well did you know this?

Not at all

Perfectly

Key value store compared to DBMS - SQL

DBMS

Relational data schema
Normalized data model
Datatypes
Relationsbetween tables (e.g., foreign key)
Row-orientation

KVS

No data scheme or modifiable data schema
De-normalizedmodel
Raw byte access
No relations
Column-orientation
Distributed data

How well did you know this?

Not at all

Perfectly

Why are key-value stores needed?

Today’s internet applications
- Huge amounts of stored data (PB (1015 bytes)+)
- Huge number of users (e.g., 1.11 billion)
- Frequent updates
- Fast retrieval of information
- Rapidly changing data definitions
Ever more users, ever more data
Incremental scalability
- User growth, traffic patterns
- Adapt to number of requests & data size
Flexibility
- Adapt to changing data definitions
Reliability
- Thousands of components at play
- Provide recovery in presence of failure
Availability
- Users are worldwide
- Guarantee fast access

How well did you know this?

Not at all

Perfectly

Key-Value store client interface

Main operations
- Write/update - put(key, value)
- Read - get(key)
- Delete - delete(key)
Usually no aggregation, no table joins

How well did you know this?

Not at all

Perfectly

Common features of key-value stores

Flexible data model
Horizontal partitioning
Store big chunks of data as raw bytes
Fast column-oriented access
Memory store & write ahead log (WAL)
- Keep data in memory for fast access ( Bath / Fasw had access)
- Keep a commit log as ground truth ( Basis )
Versioning
- Store different versions of data
Replication
- Store multiple copies of data
Failure detection & failure recovery

How well did you know this?

Not at all

Perfectly

Common non-features

Integrity constraints
Transactional guarantees, i.e., ACID
Powerful query languages
Materialized views

How well did you know this?

Not at all

Perfectly

Bigtable components

Client library
Master
- Metadata operations
- Load balancing
Tablet server
- Data operations (read / write)

How well did you know this?

Not at all

Perfectly

Master

Assigns tablets to tablet servers
Detects addition and expiration of tablet servers
Balance tablet server load
Etc.

How well did you know this?

Not at all

Perfectly

Tablet server

Manages a set of tablets (up to a thousand)
Handles read and write requests for the tablets it manages
Splits tablets that have grown too large

How well did you know this?

Not at all

Perfectly

Bigtable building blocks

Chubby
- Metadata storage
GFS (Google file system)
- Data log, storage
- Repiclation
Scheduler
- Monitoring
- Failover

How well did you know this?

Not at all

Perfectly

Chubby “& ZooKeeper lock service

Highly-available, persistent, distributed lock service
Use in Bigtable
- Ensure at most one active master at any time
- Store bootstrap location of data (root tablet)
- Discover tablet servers (manage their lifetime)
- Store schema information

How well did you know this?

Not at all

Perfectly

Representation & use of configuration information with lock service

Managed via directories, small files Directories, files can serve as locks Reads, writes are atomic.

Clients maintain sessions If session lease expires and can’t be renewed, locks are released.

How well did you know this?

Not at all

Perfectly

Lock service in context

Comprised of five active replicas
- Consistently replicate writes
One replica is designated as master
- We need to elect the master (leader)
Service is life when:
- majority of replicas are running and
- can communicate with one another
- A quorum needs to be established

How well did you know this?

Not at all

Perfectly

Core mechanisms Big TAble

Keep replicas consistent in face of failures
- Paxos algorithm based on replicated state machines
- Atomic broadcast
Ensure one active Bigtable master at any time
- Mutual exclusion, but in a distributed setting
- Relevant lectures
  – Cf. lect. on Paxos & RSM

– Cf. lect. on coordination – Cf. lect. on replication

Chubby example: leader election

Electing a primary (leader) process: emulate by acquiring an exclusive lock on a file
- Clients open a lock file and attempt to acquire the lock in write mode
- One client succeeds (i.e., becomes the leader) and writes its name to the file } Successor
- Other clients fail (i.e., become replicas) and discover the name of the leader by reading the file
- Primary obtains sequencer and passes it to servers (servers know that primary is still valid)

HBase architecture overview

Client
- Issues put, get, delete operations
Zookeeper (Chubby)
- Distributed lock service for HBase components
HRegion (tablet)
- Tables are split into multiple key-regions
HRegionServer (tablet server)
- Stores actual data
- Can host multiple key-regions (tablets)
- Answers client requests
HMaster (master)
- Coordinates components
  - Startup, shutdown, failure of region servers
  - Opens, closes, assigns, moves regions
- Not on read or write path
Write Ahead Log (WAL)
MemStore
- Keeps hot data in main memory
HDFS
- Underlying distributed file system
- Table data is stored as HFile
- Replicates data over multiple data nodes

HBase global read-path

HBase write path

HBase adding of components

Components can be added on-the-fly
Add multiple backup master servers
- Avoid single point of failure
- In case of crash, backup master takes over
Add multiple region servers
- Additional capacity can be added
- Master takes care of load balancing

Core mechanisms HBase nn

Heavy involvement of ZooKeeper for coordination tasks
- Cf. lect. on Paxos & RSM (i.e., a look inside)
HBase reliability relies on HDFS (replication)
- Cf. lect. on replication

Hbase storage unit failure

Cassandra architecture overview

Cassandra global read-path

Cassandra global write-path

Incremental scaling in Cassandra

Storage unit failure cassandra

Core mechanisms cassandra

* **Incremental scalability** * Cf. Lect. on consistent hashing * **Reliability** * Cf. Lect. on replication (e.g., **quorum** protocols) * **Membership management** * Cf. Lect. on replication (e.g., **gossip** protocols) * **Uses leader election as part of replication** * Cf. Lect. on coordination