Lecture 8 Flashcards
3 V’s describing Big data
Volume - Increased volume of data
Velocity - Increased processing speed to process more data and more results
Variety - Diversity of data and data types
Describe the Storage and data models
Storage model – describes the layout of a data structure in a physical storage
A data model – Captures the most logical aspects of a data structure in a database
Describe the Two abstract models of storage
- Cell storage – Storage consists of cells of the same size which each object fits into one cell. The model organised an array of memory cells into secondary storage in sectors, read + written in a unit.
- Journal storage – System that keeps track of the changes that is made in a journal before placing it in a main file system.
Describe database, DMS, Query Lanuage + Database models
- Database – A collection of records
- Database Management System – Software that controls the access of the database
- Query language – A programming language to develop database applications
- Database models – Limitations of the hardware available at the time of the popular applications
What are the requirements of Cloud Applications?
- Rapid application development + short-time to the market
- Low latency
- Scalability
- High availability
- Consistent view of the data
Describe the file, file pointer, logical + physical organianisation of a file
- File – Is an array of cells stored into a device. The application is viewed as a record
- File pointer – Identifies a cell used as a starting point to read + write
- The logical organisation of a file – Reflects the data model and views of the data in an application
- The physical organisation of a file – reflects the storage model and describes how the file is stored in a storage media
Describe File Systems, Distributed File Systems and the two other file systems
- File system – A collection of directories that provides information about a set of files
-
Distributed file systems – Address the need to share a file with a number of clients interconnected to a LAN:
- Network File Systems (NFS) – client server architecture
- Parallel File Systems (PFS) – Scalable, capable of distributing files across many nodes
What is Unix File System (UFS)?
A tree structured file system for organising and stores large amounts of data making it easy to manage
- Uses basic storage called inodes about each file and directories
- Stores metadata files called directories: (file owner, access rights, creation time, last modification, file size etc.)
- Separates the physical file structure from the logical
- Flexible for allocation
What is Network File System (NFS)?
- A client/server application that lets a computer user view + store + update file on a remote computer as though on a user’s computer
- Interacts with the Remote Procedure Calls (RPC)
- Ensure compatibility with existing applications
What is Parallel File System (PFS)?
- Allows multiple clients to read + write from the same file
- Concurrently executes multiple input/output operations
- Supports parallel I/O is essential for performance of many applications
What is the GPFS distributed locking and its techniques
- Distributed locking mechanism
A central local manager grants lock tokens to local lock manager running in I/O node used to cache management system
Techniques:
- Byte-range tokens: used to read + write operations of :data files
Node 1: writes file, obtains token to cover the file (without permission)
UNTIL…
Node 2 attempts to take over the same file
Node 1 range of the token is restricted
Data-shipping – alternative byte-range locking allows fine-grained sharing
Google File System (GFS) and design considerations
- Uses many storage systems build from a variety of components to provide storage to a large user and their needs
- Scalability + reliability
- Files sizes from GB > TB
Design considerations:
- File is divided into several chunks of predefined size
- Implement atomic file allowing multiple applications to run concurrently
- Build high-bandwidth than low latency
- Eliminate caching at the client site
- Minimise the involvement in master in file operation
- Support efficient check pointing + fast recovery mechanism
What is Paxos algorithm and the three phrases?
Used to reach to an agreement of a value
Phases:
- Elect a node to be a master/coordinator. To ensure each election is unique in the range (1,r) where r is the number of replicas and proposes a (prepare) message
- The master selects the value and sends an accept message to all nodes. Acceptors can reply with reject or accept.
- Majority of the nodes are accepted + the consensus is reached and the master broadcasts a commit message