Module 5: Block, File & Object Based Storage Systems (File Based Storage Systems) Flashcards
How do applications access data?
in the form of files
What is metadata?
additional data that describes the raw data in the file (EX: picture file is a JPG/PNG)
What is a file system?
logical representation of how an OS manages where and how data is stored
How are files stored?
typically in folders - folders organized in hierarchical tree structure to be directly accessed or searched sequentially
What is included in file metadata?
describes how an app can access the raw data in the correct format
What is the relationship between hosts and file systems?
each server has its own file system - that file system is only accessible to that server
What is file sharing?
allows access to different file systems across different hosts
What is the relationship between servers and file sharing in general purpose file sharing?
benefits of file sharing degrade as more general purpose servers are added to share pool
What are the major issues with network file sharing?
lack of scalability
file system incompatibilities across OSs
complex admin and data maintenance
What are the two main OSs in file systems?
Windows and Linux
each based on different set of protocols - cross file sharing between OSs is a complicated process
What is NAS (network-attached storage)?
purpose built file storage systems that take the place of general purpose servers for file sharing and storage
What are the benefits of NAS over general purpose servers?
centralizes file share operations
uses specialized and optimized file IO
enables Linux UNIX and Windows users to share data more efficiently
What is clustering in a NAS system?
enables multiple NAS controllers or nodes to function as a single entity - allow for workload distribution
What are the two components of a NAS system?
NAS Controller
File Storage
What is a NAS controller?
compute system that contains network, memory, and CPU resources for NAS
houses specialized file OS
responsible for managing RAID, creating LUNs, installing file systems and exporting file shares
What is file data storage?
block based storage is used to store raw NAS data and metadata
What is scale-up NAS?
provides the ability to independently grow capacity and performance
if you only scale compute or only scale storage that’s scaling up
What happens when a NAS begins approaching it’s capacity limits?
performance of system starts degrading
What is scale-out NAS?
ability to increase storage and compute simultaneously
What are the benefits of scale out NAS?
pools multiple Nodes to work as single device
scales performance/capacity simultaneously
clients connected to any node can access any file system on the cluster
stripes data across nodes with mirror or parity protection
How does scale out NAS networking work?
internal network provides intra cluster communication - each node connects to internal network
external network connection enables clients to access and share file data
What are the features of the internal network for scale-out NAS?
offers high throughput and low latency
high speed networking like Infiniband or Gigabit Ethernet
How do clients access the nodes in scale out NAS?
nodes must be connected to external Ethernet network
What is CIFS?
CIFS = client internet file system - enable clients to make requests from file systems on remote computers over TCP/IP
What is the difference between CIFS and SMB?
CIFS is non-proprietary version of SMB (server message block) which is made by Microsoft
How does CIFS enable file sharing?
using special locks
What are the features specific to CIFS?
uses file and record locking to prevent user overwriting
supports fault tolerance and automatically restore connections/files during interuptions
What is the naming scheme for remote file systems?
\server\share or \servername.domain.suffix\share
What is NFS?
network file system - common file protocol for UNIX systems
uses machine independent model to represent data
What is used for inter-process communication between two computers running NFS?
Remote Procedure Call (RPC)
What is HDFS?
Hadoop Distributed File System - supported by many of the major NAS vendors
What is required to run HDFS?
requires programmatic access because the file system can’t be mounted
all HDFS communication is layered on top of TCP/IP protocol
What type of architecture does HDFS run on?
primary and secondary
cluster consists of single Name Node that acts as management server
What is an HDFS cluster made up of?
has in-memory maps of every file - file locations - and blocks within the files where Data Nodes reside
What is the Name Node responsible for in HDFS?
manages file system namespace and controls access to the files by clients
What are Data Nodes responsible for in HDFS?
serve read/write creations and perform block creation/deletion/replication
What are the features of an HDFS file system?
spans multiple nodes and enables user data to be stored on files
traditional hierarchical file system
presents streaming interface to run apps through MapReduce framework
What is FTP?
protocol that enables transfer over an IP network
uses TCP as the transport protocol
What is the read/write process of a scale up NAS?
client packages IO into TCP/IP and forwards it to network
NAS receives request from network and converts IO request to correct physical storage which is a block level IO
operation than performed on physical storage
when NAS receives data from physical storage it packages it into correct file protocol
NAS packages into TCP/IP again and sends back over network
How is a write operation performed in a scale out NAS?
client sends file to NAS
node to which client is connected to receives the file
file is striped across the nodes
How is a read operation performed in a scale out NAS?
client requests file
node to which client is connected to receives the request
node retrieves and rebuilds the file and gives to client
What is true about scale-out NAS architecture?
even though client is only connected to one node at a time every read or write operation from that node is striped across whole cluster
How does the connected node rebuild a file that’s been striped across multiple nodes in a read request?
uses back-end Infiniband network
What is a data lake?
hub for data ingestion and consumption systems
allows customers to bring analytics to their data and avoid high cost of multiple systems