Mass Storage (11) Flashcards

1
Q

HDD

A
  • HDDs spin platters of magnetically-coated meterial under a moving head
    • drives speed 60/250 RPM
    • transfer rate: rate of data flow between drive and computer (around 1 Gb/sec)
    • positioning time (random access time): is time to move disk arm to desired cylinder (seek time) (from 3ms to 12ms) and time for desired sector to rotate under the disk head (rotational latency) (worst 60/RPM, avarage is 1/2 of 60/RPM)
    • Access latency = average acess time = seek time + average rotational latency
    • Average I/O time: average access time + (amount to transfer/ transfer rate) + controller overhead
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Nonvolatile memory devices

A
  • Can be drive-like, include also usb-drives
  • Can be more reliable than hdd
  • are more expensive
  • Avg shorter life span – need careful management
  • Less capacity
  • Much faster
  • Busses can be too slo for them -> connect directy to PCI
  • No moving parts, no seek time or rotational latency
  • Read and written in page increments (equivalent of sectors)
  • Can’t be overwritten in place
    • must be erased and erases happen in larger block increments
    • Limited number of erases (100k)
    • Life span measured in drive writes per day (DWPD)
      • 1TB NAND drive with rating of 5DWPD is expected to have 5TB per day written within the warantee period without failing.
        *
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

NAND Flash Controller Algorithms

A
  • With no overwrite the result is that often there is amix of valid and invalid data and non-functioning blocks.
  • Tracking valid logical blocks: controller maintains flash translation layer (FTL) table
  • Also implements garbage collection to free invalid page space
  • Allocates overporvisioning to provide space for GB
  • Each cell has a lifespan, so we need to try to write an equal amount of times on every cell.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Volatile memory

A
  • DRAM is often used as a mass storage device, even if techinically it is not a secondary storage
  • RAM drives present as raw block devices, commonly file system formatted.
  • Used as high speed temp. storage
  • Computers have buffering, caching via RAM, so why RAM drives?
    • Caches and buffers are managed by programmer, os, hw
    • We can have also user controlled ram
      • in all major os
        • Linus /dev/ram, macOS diskutil
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Address mapping

A

Disk are addressed as big 1d arrays of logical blocks:

  • logical block: smallest unit of transfer
  • Low level formatting creates logical blocks on physical media

1d array mapped into the sectors of the disk sequentially:

  • sector 0 is frist sector of the first track on the outermost cylinder
  • Mapping proceedes in this order..
  • Logical to physical is easy
    • Except for bad sectors
      • Non constant #secotrs per track via constant angular velocity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

HDD Scheduling

A

The os is in charge of using hw efficiently, in such a way that access time is low and bandwidth is large.

Many sources of I/O request

  • OS
  • System processes
  • users processes

I/O requests are made of:

  • mode: input or output
  • disk address
  • memory address
  • number of sectors to transfer

OS has queue of the requests per disk/device, when a device is idle it can work on I/O.

  • Optimization only done when there is a queue

In the past the operating system was also responsible for disk drive head scheduling, now it is built in storage device controllers.

There are several possible algorithms to use with a request queue.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Disk scheduling algo: FSFS

A

Simple FIRST IN FIRST OUT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Disk scheduling algo: SCAN

A
  • AKA elevator algorithm.
  • The disk arm starts at one end of the disk,and moves toward the other end, servicing requests until it reaches the end of the disk, there the head movement is reversed.
  • Note that if requests are not uniformly dense, and the largest density is at the extremes, those are the sectors who wait the most.
  • 208 head movements in illustration for the given queue
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Disk scheduling algo: C-SCAN

A
  • CIRCULAR-SCAN
  • Provides more uniform wait time.
  • Threats the cylinders as a circular list that wraps around from the last cylinder to the first
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Choosing disk scheduling algo

A
  • Shorter seek time first (greedy) is common
  • SCAN and C-SCAN are very good for systems that place heavy load on the disks
  • Linux implements a deadline scheduler:
    • Manintains different queue for read and write
      • read has priority
    • Implements foru queues: 2 x read, 2 x write
      • 1 read and 1 write queue sorte in LBA order (basically C-SCAN)
      • 1 read and 1 write queue sorted in FCFS
      • All I/O req.s are sent in batch sorted in the queue order
      • After each batch, checks if requests in FCFS older than max age(usually 500ms)
        • if so, LBA queue containing that request is selected for the next batch of I/O
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

NVM scheduling

A
  • No disk heads or rotational latency but we can still optimize
  • NOOP scheduling (no scheduling) is used but adjacent LBA requests are combined
    • NVM best at random I/O, HDD best at sequential
    • Throughput can be similar
    • Much more I/O per second ( 1000x)
    • Write amplification: one write can cause many reads writes because of garbage collection
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Storage device management: low level formatting

A

Low level formatting/physical formatting is the process of dividing the disk into secotrs that the disk controller can read and write.

Each sector can hold an header + data + error correction codes;

Usually sector is 512B.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Storage device management: partitions, logical formatting

A

The OS needs to use disk to hold files and its data structures.

Partitions are group of cylinders, each treated like a single logical disk.

Logical formatting means creating a file-system.

Usually FS group blocks into clusters to increase efficiency:

  • Disk I/O: block level
  • File I/O: cluster level
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Storage device management: boot block

A
  • Root partition contains OS, other partitions can hold other OS, fs, raw
    • Mounted when boot happens
  • At the time of the mount the fs consistency is checked
    • if error -> try fixing it
  • Boot block initializes system:
    • Bootstrap is stored in rom
    • Bootstrap loader is stored in boot blocks of boot partition
  • Boot block can point to
    • Boot volume
    • Boot loader: set of blocks with code to load the kernel
    • Boot management program: for multi os support
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Storage device management: sector sparing, raw disk access

A

Sector sparing is a method to manage bad blocks.

Raw access to disk may be desireable for certain applications that want to manage their own block (db).

17
Q

Swap-space management

A
  • Used for moving entire process or pages from dram to disk, when DRAM is too full.
  • OS provides the capability:
    • secondary storage is slower -> optimization is important
    • can be on raw partition or within file system.
18
Q

Storage attachment

A
  • Computers access storage in 3 ways:
    • host-attached:
      • local IO ports, using technologies such as USB or Optical Fibre Channel.
    • network-attached storage (NAS):
      • made available over a network
      • NFS/CIFS are the protocols used
      • implemented through remote procedures calls over TCP or UDP
    • cloud
      • API based, programs using the APIs to provide access
19
Q

Storage array

A
  • Attach disks or array of disks
  • Has controller(s), to provide services to hosts:
    • ports
    • memory, controlling sw
    • RAID
    • shared storage
    • extra features: snapshots, clones, replication
  • Compared to NAS is faster because NAS uses network bandwidth
20
Q

Storage area network

A

Various hosts connected to multiple storage arrays, via fibre channel(s).

Hosts can attach to switches.

Easy to add and remove storage, also easy to give storage to new machine in the net.

21
Q

RAID

A
  • redundant array of inexpensive disks
    • reliability provided by redundancy
  • increase mean time to failure
  • Mean time to repair: time in which another failure could destroy the system
  • Mean time to data loss: combines latter factors
  • Example:
    • Mean time to failure = 100k h
    • Mean time to repair = 10 h
    • Mirrored RAID: 2 disks
    • MTTF(1st disk) = MTTF/2 = 50k h
    • Probability of second failing during repair = MTTR/MTTF = 10/100k = 10^-4
    • Mean time to data loss (MTTDL) = MTTF(1st disk) / (MTTR/MTTF) = MTTF^2/(MTTR*2)
  • can be combined with nvram to increase perf.
  • striping uses a group of disks as a storage unit
  • mirroring: duplicates data
  • Other features:
    • Snapshot
    • Hot spare disk
  • Extension
    • RAID doesn’t prevent or detect data corruption but just disk failures
    • Solaris ZFS extends it with checksums
      • can detectr anc correct data/metadata corruption
    • ZFS doesn’t support volumes or partitions, but only pools
      • malloc and free primitives