Physical Storage(2020) Flashcards

1
Q

Data Storage:

Major Topics

A
  • Levels of Storage
  • Evaluating Storage
  • Magnetic Disk Physical Components
  • Data Organization
  • RAID
  • Techniques
  • Issues
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Physical Storage:

Storage Levels

A
  • Primary
    • Cache
    • Main Memory
  • Secondary
    • Flash Memory
    • Magnetic Disk
  • Tertiary
    • Optical Disk
    • Magnetic Tapes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Primary Storage

Devices

A
  • Cache
  • Main Memory
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Secondary Storage

Devices

A
  • Flash Memory (SSD)
  • Magnetic Disk (Hard Drive)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Tertiary Storage

Devices

A
  • Optical Disk
  • Magnetic Tapes

(Basically any external, sturdy storage)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Storage Devices:

Cache Overview

A
  • Primary Storage Level
  • Fastest form of storage
  • Volatile - only used temporarily
  • Managed by the computer system hardware
  • Typically multiple levels of cache
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Storage Devices:

Main Memory Overview

A
  • Primary Storage Level
  • Fast Access
    • 10s to 100s of nanoseconds
  • Generally too small/expensive to store entire databases
  • Typically RAM
  • Volatile
    • Usually lost if power is lost
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Storage Devices:

Flash Memory Overview

A
  • Secondary Storage Level
  • Reads are roughly as fast at main memory
  • Non-volatile
  • Limited number of read/writes (10k - 1M)
    • When erasing, has to wipe entire block of memory
  • Write is SLOW(Micro seconds)
    • Erase is slower
  • USB sticks, cameras, phones, etc
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Storage Devices:

Magnetic Disk Overview

A
  • Secondary Storage Level
  • Non-volatile
    • But disk failure can still destroy data
  • Stored on spinning disk
  • Read/writes magnetically
  • Primary means of long term storage for databases
  • Must be moved to memory for read/write (VERY SLOW)
  • Can read in any order
  • Rather cheap and large amounts of storage
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Storage Devices:

Optical Storage Overview

A
  • Tertiary Storage Level
  • Non-volatile
  • Read from physical disk using a laser
  • CD, DVD and Blu-Ray most popular forms
    • CD is the smallest
  • Some are write once, read many - (CD-R)
  • Some are many writes, many reads - (CD-RW)
  • Slower than magnetic disk
  • “Juke Box” systems were used to store disks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Storage Devices:

Tape Storage Overview

A
  • Tertiary Storage Level
  • Non-volatile
    • Backup and archival data
  • Sequential access
    • Extremely slow
  • Very High capacity
  • Tape jukeboxes can store petabytes of data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Magnetic Disk:

Components

A
  • Platter (disks)
    • Divided into circular “Tracks
    • Tracks broken into “Sectors
  • Spindle
  • Read-Write Head
  • Arm Assembly
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Magnetic Disks:

Read/Write Head

A
  • Very close to the platter, almost touching
  • Reads and writes data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Magnetic Disk:

Platter

A
  • Disk is split into multiple “Platters
  • Each platter is divided into circular Tracks, line lanes
    • Over 50-100K Tracks per Platter
  • Tracks are broken into Sectors, chunks of lanes
    • Smallest unit of data that can be written
    • Typically 512 bytes
    • More on outer edge of platter
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Magnetic Disk:

Reading and Writing

A
  • Reads/Writes accomplished via the Read/Write Head
  • After Write, there is a checksum
  • Read again, and check
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Magnetic Disk:

Disk Subsystem Overview

A
  • Multiple Disks are connected to a computer through a main controller
    • Controller manages the “big picture”
    • Individual disks usually handle checksums, etc
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Magnetic Disk:

Types of Disk Subsystems

A
  • SAN - Storage Area Networks
    • Connected via high speed network to servers
  • NAS - Network Attached Storage
    • Uses network file system protocol
    • Allows for storage like a file system
18
Q

Evaluating Storage:

Concerns/Factors

A
19
Q

Storage Evaluation:

Access Time

A

The time it takes from issuing read/write command to when data transfer actually begins.

Factors:

  • Seek Time
    • Time to position arm over correct track
    • ~4-10 ms
  • Rotational Latency
    • Time it takes for sector to appear under head
    • ~4-11 ms, depending on how fast disk spins
20
Q

Storage Evaluation:

Data Transfer Rate

A

The rate at which data can be stored or retrieved

  • Depends on the controller rate
    • SATA vs Fiber connections may limit if multiple disks share
21
Q

Storage Evaluation:

MTTF

A

Mean Time To Failure

  • Average time that a disk is expected to run continuously without any failure
  • Typically 3-5 years
  • Decreases as the disk ages
  • If the MTTF is 1,200,000 hours:
    • Given 1000 new disks, on average one will fail every 1200 hours
22
Q

Block:

Definition and overview

A

A Block is a contiguous sequence of sectors from a single track

  • Data is transferred from disk to memory in blocks
  • Smaller blocks = more reads from disk
  • Larger blocks = more wasted space
  • The Elevator Algrorithm is used to schedule reads and writes
23
Q

File Organization

Overview

A
  • Related information is stored nearby
  • Files may get fragmented over time:
    • Parts of file deleted
    • Free blocks are scattered, a new file is scattered
    • Increases seek time
  • Defragmenting a hard drive can improve speeds
24
Q

Non-volatile Write Buffers

A
  • Basic Idea:
    • Write blocks to battery backed up RAM or flash memory BEFORE writing to disk
  • Controller can write to disk when it has nothing else to do, or a task has been in RAM for a while
  • Database operations can continue without waiting for data to be written to disk
  • Write orders can be optimized before going to disk
25
Q

Storage Strategies

A
  • RAID
    • Redundant Arrays of Independent Disks
  • Mirroring
    • Duplicate every disk
  • Parallelism
    • Improve transfer rate by “striping” data across multiple disks
26
Q

RAID

Overview

A
  • Redundant Arrays of Independent Disks
  • Manage a large number of disks, but provide the view of a single disk
    • High speed and capacity by utilizing mutliple disks in parallel
    • High reliability by storing data redundantly
  • Originally a cost effective method, as opposed to large, expensive disks
    • “i” originally stood for “inexpensive”
  • Today, used for higher reliability and bandwidth
27
Q

Storage Strategies:

Mirroring

A
  • Duplicate every disk
  • Writes occur to both disks
  • Reads can use either disk
  • If one disk fails, the system is still operational
    • Only considered to fail if both go down simultaneously
    • Can repair or replace the one that failed
    • Probability of both going offline at same time is very low
    • Independent of outside factors, of course
      • Fires
      • Collapsed buildings
      • Disasters
28
Q

Storage Strategies:

Parallelism

A
  • Load balance multiple small inputs to increase throughput
  • Parallelize large inputs to reduce response time
  • We can improve the transfer rate by striping data across multiple disks
  • Types of Striping:
    • Bit Striping
    • Block Striping
29
Q

Storage Strategies:

Parallelism:

Bit Striping

A
  • Idea:
    • Split the bits of a byte between multiple disks
  • Suppose 8 disks:
    • Write bit i of byte to disk i
    • Can access data 8X faster than single disk
    • Seek time is worse than single disk
30
Q

Storage Strategies:

Parallelism:

Block Striping

A
  • Idea:
    • Write blocks to individual disks, instead of bits
  • Requests for different blocks can run in parallel
  • A request for a long sequence of blocks can be run in parallel
31
Q

Storage Strategies:

RAID:

RAID Levels Overview

A
  • Combines Mirroring, striping
  • Adds Parity
  • Each level has different performance and reliability:

Levels:

  • RAID 0
    • Block Striping, no redundancy
  • RAID 1+0
    • Block Striping, Mirrored Disks
  • RAID 2
    • Bit Striping, uses parity bit
  • RAID 3
    • Bit Striping, uses parity bit
  • RAID 4
    • Block Striping, uses parity block
  • RAID 5
    • Block Striping, distributed parity block
  • RAID 6
    • Block Striping, distributed redundant bits
32
Q

RAID:

RAID 0

A
  • Block Striping
  • No redundancy
  • Used in high performance, where data loss is not critical
33
Q

RAID:

RAID 1 + 0

A
  • Block Striping
  • Uses Mirrored disks for redundancy
  • Best Write performance
  • Often used for log files or database systems
34
Q

RAID:

RAID 2

A
  • Bit Striping
  • Uses Parity Bit
  • Counts number of set bits, Parity=0 if even
  • Often used for systems where you need to ensure the data is not corrupt
  • Needs 3 extra disks
  • Faster transfer rate, many more I/Os
35
Q

RAID:

RAID 3

A
  • Bit Striping
  • Uses Parity Bit
  • Unlike RAID 2, only needs a single disk for parity
    • Cheaper than RAID 2
  • If a bit is lost, then the parity of the remaining bits is calculated
    • If it matches parity bit, the lost bit is 0
    • Otherwise it is 1
  • Like RAID 2, Faster transfer rate, many more I/Os
36
Q

RAID:

RAID 4

A
  • Block Striping
  • Uses Parity Block
  • Only needs a single disk for parity block
  • Faster for large reads and writes
  • Slower for very small reads and writes
37
Q

RAID:

RAID 5

A
  • Block Striping
  • Distributed Parity Block
    • Stores disk i parity block on:
      • ( i mod n) + 1 disk,
      • where n is the number of disks
  • Subsumes RAID level 4
38
Q

RAID:

RAID 6

A
  • Block Striping
  • Distributed Redundant Bits
    • For every 4 bits stored on a disk, stores 2 in another disk
  • Better reliability than RAID 5, but costs more due to extra space
  • Not widely used
39
Q

RAID:

Considerations when choosing a RAID Level

A
  • Monetary Costs
  • Performance
    • Number of I/O operations per second
  • Performance during failure
  • Performance during rebuild
40
Q

RAID:

Choosing a RAID Level:

General Guidelines

A
  • RAID 0 is used when data safety isn’t important
  • RAID 2 and 4 are never used
    • Replaced by RAID 3 and 5
  • RAID 3 is not used much, because bit striping forces all disks to operate for a single block of data
  • RAID 6 is rarely used because 1 and 5 offer adequate protection against data loss
41
Q

RAID:

Choosing a RAID Level:

Comparison of RAID 1 and 5

A
  • RAID 1 provides better write performance than 5
    • 5 requires 2 block reads and 2 block writes to write a single block
    • RAID 1 only requires 2 writes
    • RAID 1 preferred for frequently updated environments
  • RAID 1 used to have a higher cost than RAID 5
    • Capacity increasing rapidly
    • When enough disks to satisfy I/O, often there are extra disks that can be used for storage
  • RAID 5 preferred for low update rates and large amounts of data
42
Q

Data Storage:

Latent Failures

A
  • Data that was successfully written has now been corrupted
  • Data Scrubbing:
    • Continuously scan for latent failures
  • Many systems keep spare disks online to swap out failed disks quickly
  • Redundant power supplies, multiple controllers, etc