DBMS - Persistence Flashcards

1
Q

HDD

A
  • Offer data persistence
  • Inexpensive but fragile
    Offer direct and sequential access but have slow access speeds and limited bandwidth.

Organization based on block

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

High level components of disk drive

A

Controller, Memory, Recording Channel, Actuator UCM control, Spindle motor control

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Multi Zoning

A

Outer tracks have more sectors in a HDD platter because circumference is larger

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Sector

A

Smallest unit of data that can be read or written to a disk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Cluster

A

Smallest unit of data that a file system can allocate to a file, each cluster has fixed size thats a multiple of sector size.

Fie stored optimally as a series of contiguous clusters

When file is split into multiple fragments we can have external fragmentation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Track

A

Concentric ring of sectors on a platter. R/W head can read all data from a track by moving to a position and rotating platter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Cylinder

A

Group of tracks in all platters that are on top of each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Rotational latency

A

On average half the time of a complete turn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Hard disk IO timings

A

Time of IO: TIO = Tseek+Trotation+Ttransfer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Rate of IO

A

rate of IO is size of data over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Cluster typical size

A

4096 bytes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why is the track important

A

It holds all sectors from a disk platter that can be read without moving the actuator from a surface.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why is the cylinder important

A

total storage accessible for r/w without moving actuators i.e only one seek time required. There are as many cylinders as tracks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Fastest way to read blocks

A

In a sequential stream as opposed to direct mode.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Track skew

A

Angular offset should be long enough to be just greater than seek time required. Sequential scans that overlap cylinders are avoiding rotational delay.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Interleaving

A

Jump should be long enough to be just greater than transfer time required.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Average disk seek time

A

1/3 of full seek time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Generic Disk Requirements (Data servers)

A

High RPM, low seek time, high transfer bandwidth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Generic Disk Requirements (PC)

A

Capacity and low cost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Generic Disk Requirements (Laptop)

A

Sturdy and low power consumption

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How to read HDD numbers

A
  • Read disk parameters like Transfer size, seek time, RPM, Transfer Rate, Cache
  • Controller overhead is 2ms
  • If disk is idle it has no queue delay
  • Avg. disk access time for a sector is Avg. seek+Avg. rotational delay + transfer time + controller overhead
  • Advertised seek time assumes no locality, actual typically 1/4 advertised time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Comparing Rate of Growth in Capacity with rate of progress in seek

A

Continue advance in capacity and bandwidth but slow improvement in seek and rotation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Possibilities for HDDs

A

Double and independent actuators (two RW heads on each platter). Very difficult as heads must be aligned together

24
Q

ATA Controller

A

Uses multiple buses

25
Q

SATA

A

Separate connection for each drive

26
Q

SCSI

A

Uses one bus for different drives

27
Q

Journey of the Byte

A
  1. Program asks OS to write contents to next available position in TEXT.
  2. OS passes job to file manager
  3. File manager looks up TEXT in table, checks access rights and corresponding file.
  4. File manager searches file allocation table for physical loc of sector.
  5. Make sure last sector was stored in system IO buffer in RAM.
  6. Tells IO processor where byte is in RA and where its destination is
  7. IO processor finds time when a drive is available and formats disk
  8. IO processor sends data to disk controller
  9. Controller instructs drive to move RW head to track and sends byte to be deposited
28
Q

External Fragmentation

A

File spanning may not be contiguous

29
Q

How does External Fragmentation affect sequential reads?

A

Interrupts flow by introducing seek access

30
Q

How does External Fragmentation affect direct access?

A

Superficially none but can break locality advantage

31
Q

How does External Fragmentation affect allocation?

A

Although space is available it’s not contiguous

32
Q

Internal Fragmentation

A

Any data file needs to be spanned into a list of data blocks i.e sectors. Data files cant share any sector

33
Q

Issues of Read/Write

A

Ordering of disk bound operations. Even though OS takes care of this:
1. DBMS Still needs model for read write sequencing
2. If DBMS has raw (direct) access to a HDD then surely OS is out of the picture

34
Q

Shortest Seek Time First

A

Approach to sort issues of read write. Risk of starvation

35
Q

SCSI Drive reliability

A

typically 1.2 million hours

36
Q

What affects a unit’s reliability

A

Number of platters, Seek Usage Pattern, Temperature, Power Consumption

37
Q

Disk Shadowing

A

Making 2+ copies of data written to a disk drive

38
Q

Disk Duplexing

A

Method of storing data where the data from one HDD is duplicated onto another, each using its own HDD controller

39
Q

Disk Mirroring

A

Method to share data where we duplicate a HDD but share the HDD controller

40
Q

MTBF

A

mean time between failures. Measured by averaging the timespans a unit is continuously functional for

Drive fail graph follows a U shape.

41
Q

Failure Rates

A

corelate to drive model and age. Fail rates can be annualized over nine months. Intensive usage doesn’t play a role in disks under 5 years old.

42
Q

RAID

A

Redundant Array of Independent Disks

Aggregated unit built from a number of simpler disks, CPUs and RAM, for faster and larger setups.

43
Q

RAID 6

A

RAID 5 with striping

44
Q

RAID Evaluation based on the following

A
  • Capacity: Aggregate of N drives in full redundancy. N/2 used for storage
  • Reliability: What faults and how many can can a system withstand
  • Performance: Different workloads are expected to have different measures
45
Q

RAID 0

A
  • Data divided into chunks and written across an array (Striping)
  • Need at least 2 drives
  • No recovery in case disk fails
  • Enables high level performance as parallel access improved retrieval speed
  • Typical ex. in graphic image processing
46
Q

Chunk Size

A

somewhat related to performance. The smaller the chunk is, the more pieces are needed and more spread over disks. Parallelism is good, but seek time per drive is a negative.

Larger chunks reduce the pieces required and have less spread, meaning lower seek times, but not as much parallelism

47
Q

Workload

A

mix of direct and sequential writes

48
Q

RAID 1

A
  • Uses mirroring to copy data on 2 drives simultaneously.
  • Provides failure tolerance, if one disk fails we have the other
  • Double storage cost as we duplicate data
    Application: High availability requirement
49
Q

RAID 0+1

A
  • Mirrored array whose segments are RAID 0 arrays.
  • Both data duplication and improved access speeds are possible. High IO rate due to multiple stripe segments.
  • Minimum 4 drives
  • Single drive failure will cause whole setup to become RAID0. Expensive + high overhead
    Application: File Server
50
Q

RAID 5

A
  • use technique that avoids concentration of IO on dedicated parity disks by writing separately on multiple disks
  • Minimum 3 drives
  • Write penalty occurs as existing data must be read before update and parity data has to be updated after data is written
  • Enables multiple disk writes to be implemented concurrently
  • Medium write data transaction rate
  • Difficult to rebuild once a unit fails compared to RAID 1.
    Applicable: DB and Web Servers
51
Q

RAID 10

A
  • Mirror then stripe, Minimum 4 drives. Striped array whose segments are RAID 1 arrays.
  • Fault tolerance same as RAID 1. Same overhead.
  • High IO by striping RAID 1 segments, can sustain multiple failures under right circumstances.
  • RAID 1 with performance boost
52
Q

NVRAM

A

Non Volatile RAM. Retains fata without power. read latency 100ns-1000ns

53
Q

Types of NVRAM

A
  • Uses SRAM connected to a power source ex. battery
  • Uses EEPROM to save data when power not available. Has a combination of SRAM+EEPROM semiconductors in one chip
54
Q

NVRAM Advantages

A
  • Support high speed data R/W ops for parallel processing and DBMS cache
  • Can act as in-unit cache for HDD and SSD
  • Semiconductors light on power consumption + backup power exhaustion unlikely for long time
55
Q

NVRAM Disadvantages

A
  • Write to read speed ratio is an issue for performance
  • Iffy production wise
  • Chips fail
56
Q

SAN

A

Storage Area Network. Connection uses SCSI over fibre optic. Mounting of device allows storage to appear local.

Provides access to consolidated block level data storage. SANs used to enhance storage devices such as disk arrays. Appears like device is locally attached to OS

57
Q

NAS

A

Network Attached Storage. Connection uses NES/CIES with TCP/IP. Uses Software for Logon and direct access facilities.

File-level data store connected to network providing data access to group of clients. Contain 1+ HDDs and are often arrayed into logical, redundant storage containers or RAID.