CSC 675 Review 3 - Introduction to System Implementation Techniques Flashcards

File Systems & Storage Structures - Ramakrishnan Chapter 8, 9, 10, 11 - E & N Ch. 13, 14

1
Q

Explain in your own words how data (bits) are stored on standard magnetic disk hardware.

A

The most basic unit of data on the disk is a single bit of information. A magnetic orientation in one direction on the disk could represent a “1”, an orientation in the opposite direction could represent a “0”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the minimum unit of data transfer to/from the disk?

A

The sector is the smallest unit that can be transferred to/from the disk, disk blocks are the smallest unit of transfer requested by the operating system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why is it usually more efficient to transfer a larger amount of data to/from the disk?

A

One large request requires less processing requests. Less disk I/O.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the major components of the time required to access a sector for data stored on disk?

A
  • seek time, the r/w heads are moved to the proper cylinder, mechanical movement is slow and inaccurate
  • the proper track is selected, fast electronic switch

• latency time,
- the head monitors the track until the proper sector rotates under the head, disks usually rotate 3600 rpm and about half a track will need to be scanned on average

• transfer time
- the data is transferred, typically very fast compared with seek time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe two ways in which the sector access time can be reduced (without modifying the disk hardware).

A
  1. Reduce the seek/latency for each I/O operation by maximizing sequentiality in data placement.
    • Hard to realize in a multi-user environment!
  2. Reduce the total number of disk I/O
    • This may or may not require less I/O time in single user mode, but will probably require less I/O time in multi-user mode (when the disk is shared and every disk I/O operation is effectively random).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why is sequential disk I/O much faster than random disk I/O?

A

Sequential disk I/O is faster than random disk I/O because it requires less processing between transfers. The parts of the hard drive have to move around less to find the needed data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a file?

A

ordered sequence of file blocks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are two standard techniques for associating disk blocks into files?

A

(1) sequential: file blocks are stored in contiguous disk blocks.
(2) indexed: file blocks are indexed and stored in random disk blocks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Discuss the advantages and disadvantages of using sequential allocation for file blocks.

A

Fast sequential read/ write, difficult to allocate and deallocate disk space efficiently.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Discuss the advantages and disadvantages of using indexed allocation for file blocks.

A

Slow read/write access, easy to allocate and deallocate disk space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Under what circumstances will sequential file I/O correspond to sequential disk I/O?

A

Sequential access within a file may or
may not correspond to sequential
access on the disk!!
–  Most mainframe operating systems and high-end
unix systems support high performance access.!
–  Most low-end unix systems and PC based
systems do not!
–  Most special purpose file systems implemented for use with DBS support sequential access.!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why is it important to store type information with records?

A

?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are some standard ways of storing type information.

A

Type information can be stored in a
variety of ways:

once per record set,

once per record (named record type),

once per field (named field type).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is blocking factor?

A

number of records per block

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can blocking factor be used with the number of tuples in the file to calculate the file size?

A

records / blocking factor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the record at a time interface?

A

Records have an order defined by the underlying file organization.

Interface provides means to filter

  • current
  • find
  • findNext
  • Delete
17
Q

How is it used by the DBS?

A

Basic file organization

18
Q

What is a storage structure?

A

a way to organize records on disk

19
Q

What is an access method?

A

organize information on the disk & retrieve individual records when requested by the run-time environment

20
Q

How are storage structures and access methods related?

A

access methods use storage structures to organize information

21
Q

What are the advantages and disadvantages of using an unordered file?

A

low insertion costs, high random & key sequential access costs.

22
Q

What are the advantages and disadvantages of using an ordered file?

A

high insertion costs, moderate random, low key sequential costs!

23
Q

What are the advantages and disadvantages of using a static hash file?

A

moderate insertion costs, low random, high key sequential costs

24
Q

What are the advantages and disadvantages of using a dynamic hash file?

A

index by high order

bits, no overflow blocks, exponential directory growth

25
Q

What are the advantages and disadvantages of using a B+ tree index?

A

moderate insertion, random & key sequential costs.

26
Q

What is the difference between a primary index and a clustering index?

A

primary index - built on primary key

physically ordered on non-key field

27
Q

What is a secondary index?

A

Secondary Index: Indexing field

Key field: dense index. Index ordered,
block pointers in index.

Non-key field: dense index, repeating field for duplicates, indirect index level (index points to block of pointers with same values).

28
Q

How does a secondary index differ from a primary index?

A

?

29
Q

Why is it possible to have any number of secondary indexes on a file, but at most one primary index?

A

?

30
Q

What is a B tree?

A

a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree is a generalization of a binary search tree in that a node can have more than two children.

31
Q

How are random and key sequential access performed using a B-tree?

A

Random access: partial root-leaf traversal

Sequential access: extended in-order
traversal

32
Q

what is a B+ tree?

A

make tree shorter by storing records in leaves and index entries only in internal nodes. This introduces some redundancy, but increases performance, especially for key sequential access.

33
Q

How are random and key sequential access performed using a B+ tree?

A

Random access: root-leaf traversal

Sequential access: scan data blocks
sequence set pointers

34
Q

Describe the structure of an index node in a B+ tree. How is the number of entries per B+ tree index node determined?

A

2d key + record value slots, 2d+1 pointer slots (order = 2d+1, or d)

35
Q

How is the number of entries per B+ tree index node determined?

A

?

36
Q

What are some of the different strategies for implementing record pointers( i.e. for secondary index construction)?

A

Offset in file

RID: record ID (block number + slot number)

TID: tuple ID (unique system generated value stored with each tuple)

Primary Key (unique attribute data value stored with each tuple)

37
Q

What are the advantages for each?

A

access time vs. Data Independence