CSC 675 Review 3 - Introduction to System Implementation Techniques Flashcards
File Systems & Storage Structures - Ramakrishnan Chapter 8, 9, 10, 11 - E & N Ch. 13, 14
Explain in your own words how data (bits) are stored on standard magnetic disk hardware.
The most basic unit of data on the disk is a single bit of information. A magnetic orientation in one direction on the disk could represent a “1”, an orientation in the opposite direction could represent a “0”.
What is the minimum unit of data transfer to/from the disk?
The sector is the smallest unit that can be transferred to/from the disk, disk blocks are the smallest unit of transfer requested by the operating system.
Why is it usually more efficient to transfer a larger amount of data to/from the disk?
One large request requires less processing requests. Less disk I/O.
What are the major components of the time required to access a sector for data stored on disk?
- seek time, the r/w heads are moved to the proper cylinder, mechanical movement is slow and inaccurate
- the proper track is selected, fast electronic switch
• latency time,
- the head monitors the track until the proper sector rotates under the head, disks usually rotate 3600 rpm and about half a track will need to be scanned on average
• transfer time
- the data is transferred, typically very fast compared with seek time
Describe two ways in which the sector access time can be reduced (without modifying the disk hardware).
- Reduce the seek/latency for each I/O operation by maximizing sequentiality in data placement.
• Hard to realize in a multi-user environment! - Reduce the total number of disk I/O
• This may or may not require less I/O time in single user mode, but will probably require less I/O time in multi-user mode (when the disk is shared and every disk I/O operation is effectively random).
Why is sequential disk I/O much faster than random disk I/O?
Sequential disk I/O is faster than random disk I/O because it requires less processing between transfers. The parts of the hard drive have to move around less to find the needed data.
What is a file?
ordered sequence of file blocks
What are two standard techniques for associating disk blocks into files?
(1) sequential: file blocks are stored in contiguous disk blocks.
(2) indexed: file blocks are indexed and stored in random disk blocks.
Discuss the advantages and disadvantages of using sequential allocation for file blocks.
Fast sequential read/ write, difficult to allocate and deallocate disk space efficiently.
Discuss the advantages and disadvantages of using indexed allocation for file blocks.
Slow read/write access, easy to allocate and deallocate disk space.
Under what circumstances will sequential file I/O correspond to sequential disk I/O?
Sequential access within a file may or
may not correspond to sequential
access on the disk!!
– Most mainframe operating systems and high-end
unix systems support high performance access.!
– Most low-end unix systems and PC based
systems do not!
– Most special purpose file systems implemented for use with DBS support sequential access.!
Why is it important to store type information with records?
?
What are some standard ways of storing type information.
Type information can be stored in a
variety of ways:
once per record set,
once per record (named record type),
once per field (named field type).
What is blocking factor?
number of records per block
How can blocking factor be used with the number of tuples in the file to calculate the file size?
records / blocking factor
What is the record at a time interface?
Records have an order defined by the underlying file organization.
Interface provides means to filter
- current
- find
- findNext
- Delete
How is it used by the DBS?
Basic file organization
What is a storage structure?
a way to organize records on disk
What is an access method?
organize information on the disk & retrieve individual records when requested by the run-time environment
How are storage structures and access methods related?
access methods use storage structures to organize information
What are the advantages and disadvantages of using an unordered file?
low insertion costs, high random & key sequential access costs.
What are the advantages and disadvantages of using an ordered file?
high insertion costs, moderate random, low key sequential costs!
What are the advantages and disadvantages of using a static hash file?
moderate insertion costs, low random, high key sequential costs
What are the advantages and disadvantages of using a dynamic hash file?
index by high order
bits, no overflow blocks, exponential directory growth
What are the advantages and disadvantages of using a B+ tree index?
moderate insertion, random & key sequential costs.
What is the difference between a primary index and a clustering index?
primary index - built on primary key
physically ordered on non-key field
What is a secondary index?
Secondary Index: Indexing field
Key field: dense index. Index ordered,
block pointers in index.
Non-key field: dense index, repeating field for duplicates, indirect index level (index points to block of pointers with same values).
How does a secondary index differ from a primary index?
?
Why is it possible to have any number of secondary indexes on a file, but at most one primary index?
?
What is a B tree?
a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree is a generalization of a binary search tree in that a node can have more than two children.
How are random and key sequential access performed using a B-tree?
Random access: partial root-leaf traversal
Sequential access: extended in-order
traversal
what is a B+ tree?
make tree shorter by storing records in leaves and index entries only in internal nodes. This introduces some redundancy, but increases performance, especially for key sequential access.
How are random and key sequential access performed using a B+ tree?
Random access: root-leaf traversal
Sequential access: scan data blocks
sequence set pointers
Describe the structure of an index node in a B+ tree. How is the number of entries per B+ tree index node determined?
2d key + record value slots, 2d+1 pointer slots (order = 2d+1, or d)
How is the number of entries per B+ tree index node determined?
?
What are some of the different strategies for implementing record pointers( i.e. for secondary index construction)?
Offset in file
RID: record ID (block number + slot number)
TID: tuple ID (unique system generated value stored with each tuple)
Primary Key (unique attribute data value stored with each tuple)
What are the advantages for each?
access time vs. Data Independence