Week 6 Flashcards
Where do most DB data resides on?
Secondary storage (HDD/SSD)
Why do most DB data resides on secondary storage?
- It is may be too large to reside entirely in main memory (e.g. H-Unique core datasets > 250GB and will grow to several TB)
- Secondary storage costs orders of magnitude lower than main memory, but slower to access
- Often not all database data required frequently - access from disk as required
- Secondary storage offers persistence
- Trade off between storage capacity, robustness and speed of access
Do main memory DB exist?
Yes,
1. Entire database held in main memory (along with OS, DBMS and possibly other applications)
2. Suited for real-time applications requiring extremely fast response times (e.g. Telephone network routing, high-frequency trading)
3. Extremely expensive for large datasets
How data is organised on disk
Data is organised on disk into file of records
What is a Record?
Record is a collection of related data items
- e.g. Personnel record may contain forename, surname, DOB, NI number etc
- Each item consists of one or more bytes of data
- Each data item corresponds to a particular field of the record
What is a record type?
A collection of field names and their corresponding data-types
Data types of fields are standard data types such as integer, float, character strings, date etc
Number of bytes required to store data items of each particular type if fixed for a given computer system
What is a file?
A sequence of n records.
What are the two types of records?
- Fixed length records - every record in the file is the exact same size (in bytes)
- Variable length records - file contains records of differing lengths
Consider a personnel record, if forename and surname have a maximum defined length of 15 characters each with a fixed number of bytes, what is the type of the record?
Fixed length records
On a fixed-length record, how does the position of records identified?
Starting byte position of each field can be identified relative to the start of the record, similarly, the start position of the next record can be identified relative to position of the end of the record
What is the reason different records in a file may have different sizes in bytes?
- Records of the same record type but have one or more varying length fields (e.g. name fields of employee record)
- Records of the same type but a field may have multiple values for a record (e.g. multiple contact phone numbers)
- Records of same type but one or more fields may be optional
In variable-length records, what is the purpose of different types clustered together?
For performance and retrieval
To prevent a waste of space on disk, what do variable length records use to terminate the end of the record
Special separator character
What is the other methods for encoding variable length fields may include?
- Fields as name/value pairs <field name, field value>
- Field length (in bytes) can be stored preceding field value
Records of a file must be allocated to disk blocks, what is a block?
Block is data transfer unit between disk and main memory
What are the three possibilities for block sizing?
- Block size > record size (block may contain several records)
- Block size < record size (Record is stored across multiple blocks)
- Block size = record size (Exactly one record per block)
Suppose block size is B bytes and a file contains fixed length records of size R bytes, if B > R, how many blocking factor (bfr) can we fit?
[B/R] records per block, [(x)] is a floor function that rounds the number x down to an integer
What is a bfr
Blocking factor
What would happen if R does not divide B exactly?
There will be unused space in each block equal to B-(bfr *R) bytes.
If a block does not have enough remaining space to store the complete records, the situation may be handled in one of two ways which is?
- Spanned records
- Unspanned records
Explain the Spanned records
Spanned records may be spread across two blocks, the first block has a pointer to the block containing the rest of the record
Explain the Unspanned records
In this case records are not allowed to cross block boundaries
Why do we need to allocate file blocks on disk?!
- Database files typically have an initial allocation of blocks on disk
- As data growth occurs (and it nearly always does!) files need to be able to grow to accommodate the enlarged data.
- Block allocation routines need to balance performance, flexibility and space efficiency
- Disk space is shared with any OS and other application files that are using the same disk
What are the 4 ways of allocating file blocks to disk blocks
- Continuous allocation
- Linked allocation
- Cluster allocation
- Indexed allocation
What is the Continuous allocation?
File blocks allocated to consecutive disk blocks
+ve: Very fast reading of the whole file as blocks are contiguous
-ve: File expansion difficult due to used block on disk
What is the Linked allocation?
Each file block contains a pointer to the next file block
+ve: Very easy to expand the file, just allocate the next free block and add a pointer
-ve: File reads are slower, especially with magnetic disks
What is the Cluster allocation?
- Combines continuous and linked allocations
- Allocate clusters of blocks and link them with pointers
- Achieve a balance between ease of expansion and read performance
What is the Indexed allocation?
- One or more index blocks contain pointer to actual file blocks
- Read index blocks to find pointers to blocks containing desired data
- Analogous to using a library card index
Explain a File Headers
- Also known as file descriptor
- Contains information needed by programs accessing records in the file
2.1. Information to determine disk address of file block
2.2. Record format description
2.2.1. For both fixed and variable length records
2.2.2. Order of fields
2.2.3. Type codes
2.2.4. Field & record separators
What are the two groups of operations on a file?
- Retrieval Operations
1.1. Do not change data in the file
1.1.1 Locate records so field values can be read and processed - Update Operations
2.1. Modify the data file
2.1.1 Insertion or deletion of records
2.1.2 Alteration of field values within a record or records
These are the underlying file operations for which SQL SELECT/INSERT/UPDATE/DELETE statements provide an abstraction
Explain the Open File Operations
- Prepares file for reading/writing
- Allocates buffers (usually minimum of 2) to hold file blocks from disk
- Retrieves file header
- Set file pointer to beginning of file
Explain the Reset File Operation
- Set file pointer back to beginning of file