Week 6 Flashcards

Question 1

Q

Where do most DB data resides on?

Answer

A

Secondary storage (HDD/SSD)

Question 2

Q

Why do most DB data resides on secondary storage?

Answer

A

It is may be too large to reside entirely in main memory (e.g. H-Unique core datasets > 250GB and will grow to several TB)
Secondary storage costs orders of magnitude lower than main memory, but slower to access
Often not all database data required frequently - access from disk as required
Secondary storage offers persistence
Trade off between storage capacity, robustness and speed of access

Question 3

Q

Do main memory DB exist?

Answer

A

Yes,
1. Entire database held in main memory (along with OS, DBMS and possibly other applications)
2. Suited for real-time applications requiring extremely fast response times (e.g. Telephone network routing, high-frequency trading)
3. Extremely expensive for large datasets

Question 4

Q

How data is organised on disk

Answer

A

Data is organised on disk into file of records

Question 5

Q

What is a Record?

Answer

A

Record is a collection of related data items
- e.g. Personnel record may contain forename, surname, DOB, NI number etc
- Each item consists of one or more bytes of data
- Each data item corresponds to a particular field of the record

Question 6

Q

What is a record type?

Answer

A

A collection of field names and their corresponding data-types

Data types of fields are standard data types such as integer, float, character strings, date etc

Number of bytes required to store data items of each particular type if fixed for a given computer system

Question 7

Q

What is a file?

Answer

A

A sequence of n records.

Question 8

Q

What are the two types of records?

Answer

A

Fixed length records - every record in the file is the exact same size (in bytes)
Variable length records - file contains records of differing lengths

Question 9

Q

Consider a personnel record, if forename and surname have a maximum defined length of 15 characters each with a fixed number of bytes, what is the type of the record?

Answer

A

Fixed length records

Question 10

Q

On a fixed-length record, how does the position of records identified?

Answer

A

Starting byte position of each field can be identified relative to the start of the record, similarly, the start position of the next record can be identified relative to position of the end of the record

Question 11

Q

What is the reason different records in a file may have different sizes in bytes?

Answer

A

Records of the same record type but have one or more varying length fields (e.g. name fields of employee record)
Records of the same type but a field may have multiple values for a record (e.g. multiple contact phone numbers)
Records of same type but one or more fields may be optional

Question 12

Q

In variable-length records, what is the purpose of different types clustered together?

Answer

A

For performance and retrieval

Question 13

Q

To prevent a waste of space on disk, what do variable length records use to terminate the end of the record

Answer

A

Special separator character

Question 14

Q

What is the other methods for encoding variable length fields may include?

Answer

A

Fields as name/value pairs <field name, field value>
Field length (in bytes) can be stored preceding field value

Question 15

Q

Records of a file must be allocated to disk blocks, what is a block?

Answer

A

Block is data transfer unit between disk and main memory

Question 16

Q

What are the three possibilities for block sizing?

Answer

A

Block size > record size (block may contain several records)
Block size < record size (Record is stored across multiple blocks)
Block size = record size (Exactly one record per block)

Question 17

Q

Suppose block size is B bytes and a file contains fixed length records of size R bytes, if B > R, how many blocking factor (bfr) can we fit?

Answer

A

[B/R] records per block, [(x)] is a floor function that rounds the number x down to an integer

Question 18

Q

What is a bfr

Answer

A

Blocking factor

Question 19

Q

What would happen if R does not divide B exactly?

Answer

A

There will be unused space in each block equal to B-(bfr *R) bytes.

Question 20

Q

If a block does not have enough remaining space to store the complete records, the situation may be handled in one of two ways which is?

Answer

A

Spanned records
Unspanned records

Question 21

Q

Explain the Spanned records

Answer

A

Spanned records may be spread across two blocks, the first block has a pointer to the block containing the rest of the record

Question 22

Q

Explain the Unspanned records

Answer

A

In this case records are not allowed to cross block boundaries

Question 23

Q

Why do we need to allocate file blocks on disk?!

Answer

A

Database files typically have an initial allocation of blocks on disk
As data growth occurs (and it nearly always does!) files need to be able to grow to accommodate the enlarged data.
Block allocation routines need to balance performance, flexibility and space efficiency
Disk space is shared with any OS and other application files that are using the same disk

Question 24

Q

What are the 4 ways of allocating file blocks to disk blocks

Answer

A

Continuous allocation
Linked allocation
Cluster allocation
Indexed allocation

Question 25

Q

What is the Continuous allocation?

Answer

A

File blocks allocated to consecutive disk blocks
+ve: Very fast reading of the whole file as blocks are contiguous
-ve: File expansion difficult due to used block on disk

Question 26

Q

What is the Linked allocation?

Answer

A

Each file block contains a pointer to the next file block
+ve: Very easy to expand the file, just allocate the next free block and add a pointer
-ve: File reads are slower, especially with magnetic disks

Question 27

Q

What is the Cluster allocation?

Answer

A

Combines continuous and linked allocations
Allocate clusters of blocks and link them with pointers
- Achieve a balance between ease of expansion and read performance

Question 28

Q

What is the Indexed allocation?

Answer

A

One or more index blocks contain pointer to actual file blocks
Read index blocks to find pointers to blocks containing desired data
Analogous to using a library card index

Question 29

Q

Explain a File Headers

Answer

A

Also known as file descriptor
Contains information needed by programs accessing records in the file

2.1. Information to determine disk address of file block

2.2. Record format description
2.2.1. For both fixed and variable length records
2.2.2. Order of fields
2.2.3. Type codes
2.2.4. Field & record separators

Question 30

Q

What are the two groups of operations on a file?

Answer

A

Retrieval Operations
1.1. Do not change data in the file
1.1.1 Locate records so field values can be read and processed
Update Operations
2.1. Modify the data file
2.1.1 Insertion or deletion of records
2.1.2 Alteration of field values within a record or records

These are the underlying file operations for which SQL SELECT/INSERT/UPDATE/DELETE statements provide an abstraction

Question 31

Q

Explain the Open File Operations

Answer

A

Prepares file for reading/writing
Allocates buffers (usually minimum of 2) to hold file blocks from disk
Retrieves file header
Set file pointer to beginning of file

Question 32

Q

Explain the Reset File Operation

Answer

A

Set file pointer back to beginning of file

Question 33

Q

Explain the Find/Locate file operation

Answer

A

Searches for first record satisfying search condition
Transfers block containing matched records into main memory buffer (if not already in memory)
Points file pointer to fetched record in file

Question 34

Q

Explain the FindNext file operation

Answer

A

Same as find, but gets the NEXT matching record in file

Question 35

Q

Explain the Read/Get file operation

Answer

A

Copies current record from buffer to program variable
May also advance file pointer to next record in file
a. May cause next block to be read from disk

Question 36

Q

Explain the Delete file operation

Answer

A

Deletes the current record
Updates file on disk to reflect deletion

Question 37

Q

Explain the Modify file operation

Answer

A

Modifies some fields for the current record
Updates file on disk to reflect changes

Question 38

Q

Explain the Insert file operation

Answer

A

Locates block where record to be inserted
Transfers block to main memory buffer
Writes record into buffer
Writes buffer to disk

Question 39

Q

Explain the Close file operation

Answer

A

Releases all buffers
Performs any other cleanup operations required

Question 40

Q

What are the 3 Set-at-a-time operations

Answer

A

*FindAll
* Locate all records that satisfy a search condition

FindOrdered
* Retrieves all records in a file in a specified order
Reorganise
* E.g. Reorder file records based on a specified field value

Question 41

Q

What are the 2 file organisation?

Answer

A

Primary file organisation
Secondary file organisation

Question 42

Q

What are the two types of Primary organisation

Answer

A

unordered records
ordered records

Question 43

Q

Explain briefly primary organisation: unordered records

Answer

A

Also known as heap files
Simples most basic type of organisation
Records placed in file in insertion order (new records placed at end of file)

Question 44

Q

What are the characteristics of primary organisation: unordered records

Answer

A

Inserting new record very efficient
Searching for record very expensive
Deletion of record expensive and inefficient

Question 45

Q

Why do inserting in the primary organisation: unordered records very efficient?

Answer

A

Address of last block kept in file header
Last file block copied to buffer
New record added and written to disk

Question 46

Q

Why do searching for record in the primary organisation: unordered record very expensive?

Answer

A

Linear search required

Question 47

Q

Why do deletion of record in the primary organisation: unordered record expensive and inefficient

Answer

A

Linear search to find block with record to be deleted
Copy block to buffer
Delete record from buffer
Rewrite block to disk
Leaves unused space on disk
Large no. of deletes leads to much wasted space

Question 48

Q

What are the alternative deletion approach in the unordered records

Answer

A

Use deletion marker
Extra bit set to mark records as deleted
Search operations ignore deleted records

Question 49

Q

Explain briefly Primary organisation: ordered records

Answer

A

Physically order records in a file based on one of their fields (called ordering field)
- Ordering field called ordering key if it is a key field of file
– i.e. unique value for each record

Question 50

Q

What are the advantage of Primary organisation: ordered records?

Answer

A

Reading records in order defined by key extremely efficient (no sorting required)
Finding next record in order requires no additional block access unless current record is last in block
Using search conditions based on ordering key results in faster access

Question 51

Q

What are the disadvantages of Primary Organisation: ordered records

Answer

A

no advantage for acess based on values of non-ordering fields
Insertion and deletion of records very expensive (Records must remain physically ordered)
E.g. Insert:
- Find correct position in file
- Make space and insert record
- Very expensive operation for large files
Modification of a record
- Record may change position in file > equivalent to deletion + insertion

Question 52

Q

What is a secondary file?

Answer

A

It describes supplemental access structures which are used to speed up the retrieval of records when specific search conditions are met.

Question 53

Q

What are the types of secondary files?

Answer

A

Single Level
Multi-level

Question 54

Q

What are the 3 type of indexes there are in a single level secondary file?

Answer

A

Primary indexes
Clustering indexes
Secondary indexes

Question 55

Q

What are the characteristics of secondary files?

Answer

A

Exist in addition to primary file organisations (similar to indexes in textbooks)
Provide an alternative way of locating record blocks on disk without affecting the physical placement of records on disk
Any field of a record in a file can be used to create an index (Multiple indexes on different fields can be constructed for the file)

Question 56

Q

What do you called fields used to create index

Answer

A

Indexing fields

Question 57

Q

Why do values in an index are ordered?

Answer

A

Allows binary (chop) search to be carried out on the index
Binary search is highly efficient with ordered lists
Once located, index item points to one or more blocks in file where required records are located

Question 58

Q

Explain the Single level indexes

Answer

A

Based on ordered files
Usually defined on a single field of a file called an indexing field
Index typically stores (each value of the field, list of pointers to all disk blocks containing records with that field value)
If both data file and index file are ordered (Index file will usually be much smaller than data file, thus searching index is more efficient)

Question 59

Q

How do primary indexes work

Answer

A

Specified the ordering key of an ordered file (the ordering key has a unique value for each record)
The ordering key field is used to physically order file records on the disk
Index is an ordered file of fixed-length records (each record has two fields, first field is of same datatype

Question 60

Q

How many index entry in the index file for each block in data file?

Answer

A

One index
1. Not one entry for each record in data file
2. index entry contains primary key for the first record in a block along with pointer to that block

Question 61

Q

What is the anchor record of the block or the block anchor

Answer

A

First record in each block in the data file

Question 62

Q

PRIMARY INDEX EXAMPLE

Question 63

Q

What are the advantages of Primary indexes?

Answer

A

Search for location of relevant record blocks very efficient
Index is much smaller than main file with fewer entries

Question 64

Q

What are the disadvantages of Primary indexes?

Answer

A

Same as with ordered files
Insertion and deletion of records with some extra complexities

Answer 64

A

To insert at correct position we need to create space for new record, which will most likely involve moving existing records
Moving records will change anchor records of some blocks
If anchor records change or new blocks created, we will have to change some index entries
a. As index is ordered file, has similar overheads to modification of records in ordered data file

Answer 65

A

Used if the ordering key in the data file is a non-key field
a. Multiple records in the file can have the same value for the ordering field
Index is an ordered file of fixed length records
a. Each record has two fields

Answer 66

A

In the situation where multiple records in the file can have the same value for the ordering field

Answer 67

A

There is one index entry in the index file for each distinct value of the clustering field
a. Contains the value for the clustering field and a pointer to the first block in the data file with that value for it’s clustering field

Answer 68

A

Insertion and deletion is still a problem
a. Data records are physically ordered
b. Can be simplified by reserving the whole block or cluster of contiguous blocks for each value of clustering field
c. All records with that value are placed in block or block cluster reserved for it.

Answer 69

A

A file can have at most one physical ordering field
a. Thus, there can be at most one primary index or one clustering index, but not both
A secondary index can be specified on any non-ordering field of a file
b. A file can have multiple secondary indexes in addition to it’s primary access method (primary or clustering)

Answer 70

A

Such an index is an ordered file whose records are of fixed length
Each record has two fields
a. First field is the same datatype as the non-ordering field which is to be the indexing field
b. Second field is either a block pointer or a record pointer

Answer 71

A

Consider a secondary index based on a secondary key
One index entry for each record data file which contains:
a. Value of secondary key for the record
b. Pointer to either the block in which the record is stored to the record itself
Records not physically by values of secondary key ∴
a. One index entry created for each record in data file
Secondary index is ordered

Answer 72

A

Because search field is a key, there will be one index entry per record
Usually needs more storage and longer search time than primary indexes because of it’s large number of entries
Offers significant improvement in search time for an arbitrary record
a. We would need to do a linear search on data file if secondary index did not exist

Answer 73

A

Solution one:
a. Include several entries with same value for indexing field
i. Same as implementation for secondary index based on secondary key
ii. One entry per record in file

Solution two:
a. Have variable length records for index entries
i. Keep list of pointers in index entry for each value of indexing field
ii. One pointer to each block that contains a record to each block that contains a record with indexing field value
iii. Can have an extra level of indirection to handle multiple pointers and keep index file records fixed-length

Answer 74

A

Indexes can have indexes

Answer 75

A

Help to reduce search space
a. Create indexes on indexes
i. Search a smaller index to find where to
look in a larger index
b. Useful if first level needs > 1 storage block
c. We can have 3rd, 4th, … nth level
Multi-level scheme can be used on any type of index, primary, clustering or secondary
a. First level index must have fixed length entries and distinct values for indexing fields value