File Organisation Flashcards
what are heap, ordered and hash files
Heap:records are on disk in no order.
Ordered:records are ordered by value of specific field.
Hash: records are placed in hash function
Data stored on disk-files of records- collection of data values as facts. Records are stored to locate them efficiently.
What is primary, secondary and tertiary storage ?
Primary:Fastest with cache and main. Data accessed by cpu. Limited capacity and power supply
Secondary:Media is on magnetic disks. Cost less and greater capacity. Slower access. Data loaded to prim before operation. No power supply.
Tertiary:used for backup, tapes DVD.
What is access method and how is it applied ?
Storing and retrieving. Each tuple-map to record in OS. Each field in record value- one attribute value.
User-requests tuple- DBMS maps logical onto physical and retrieves and buffers from 1 storage.
What are physical records
Unit of Transfer between disk and primary storage. Has more than one tuple, single logical record which corresponds to one/more physical ones. Called BLOCKS/ PAGES
How are heap files inserted and what is it’s advantage ?
Simplest and most basic type. Records are stored in the ORDER in which they are INSERTED. New records- last page of file. No space then new page is added. INSERTION IS VERY EFFICIENT- O(1) COMPLEXITY. Advantage is that they are very efficient at BULK LOADING data into table. No overhead of calculating what page the record should go on.
How are heap files searched for ?
SEARCHING IS INEFFICIENT. Since no order to find a value, a LINEAR search must be performed. A Linear search- reading pages from file until the required one is found.
How are heap files deleted ?
Physical deletion leaves unused space in block. A large amount of space is wasted if lot of deletion. File size & disk space will increase and performance progressively deteriorate
What are ordered files ?
Records in a file- physically ordered based on value of one or more of fields. The field that file is sorted on is called ORDERING FIELD.
How are ordered files searched for ?
We use the ORDER BY statement to search. There is also binary search which has following steps:
Retrieve the mid-page of the file.
Check if the required record is between the first and last record of this page.
If yes, the record is found on this page, and no more pages need to be retrieved.
If no, proceed to the next steps.
If the value of the key field in the first record on the page is greater than the required value:
Repeat the process using the top half of the file as the new search area.
If the value of the key field in the last record on the page is less than the required value:
Repeat the process using the bottom half of the file as the new search area.
With each iteration, half of the search space is eliminated, optimizing the search for the required record.
More efficient than linear o(log(n))
How are ordered files inserted and deleted ?
it is a problem because order has to be maintained.
Insert :-find the correct position in file and find space to insert record. If space is there then page is REORDERED if not then MOVE records to NEXT PAGE which causes CASCADING EFFECT.
A solution is-OVERFLOW/TRANSACTION FILE. Insertions added periodically and merged.
Deletion- reorganise the records to remove the free slot.
What are hash files ?
A hash function is used to calculate address of page in which record is stored based on one/more fields-O(1) COMPLEXITY.
Base field- HASH FIELD
if hash field is a key field-HASH KEY
Records will appear randomly distributed across space- RANDOM FILES.
What are the hash files examples and explain them
FOLDING-applies an arithmetic function to diff parts of has field. Strings can convert to numeric value. Result of arithmetic determines ADDRESS OF DISK PAGE.
DIVISION-REMAINDER-More popular and uses the MODULO function where it / pre determined integer and takes REMAINDER AS DISK ADDRESS.
What are the problems that occur with hash files ?
Do not guarantee a unique address because of values a hash field can have.
Each address corresponds to a page(Bucket) with slots for multiple records. Within bucket -record are placed in order of arrival.
When same address is generated for two or more a COLLISION is occured and records are called SYNONYMNS.
Collision management degrades overall performance.
4 techniques used for collision management
- OPEN ADDRESSING(linear probing)- if required position is occupied then program checks subsequent positions in turn until an available space is located.
2.UNCHAINED OVERFLOW- area is maintained for collisions. Record that collides with another will be placed here.
3.CHAINED OVERFLOW-each bucket has an additional field(synonym pointer) that indicated if collision has occurred and if yes then points to overflow page.
linked list for each hash address of overflow records is maintained.
4.MULTIPLE HASHING-when a second hash function is applied.
THESE ARE ALL STATIC
What is dynamic hashing ?
When the space is full, it is said to be saturated. This is when the file size changes dynamically to accommodate shrinking and growth of DB