Lecture 17-18 Flashcards
What is buffering?
Data in secondary or tertiary storage needs to be transferred to main memory, before being rpocessed by the cpu. However, transport time takes longer than memory access, hence, multiple bufferes in main memory exist, while one is being read or written, the CPU can process data in other buffers.
What are the characteristics of discs?
Physically they are made up of tracks, cylinders, sectors, and blocks. A hardware address, surface number, track number, block or sector number.
How are file records formatted on the disk?
Records are a collection of related data values or items, usually this is a tuple or row.
Record types are a collectino of field names and their data types.
A file contains a sequence of records, typically all will be the same record type.
In fixed length records every record is exactly the same size, in variable length records different records have different sizes.
How do variable length records work?
A seperator character that isn’t used in any field seperates the fields. A special character can also be used to seperate the field name from the field value.
What are spanned and unspanned records?
Records must be allocated to disk blocks, a spanned record can span more than one block, an unspanned record cannot.
What is a block? What are the important things to consider for fixed length and variable length records?
The unit of data transfer between disk and memory. Blocking factor is the number of records stored in a block. In fixed length records this is the block size/the record size(rounded down), the unused space is the block size - (blocking factor*records in a block).
For variable length records the blocking factor is an average, and the number of blocks required for r records is r/blocking factor, rounded up.
What are the methods for block allocation?
As needed: data scattered all over disk.
Contiguous: file blocks allocated to consecutive disk blocks, fast read, but expansion is problematic.
Linked: each block contains pointer to next block, slower read, easy expansion.
Clusters: clusters of consecutive blocks and clusters linked, typically done to segments of disk or extents of records.
Indexed: index blocks have pointers to actual data blocks.
How are files or unorganised records dealt with?
They use heap files, records are placed in the order they are inserted, new records at the end of the file. This makes insertion simple, but searching is linear. Deletion also wastes storage space, as it just removes the record from the block, without reshuffling block. Sorting could be done, but this can be expensive for a large disk file.
How are files of ordered records dealt with?
They have an ordering field, the key field. This makes reading in order, finding the next record often doesn’t need disk access as it is in same block, search is binary, and a search involving > or < is efficient. However, insertion and deleting records is expensive, modifying ordered field requires delete and insert and no help for searches on non ordering field.
How are files of hashed records dealt with?
Hash files make access very fast for hashed field, needs a hash function, typically K mod M, where M is typically a prime and K is the value. insertion and deletion is relatively simple, equality search is also efficient.
Needs collission resolution, this could be open addressing (subsequent positions), chaining(place in an unused overflow location, add pointer to this location in occupied location, or multiple hashing(second hash function.
How does hashing work for disk files?
Disk space divided into buckets, hashing maps to relative bucket number, file header maintains table that maps bucket number into disk block address. In static hashing a fixed number of buckets is allocated, this is a drawback for dynamic files.
What are indexes?
Additional files on the disk, which allow access of records without affecting physical placement of records on disk, they consist of an indexing field and a list of pointers to disk blocks.
Every record has a unique index key field, used to physically order the index records on disk. Binary search on the index is much faster than on data file as the index file is is much smaller.
What are single-level ordered indexes?
A file can have at most one physical ordering field, types of this are: Primary index(ordered by key field of ordered file, each will be unique) Clustering index( ordering field not key, as such more than one record may have the same value), a file can have a primary index or clustering but not both. A secondary index is specified on any non-ordering field, several of these can exist.
What are some important things about primary indexes?
Ordered file with records of fixed length with two fields: Data from the ordering key field, and a pointer to the data block. The number of index entries = the number of disk blocks used, with only the block anchor being referenced.
A dense index has an index entry for every record, a sparse index has index entries for only some, a primary index in nondense(sparse).
Moving records may change anchor records, meaning reorganisation can be required.
What is an anchor record/block anchor?
The first record of a block.