Organisation and structure of data Flashcards
paper 2
fixed length record 5 facts
- easier to program and search as they have same allocation of memory and space
- easier to process as we know start and end locations
- binary search can be used to locate records
- wastes space if records are too small
- truncates fields if they are too large
variable length record 5 facts
- diff number of bytes for each record
- slow searching as you need to identify the marker at the end of each record
- if file is updated it all needs to be rebuilt
- can only use linear search
- better if records are different sizes as doesn’t waste space
master file 5 facts
- larger file that is sorted into an order
- data is updated from the transaction file
- data searched regularly and used for info by staff
- data not always completely up to date
- permanent data
transaction file 3 facts
- serial files, stored in the order the data is submitted
- used as temporary files before data is added to master file usually for short amount of time
- serially searched (slow)
updating the master file process
usually occurs overnight
1. transaction file sorted to the same order as the master file
2. data copied from the old master to the new master until the point where the data from transaction file is needed
3. data copied from transaction file to new master file
4. this is repeated until all data from transaction file is in correct location
5. error log is generated at the end of process
serial files
records stored in chronological order (order they were added
must be linearly searched (slow)
adding records is fast as it is just appended to the end of last record
sequential files
stores data in order of primary key
fast to locate specific files - can use binary search
when new records added a new file is made and records are copied across with new record added in when appropriate location reached
same for when record is deleted (deleted record just not copied across)
indexed sequential files
records split into 2 components: the index (primary key and pointer) and the bulk of the record
primary keys stored in order and each linked to a pointer which identifies where on the disk the rest of the record can be found
keys are added and deleted in the same manner as a sequential file but is faster as only the index needs to be copied as the pointer remain the same.
speeds up searching time as less of the file needs to be searched
multilevel index
same as index but top level index doesn’t contain pointers to the records but to second level indexes which contains pointers to the records (or to a 3rd layer etc)
useful when diff clusters of records are stored in diff physical locations.
direct file access
has records that are stored and retrieved according to either their disk address or their relative position within the file
means that the program which stored and retrieves the records must specify the address within the file - must use hashing algorithm with the key to generate the start position of the block, the block can then be serially searched to find the tile
types of hashing function
deterministic
uniformity
data normalisation
continuity
non-invertible
deterministic hash function
when given a key it should always produce the same result
uniformity hash function
keys should be spread evenly over the available block range with same probability to reduce num of records in same block
data normalisation hash function
keys should be normalised before hashed (eg to make all characters lower case)
continuity hash function
keys that differ by a small amount should result in hash values that also only differ by small amount