Indexing Flashcards
true or false: each index is associated with a search key
true
true or false: search keys can consist of a set rather than a single value
true
true or false: multiple records cannot have the same search key value
false
true or false: you can have multiple indexes on a file, each with its own search key
true
an index will have index entries, data entries and data records
what are the differences
the index entries will filter and provide data entries that meet the index’s search key. You will be given a collection of data entries that hold/points to the data records depending on which of the 3 alternatives you have for keeping data entries
Data entries have 3 alternative structures, what are they and describe them?
- <k, data record> (by value)
- no need to follow pointers
- you have the full data stored in the data entry
- at most one index on a given collection of data records can use this method, otherwise data records are duplicated and there will be redundant storage and potential inconsistencies - <k, rid> (by reference)
- data entries are typically smaller than data records, therefore better than alt1 with large records - <k, list of rids> (by list of references)
- more compact than alt2 but leads to variable sized data entries even if search keys are fixed length
indexed files can be sorted or hashed, how does that affect the way k* (data entries) are stored
k* are sorted by k for sorted indexed files and hashed by k for hashed indexed files
what does it mean to be a unique index
Search key contains a candidate key (a key that can uniquely identify a record)
ie. we only return one record with this index
sorted index files: what is a clustered vs unclustered index, define the differences
For sorted index, data entries k* are always sorted by search key k
If data records are also sorted by k it is a clustered index, otherwise, unclustered
index
Note
- A file has at most one clustered index (a table can have only one because the physical order of the data can only be in one way)
- Alternative 1 leads to clustered index
between clustered and unclustered indexes, which will have the lowest I/O
clustered, because data records having same k are likely found on the same page
what is the difference between dense and sparce
dense: there is at least one data entry for every record
sparce: every record does not have its own data entry. One index per page
why does sparce indexed always need to be clustered
because you don’t have a data entry point to everything, you need to be able to find records without pointers, so their position needs to be implied, which can only be done if its sorted
true or false: alternative 1 leads to a dense index
true
can a file have more than one sparce index, why or why not
no, because you need to be sorted for a sparce indexes to work, but you can’t be sorted in 2 ways
what is a composite search key
a search key that contains more than one field
for a sorted indexed file why does binary search not work when doing a range search
because we don’t have sequential storage
how do you do a range search with a sorted indexed file, what is the point of the values and pointers
create an index file to search indexes,there will be values and pointers, depending on if you are greater or less than the value you wil follow a specifc pointer.
in a ISAM tree index file, what is an overflow page
if all leafs are full you will add new info in the overflow pages
these pages will not be sorted