Week 7 Flashcards

Question

Example on Hashing functions

Answer 1

* One common hash function is: h(K) = K mod M * Returns remainder of integer of hash field value K after division by M * Remember M is no of slots in hash table so operation ensures h(K) returns result between 0 and M-1 * Value, h(K), is used as the address for the record * Requires an integer value to calculate * All binary data can be represented as integer, e.g. character strings can use ascii codes of characters

Answer 2

Folding: * Apply * arithmetic functions such as addition or * Logical functions such as exclusive OR * To different portions of hash field values to calculate the hash address * Or simply pick some digits of hash field value e.g. 2nd, 4th and 6th digit to form a hash address * The hash address should be computationally cheap to calculate

Answer 3

* Occur when multiple value of hashing field result in same output from the hashing function * More likely to occur when number of possible slots M is small compared to the possible number of hash field values * Ideal hashing function will minimise collisions whilst still being cheap to calculate * Strategy needed to handle (resolve) conflicts

Answer 4

* Open addressing * Proceed from already occupied position specified by hash * Check subsequent positions in order * Until an unused position is found * Logically simple to implement * But * If no of collisions is large, hash table order begins to degrade * Multiple hashing * Apply a second hash function if first results in collision * If another collision occurs, use open addressing or apply third hash then use open addressing

Answer 5

* Chaining * In addition to our main locations 0 … M-1 * Extend array with a number of overflow positions * Additionally, pointer field added to each record location * Collision is resolved by placing new record in unused overflow location and setting occupied hash address pointer to address of overflow location * Thus a linked list of overflow locations is maintained

Answer 6

* Gives us a very rapid lookup mechanism * Where we are using equality for search * Doesn’t work for SELECT * WHERE X > Value

Answer 7

* This is hashing for disk files * Target address space is made of ‘buckets’ * Bucket is one disk block or a cluster of contiguous disk blocks * Each bucket holds multiple records * Hashing function maps a key into a relative bucket * It does not assign an absolute block address * A bucket table maintained in the file header converts the bucket number into corresponding disk block address

Answer 8

* Less severe due to use of buckets * As many records as will fit in the bucket can all hash to the same bucket * It is a possibility that a bucket will fill to capacity and a new record will hash to a full bucket Collision resolution - Use a variation of chaining

Answer 9

* Remove record from bucket * If bucket has an overflow chain, can move one of overflow records into main bucket * If record to be deleted already in overflow bucket, simply remove it from linked list * Requires keeping track of empty overflow locations

Answer 10

* Searching for record using non-hash field is as expensive as for an unordered file * Modifying hash field * May require moving record to different bucket * This requires deletion of old record plus the creation of modified record

Answer 11

* External hash scheme described so far is static * Fixed number of buckets M allocated * M = No of buckets for address space * n = Maximum number of records that can fit in one bucket * Maximum records possible in allocated space * M * n * Assuming records allocate equally across all buckets

Answer 12

Left with a lot of unused space

Answer 13

There will be a large number of collision (link linked lists for overflow)

Answer 14

It ties us to fixed and static size of M buckets 1. Doesn't cope well with dynamic files a. Not a problem if we know in advance, with some confidence how many records we will need to store b. Many (most?) real world scenarios we do not know in advance 2. We may need to change M a. Implies we need a new hashing function b. Implies record redistribution c. THIS IS EXPENSIVE!

Answer 15

* Easier to expand or shrink a file (compared with static hashing) * Takes advantage of the fact: * The result of applying a hash function is always a non-negative integer * Can be represented as a binary number

Answer 16

* Directory consisting of array of 2d bucket addresses is maintained * Does not require a distinct bucket for each of the 2d directory entries * Several directory entries with the same first d’ bits may contain the same bucket address * Records that hash to these location fit in a single bucket

Answer 17

* d is called the global depth of the directory * Integer value corresponding to the first (high-order) d bits of the hash value is used as index into the array to determine the directory entry * Address in the entry determines in which bucket corresponding records are stored

Answer 18

- Called the local depth - Stored with each bucket - Number of bits on which bucket contents are based

Answer 19

* Value of d can be increased or decreased by 1 at a time * In this event, number of entries in directory array is doubled or halved * Doubling needed if bucket with d’ = d overflows * Halving is needed if (possibly after some deletions) d > d’ for all buckets

Answer 20

1. Performance does not degrade as file grows a. In static hashing, collisions increase and overflow chains become large 2. No space allocated for future growth a. It's all handled dynamically for additional buckets 3. Space overhead for directory is negligible a. Maximum directory size is 2b - where b is number of bits in hash value 4. Splitting causes minor reorganisation in most cases a. Only records in one bucket are redistributed to two new buckets

Answer 21

1. Reorg expensive when directory needs to be doubled or halved 2. two block accesses required a. Directory must be searched before accessing buckets Overall performance penalty is minor (desirable choice for dynamic files)

Answer 22

WEEK 7 LECTURE 2 SLIDE 42-62

Week 7 Flashcards

(47 cards)