DBMS - External Hashing Flashcards

1
Q

Basic/Old Idea of Hashing

A

Decide on hash key, size of hash file, hash function. Hash the source data and place it in the file of RRN location indexed by the hash function.

To search for a record, hash its key and directly access record in hash file.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Hash address collision

A

hash addresses can collide

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Hash key space

A

shouldnt be much larger than hash file address space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Hashing allows us to retrieve

A

a record on the basis of key value throughout use of a hashing function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Hashing Uses and Issues

A
  • Routine converts key value to int that corresponds to an address location in a hash file where record’s key values and pointers are kept

-Locations generated appear to be random

  • With hashing, no of keys could be mapped into same address/location (collision)
  • Another issue is exhaustion of pre-booked space
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

External hashing

A

Deals with hashing as a file indexing structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Internal Hashing

A

Deals with main memory structures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

If keyi > keyj

A

It is not necessary that h(keyi)>(keyj) unless h() is monotonic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Perfect hash function

A

Has no duplicate hashes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Minimal perfect hash function

A

has as many keys as hashes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Hash Function Examples

A
  • Represent key in numerical form
  • Prime Division
  • Truncation
  • Folding
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Truncation

A

If key is 123456789 then truncation (of last 3 digits) will return an address of 789

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Folding

A

Works by splitting the key into two and operating on these, so 1234 and 456 will add to an address of 1690

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Prime Division

A

Say n is our file address size, divide key by a prime just greater than n (skewed dist)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Represent Key in numerical form

A

convert alphanumerics to numbers and apply function to a key to produce address

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Hashing Collision Management

A

Caused because actual hash key distribution is not uniform. Size of hash index address space range i.e size of file.

17
Q

Conflict Resolution

A

Overflow Area and Placement Management. If cells in a home address are taken then remaining synonyms are spanned over overflow area.

We can also use double hashing or a linked list to use multiple keys in a hash.

18
Q

Home address

A

Has fixed size. Promising solution is to address extendable address space and hashing function

19
Q

Operations over an external hashed file structure

A
  • Create hash file (incl. overflow area) with initial no. of buckets, loading ratios etc,
  • Insert records in bucket
  • Compute address (key value) in memory
  • Get from address into bucket
  • Put an address into bucket

All these ops are very fast, unless overflow occurs, they are direct. 70-90% of loading can still operate with low level of collisions.

20
Q

Hash structure characteristics

A
  • Space: Pre-booking of space seems counter productive. Optimal system performance at 66% capacity.
  • Operation’s Time: Searching, inserts. Sequential traversal not implicitly supported.
  • Parameters: Bucket space, hash function distribution, hash key dist. in indexed dataset.
21
Q

Advantages of Hashing

A

Very Fast

22
Q

Disadvantages of Hashing

A

Hash files cannot be read out in record-sequence since they are stored in address-sequence.
Hash files grow substantially in size without redesign of hashing routine or address size sequence increase

23
Q

Improvements

A
  • Extensible hashing with better overflow management
  • Multidimensional hashing
24
Q

Making Hashing Disk Friendly

A
  • Address space is to represent disk addressing, use RRN
  • Space at address is called the bucket, can be a single disk block or a bunch of blocks
  • Hash address collisions possible even with large buckets and low loading
  • If h() is monotonic, we can sequential scan
  • With static addressing we cant have a scalable index, but with extendible addressing we can.
25
Q

RRN

A

Relative Record Number

26
Q

Making Hashing Dynamic

A
  • h() that grows with key address space
  • hash file that grows and shrinks according to demand.
27
Q

Solution for hashing

A

Extendible Hashing

28
Q

Extendible Hashing, Basic Technique

A

Introduce search structure that is
1) additional to hash file
2) search structure entries are all based on (or part) of key entry’s hash

29
Q

How to define extendible hashing function

A
  • Select a hashing function ex. prime, folding
  • Start with a bucket b0
  • Eventually b0 overflows
  • We may need more space
30
Q

What if we need more space in extensible hashing

A

Bucket that overflows must be split in 2 by increasing local bucket depth by 1. Directory must be adjusted, first double its entries, then adjust global depth.

31
Q

What to look for in hashing function

A

Ensure it can return a huge range. Compute hashing on key value and return a left string of the hash address generated.

32
Q

What happens when the bucket eventually overflows in extendible hashing

A

Create new bucket b1 and share index entries with b0. Create a directory structure with two bit pattern entries 0 and 1, with each directory entry having a pointer to the respective bucket.

Also mark depth of directory as the max of all local depths. This is called the global depth.

33
Q

Hash indexes can only handle

A

Simple equality comparisons.

34
Q

Query Planner

A

consider using a hash index whenever an indexed column is involved in a comparison using the = operator

35
Q

Command to create a hash index in postgres

A

CREATE INDEX name ON table USING hash(column);

36
Q

Cases to use external hashing

A

group by queries, joins, distinct keywords, indexing on attributes.

37
Q

Where is internal hashing used (ie. DBMS)

A

Intermediate IO ops, General use, Function name look up, Expressions look-up