Hashtables Flashcards
a bad example of a way to hash phone numbers
using the first three digits
a good example of a way to hash phone numbers
last three deigits
what does the hashCode() method return
an int
what is an equality requirement of hash function
x.equals(y)
then x.hashCode() == y.hashCode()
what is highly desirable when it comes to equality and the hashCode() function
if !x.equals(y)
then
x.hashCode != y.hashCode()
what is the deafault implementation of hashCode
using memory address of the object
if a null object is sent into the hashCode function what should be returned
0
if an array is sent into the hashCode function how is it hashed
each array entry is hashed
the int returned must be between
-2^31 and 2^31 -1
how to convert if the key is not in the range
get abs value
what is the uniform hashing assumption
each key is equally likely to has to an integer between 0 and M-1
after how many insertions do we expect two keys in the same index
sq root (pi*M /2)
how many keys do we need to put in for all the indexes to be used
MlgM insertions
what do we expect the height of the most loaded index to ve
logM / loglogM
two methods for dealing with collisions
linear probing
separate chaining
what is a collision
when two keys hash at the same index
what is the separate chaining approach to dealing with collisions
have a linked list in every index of the array
when keys are inserted in the linked list implementation, where are they inserted, at the head or tail
head
what is the cost of inserting at the head of the linked list
constant
how to search for key in separate chaining approach
go to hash index
search through linked list
worst case search using linked list approach
length of linked list
searches every node in the list
what will the length of the longest linked list be
logM / loglogM
if the number of indexes is too large in comparison to the number of items we want to hash, what is the problem
too many empty chains
if the number of indexes is too small in comparison to the number of items we want to hash, what is the problem
chains are too long
what is the best ratio of items to indexes for best run time
items/indexes = 4
when should the hash array be resized in separate chaining
doubled when n/m >= 8
halved when n/m <= 2
remember to rehash all keys when resizinf
how do delete a key in separate chaining hash table
delete key from linked list
what is linear probing also known as
open addressing
how does linear probing work
when a key collides, find the next empty slot and put it in there
how does search work in linear probing
search where the hash index is, if not there try the next and so on
why is clustering common in linear probing
new keys are likely to be added in clusters as it will continuosly try the next space until a free one is found
what is a requirement to use the linear probing algorithms
must always be one emmpty slot in the table
which is better
many small clusters
few larger clustesr
many small clusters as searching will take less time if a blank space is found sooner
on average how many indexes must you move to find the next empty space when the array is half full
M/2 spaces
on average how many indexes must you move to find the next empty space when the array is full
M spaces
if the number of indexes in the array in linear probing is too large why is this bad
takes up too much space
resizing array parameters for linear probing
double when n/m > 1/2
halve when n.m < 1/4
how to delete keys in linear probing hash table
delete and rehash everything after
why must everything after be rehashed when deleting from a linear probing array
it will break the search function if there is an empty space
what is the worst case scenario of linear probing
all the values are in one big cluster
why is security often talked about with hash tables
values can be chosen in a way that keys will be added to the same index of the array, or same cluster, grinding performance to a halt
why is hashing used
to distribute entries uniformly across an array
by using the key to search for an element, what is the runtime of such a sprt
constant run time
what are the values returned from a hash function called
hash values
hash codes
hashes
what three things are we looking for from a hash function
easy to compute (should not be a huge algorithm in itself)
uniform distribution across hash table
as little collisions as possible
uses of hash tables
database indexing