Hash Table Flashcards
what are the properties of a hash function?
it’s typically one-way and u lose some data when u hash something , it’s repeatable , and deterministic which means that every time i put the same thing in the hash function i expect the outcome to be the same each time
what are hashing rules?
1 - hashing must be deterministic under the same context
2- Two objects that are equal should return the same hash
3- the same hash might be produced by two different objects
what could go wrong in while using hashing?
hash collision
what are the advantages of having a hash table?
it’s extremely fast in retrieving data also in inserting and deleting
what is the name of an entry in a hash table?
bucket
do u use whatever number the hash function produces to be the index?
No, u run it through another equation might be like % the size of the array to make it fit in the array allocated
how to manage collisions?
1- separate chaining:
where each bucket in the collision is actually another array or a linked list of values
Hashing custom class
the default equality behavior of two classes that u check the address pointing to these objects if they are the same but if u want to override the equality method to check for the internal state then u must redefine the hashing mechanism for that object as to hash two of the same objects must return the same hash value
what is the difference between hash table and hash map in java?
hash table is used in multi-threading but with performance cost while hash-map is used in a single threaded application
what is a common problem to solve with hash tables?
1 -phone contact problem where u want to map each name to a phone number and each phone number to a name
2- finding two words that are the same in a document
3- spelling correction to search in a dictionary
4- compilers and interpreters
5- substring search
6- string commanalities
7- file & directory syncronization
9- cryptography
how does hash tables get linked to dropbox or google drive?
as they are distributed storage systems they use hash tables if 3 users upload the same video they store it as one physical entity and just have the 3 users point to that
how does distributed storage system work with the naive way?
u first upload the files then go through each file in the system and compare it bye by byte to see if it matches any other files
the draw backs of this of course that u already uploaded the file and it will be O(N^2)
how does distributed storage system work with the rabkin-karp’s algorithm?
if there is a hash already in the system then the files are the same upload and compare but if the files are different then it doesn’t exist then just upload
and the draw back of this that u will have to upload to compare it to the files already in the system and collisions
how does distributed storage system work with several hashes?
each file before upload is hashed like 5 times then u will actually be sure that the file already exists in the system or not and we only now search with the hash not with the file
what is the solution to instant upload and storage optimization?
hash table with multiple hash functions
what is still the problem with online storage even though we solved it with a hash table?
that we r using only one hash table as millions of user request on the same hash table
what is the problem with hash tables and big data?
very large amount of memory is needed
what are the main operations of a hash tables?
insert , delete , search
what are the obstacles in hash tables?
1 - they keys are not integers
2 - gigantic memory hog
what is the solution to the keys that are not integers?
prehashing
what is prehashing?
mapping keys to non negative integers