Hashing with Chaining
https://www.geeksforgeeks.org/separate-chaining-collision-handling-technique-in-hashing/
Simplest way to resolve collisions: Each slot stores a bucket containing 0 or more KVPs.
-
A bucket could be implemented by any dictionary realization(even another hash table)
-
The simplest approach is to use unsorted Linked Lists for buckets. This is called collision resolution by chaining
-
: Look for key in the list at apply MTF
-
: Add (k, v) to the front of the list at
-
: Perform a search, then delete from the linked list
\alpha
Hashing with Chaining does not care about the that is the ratio of , since each slot contains a list and we just add in the front of the list in each of the slot if it is already occupied. Don’t need to rehash so there is no insertion failure.
Example in lecture notes CS240 - Data Structure and Data Management.
Run-times:
- insert takes time , expected cost for hashing with chaining for lookup, insert and delete (according to piazza).
- search and delete have run-time (1+ size of bucket ) → what is this? The worst-case runtime? (Waiting on piazza response.)
- is the average-case runtime. Since the average bucket size is .
The average bucket-size is ( is called the load factor).
In separate chaining, search and delete have average-case cost , where is the number of keys and is the number of buckets. Which is . Only under the uniform hashing assumption!!!
Note
Does not mean the average-case cost of search and delete is !
(If all keys hash to the same slot, the bucket size is still , but operations will take )
Does that mean under the uniform hashing assumption that the expected running time of search and delete is ???? Yes, because is in the order of a constant (since ) so you’ll never have search or delete be faster than constant runtime. It is plus 1 because is always less than or equal to ; further, if , then . But the average case of search is obviously not , so we add to ensure that we’re correctly representing the idea of an average-case constant runtime.
, and there are such keys to have a possibility to collide with that slot.
Why is rehashing costs ????? Is it because to initialize hash table is and copying every item over to the new hash table is , since all operations now have expected runtime of or ? and when we combine together we get ??
Summary: If we maintain , then (under the uniform hashing assumption) the expected cost for hashing with chaining is , for lookup, insert and delete and the space is .
If then is not .
Questions
What is the worst-case number of comparisons we make when we search for the same key  twice in a row in a hash table that uses chaining with a linked list and the MTF Heuristic?
- My reasoning is that in the worst case, we find the slot in the table in time, and we have to travel and compare the whole linked list since it’s at the end. That would cost . For the first search, then MTF, we search again, it’s already at the front of the list. So .
Do we require a load factor ?
- No, not in separate chaining. Since, the collision technique uses linked lists to handle multiple elements that hash to the same index. As a result, the hash table can handle a load factor greater than 1, that is the ration of the number of elements stored in to the size of the table.