Difficulty assessment:

(*) Basic. You should be able to easily do this.

(**) Standard. Core material to test your understanding.

(***) Advanced. Possibly slightly above the level of the assignments.

**Exercise 1 ( ** ):**

(a) You have a universe of 30 numbers $\{0, 1, \ldots, 29\}$ and a hash-table of size $10$. Which hash-function is better, $h_1(k) = k \mod 10$, or $h_2(k) = \lfloor k/10\rfloor$?

**Solution:** The first hash-function selects the last digit of a number as a hash value, and
the second hash function the first. The first hash function is better because its image covers the whole hash-table, whereas the image of the second hash function, $\{0, 1, 2\}$, is only a subset of
the hash-table. Moreover, the first hash-function distributes the hash-values evenly over
the hash table, thus if the keys drawn from the universe are random, there will be few
collisions.

**Solution:** The number of functions that map from a universe of size $n$ to the integers
$1\ldots m$ is $m^n$, as each of the $n$ elements can map to one of $m$ integers.

**Exercise 2 ( ** ):**

(a) Draw the hash table of length $m = 16$ resulting from hashing the keys $8, 12, 40, 13, 88, 45, 29, 20, 23$, and $77$, using the hash function $h(k) = (3k + 1) \mod 16$ and assuming collisions are handled by chaining. Is $m = 16$ a good choice for the size of a table? Why or why not?

**Solution:** The following image illustrates the keys inserted into a hash-table of size 16:

For the given hash function, $m = 16$ is a bad choice for the size of a hash table because it is a power of $2$. When we are using the division method to create a hash function, if the size of the table $m = 2^p$, then the hash values will only depend on the lower $p$ bits. This may lead to more collisions.

**Solution:** The following image illustrates the keys inserted into a hash-table of size $13$:

$m = 13$ is a better choice for a size of a hash table when using the division method to create a hash function because it is prime and it is far enough from a power of two. A hash function created by using the division method will not only depend on the lower bits of the keys.

**Solution:**

For keys $\{8, 12, 40, 13, 88, 45, 29, 20\}$ there were $0$ unsuccessful probes. For keys $23$ and $77$ there were $2$ unsuccessful probes. The total number of unsuccessful probes is $4$.

**Solution:**

For keys $\{8, 12, 40, 13, 88, 45, 29, 20\}$ there were $0$ unsuccessful probes. For key $23$ there was $1$ unsuccessful probe, and for key $77$ there were $2$ unsuccessful probes. The total number of unsuccessful probes is 3.

**Exercise 3:**
Describe algorithms for Hash-Chain-Insert, Hash-Chain-Delete, and Hash-Chain-Search for a hash table with chaining, where the (doubly-linked) lists are stored in a sorted order. Analyze the running time of the algorithms.

**Solution:**

(i) Hash-Chain-Insert: instead of inserting $x$ at the head of $T[h(x:key)]$ we need to find the proper position of $x$ in the list. So, we compare the elements of $T[h(x:key)]$ one by one with $x$ until we reach the end of the list or until we find an element $y$ such that $y:key > x:key$. Then, we insert $x$ in front of $y$, or at the end of the list if no such $y$ was found.

*Running time analysis:* The worst-case running time of Hash-Chain-Insert is
$\Theta(n)$ (if all the elements hash into the same slot, and $x$ is the maximum element,
then the whole list needs to be traversed until we can insert $x$. On average, if the
load factor of the hash table is $\alpha$, the running time of Hash-Chain-Insert will be
$\Theta(1 + \alpha)$.

(ii) Hash-Chain-Search: search for an element with key $k$ in the list $T[h(k)]$. We can use the same algorithm as for the non-sorted lists. We can slightly improve the algorithm by stopping the search when we find a $y$ with $y:key > x:key$. The running time for both variants is $\Theta(1 + \alpha)$.

(iii) Hash-Chain-Delete: delete element $x$ from the list $T[h(x:key)]$. The algorithm is the same as for the non-sorted lists: when deleting $x$ from a sorted list, the sorted order of the remaining elements is preserved. The running time is $O(1)$ if the lists are doubly linked.

**Exercise 4:** Consider two sets of integers, $S = \{s_1, s_2, \ldots, s_m\}$ and $T = \{t_1, t_2, \ldots, t_n\}$, $m \leq n$. Describe an algorithm that uses a hash table of size $m$ to test whether $S$ is a subset of $T$, that runs in $O(n)$ time. You may assume the existence of a suitable hash function that satisifies the assumption of simple uniform hashing.

**Solution:**

Let $H$ be the hash table of size $m$ and $h()$ be a corresponding uniform hash function. The algorithm is the following: First, insert all the elements of $T$ into $H$. Next, for each element $s_i$ in $S$ search for $s_i$ in $H$. If for some $s_i$ the search was unsuccessful (i.e. returns None), then $s_i \not \in T$, and thus $S \not \subset T$. Otherwise, report that $S \subset T$.

*Note: You could, but don't need to, add Python code here to clarify the algorithm.* Python code for this algorithm is given below.

*Correctness:*
If $S \subset T$, then for all elements of $S$ the search will return the corresponding
element of $T$ from the hash table. Therefore all searches are successful, and the algorithm will correctly report that $S \subset T$.

If $S \not \subset T$, then there will be an element $s_i$ from $S$ such that $s_i \not \in T$. Then, searching for $s_i$ in the hash table will return None, and the algorithm correctly reports that $S \not \subset T$.

*Running Time:*
Inserting $n$ elements into the a hashtable takes (with chaining) takes $O(1)\cdot n = O(n)$ time. Searching takes $O(1+\alpha)$ time per element, where $\alpha$ is the load factor, and $\alpha = n/m$. Thus, searching for $m$ elements takes $m O(1+n/m) = O(m + n)$ time. Overall the algorithm runs in $O(m + n)$ time.

In [1]:

```
# Solution code:
# A simplistic hash table for the purpose of this exercise. It just uses keys (instead of elements with keys)
class HashTableChaining:
hashTable = []
size = 0
def __init__(self, size):
self.size = size
self.hashTable = [[]]*size
def h(self,key):
# division method. Note that depending on the table size, this might be not a good hash function, but it serves as an example here.
return key%self.size
def insert(self, key):
self.hashTable[self.h(key)].append(key)
def search(self,key):
# Since there are no elements, this search routine will simply return the key if it finds it, and None otherwise
if key in self.hashTable[self.h(key)]: return key
else: return None
def isSubset (S,T):
m = len(S)
n = len(T)
H = HashTableChaining(m)
for t in T: H.insert(t)
for s in S:
if not H.search(s): return False
return True
isSubset([1,2,3],[1,4,2,3])
```

Out[1]:

In [ ]:

```
```