In this notes we will have a closer look at the algorithms from the lecture. We started off with *factorial*.

can be defined recursively by

$$n! = n \cdot (n-1)!\;.$$It seems easy to directly translate this into code:

In [25]:

```
def factorial(n):
return n* factorial(n-1)
```

It should come at no surprise that this attempt fails miserably:

In [26]:

```
factorial(5)
```

$$n! = \begin{cases} 1 &\mbox{if } n = 1 \\
n \cdot (n-1)! & \mbox{if } n >1\end{cases}$$

So the corresponding code reads

In [27]:

```
def factorial(n):
if (n==1): return 1
else: return n* factorial(n-1)
```

In [28]:

```
factorial(5)
```

Out[28]:

*reduce* the problem (of computing $n!$) to a simpler problem (namely computing $(n-1)!$), or more specifically, we need to make progress towards the base case. Now computing factorials might not yet seem impressive, since factorial is simple to compute with a loop. A non-recursive solution to the *Towers of Hanoi* is not so obvious. But the recursive solution is.

In [29]:

```
def Hanoi(src, tmp, dst, n): # move n disks from src (source) to dst (destination) using tmp as temporary peg
if (n>0):
Hanoi(src, dst, tmp, n-1)
moveDisk(src, dst)
Hanoi(tmp, src, dst, n-1)
```

*pop*, and place a disk on a peg by calling *append*.

In [30]:

```
def moveDisk(src, dst):
dst.append(src.pop())
```

In [31]:

```
A = [4,3,2,1]
B = []
C = []
Hanoi(A, B, C, 4)
print("A =", A, " B =", B, ", C =", C)
```

In [32]:

```
def binary_search(A, v, x, y):
if (x<y):
h = (x+y)//2
if (A[h]<v): return binary_search(A, v, h+1, y)
else: return binary_search(A, v, x, h)
else:
if (A[x] == v): return x
else: return -1
```

In [33]:

```
binary_search([1,4,5], 4, 0, 3)
```

Out[33]:

In [34]:

```
def binary_search(A, v, x=0, y=None):
if (y == None): y = len(A)-1
if (x<y):
h = (x+y)//2
if (A[h]<v): return binary_search(A, v, h+1, y)
else: return binary_search(A, v, x, h)
else:
if (A[x] == v): return x
else: return -1
```

In [35]:

```
binary_search([1,4,5], 4)
```

Out[35]:

In [36]:

```
def sum_slice(A):
if (len(A)==1): return A[0]
else:
return A[0] + sum_slice(A[1:])
```

In [37]:

```
def sum1(A, first = 0):
if (first==len(A)-1): return A[first]
else:
return A[first] + sum1(A, first+1)
```

*first* where it starts. Let us check that these algorithms do the same, and then time then.

In [38]:

```
sum_slice(range(6)) # 1+2+3+4+5= 5*6/2 = 15
```

Out[38]:

In [39]:

```
sum1(range(6))
```

Out[39]:

In [40]:

```
A = list(range(900))
%timeit sum_slice(A)
```

In [41]:

```
%timeit sum1(A)
```

In [42]:

```
def sum2(A, first=0, last=None):
if (last == None): last = len(A)-1
if (first == last): return A[first]
else:
mid = (first + last)//2
return sum2(A, first, mid) + sum2(A, mid+1, last)
```

In [43]:

```
sum2(range(6))
```

Out[43]:

In [44]:

```
%timeit sum2(A)
```

To prove the correctness of a recursive algorithm we use mathematical induction. In a mathematical induction we want to prove a statement $P(n)$ for all natural numbers $n$ (possibly starting at an $n_0$, but lets assume for simplicity we are proving the statement for all $n \ge 1$).

In an induction we will typically prove this by

1.) proving $P(n)$ for a base case (sometimes several base cases), i.e., to prove that P(1) holds, and then

2.) proving that if $P(n)$ holds for some $n$ (This is the induction hypothesis) that $P()$ then holds for the next natural number, i.e., that $P(n+1)$ holds.

For a recursive algorithm the statement $P(n)$ commonly is something like: The algorithm provides correct output for all inputs of size $n$. But we might need to be more careful. Lets briefly consider the algorithm *sum_splice*. In this case indeed the size of the input decreases with every recursive call. But for *sum1* the situation is already more intricate. It is not the actual input size that decreases, but the size of the subarray *A[first..len(A)-1]*. Thus for *sum1* we can take the size of this array as induction variable $n = len(A)-first$. Now a correctness proof for *sum1* could go as follows.

The algorithm *sum1* computes the sum of numbers in the array *A[first..len(A)-1]*.
We prove the correctness of the algorithm *sum1* by induction on $n = len(A)-first$.

Base case ($n=1$): If $n=1$ then $first == len(A)-1$. Thus the array contains one element, $A[first]$, and the sum of element(s) is also simply $A[first]$. This is exactly what the algorithm returns in this case (in line 2). Therefore the algorithm works correctly for $n=1$.

Induction hypothesis ($n$): The algorithm correctly computes the sum of numbers for $n = len(A)-first$.

Induction step ($n \rightarrow n+1$): Assume the induction hypothesis holds for $n$. We need to prove that it then also holds for $n+1$, that is, the algorithm works correctly if $len(A)-first = n+1$. Suppose we have an input (A, first) with $len(A)-first = n+1>1$. The algorithm returns A[first] + sum1(A, first+1). The recursive call is on a range of length $len(A)-(first+1) = (len(A)-first) -1 = (n+1) -1 =n$. So by the induction hypothesis $sum1(A, first+1)$ correctly computes the sum of the numbers in the array $A[first+1..len(A)-1]$. Thus the algorithm returns this sum plus A[first], which is the sum of the numbers in $A[first..len(A)-1]$. Thus, the induction hypothesis again holds (for $n+1$).

We can conclude that the algorithm is correct, and that it computes the sum of numbers in array A when called with $first =0$.

Induction worked nicely in the case above, but often the size of the problem we recurse on does not reduce by exactly 1. In this case we need to set up the induction slightly different. In the previous induction we assumed that the statement $P(n)$ was true for some $n \ge 1$ and from there proved that it is true for $n+1$, i.e., that $P(n+1)$ is true. As a side note: We could instead of course have assumed that the statement $P(n-1)$ is true for some $n > 1$ and then have proved that $P(n)$ is true. But at the point that we want to prove that $P(n)$ holds, we can actually assume that it already holds for all $m

1.) proving $P(n)$ for a base case (sometimes several base cases), i.e., to prove that P(1) holds, and then

2.) proving that if $P(m)$ holds for $m < n$ (This is the induction hypothesis) that then also $P(n)$ holds.

This type of induction proof is also called *strong induction*. With this we can proceed to prove the correctness of algorithm *sum2* and *binary_search*.

The algorithm *sum2* computes the sum of numbers in the array *A[first..last]*.
We prove the correctness of the algorithm *sum2* by induction on $n = last-first + 1$. *(Note: I added 1 here to have $n$ correspond to the number of elements in the array)*.

Base case ($n=1$): If $n=1$ then $first == last$. Thus the array contains one element, $A[first]$, and the sum of element(s) is also simply $A[first]$. This is exactly what the algorithm returns in this case (in line 3). Therefore the algorithm works correctly for $n = 1$.

Induction hypothesis ($m$): The algorithm correctly computes the sum of numbers for inputs (A, first, last) with $m = last-first+1$.

Induction step ($m < n \rightarrow n$): Assume the induction hypothesis holds for all $1 \le m < n$. We need to prove that it then also holds for $n$, that is, the algorithm works correctly if $last-first = n+1$. Suppose we have an input (A, first, last) with $last-first + 1 = n>1$. The algorithm returns $sum2(A, first, mid) + sum2(A, mid+1, last)$ with $mid=(first + last)//2$. Since $n>1$ we have $first

Thus, the induction hypothesis again holds (for $n$).

We can conclude that the algorithm is correct, and that it computes the sum of numbers in array A when called with $first =0$ and $last = len(A)-1.

The algorithm *binary_search* returns an index $i$ with $x \le i \le y$ and $A[i] == v$ if such an index exists. Otherwise it returns -1.
We prove the correctness of the algorithm *binary_search* by induction on $n = y-x + 1$. *(Note: $n= y-x$ would work just as well, then the base case would be $n=0$. But to stay consistent with the proof for *sum2* I use $n = y-x + 1, which is also the number of elements in the subarray)*.

Base case ($n=1$): If $n=1$ then $x == y$. Thus the array contains one element, $A[x]$, and the algorithm returns $x$ if $A[x] == v$ (and -1 otherwise). The the algorithm works correctly for $n=1$.

Induction hypothesis ($m$): For inputs with $m=y-x+1$ the algorithm correctly returns an index $i$ with $x \le i \le y$ and $A[i] == v$ if such an index exists (Otherwise it returns -1).

Induction step ($m < n \rightarrow n$): Assume the induction hypothesis holds for all $1 \le m < n$. We need to prove that it then also holds for $n$, that is, the algorithm works correctly if $y-x = n+1$. Suppose we have an input (A, v, x, y) with $y-x + 1 = n > 1$, and as in the algorithm define $h = (x+y)/2$. There are two cases.

$A[h] < v$. Since the array is sorted, we know that $v$ cannot be in the subarray $A[x..h]$. Thus, if it is in the array $A[x..y]$, it has to be in the array $A[h+1..y]$. This is exactly what the recursive call in this case checks, if it works correctly. Since $h \ge x$ holds, we have $y-(h+1)+1 < y-x+1 =n$, so by the induction hypothesis, the recursive call is indeed correct.

$A[h] \ge v$. The case is analogous to the first case. If $v$ is in $A[x..y]$ it has to be in the array $A[x..h]$. This is checked in the recursive call, which works correctly by induction hypothesis since $h-x+1 < y-x+1 =n$.

Thus, in both cases the induction hypothesis again holds (for $n$). We can conclude that the algorithm is correct.

Analyzing the running time of a recursive algorithm often consists of the following two steps:

1.) Finding the *recurrence* for the running time

2.) Solving the *recurrence*

To find the recurrence, we need to analyze how long the algorithm takes when not accounting for the recursive calls, and then we need to analyze the cost/size of recursive calls. We consider the examples from above.

The algorithm *factorial* takes constant time plus the time for the recursive call. In the recursive call we reduce to a problem with $n$ reduced to $n-1$. The running time will be:

This is the *recurrence* for the running time of *factorial*. Now we would need to solve the recurrence. In this case it is not difficult to guess that the recurrence solves to $T(n) = O(n)$, which one could prove formally by induction. Often you will see the base case omitted in recurrences for running times. So the recurrence above would simply be written as

$$T(n) = T(n-1) + O(1)$$

The reason for this is that as soon as we have a problem of 'constant size', we can assume that the algorithm takes constant time on it. Specifically, in this case, the time to compute 1!, no matter what the algorithm does, will not depend on a general $n$, because there is no $n$ in $1!$.

The recurrence for the running time of *sum1* is actually the same as for *factorial*, because we spend constant time, and reduce from $n$ to $n-1$. In *sum1* it is less obvious what '$n$' actually is, but that is something we already resolved when thinking about the correctness proof.

Now for binary search, the problem size actually reduces from $n$ to $n/2$. Again we spend constant time, so the recurrence is (we can ignore rounding issues here):

$$T(n) = T(n/2) + O(1)$$

This solves to $T(n) = O(\log(n))$: The recursion depth corresponds to how often we can divide $n$ by $2$ until we hit 1, which is $\log (n)$ (We use $\log$ to denote the logarithm base 2). We only have 1 recursive call, so we are executing the code $\log (n)$ times, always doing O(1) elementary operations. Therefore $T(n) = O(\log (n)$.

Finally, *sum2* has as recurrence

$$T(n) = 2 T(n/2) + O(1)$$.

The recursion depth again is bounded by $\log (n)$, but we have more subproblems to solve: first 2, then 4, and so on. Overall this results in a linear number of subproblems, and in $T(n) = O(n)$.

Since we haven't discussed solving recurrences yet, for now I am expecting you to be able to find recurrences and try to guess the running time based on the recurrence.

In [45]:

```
import csv
with open('weather-eindhoven.csv') as csvfile:
reader = csv.reader(csvfile)
next(reader) # skip header row
datatype = int
date = []
max_temperature = []
for row in reader:
date.append(datatype(row[1])) #column 1 (that is, the second) contains the date
max_temperature.append(datatype(row[14])/10) # column 14 contains the max temperature times 10
```

In [46]:

```
temperature_and_date = zip(max_temperature, date)
temp_date_sorted = sorted(temperature_and_date)
print("5 coldest daily temperatures (with corresponding date):")
print(temp_date_sorted[0:5])
```

In [47]:

```
temp_sorted = [x for x, y in temp_date_sorted]
date_sorted_by_temp = [y for x, y in temp_date_sorted]
print("5 warmest daily temperatures:")
print(temp_sorted[-6:-1])
print("The corresponding dates:")
print(date_sorted_by_temp[-6:-1])
```

In [48]:

```
i = binary_search(temp_sorted, 30)
print(i)
```

In [49]:

```
print("On %s the average temperature was %s degrees (Celsius)." % (date_sorted_by_temp[i], temp_sorted[i]))
```

So lets be bold and ask for exactly 40 degrees:

In [50]:

```
binary_search(temp_sorted, 40)
```

Out[50]:

A few suggested to explore the data and algorithms further:

(a) Compare the running time of binary search and linear search on this data. Is it as expected?

(b) Only being able to ask for an exact temperature doesn't seem very useful. Adapt the binary search algorithm to answer more interesting questions. For instance: On how many days in the data was the temperature between 25 and 30 degrees? Can you report all of these days?

But now what if we want to know whether in 2011 there was a day with at least 25 degrees? Or if we want to know on which days there was a high temperature and heavy rain?

These queries are not easily answered using binary search, because we are searching for ranges on several attributes (such queries are called *range queries*). Of course linear search (suitably adapted) can still answer these queries but not very efficiently. Later in the course we will also see a data structure which allows us to handle this type of queries efficiently.

In [ ]:

```
```