## Exploring weather data: Iterative and recursive algorithms by the example of CountInInterval¶

In this notebook we want to explore weather data of the weather in Eindhoven. Specifically, we will develop a routine CountInInterval(A, p, q) that computes for an array A how many values are between p and q.

In [1]:
import csv

with open('weather-eindhoven.csv') as csvfile:
datatype = int
date = []
max_temperature = []
for row in reader:
date.append(datatype(row[1])) #column 1 (that is, the second) contains the date
max_temperature.append(datatype(row[14])/10) # column 14 contains the max temperature times 10

In [2]:
def countInInterval(A, p, q):
for i in range(0, len(A)):
if A[i]>= p and A[i]<= q: answer = answer+1

In [3]:
print(countInInterval(max_temperature, -5, 5))
print(countInInterval(max_temperature, 20, 30))

1220
2963

In [4]:
%timeit countInInterval(max_temperature, -5, 5)

100 loops, best of 3: 29.6 ms per loop


Assume we want to use the countInInterval method frequently. Then it seems a good idea to invest some time into extra preprocessing, if in turn we can count faster. A simple approach is to sort the temperatures.

In [5]:
max_temperature.sort()


Now lets try to count using a recursive divide&conquer algorithm:

In [6]:
def countInInterval2(A, p, q, x=0, y=None):
if (y == None): y = len(A)-1
if (x<y):
h = (x+y)//2
if (A[h]<p): return countInInterval2(A, p, q, h+1, y)
else: return countInInterval2(A, p, q, x, h)
else:
count = 0
while(A[x]>= p and A[x]<=q and x<len(A)):
count = count+1
x = x+1
return count

In [9]:
countInInterval2(max_temperature, -5, 5)

Out[9]:
1220
In [10]:
%timeit countInInterval2(max_temperature, -5, 5)

100 loops, best of 3: 6.3 ms per loop


This this is faster than the iterative algorithm on the unsorted data, the algorithm above sometimes requires two recursive calls. This can be avoided:

In [11]:
def countInInterval3(A, p, q, x=0, y=None):
if (y == None): y = len(A)-1
if (x<y):
h = (x+y)//2
if (A[h]<p):
return countInInterval3(A, p, q, h+1, y)
else:
if (A[h]>q): return countInInterval3(A, p, q, x, h)
else: return countInInterval3(A, p, q, h+1, y) + countInInterval3(A, p, q, x, h)
else:
if(A[x]>= p and A[x]<=q): return 1
else: return 0

In [12]:
countInInterval3(max_temperature, -5, 5)

Out[12]:
1220
In [13]:
%timeit countInInterval3(max_temperature, -5, 5)

100 loops, best of 3: 18.7 ms per loop


This didn't work out that well. And there is also something unsatisfactory: If many values are counted the algorithm will also take long. We can avoid this. Essentially we just need two binary searches.

In [19]:
def binary_search_lower(A, v, x, y):
if (x<y):
h = (x+y)//2
if (A[h]<v): return binary_search_lower(A, v, h+1, y)
else: return binary_search_lower(A, v, x, h)
else:
return x

def binary_search_upper(A, v, x, y):
if (x<y):
h = (x+y)//2
if (A[h]<=v): return binary_search_upper(A, v, h+1, y)
else: return binary_search_upper(A, v, x, h)
else:
return x

def countInInterval4(A, p, q):
return binary_search_upper(A, q, 0, len(A)-1) - binary_search_lower(A, p, 0, len(A)-1)

In [20]:
countInInterval4(max_temperature, -5, 5)

Out[20]:
1220
In [21]:
%timeit countInInterval4(max_temperature, -5, 5)

10000 loops, best of 3: 85.8 µs per loop

In [ ]: