This week I decided to doing some explorations around dynamic programming and conduct some timing tests to figure out when the run times of the different methods start to diverge drastically.
Quick intro for anyone walking in off the street not familiar with dynamic programming. Dynamic programming is an algorithmic paradigm which breaks down complex problems into smaller, easier to calculate ones. There are two approaches to DP.
Overlapping subproblems. This is similar to a Divide and Conquer paradigm where solutions to smaller subproblems are combined to provide an overall answer.
Optimal Substructure. The basis for using this is if optimal solution of the given problem can be obtained by using optimal solutions of its subproblems.
For my purposes (and the problems I decided to time) I’m just focused on the overlapping subproblems.
In my mind the most intuitive approach is to break the problems recursively.
A recursive algorithm is an algorithm which calls itself with “smaller (or simpler)” input values, and which obtains the result for the current input by applying simple operations to the returned value for the smaller (or simpler) input. source
Memoization takes over where recursion leaves off. Memoization involves caching results so that work isn’t repeated. Memoization is a top down approach.
Tabulation on the other hand is a bottom up approach where the smaller problems are solved first and their solutions are combined.
The two algorithms I decided to time are the Fibonacci sequence and the longest common substring problem.
Recursive code
def fib_rec(n): if n <= 1: return n else: return fib_rec(n-1)+fib_rec(n-2) |
Memoization code
L = [None] * 30 L[0] = 1 L[1] = 1 def fib_memo(n): global L if n == 0 or n == 1: return 1 if L[n-1] is not None: return L[n-1] else: L[n-1] = fib_memo(n-1) + fib_memo(n-2) return L[n-1] |
Iterative approach
def fib_loop(n): i = 1 a=1 b=1 for i in range(n-1): a,b=b,a+b return a |
I first decided to try and find the Fibonacci number for 30. Here are the numbers for that, respectively.
0.491546154022
4.60147857666e-05
1.50203704834e-05
It takes almost half a second to calculate 30 numbers for the recursive solution. Let’s try 33 next.
2.093075037
2.69412994385e-05
1.19209289551e-05
The recursive algorithm has more than quadrupled in timing. Let’s see what the affect is just bumping up by 2 to 35 this time.
5.6231648922
4.29153442383e-05
2.50339508057e-05
Over 5 seconds to calculate fib_rec(35)! Amazing how quickly those calculations add up. Keep in mind the O(n) is 2^n due to the repetitive works. The memoized version and iterative versions have barely budged. There is clearly some more overhead with memoization but once it’s done a calculation, that’s it, it’s linear in efficiency. One final bump to 36 to see the affect of recursion at this threshold. and then predicting what 40 will be and running it to test it.
8.92951416969
5.29289245605e-05
1.71661376953e-05
Based off this i’m guessing 45 seconds to calculate the Fibonacci of 40 recursively.
61.1721940041
0.000101089477539
0.000174999237061
Looks like I underestimated how slow it would be. Really goes to show how the simplest approach isn’t necessarily the best and how adding a simple cache or switching the approach altogether can yield great results. This actually wound up being a longer post than I anticipated so I’ll discuss the longest common substring on a subsequent post.