As a follow up to Dynamic Programming Timing Pt1., I’ve timed the different solutions of another algorithmic problem, the Longest common subsequence. Again the point I’m really curious in is what point do the timings of the different methods diverge and I as a user/programmer/human start to notice.
To start with, what’s the longest common subsequence? LCS Problem Statement: Given two sequences, find the length of longest subsequence present in both of them. A subsequence is a sequence that appears in the same relative order, but not necessarily contiguous. For example, “abc”, “abg”, “bdf”, “aeg”, ‘”acefg”, .. etc are subsequences of “abcdefg”. So a string of length n has 2^n different possible subsequences.
The naive solution for this problem is to generate all subsequences of both given sequences and find the longest matching subsequence. This solution is exponential in term of time complexity.
I will again use the three methods and will adjust the strings
Naive Recursive Method
def LCS(X,Y,m,n): print str(m) + " " + str(n) if m == 0 or n == 0: return 0 elif X[m-1] == Y[n-1]: return 1 + LCS(X,Y,m-1,n-1) else: return max(LCS(X,Y,m-1,n),LCS(X,Y,m,n-1)) |
One minor additional Python exploration will be to test the memoization of using a dictionary vs a list and see whether one is quicker than the other and then hypothesizing why that is.
Memoization with lists
L = [[None] * (len(X)+1) for i in range(len(Y)+1)] longest = "" def LCS_memo(X,Y,m,n): global L global longest if m == 0 or n == 0: return 0 if L[m-1][n-1] is not None: return L[m-1][n-1] else: if X[m-1] == Y[n-1]: L[m-1][n-1] = 1 + LCS_memo(X,Y,m-1,n-1) else: L[m-1][n-1] = max(LCS_memo(X,Y,m-1,n),LCS_memo(X,Y,m,n-1)) return L[m-1][n-1] |
Memoization with Dictionaries
cache = {} def lcs_dict(s1, s2): global cache if len(s1) == 0 or len(s2) == 0: return 0 if (s1, s2) in cache: return cache[(s1, s2)] else: if s1[-1] == s2[-1]: cache[(s1, s2)] = 1 + lcs_dict(s1[:-1], s2[:-1]) else: cache[(s1, s2)] = max(lcs_dict(s1[:-1], s2), lcs_dict(s1, s2[:-1])) return cache[(s1, s2)] |
Iterative Method
def lcs_iter(X , Y): # find the length of the strings m = len(X) n = len(Y) # declaring the array for storing the dp values L = [[None]*(n+1) for i in xrange(m+1)] for i in range(m+1): for j in range(n+1): if i == 0 or j == 0 : L[i][j] = 0 elif X[i-1] == Y[j-1]: L[i][j] = L[i-1][j-1]+1 else: L[i][j] = max(L[i-1][j] , L[i][j-1]) # L[m][n] contains the length of LCS of X[0..n-1] & Y[0..m-1] return L[m][n] |
First test case , which I’m borrowing from geeks for geeks, along with the iterative code.
X = “AGGTAB”
Y = “GXTXAYB”
Recusive solution is 0:00:00.001194
Memoized 2d array solution is 0:00:00.000044
Memoized dict solution is 0:00:00.000059
Iterative dict solution is 0:00:00.000097
Starting to increase the strings in length gradually.
X = AGGTABSMF
Y = GXTXAYBTIW
The LCS is 4
Recusive solution is 0:00:00.019347
Memoized 2d array solution is 0:00:00.000161
Memoized dict solution is 0:00:00.000258
Iterative dict solution is 0:00:00.000148
Trying to see where I can trip up the algos.
X = AGGTABSMF
Y = GXTXAYBTM
The LCS is 5
Recusive solution is 0:00:00.007018
Memoized 2d array solution is 0:00:00.000132
Memoized dict solution is 0:00:00.000229
Iterative dict solution is 0:00:00.000154
And we’ve reached that point fairly quickly.
X = AGGTABAGGTRYQPS
Y = GXTXAYBAYBSDFMT
The LCS is 7
Recusive solution is 0:00:16.634588
Memoized 2d array solution is 0:00:00.000319
Memoized dict solution is 0:00:00.000429
Iterative dict solution is 0:00:00.000243
For this next set the recursive solution took longer than 10 minutes so I gave up the timing and just ran the three quick solutions, which all turned to be sub millisecond. This really shows the power of avoiding recalculations.
X = SDLFSDHFSDFNJZOILKSD
Y = SKWEKJLWERJKLCJKLDSF
The LCS is 6
Memoized 2d array solution is 0:00:00.002030
Memoized dict solution is 0:00:00.000944
Iterative dict solution is 0:00:00.000386
I can’t even imagine how long it would take to calculate a 22 LCS through the reclusive method
X = SDLFSDHFSDFNJZOILKSDFKJDOIFWCSDFWLKWEFJKLSFWE
Y = LKJWRJJKSDKJSDFMNSDFKJSKWEKJLWERJKLCJKLDSFJKA
The LCS is 22
Memoized 2d array solution is 0:00:00.005035
Memoized dict solution is 0:00:00.004494
Iterative dict solution is 0:00:00.002438