Why is this code taking longer time than expected?

164 views Asked by At

Question: Given a list of unsorted elements, we have to find the length of longest consecutive elements sequence.

Expected time complexity: O(N)

for Ex: [4,7,1,100,28,2,3]

output: 4 since the longest consecutive elements sequence is [1,2,3,4] from the above list

View Problem Statement

I've written the code in Python. I've used set and map (defaultdict) to solve this problem.

from collections import defaultdict as maps
class Solution:
    def longestConsecutive(self, nums: List[int]) -> int:
        nums = set(nums); # to remove duplicates
        n = len(nums);
        if n == 0: return 0;
        if n == 1: return 1;
        present = maps(int);
        # Inserting values into map (defaultdict) as key
        for i in nums:
            present[i] = 1; # to check for presence of element
        i = 0; curr = 0; cnt, maxx = 0, 0; visset = {i for i in nums}
        while n > 0:
              #I had to use this while loop since the for-loop below will get interrupted if we make any changes to the iterating set.
            
            cnt = 1;
            for ind in visset:
                curr = ind; n -= 1; 
                visset.remove(ind); break; #remove is O(1) complexity for set
            # values less than curr::
            curr, temp = curr, curr;
            while n > 0:
                if present[curr - 1]:
                    curr -= 1; n -= 1; 
                    visset.remove(curr); cnt += 1; #remove is O(1) complexity for set
                else:
                    break;
            # values greater than curr::
            while n > 0:
                if present[temp + 1]:
                    temp += 1; n -= 1; 
                    visset.remove(temp); cnt += 1; #remove is O(1) complexity for set
                else:
                    break;
            maxx = max(cnt, maxx);
        maxx = max(cnt, maxx);
        return maxx

for more reference and runtime

Can anybody explain why this code's runtime is higher than intended ?

UPDATE: the runtime of this program is around 7000ms and if I replace set with a dictionary and use del dictt[i] instead of remove() in case of set, the runtime comes down to 2000ms

3

There are 3 answers

1
Kenny Ostrom On BEST ANSWER

Yours passes unmodified on leetcode, but it was on the slower end. Here's a simplification of the same code that does a little better than average:

class Solution:
    def longestConsecutive(self, nums: List[int]) -> int:
        present = set(nums)
        curr = 0
        cnt, maxx = 0, 0
        while present:
            cnt = 1
            start = curr = present.pop()
            while start - 1 in present:
                start -= 1
                cnt += 1
                present.remove(start)
            while curr + 1 in present:
                curr += 1
                cnt += 1
                present.remove(curr)
            maxx = max(cnt, maxx)
        return maxx

You did have a lot of unnecessary data structures, and dumplicate maps where the value wasn't even used. I replaced them all with one set.

3
Alain T. On

The problem may stem from for ind in visset. I observed this before (for dictionaries) but never bothered to measure it. There seems to be some overhead involved in iterating over a set after removing elements from it.

from time import time

s = set(range(1_000_000))
x = set(range(1_000_000))
start = time()
for i in range(100_000):
    x.remove(i)
    for i in s: break
print(time()-start)  # 0.017419099807739258


start = time()
for i in range(100_000):
    s.remove(i)
    for i in s: break
print(time()-start)  # 2.6430912017822266

Perhaps a solution without a for-loop would run faster

# using pop to get a random number out of the set
# instead of the for-loop-break
curr = 1
if visset:
   curr = visset.pop()
   n   -= 1

Or a different algorithm altogether to avoid removing items completely:

def longestConsec(nums):
    forward  = dict(zip(nums,nums))
    backward = forward.copy()
    for n in forward:
        f = forward.get(n+1,n)
        b = backward.get(n-1,n)
        forward[b]  = f
        backward[f] = b            
    return max(f-b+1 for b,f in forward.items())
4
JonSG On

It's slow on inputs like list(range(0, 200000, 2)). You do this 100000 times then:

for ind in visset:
    curr = ind; n -= 1; 
    visset.remove(ind); break; #remove is O(1) complexity for set

It's not the removal that's slow but the for loop iterating to find ind, as explained here. Takes quadratic time overall.

Use ind = visset.pop() instead, that's optimized for repeated removal of arbitrary elements. Or you could use a collections.OrderedDict instead of your set. Thanks to its additional linked list, it's fast for your style of find-one-and-remove-it.

Here is an alternate approach you might consider as well.

Building on the fact that we can test if a prior number and next number is in the list, we can build and keep track of runs of items.

def longest_consecutive(numbers):
    numbers = set(numbers)
    best_length = 0
    best_start = -1

    for number in numbers:
        ## ----------------
        ## this run is already/will be accounted for by some real 'start'
        ## ----------------
        if number - 1 in numbers:
            continue
        ## ----------------

        ## ----------------
        ## find consecutive numbers
        ## ----------------
        i = number
        while i in numbers:
            i += 1
        ## ----------------

        ## ----------------
        ## record the streak if better than what we have seen
        ## ----------------
        if i - number > best_length:
            best_length = i - number
            best_start = number
        ## ----------------

    return list(range(best_start, best_start + best_length))

based on timeit this will do lists of a million random integers in about a 1/4 of a second.