Dynamic programming knapsack solution

  • I wrote a solution to the Knapsack problem in Python, using a bottom-up dynamic programming algorithm. It correctly computes the optimal value, given a list of items with values and weights, and a maximum allowed weight.



    Any critique on code style, comment style, readability, and best-practice would be greatly appreciated. I'm not sure how Pythonic the code is; filling in a matrix seems like a natural way of implementing dynamic programming, but it doesn't "feel" Pythonic (and is that a bad thing, in this case?).



    Note that the comments are a little on the verbose side; as I was writing the algorithm, I tried to be as explicit about each step as possible, since I was trying to understand it to the fullest extent possible. If they are excessive (or just plain incorrect), feel free to comment (hah!) on it.



    import sys

    def knapsack(items, maxweight):
    # Create an (N+1) by (W+1) 2-d list to contain the running values
    # which are to be filled by the dynamic programming routine.
    #
    # There are N+1 rows because we need to account for the possibility
    # of choosing from 0 up to and including N possible items.
    # There are W+1 columns because we need to account for possible
    # "running capacities" from 0 up to and including the maximum weight W.
    bestvalues = [[0] * (maxweight + 1)
    for i in xrange(len(items) + 1)]

    # Enumerate through the items and fill in the best-value table
    for i, (value, weight) in enumerate(items):
    # Increment i, because the first row (0) is the case where no items
    # are chosen, and is already initialized as 0, so we're skipping it
    i += 1
    for capacity in xrange(maxweight + 1):
    # Handle the case where the weight of the current item is greater
    # than the "running capacity" - we can't add it to the knapsack
    if weight > capacity:
    bestvalues[i][capacity] = bestvalues[i - 1][capacity]
    else:
    # Otherwise, we must choose between two possible candidate values:
    # 1) the value of "running capacity" as it stands with the last item
    # that was computed; if this is larger, then we skip the current item
    # 2) the value of the current item plus the value of a previously computed
    # set of items, constrained by the amount of capacity that would be left
    # in the knapsack (running capacity - item's weight)
    candidate1 = bestvalues[i - 1][capacity]
    candidate2 = bestvalues[i - 1][capacity - weight] + value

    # Just take the maximum of the two candidates; by doing this, we are
    # in effect "setting in stone" the best value so far for a particular
    # prefix of the items, and for a particular "prefix" of knapsack capacities
    bestvalues[i][capacity] = max(candidate1, candidate2)

    # Reconstruction
    # Iterate through the values table, and check
    # to see which of the two candidates were chosen. We can do this by simply
    # checking if the value is the same as the value of the previous row. If so, then
    # we say that the item was not included in the knapsack (this is how we arbitrarily
    # break ties) and simply move the pointer to the previous row. Otherwise, we add
    # the item to the reconstruction list and subtract the item's weight from the
    # remaining capacity of the knapsack. Once we reach row 0, we're done
    reconstruction = []
    i = len(items)
    j = maxweight
    while i > 0:
    if bestvalues[i][j] != bestvalues[i - 1][j]:
    reconstruction.append(items[i - 1])
    j -= items[i - 1][1]
    i -= 1

    # Reverse the reconstruction list, so that it is presented
    # in the order that it was given
    reconstruction.reverse()

    # Return the best value, and the reconstruction list
    return bestvalues[len(items)][maxweight], reconstruction


    if __name__ == '__main__':
    if len(sys.argv) != 2:
    print('usage: knapsack.py [file]')
    sys.exit(1)

    filename = sys.argv[1]
    with open(filename) as f:
    lines = f.readlines()

    maxweight = int(lines[0])
    items = [map(int, line.split()) for line in lines[1:]]

    bestvalue, reconstruction = knapsack(items, maxweight)

    print('Best possible value: {0}'.format(bestvalue))
    print('Items:')
    for value, weight in reconstruction:
    print('V: {0}, W: {1}'.format(value, weight))


    The input file that it expects is as follows:



    165
    92 23
    57 31
    49 29
    68 44
    60 53
    43 38
    67 63
    84 85
    87 89
    72 82


    The first line contains the maximum weight allowed, and subsequent lines contain the items, represented by value-weight pairs.


    An old post, I know, but if you haven't run into it already `enumerate` allows you to specify the starting index (e.g. `for i,item in enumerate(items,1):` would have `i` begin at 1.

    @SimonT: Fantastic! I've programmed in Python for 2+ years now, and I had never seen `enumerate` used that way. I'll definitely keep it in mind.

  • Gareth Rees

    Gareth Rees Correct answer

    8 years ago

    1. Review




    1. The function knapsack lacks a docstring that would explain what arguments the function takes (what kind of things are in items? must items be a sequence, or can it be an iterable?) and what it returns.


    2. This kind of function is ideal for doctests.


    3. The comments say things like "Create an (N+1) by (W+1) 2-d list". But what is N and what is W? Presumably N is len(items) and W is maxweight, so it would be a good idea to use the same names in the comments and the code:



      N = len(items)
      W = maxweight

    4. The comment above bestvalues fails to explain what the values in this table mean. I would write something like this:



      # bestvalues[i][j] is the best sum of values for any
      # subsequence of the first i items, whose weights sum
      # to no more than j.


      This makes it obvious why \$0 ≤ i ≤ N\$ and why \$0 ≤ j ≤ W\$.


    5. In a loop like:



      bestvalues = [[0] * (maxweight + 1)
      for i in xrange(len(items) + 1)]


      where the loop variable (here i) is unused, it's conventional to name it _.


    6. These lines could be omitted:



      # Increment i, because the first row (0) is the case where no items
      # are chosen, and is already initialized as 0, so we're skipping it
      i += 1


      and then in the rest of the code, use i + 1 instead of i and i instead of i - 1.


    7. The reconstruction loop:



      i = N
      while i > 0:
      # code
      i -= 1


      can be written like this:



      for i in reversed(range(1, N + 1)):
      # code

    8. The code prints an error message like this:



      print('usage: knapsack.py [file]')


      Error messages ought to go to standard error (not standard output). And it's a good idea not to hard-code the name of the program, because it would be easy to rename the program but forget to update the code. So write instead:



      sys.stderr.write("usage: {0} [file]\n".format(sys.argv[0]))

    9. The block of code that reads the problem description and prints the result only runs when __name__ == '__main__'. This makes it hard to test, for example from the interactive interpreter. It's usually best to put this kind of code in its own function, like this:



      def main(filename):
      with open(filename) as f:
      # etc.

      if __name__ == '__main__':
      if len(sys.argv) != 2:
      print('usage: knapsack.py [file]')
      sys.exit(1)
      main(sys.argv[1])


      and now you can run main('problem.txt') from the interpreter to test it.


    10. The code reads the whole of the file into memory as a list of lines:



      lines = f.readlines()


      this is harmless here because the file is small, but it's usually better to process a file one line at a time, like this:



      with open(filename) as f:
      maxweight = int(next(f))
      items = [[int(word) for word in line.split()] for line in f]



    2. Recursive approach



    Any dynamic programming algorithm can be implemented in two ways: by building a table of partial results from the bottom up (as in the code in the post), or by recursively computing the result from the top down, using memoization to avoid computing any partial result more than once.



    The top-down approach often results in slightly simpler and clearer code, and it only computes the partial results that are needed for the particular problem instance (whereas in the bottom-up approach it's often hard to avoid computing all partial results even if some of them go unused).



    So we could use @functools.lru_cache to implement a top-down solution, like this:



    from functools import lru_cache

    def knapsack(items, maxweight):
    """Solve the knapsack problem by finding the most valuable subsequence
    of items that weighs no more than maxweight.

    items must be a sequence of pairs (value, weight), where value is a
    number and weight is a non-negative integer.

    maxweight is a non-negative integer.

    Return a pair whose first element is the sum of values in the most
    valuable subsequence, and whose second element is the subsequence.

    >>> items = [(4, 12), (2, 1), (6, 4), (1, 1), (2, 2)]
    >>> knapsack(items, 15)
    (11, [(2, 1), (6, 4), (1, 1), (2, 2)])

    """
    @lru_cache(maxsize=None)
    def bestvalue(i, j):
    # Return the value of the most valuable subsequence of the first
    # i elements in items whose weights sum to no more than j.
    if j < 0:
    return float('-inf')
    if i == 0:
    return 0
    value, weight = items[i - 1]
    return max(bestvalue(i - 1, j), bestvalue(i - 1, j - weight) + value)

    j = maxweight
    result = []
    for i in reversed(range(len(items))):
    if bestvalue(i + 1, j) != bestvalue(i, j):
    result.append(items[i])
    j -= items[i][1]
    result.reverse()
    return bestvalue(len(items), maxweight), result


    To see how many partial solutions this needs to compute, print bestvalue.cache_info() just before returning the result. When solving the example problem in the docstring, this outputs:



    CacheInfo(hits=17, misses=42, maxsize=None, currsize=42)


    The 42 entries in the cache are fewer than the 96 partial solutions computed by the bottom up approach.


    Thanks for your response (especially points 3 and 9)! I can't believe I forgot to consider the memoized recursive method. I'll apply some of your suggestions later today, hopefully. Thanks again.

    Beware of recursion depth limit issues with this proposed solution.

    @Frank: yes, this solution uses `len(items)` levels of stack.

    @GarethRees Do you need to define `bestvalue()` inside `knapsack()`? I would tend to avoid defining a function inside a function to make things clearer.

    @greendiod: I defined `bestvalue` inside `knapsack` because (i) it's only used there; (ii) it uses `items` from the outer scope (it would be a pain to have to pass this down through all the recursive calls); (iii) it's memoized so it has an associated cache that we don't want to keep around when `knapsack` has returned.

    ```value, weight = items[i - 1]``` present above the return value of ```bestvalue``` function returns an error: ```ValueError: not enough values to unpack (expected 2, got 0)```

    @Saurabh: Check your `items` argument: is it a list of pairs as required?

    @GarethRees: Sorry I got the error because I didn't fully implement your review in the original post.

License under CC-BY-SA with attribution


Content dated before 7/24/2021 11:53 AM