Filtering Values

There will be times when you have a list and you want to remove unwanted values. For example, you might want to remove all the None values from a list.

The Brute Force Method

You probably know lists have a built-in remove method. This is a good choice if you know there are only a few values to remove

while None in values:
    values.remove(None)

And with a try block you can eliminate the in test

while True:
    try:
        values.remove(None) #raise error if value not found
    except ValueError:
        break

This is slow because remove searches the list from the beginning each time we call it. Can we do it in a single pass?

Here's how NOT to do it

You might think
"I'll loop through the list just once and delete the None values."

for index, v in enumerate(values):
    if v is None:
        # delete the value the for loop is 'sitting on'
        del values[index]

Bad idea! This will cause the for loop to lose track of where it is. The correct way to filter a list is
  1. use a list comprehension to create a temporary list containing all the values that are not None
  2. use the slice assignment feature to replace the original values
values[:] = [v for v in values if v is not None]

A comprehension runs faster than an explicit for loop and requires fewer lines of code. Most important, the comprehension will skip over values without getting confused.

Generator expressions

If the list is large, a generator expression may be more efficient. This is similar to a comprehension but avoids creating an intermediate list.

values[:] = (v for v in values if v is not None)

The filter and ifilter functions

The built-in filter function can also be used, but you need to create a function to test the values.

values[:] = filter(lambda x: x is not None, values)

If the list is very large you can use itertools.ifilter instead of the built-in filter function. This has the advantage of returning the filtered values one at a time instead of creating an intermediate list.

import itertools
values[:] = itertools.ifilter(lambda x: x is not None, values)

Why is slice assignment important?

This might be the first time you've seen slice assignment so you might be wondering why you can't use regular assignment?

It's because a list comprehension returns a new list object. This fact is important if you have multiple variables referring to the same list object.

With regular assignment, the variables "values" and "x" start out referring to the same object but end up referring to different objects.


But with slice assignment they still refer to the same object.



No comments:

Post a Comment