In Python, we can use the numpy.where() function to select elements from a numpy array, based on a condition.
Not only that, but we can perform some operations on those elements if the condition is satisfied.
Let’s look at how we can use this function, using some illustrative examples!
This function accepts a numpy-like array (ex. a NumPy array of integers/booleans).
It returns a new numpy array, after filtering based on a condition, which is a numpy-like array of boolean values.
For example, condition
can take the value of array([[True, True, True]]
), which is a numpy-like boolean array. (By default, NumPy only supports numeric values, but we can cast them to bool
also)
For example, if condition
is array([[True, True, False]])
, and our array is a = ndarray([[1, 2, 3]])
, on applying a condition to array (a[:, condition]
), we will get the array ndarray([[1 2]])
.
import numpy as np
a = np.arange(10)
print(a[a <= 2]) # Will only capture elements <= 2 and ignore others
Outputarray([0 1 2])
NOTE: The same condition condition can also be represented as a <= 2. This is the recommended format for the condition array, as it is very tedious writing it as a boolean array
But what if we want to preserve the dimension of the result, and not lose out on elements from our original array? We can use numpy.where() for this.
numpy.where(condition [, x, y])
We have two more parameters x
and y
. What are those?
Basically, what this says is that if condition
holds true for some element in our array, the new array will choose elements from x
.
Otherwise, if it’s false, elements from y
will be taken.
With that, our final output array will be an array with elements from x
wherever condition = True
, and elements from y
whenever condition = False
.
Note that although x
and y
are optional, if you specify x
, you MUST also specify y
. This is because, in this case, the output array shape must be the same as the input array.
NOTE: The same logic applies for both single and multi-dimensional arrays too. In both cases, we filter based on the condition. Also remember that the shapes of x
, y
and condition
are broadcasted together.
Now, let us look at some examples, to understand this function properly.
Suppose we want to take only positive elements from a numpy array and set all negative elements to 0, let’s write the code using numpy.where()
.
We’ll use a 2 dimensional random array here, and only output the positive elements.
import numpy as np
# Random initialization of a (2D array)
a = np.random.randn(2, 3)
print(a)
# b will be all elements of a whenever the condition holds true (i.e only positive elements)
# Otherwise, set it as 0
b = np.where(a > 0, a, 0)
print(b)
Output[[-1.06455975 0.94589166 -1.94987123]
[-1.72083344 -0.69813711 1.05448464]]
[[0. 0.94589166 0. ]
[0. 0. 1.05448464]]
As you can see, only the positive elements are now retained!
There may be some confusion regarding the above code, as some of you may think that the more intuitive way would be to simply write the condition like this:
import random
import numpy as np
a = np.random.randn(2, 3)
b = np.where(a > 0)
print(b)
If you now try running the above code, with this change, you’ll get an output like this:
(array([0, 1]), array([2, 1]))
If you observe closely, b
is now a tuple of numpy arrays. And each array is the location of a positive element. What does this mean?
Whenever we provide just a condition, this function is actually equivalent to np.asarray.nonzero()
.
In our example, np.asarray(a > 0)
will return a boolean-like array after applying the condition, and np.nonzero(arr_like)
will return the indices of the non-zero elements of arr_like
. (Refer to this link)
So, we’ll now look at a simpler example, that shows us how flexible we can be with numpy!
import numpy as np
a = np.arange(10)
b = np.where(a < 5, a, a * 10)
print(a)
print(b)
Ouptut
[0 1 2 3 4 5 6 7 8 9]
[ 0 1 2 3 4 50 60 70 80 90]
Here, the condition is a < 5
, which will be the numpy-like array [True True True True True False False False False False]
, x
is the array a, and y
is the array a * 10. So, we choose from an only if a < 5, and from a * 10, if a > 5.
So, this transforms all elements >= 5, by multiplication with 10. This is what we get indeed!
If we provide all of condition
, x
, and y
arrays, numpy will broadcast them together.
import numpy as np
a = np.arange(12).reshape(3, 4)
b = np.arange(4).reshape(1, 4)
print(a)
print(b)
# Broadcasts (a < 5, a, and b * 10)
# of shape (3, 4), (3, 4) and (1, 4)
c = np.where(a < 5, a, b * 10)
print(c)
Output[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[0 1 2 3]]
[[ 0 1 2 3]
[ 4 10 20 30]
[ 0 10 20 30]]
Again, here, the output is selected based on the condition, so all elements, but here, b
is broadcasted to the shape of a
. (One of its dimensions has only one element, so there will be no errors during broadcasting)
So, b
will now become [[0 1 2 3] [0 1 2 3] [0 1 2 3]]
, and now, we can select elements even from this broadcasted array.
So the shape of the output is the same as the shape of a
.
In this article, we learned about how we can use the Python numpy.where() function to select arrays based on another condition array.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
While we believe that this content benefits our community, we have not yet thoroughly reviewed it. If you have any suggestions for improvements, please let us know by clicking the “report an issue“ button at the bottom of the tutorial.