I have a function x as shown below that takes two numpy arrays as the input and I want to get back a boolean value upon some computation.
import numpy as np
def x(a,b):
print(a)
print(b)
# Some computation...
return boolean_value
wrappedFunc = np.frompyfunc(x,nin=2,nout=1)
arg_a = np.arange(8).reshape(2,4)
# arg_b is a numpy array having shape (2,1)
arg_b = np.array((np.array([[0, 1, 0],
[0, 0, 0],
[1, 0, 0],
[1, 1, 0]]),
np.array([[0., 1., 0.],
[0., 0., 0.],
[1., 0., 0.],
[1., 1., 0.],
[0.5, 0.5, 0.]])), dtype=object).reshape(2, 1)
Executing the code above results in the following output.
# Output of a is:
0
1
2
3
4
5
6
7
# output of b is:
[[0 1 0]
[0 0 0]
[1 0 0]
[1 1 0]]
[[0 1 0]
[0 0 0]
[1 0 0]
[1 1 0]]
[[0 1 0]
[0 0 0]
[1 0 0]
[1 1 0]]
[[0 1 0]
[0 0 0]
[1 0 0]
[1 1 0]]
[[0. 1. 0. ]
[0. 0. 0. ]
[1. 0. 0. ]
[1. 1. 0. ]
[0.5 0.5 0. ]]
[[0. 1. 0. ]
[0. 0. 0. ]
[1. 0. 0. ]
[1. 1. 0. ]
[0.5 0.5 0. ]]
[[0. 1. 0. ]
[0. 0. 0. ]
[1. 0. 0. ]
[1. 1. 0. ]
[0.5 0.5 0. ]]
[[0. 1. 0. ]
[0. 0. 0. ]
[1. 0. 0. ]
[1. 1. 0. ]
[0.5 0.5 0. ]]
As you can see the variables a and b are printed 8 times respectively, this is not the intended behaviour as I expected to see the output of the print statements for a and b twice respectively. The expected output from print(a) and print(b) statements is shown below:
On first call:
a needs to be:[0,1,2,3]
b needs to be:[[0 1 0]
[0 0 0]
[1 0 0]
[1 1 0]]
On second call:
a needs to be:[4,5,6,7]
b needs to be:[[0. 1. 0. ]
[0. 0. 0. ]
[1. 0. 0. ]
[1. 1. 0. ]
[0.5 0.5 0. ]]
What am I doing wrong here?
Let's look at
frompyfuncwith a simplerb, and compare it to straightforwardnumpyaddition.The addition of a (2,4) with a (2,1) yields a (2,4). By the rules of
broadcastingthe size 1 dimension is 'replicated' to match the 4 ofa:Define a function that simply adds two 'scalars'. As written it works with arrays, including
aandb, but imagine having someiflines that only work with scalars.Using
frompyfuncto make aufuncthat canbroadcastits arguments, passing scalar values tox:What you seem to want is
zipof the arrays on their first dimension:Note that
xhere gets a (4,) and (1,) shaped arrays, which, again bybroadcasting, yield a (4,) result.Those 2 output arrays can be joined to make the same (4,2) as before:
A related function,
vectorizetakes asignaturethat allows us to specify itertion on the first axis. Getting that right can take some practice (though I got it right on the first try!):vectorizehas a performance disclaimer, and that applies doubly so to thesignatureversion.frompyfuncgenerally performs better (when it does what we want).For small arrays, list comprehension usually does better, however for large arrays,
vectorizeseems to scale better, and ends up with a modest speed advantage. But to get the bestnumpyperformance it's best to work with the whole arrays (true vectorization), without any of this 'iteration'.