Friday, January 28, 2022

Arithmetic with NumPy Arrays

Arrays are important because they enable you to express batch operations on data without writing any for loops. NumPy users call this vectorization. Any arithmetic operations between equal-size arrays applies the operation element-wise:

In [51]: arr = np.array([[1., 2., 3.], [4., 5., 6.]])

In [52]: arr

Out[52]:

array([[ 1., 2., 3.],

[ 4., 5., 6.]])

In [53]: arr * arr

Out[53]:

array([[ 1., 4., 9.],

[ 16., 25., 36.]])

In [54]: arr - arr

Out[54]:

array([[ 0., 0., 0.],

[ 0., 0., 0.]])

Arithmetic operations with scalars propagate the scalar argument to each element in the array:

In [55]: 1 / arr

Out[55]:

array([[ 1. , 0.5 , 0.3333],

[ 0.25 , 0.2 , 0.1667]])

In [56]: arr ** 0.5

Out[56]:

array([[ 1. , 1.4142, 1.7321],

[ 2. , 2.2361, 2.4495]])

Comparisons between arrays of the same size yield boolean arrays:

In [57]: arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])

In [58]: arr2

Out[58]:

array([[ 0., 4., 1.],

[ 7., 2., 12.]])

In [59]: arr2 > arr

Out[59]:

array([[False, True, False],

[ True, False, True]], dtype=bool)

Operations between differently sized arrays is called broadcasting. Broadcasting describes how arithmetic works between arrays of different shapes. It can be a powerful feature, but one that can cause confusion, even for experienced users. The simplest example of broadcasting occurs when combining a scalar value with an array:

In [79]: arr = np.arange(5)

In [80]: arr

Out[80]: array([0, 1, 2, 3, 4])

In [81]: arr * 4

Out[81]: array([ 0, 4, 8, 12, 16])

Here we say that the scalar value 4 has been broadcast to all of the other elements in the multiplication operation. For example, we can demean each column of an array by subtracting the column means. In this case, it is very simple:

In [82]: arr = np.random.randn(4, 3)

In [83]: arr.mean(0)

Out[83]: array([-0.3928, -0.3824, -0.8768])

In [84]: demeaned = arr - arr.mean(0)

In [85]: demeaned

Out[85]:

array([[ 0.3937, 1.7263, 0.1633],

[-0.4384, -1.9878, -0.9839],

[-0.468 , 0.9426, -0.3891],

[ 0.5126, -0.6811, 1.2097]])

In [86]: demeaned.mean(0)

Out[86]: array([-0., 0., -0.])

Demeaning the rows as a broadcast operation requires a bit more care. Fortunately, broadcasting potentially lower dimensional values across any dimension of an array (like subtracting the row means from each column of a two-dimensional array) is possible as long as you follow the rules.

This brings us to: The Broadcasting Rule

Two arrays are compatible for broadcasting if for each trailing dimension (i.e., starting from the end) the axis lengths match or if either of the lengths is 1. Broadcasting is then performed over the missing or length 1 dimensions.


Even as an experienced NumPy user, I often find myself having to pause and draw a diagram as I think about the broadcasting rule. Consider the last example and suppose we wished instead to subtract the mean value from each row. Since arr.mean(0) has length 3, it is compatible for broadcasting across axis 0 because the trailing dimension in arr is 3 and therefore matches. According to the rules, to subtract over axis 1 (i.e., subtract the row mean from each row), the smaller array must have shape (4, 1):

In [87]: arr
Out[87]:
array([[ 0.0009, 1.3438, -0.7135],
[-0.8312, -2.3702, -1.8608],
[-0.8608, 0.5601, -1.2659],
[ 0.1198, -1.0635, 0.3329]])
In [88]: row_means = arr.mean(1)
In [89]: row_means.shape
Out[89]: (4,)
In [90]: row_means.reshape((4, 1))
Out[90]:
array([[ 0.2104],
[-1.6874],
[-0.5222],
[-0.2036]])
In [91]: demeaned = arr - row_means.reshape((4, 1))
In [92]: demeaned.mean(1)
Out[92]: array([ 0., -0., 0., 0.])

See the Figure below for an illustration of this operation.


The next topic is a very interesting and highly imperative. It is related to basic Indexing and Slicing. See ya then !






Share:

0 comments:

Post a Comment