Sunday, March 3, 2019

NumPy library - 7 (Vectorization and broadcasting)

Vectorization is the absence of an explicit loop during the developing of the code. These loops actually cannot be omitted, but are implemented internally and then are replaced by other constructs in the code. The application of vectorization leads to a more concise and readable code.

Due to vectorization many operations take on a more mathematical expression for example we can multiply two arrays arr1 and arr2 as arr1* arr2 just like normal variables. Two matrices A and B can also be multiplied as A*B.

Vectorization, along with the broadcasting, is the basis of the internal implementation of NumPy. Broadcasting allows an operator or a function to act on two or more arrays to operate even if these arrays do not have the same shape. The term broadcasting refers to the ability of NumPy to treat arrays of different shapes during arithmetic operations. Arithmetic operations on arrays are usually done on corresponding elements.



Two arrays can be subjected to broadcasting when all their dimensions are compatible, i.e., the length of each dimension must be equal or one of them must be equal to 1. If neither of these conditions is met, we get an exception that states that the two arrays are not compatible. Thus if the dimensions of two arrays are dissimilar, element-to-element operations are not possible.

The operations on arrays of non-similar shapes is still possible in NumPy and this is where  broadcasting comes in to picture. The smaller array is broadcast to the size of the larger array so that they have compatible shapes.

Broadcasting is possible if the following rules are satisfied:

1. Array with smaller ndim than the other is prepended with '1' in its shape.

2. Size in each dimension of the output shape is maximum of the input sizes in that dimension.

3. An input can be used in calculation, if its size in a particular dimension matches the output size or its value is exactly 1.

4. If an input has a dimension size of 1, the first data entry in that dimension is used for all calculations along that dimension.

Thus a set of arrays is set to be broadcastable if the above rules are satisfied and on of the below mentioned criteria is met:

1. Arrays have exactly the same shape.

2. Arrays have the same number of dimensions and the length of each dimension is either a common length or 1.

3.Array having too few dimensions can have its shape prepended with a dimension of length 1, so that the above stated property is true.

Let's understand this with the help of an example:

import numpy as np

arr1 = np.arange(9).reshape(3, 3)
arr2 = np.arange(3)

print('The original array are \n')
print(arr1)

print('\nThe original array are \n')
print(arr2)
print('\nThe addition of array are \n')
print(arr1+arr2)


In this example one of the two arrays is smaller than the other. As seen in the code arr1 is a 3 x 3 array and arr2 is 3 element array. Then as per rules of broadcasting we must add a 1 to each missing dimension. If the compatibility rules are now satisfied, you can apply broadcasting and move to the second rule. For example: 3 x 3 , 3x 1

The rule of compatibility is met. Now we can move to the second rule of broadcasting. This rule explains how to extend the size of the smallest array so that it’s the size of the biggest array, so that the element-wise function or operator is applicable.

The second rule assumes that the missing elements (size, length 1) are filled with replicas of the values contained in extended sizes. When the two arrays have the same dimensions, the values inside may be added together as we did in our program print(arr1+arr2)

The output of the program is shown below:

The original array are

[[0 1 2]
 [3 4 5]
 [6 7 8]]

The original array are

[0 1 2]

The addition of array are

[[ 0  2  4]
 [ 3  5  7]
 [ 6  8 10]]


------------------
(program exited with code: 0)

Press any key to continue . . .


There may be more complex cases in which the two arrays have different shapes and each is smaller than the other only in certain dimensions. See the program below:

import numpy as np

arr1 = np.arange(6).reshape(3, 1, 2)
arr2 = np.arange(6).reshape(3, 2, 1)

print('The original array arr1\n')
print(arr1)

print('\nThe original array arr2 \n')
print(arr2)
print('\nThe addition of array are \n')
print(arr1+arr2)


The two arrays has the following dimensions-  3 x 1 x 2 and 3 x 2 x 1 thus they are compatible and therefore the rules of broadcasting can be applied. After broadcasting  the arrays becomes-

arr1 = [[[0,1], arr2 = [[[0,0],
             [0,1]],            [1,1]],
             [[2,3],            [[2,2],
             [2,3]],            [3,3]],
             [[4,5],            [[4,4],
             [4,5]]]            [5,5]]]

Over the broadcasted arrays we apply the addition operator. The output of the program is shown below:

The original array arr1

[[[0 1]]

 [[2 3]]

 [[4 5]]]

The original array arr2

[[[0]
  [1]]

 [[2]
  [3]]

 [[4]
  [5]]]

The addition of array are

[[[ 0  1]
  [ 1  2]]

 [[ 4  5]
  [ 5  6]]

 [[ 8  9]
  [ 9 10]]]

------------------
(program exited with code: 0)

Press any key to continue . . .

Here I am ending today's post. In the next post we shall further explore NumPy library and discuss about Structured Arrays. Till we meet next keep practicing and learning Python as Python is easy to learn!

Share:

0 comments:

Post a Comment