Wednesday, August 19, 2020

Vector Operations

Vectorization and parallelization in Python with numpy and pandas ...

Vectors of the Same Length

If you perform an operation between two vectors of the same length, the resulting vector will be an element-by element calculation of the vectors.

print(ages + ages)

 

print(ages * ages)

 

Vectors With Integers (Scalars)

When you perform an operation on a vector using a scalar, the scalar will be recycled across all the elements in the vector.

print(ages + 100) 

 

print(ages * 2)

 

Vectors With Different Lengths 

When you are working with vectors of different lengths, the behavior will depend on the type of the vectors. With a Series, the vectors will perform an operation matched by the index. The rest of the resulting vector will be filled with a “missing” value, denoted with NaN, signifying “not a number.”
This type of behavior, which is called broadcasting, differs between languages. Broadcasting in Pandas refers to how operations are calculated between arrays with different shapes. 

print(ages + pd.Series([1, 100]))

 

With other types, the shapes must match.

import numpy as np
# this will cause an error
print(ages + np.array([1, 100]))

 

Vectors With Common Index Labels (Automatic Alignment) 

What’s cool about Pandas is how data alignment is almost always automatic. If possible, things will always align themselves with the index label when actions are performed. 

# ages as they appear in the data
print(ages)

 

rev_ages = ages.sort_index(ascending=False)
print(rev_ages)

 

If we perform an operation using ages and rev_ages, it will still be conducted on an element-by-element
basis, but the vectors will be aligned first before the operation is carried out.

# reference output to show index label alignment
print(ages * 2)

 

# note how we get the same values
# even though the vector is reversed
print(ages + rev_ages)


Share:

0 comments:

Post a Comment