Thursday, February 21, 2019

NumPy library -1 (Introduction)

NumPy is the foundation library for scientific computing in Python since it provides data structures and high-performing functions that the basic package of the Python cannot provide. NumPy defines a specific data structure that is an N-dimensional array defined as ndarray. Knowledge of NumPy  is essential in terms of numerical calculations since its correct use can greatly influence the performance of your computations.

This package provides some features that will be added to the standard Python:

• Ndarray: A multidimensional array much faster and more efficient than those provided by the basic package of Python.

• Element-wise computation: A set of functions for performing this type of calculation with arrays and mathematical operations between arrays.

• Reading-writing datasets: A set of tools for reading and writing data stored in the hard disk.

• Integration with other languages such as C, C++, and FORTRAN: A set of tools to integrate code developed with these programming languages.

The installation of NumPy is easy using the PIP-

pip install NumPy

On Windows with Anaconda, we can use:

conda install numpy

NumPy is important for numerical computations in Python because it is designed for efficiency on large arrays of data. There are a number of reasons for this:

1. NumPy internally stores data in a contiguous block of memory, independent of other built-in Python objects. NumPy’s library of algorithms written in the C language can operate on this memory without any type checking or other overhead. NumPy arrays also use much less memory than built-in Python sequences.

2. NumPy operations perform complex computations on entire arrays without the need for Python for loops.

3. NumPy-based algorithms are generally 10 to 100 times faster (or more) than their pure Python counterparts and use significantly less memory.



Ndarray

The main object of NumPy library is ndarray (which stands for N-dimensional array) is a multidimensional homogeneous array with a predetermined number of items: homogeneous because virtually all the items in it are of the same type and the same size. In fact, the data type is specified by another NumPy object called dtype (data-type); each ndarray is associated with only one type of dtype.

The number of the dimensions and items in an array is defined by its shape, a tuple of N-positive integers that specifies the size for each dimension. The dimensions are defined as axes and the number of axes as rank. The size of NumPy arrays is fixed, that is, once we define their size at the time of creation, it remains unchanged. This behavior is different from Python lists, which can grow or shrink in size.

We can define a ndarray using the array() function which takes Python list containing the elements to be included in it as an argument.

Our first program aims to prove that NumPy-based algorithms are generally 10 to 100 times faster (or more) than their pure Python counterparts. I'll be using IPython often in the future so its better to install iPython as shown below:

Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Users\Python>pip install ipython
Collecting ipython
  Downloading https://files.pythonhosted.org/packages/f0/b4/a9ea018c73a84ee6280b
2e94a1a6af8d63e45903eac2da0640fa63bca4db/ipython-7.2.0-py3-none-any.whl (765kB)
    100% |████████████████████████████████| 768kB 588kB/s
Collecting colorama; sys_platform == "win32" (from ipython)
  Downloading https://files.pythonhosted.org/packages/4f/a6/728666f39bfff1719fc9
4c481890b2106837da9318031f71a8424b662e12/colorama-0.4.1-py2.py3-none-any.whl
Collecting prompt-toolkit<2.1.0,>=2.0.0 (from ipython)
  Downloading https://files.pythonhosted.org/packages/65/c2/e676da701cda11b32ff4
2eceb44aa7d8934b597d604bb5e94c0283def064/prompt_toolkit-2.0.8-py3-none-any.whl (
342kB)
    100% |████████████████████████████████| 348kB 320kB/s
Collecting backcall (from ipython)
  Downloading https://files.pythonhosted.org/packages/84/71/c8ca4f5bb1e08401b916
c68003acf0a0655df935d74d93bf3f3364b310e0/backcall-0.1.0.tar.gz
Collecting pickleshare (from ipython)
  Downloading https://files.pythonhosted.org/packages/9a/41/220f49aaea88bc6fa6cb
a8d05ecf24676326156c23b991e80b3f2fc24c77/pickleshare-0.7.5-py2.py3-none-any.whl
Collecting pygments (from ipython)
  Downloading https://files.pythonhosted.org/packages/13/e5/6d710c9cf96c31ac8265
7bcfb441df328b22df8564d58d0c4cd62612674c/Pygments-2.3.1-py2.py3-none-any.whl (84
9kB)
    100% |████████████████████████████████| 849kB 522kB/s
Collecting jedi>=0.10 (from ipython)
  Downloading https://files.pythonhosted.org/packages/c2/bc/54d53f5bc4658380d0ec
a9055d72be4df45e5bfd91a4bac97da224a92553/jedi-0.13.2-py2.py3-none-any.whl (177kB
)
    100% |████████████████████████████████| 184kB 386kB/s
Requirement already satisfied: setuptools>=18.5 in c:\users\python\appdata\local
\programs\python\python36\lib\site-packages (from ipython) (28.8.0)
Collecting traitlets>=4.2 (from ipython)
  Downloading https://files.pythonhosted.org/packages/93/d6/abcb22de61d78e2fc395
9c964628a5771e47e7cc60d53e9342e21ed6cc9a/traitlets-4.3.2-py2.py3-none-any.whl (7
4kB)
    100% |████████████████████████████████| 81kB 239kB/s
Collecting decorator (from ipython)
  Downloading https://files.pythonhosted.org/packages/f1/cd/7c8240007e9716b14679
bc217a1baefa4432aa30394f7e2ec40a52b1a708/decorator-4.3.2-py2.py3-none-any.whl
Requirement already satisfied: six>=1.9.0 in c:\users\python\appdata\roaming\pyt
hon\python36\site-packages (from prompt-toolkit<2.1.0,>=2.0.0->ipython) (1.11.0)

Collecting wcwidth (from prompt-toolkit<2.1.0,>=2.0.0->ipython)
  Downloading https://files.pythonhosted.org/packages/7e/9f/526a6947247599b084ee
5232e4f9190a38f398d7300d866af3ab571a5bfe/wcwidth-0.1.7-py2.py3-none-any.whl
Collecting parso>=0.3.0 (from jedi>=0.10->ipython)
  Downloading https://files.pythonhosted.org/packages/19/b1/522b2671cc6d134c9d3f
5dfc0d02fee07cab848e908d03d2bffea78cca8f/parso-0.3.4-py2.py3-none-any.whl (93kB)

    100% |████████████████████████████████| 102kB 652kB/s
Collecting ipython-genutils (from traitlets>=4.2->ipython)
  Downloading https://files.pythonhosted.org/packages/fa/bc/9bd3b5c2b4774d5f33b2
d544f1460be9df7df2fe42f352135381c347c69a/ipython_genutils-0.2.0-py2.py3-none-any
.whl
Building wheels for collected packages: backcall
  Running setup.py bdist_wheel for backcall ... done
  Stored in directory: C:\Users\Python\AppData\Local\pip\Cache\wheels\98\b0\dd\2
9e28ff615af3dda4c67cab719dd51357597eabff926976b45
Successfully built backcall
Installing collected packages: colorama, wcwidth, prompt-toolkit, backcall, pick
leshare, pygments, parso, jedi, ipython-genutils, decorator, traitlets, ipython
Successfully installed backcall-0.1.0 colorama-0.4.1 decorator-4.3.2 ipython-7.2
.0 ipython-genutils-0.2.0 jedi-0.13.2 parso-0.3.4 pickleshare-0.7.5 prompt-toolk
it-2.0.8 pygments-2.3.1 traitlets-4.3.2 wcwidth-0.1.7

Now from command window ipython can be started as shown below:



C:\Users\Python>ipython
Python 3.6.5rc1 (v3.6.5rc1:f03c5148cf, Mar 14 2018, 03:12:11) [MSC v.1913 64 bit
 (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.2.0 -- An enhanced Interactive Python. Type '?' for help.





In [1]: import numpy as np

In [2]: a = np.arange(1000000)

In [3]: l = list(range(1000000))

In [4]: %time for _ in range(10): a2 = a * 2
Wall time: 20 ms

In [5]: %time for _ in range(10): l2 = [x * 2 for x in l]
Wall time: 1.09 s

The above program creates an array of one million integers, and the equivalent Python list 'l'. When we notice the wall time we can see that NumPy-based algorithms are generally 10 to 100 times faster (or more) than their pure Python counterparts (lists).

Another program which proves the above statement is shown below:

In [6]: arr = np.arange(1e7)

In [7]: larr = arr.tolist()

In [8]: def list_times(alist, scalar):
   ...:     for i, val in enumerate(alist):
   ...:         alist[i] = val * scalar
   ...:     return alist
   ...:

In [9]: timeit arr * 1.1
36.2 ms ± 173 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [10]: timeit list_times(larr, 1.1)
1.08 s ± 20.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

As you can see the ndarray operation is ∼ 25 faster than the Python loop in this example.


Now let's focus on ndarrays and see array creation and data typing fundamentals. There are many ways to create an array in NumPy, and here we will discuss the ones that are most useful.

1. Creating array from lists

First we create a list and then wrap it with the np.array() function as shown in the code below:

alist = [1, 2, 3]
arr = np.array(alist)

We can also pass the list as an argument to the np.array() as shown below:

a = np.array([1, 2, 3])

The following program creates an array and print it:

import numpy as np

alist = [1, 2, 3]
arr = np.array(alist)
a = np.array([1, 2, 3])
print(arr)
print(a) 


The output of the program is shown below:

[1 2 3]
[1 2 3]
------------------
(program exited with code: 0)

Press any key to continue . . .


As we can see the both the arrays are same which contains the list elements. We can check that a newly created object is an ndarray by passing the new variable to the type() function:

print(type(arr))
print(type(a)) 


The output of the above program using type() is shown below:

[1 2 3]
[1 2 3]
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
------------------
(program exited with code: 0)

Press any key to continue . . . 


An ndarray is a generic multidimensional container for homogeneous data; that is, all of the elements must be the same type. Every array has a shape, a tuple indicating the size of each dimension, and a dtype, an object describing the data type of the array. In order to know the associated dtype to the newly created ndarray, we use the dtype attribute as shown below:

print(arr.dtype)
print(a.dtype)


The output of the above program using dtype attribute is shown below:

[1 2 3]
[1 2 3]
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
int32
int32
------------------
(program exited with code: 0)

Press any key to continue . . .

In order to get the shape of the newly created ndarray, we use the shape attribute, use the ndim
attribute for getting the axes, the size attribute to know the array length as shown below:

import numpy as np

a = np.array([1, 2, 3])
print(a.shape)
print(a.ndim)
print(a.size)

The output of the above program using the attributes is shown below:

(3,)
1
3
------------------
(program exited with code: 0)

Press any key to continue . . .

In order to create a two dimensional array we can use the np.array() function as shown below:

arr = np.array([[1.5, 2.8],[0.5, 4.5]])

See the following program which creates and print the array and its attributes:

import numpy as np

arr = np.array([[1.5, 2.8],[0.5, 4.5]])
print(arr)
print(arr.dtype)
print(arr.shape)
print(arr.ndim)
print(arr.size)


The output of the above program using the attributes is shown below:

[[1.5 2.8]
 [0.5 4.5]]
float64
(2, 2)
2
4
------------------
(program exited with code: 0)

Press any key to continue . . .

The arr array has rank 2 as shown from the output arr.ndim, it has two axes each of length 2. To determine the size of each item in the array, the itemsize attribute is used which defines the size in bytes of each array item. See the following program:

import numpy as np

arr = np.array([[1.5, 2.8],[0.5, 4.5]])

print(arr.itemsize)

The output of the above program is shown below:

8
------------------
(program exited with code: 0)

Press any key to continue . . . 


The array() function, in addition to lists, can accept tuples, sequences of tuples and interconnected lists. See the program below:

import numpy as np

arr = np.array(((1, 2, 3),(4, 5, 6)))

print(arr)
print('\n')

arr = np.array([(1, 2, 3), [4, 5, 6], (7, 8, 9)])

print(arr)



The output of the above program is shown below:

[[1 2 3]
 [4 5 6]]


[[1 2 3]
 [4 5 6]
 [7 8 9]]


------------------
(program exited with code: 0)

Press any key to continue . . .


In the following program we can see some more ways to create an array in NumPy:

import numpy as np

# Creating an array of zeros with five elements
arr = np.zeros(5)
print("Creating an array of zeros with five elements\n")
print(arr)
print('\n')

#create an array going from 0 to 100
arr = np.arange(100)
print("create an array going from 0 to 100\n")
print(arr)
print('\n')

# Or 10 to 100?
arr = np.arange(10,100)
print("Or 10 to 100\n")
print(arr)
print('\n')

# If you want 100 steps from 0 to 1...
arr = np.linspace(0, 1, 100)
print("100 steps from 0 to 1\n")
print(arr)
print('\n')

# Or if you want to generate an array from 1 to 10
# in log10 space in 100 steps...
arr = np.logspace(0, 1, 100, base=10.0)
print("generate an array from 1 to 10 in log10 space in 100 steps\n")
print(arr)
print('\n')

# Creating a 5x5 array of zeros (an image)
image = np.zeros((5,5))
print("Creating a 5x5 array of zeros (an image)\n")
print(image)
print('\n')

# Creating a 5x5x5 cube of 1's
# The astype() method sets the array with integer elements.
cube = np.zeros((5,5,5)).astype(int) + 1
print("Creating a 5x5x5 cube of 1's\n")
print(cube)
print('\n')

# Or even simpler with 16-bit floating-point precision...
cube = np.ones((5, 5, 5)).astype(np.float16)
print("with 16-bit floating-point precision\n")
print(cube)
print('\n')


The output of the above program is shown below:

Creating an array of zeros with five elements

[0. 0. 0. 0. 0.]


create an array going from 0 to 100

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
 96 97 98 99]


Or 10 to 100

[10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57
 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81
 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99]


100 steps from 0 to 1

[0.         0.01010101 0.02020202 0.03030303 0.04040404 0.05050505
 0.06060606 0.07070707 0.08080808 0.09090909 0.1010101  0.11111111
 0.12121212 0.13131313 0.14141414 0.15151515 0.16161616 0.17171717
 0.18181818 0.19191919 0.2020202  0.21212121 0.22222222 0.23232323
 0.24242424 0.25252525 0.26262626 0.27272727 0.28282828 0.29292929
 0.3030303  0.31313131 0.32323232 0.33333333 0.34343434 0.35353535
 0.36363636 0.37373737 0.38383838 0.39393939 0.4040404  0.41414141
 0.42424242 0.43434343 0.44444444 0.45454545 0.46464646 0.47474747
 0.48484848 0.49494949 0.50505051 0.51515152 0.52525253 0.53535354
 0.54545455 0.55555556 0.56565657 0.57575758 0.58585859 0.5959596
 0.60606061 0.61616162 0.62626263 0.63636364 0.64646465 0.65656566
 0.66666667 0.67676768 0.68686869 0.6969697  0.70707071 0.71717172
 0.72727273 0.73737374 0.74747475 0.75757576 0.76767677 0.77777778
 0.78787879 0.7979798  0.80808081 0.81818182 0.82828283 0.83838384
 0.84848485 0.85858586 0.86868687 0.87878788 0.88888889 0.8989899
 0.90909091 0.91919192 0.92929293 0.93939394 0.94949495 0.95959596
 0.96969697 0.97979798 0.98989899 1.        ]


generate an array from 1 to 10 in log10 space in 100 steps

[ 1.          1.02353102  1.04761575  1.07226722  1.09749877  1.12332403
  1.149757    1.17681195  1.20450354  1.23284674  1.26185688  1.29154967
  1.32194115  1.35304777  1.38488637  1.41747416  1.45082878  1.48496826
  1.51991108  1.55567614  1.59228279  1.62975083  1.66810054  1.70735265
  1.7475284   1.78864953  1.83073828  1.87381742  1.91791026  1.96304065
  2.009233    2.05651231  2.10490414  2.15443469  2.20513074  2.25701972
  2.3101297   2.36448941  2.42012826  2.47707636  2.53536449  2.59502421
  2.65608778  2.71858824  2.7825594   2.84803587  2.91505306  2.98364724
  3.05385551  3.12571585  3.19926714  3.27454916  3.35160265  3.43046929
  3.51119173  3.59381366  3.67837977  3.76493581  3.85352859  3.94420606
  4.03701726  4.1320124   4.22924287  4.32876128  4.43062146  4.53487851
  4.64158883  4.75081016  4.86260158  4.97702356  5.09413801  5.21400829
  5.33669923  5.46227722  5.59081018  5.72236766  5.85702082  5.9948425
  6.13590727  6.28029144  6.42807312  6.57933225  6.73415066  6.8926121
  7.05480231  7.22080902  7.39072203  7.56463328  7.74263683  7.92482898
  8.11130831  8.30217568  8.49753436  8.69749003  8.90215085  9.11162756
  9.32603347  9.54548457  9.77009957 10.        ]


Creating a 5x5 array of zeros (an image)

[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]


Creating a 5x5x5 cube of 1's

[[[1 1 1 1 1]
  [1 1 1 1 1]
  [1 1 1 1 1]
  [1 1 1 1 1]
  [1 1 1 1 1]]

 [[1 1 1 1 1]
  [1 1 1 1 1]
  [1 1 1 1 1]
  [1 1 1 1 1]
  [1 1 1 1 1]]

 [[1 1 1 1 1]
  [1 1 1 1 1]
  [1 1 1 1 1]
  [1 1 1 1 1]
  [1 1 1 1 1]]

 [[1 1 1 1 1]
  [1 1 1 1 1]
  [1 1 1 1 1]
  [1 1 1 1 1]
  [1 1 1 1 1]]

 [[1 1 1 1 1]
  [1 1 1 1 1]
  [1 1 1 1 1]
  [1 1 1 1 1]
  [1 1 1 1 1]]]


with 16-bit floating-point precision

[[[1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]]

 [[1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]]

 [[1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]]

 [[1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]]

 [[1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]]]


------------------
(program exited with code: 0)

Press any key to continue . . .



NumPy arrays are designed to contain a wide variety of data types. Let us create an array of string type:

import numpy as np

arr = np.array([['a', 'b'],['c', 'd']])
print(arr)

print(arr.dtype)
print(arr.dtype.name)





The above program creates an array of string type and print it. Next we verify the type of array using the dtype method and print the name of stringtype using dtype.name attribute. The output of the above program is shown below:

[['a' 'b']
 ['c' 'd']]
<U1
str32
------------------
(program exited with code: 0)

Press any key to continue . . .

The data types supported by NumPy are:

bool_               Boolean (true or false) stored as a byte
int_                 Default integer type (same as C long; normally either int64 or int32)
intc                 Identical to C int (normally int32 or int64)
intp                Integer used for indexing (same as C size_t; normally either int32 or int64)
int8                Byte (–128 to 127)
int16              Integer (–32768 to 32767)
int32              Integer (–2147483648 to 2147483647)
int64              Integer (–9223372036854775808 to 9223372036854775807)
uint8              Unsigned integer (0 to 255)
uint16            Unsigned integer (0 to 65535)
uint32            Unsigned integer (0 to 4294967295)
uint64            Unsigned integer (0 to 18446744073709551615)
float_             Shorthand for float64
float16           Half precision float: sign bit, 5-bit exponent, 10-bit mantissa
float32           Single precision float: sign bit, 8-bit exponent, 23-bit mantissa
float64          Double precision float: sign bit, 11-bit exponent, 52-bit mantissa
complex_      Shorthand for complex128
complex64    Complex number, represented by two 32-bit floats (real and imaginary components)
complex128  Complex number, represented by two 64-bit floats (real and imaginary components)

The array() function can associate the most suitable type according to the values contained in the sequence of lists or tuples but it is also possible to define the dtype using the dtype option as argument of the function. In the following program we first define and array of integers and later convert the same array with complex values using dtype=complex:

import numpy as np

arr = np.array([[1, 2, 3],[4, 5, 6]])
print(arr)

print("\nArray with complex values\n")

arr = np.array([[1, 2, 3],[4, 5, 6]], dtype=complex)
print(arr)
print('\n')
print(arr.dtype)

The output of the above program is shown below:

[[1 2 3]
 [4 5 6]]

Array with complex values

[[1.+0.j 2.+0.j 3.+0.j]
 [4.+0.j 5.+0.j 6.+0.j]]


complex128
------------------
(program exited with code: 0)

Press any key to continue . . .

Here I am ending today's post. In the next post we shall further explore NumPy library. Till we meet next keep practicing and learning Python as Python is easy to learn!









Share:

0 comments:

Post a Comment