Tuesday, March 26, 2019

Pandas -2 (Series I)

The series object of the pandas library is designed to represent one-dimensional data structures, similar to an array but with some additional features. It is composed of two arrays associated with each other. The main array holds the data (data of any NumPy type) to which each element is associated with a label, contained within the other array, called the index.

Creating a series

Let's create a series as shown in the figure below. 



To create a series we call the Series() constructor and pass as an argument an array containing the values to be included in it. The following program creates and prints the series:

import pandas as pd

s = pd.Series([12,-4,7,9])
print(s)


As we can see from the import statement, the pandas module needs to be imported first and then we have use the Series() and passed the array [12,-4,7,9] as an argument. The output of the program is shown as follows:

0    12
1    -4
2     7
3     9
dtype: int64
------------------
(program exited with code: 0)

Press any key to continue . . . 


We can see from the output of the series, on the left there are the values in the index, which is a series of labels, and on the right are the corresponding values.

If we do not specify any index during the definition of the series, by default, pandas will assign numerical values increasing from 0 as labels. In this case, the labels correspond to the indexes (position in the array) of the elements in the series object.

But it is preferable to create a series using meaningful labels in order to distinguish and identify each item regardless of the order in which they were inserted into the series. To do so, it will be necessary, during the constructor call, to include the index option and assign an array of strings containing the labels. Now lets create the series as shown in the following figure:

import pandas as pd

s = pd.Series([5,6,12,-5,6.7], index=['A','B','C','D','E'])
print(s)


The output of the program is shown as follows:

A     5.0
B     6.0
C    12.0
D    -5.0
E     6.7
dtype: float64
------------------
(program exited with code: 0)

Press any key to continue . . . 


It is also possible to individually see the two arrays that make up this data structure. This can be done by calling the two attributes of the series as follows: index and values. See the following program:

import pandas as pd

s = pd.Series([5,6,12,-5,6.7], index=['A','B','C','D','E'])
print(s.values)
print('\n')
print(s.index)


The output of the program is shown as follows:

[ 5.   6.  12.  -5.   6.7]


Index(['A', 'B', 'C', 'D', 'E'], dtype='object')


------------------
(program exited with code: 0)

Press any key to continue . . . 


Selecting the Internal Elements of series

It is possible to select individual elements of series as ordinary numpy arrays by specifying the key. We can also specify the label corresponding to the position of the index. Similarly we can select multiple items in a numpy array, specify the list of labels in an array to get their values. The following program shows how to implement these:

import pandas as pd

s = pd.Series([5,6,12,-5,6.7], index=['A','B','C','D','E'])
print('Accessing individual elements\n')
print(s[2])
print('\nAccessing individual elements using label\n')
print(s['B'])
print('\nSelect multiple items\n')
print(s[0:2])
print('\nSpecify the list of labels\n')
print(s[['B','C']])


The output of the program is shown as follows:

Accessing individual elements

12.0

Accessing individual elements using label

6.0

Select multiple items

A    5.0
B    6.0
dtype: float64

Specify the list of labels

B     6.0
C    12.0
dtype: float64
------------------
(program exited with code: 0)

Press any key to continue . . . 


Assigning Values to the Elements of series

It is possible to assign new values to elements of series by selecting the value by index or by label. See the following program:

import pandas as pd

s = pd.Series([5,6,12,-5,6.7], index=['A','B','C','D','E'])
s[1]=0
print('Assigning value to element by selecting value\n')
print(s)
s['B']=1
print('\nAssigning value to element by selecting label\n')
print(s)


The output of the program is shown as follows:

Assigning value to element by selecting value

A     5.0
B     0.0
C    12.0
D    -5.0
E     6.7
dtype: float64

Assigning value to element by selecting label

A     5.0
B     1.0
C    12.0
D    -5.0
E     6.7
dtype: float64
------------------
(program exited with code: 0)

Press any key to continue . . . 


Defining a Series

We can define a new series starting with NumPy arrays or with an existing series. The following program defines a new series using both the approaches:

import pandas as pd
import numpy as np

arr = np.array([1,2,3,4])

s = pd.Series(arr)
print('define a new series starting with NumPy arrays\n')
print(s)

s1 = pd.Series([5,6,12,-5,6.7], index=['A','B','C','D','E'])

s2 = pd.Series(s1)
print('\ndefine a new series with an existing series\n')
print(s2)


The output of the program is shown as follows:

define a new series starting with NumPy arrays

0    1
1    2
2    3
3    4
dtype: int32

define a new series with an existing series

A     5.0
B     6.0
C    12.0
D    -5.0
E     6.7
dtype: float64
------------------
(program exited with code: 0)

Press any key to continue . . . 


The values contained in the NumPy array or in the original series are not copied, but are passed by reference. That is, the object is inserted dynamically within the new series object. If it changes, for example its internal element varies in value, then those changes will also be present in the new series object. The following program proves this:

import pandas as pd
import numpy as np

arr = np.array([1,2,3,4])

s = pd.Series(arr)
print('define a new series starting with NumPy arrays\n')
print(s)

arr[2] = -2

print('\nThe new series after changing the element\n')
print(s) 


The output of the program is shown as follows:

define a new series starting with NumPy arrays

0    1
1    2
2    3
3    4
dtype: int32

The new series after changing the element

0    1
1    2
2   -2
3    4
dtype: int32
------------------
(program exited with code: 0)

Press any key to continue . . . 


As we can see, by changing the third element of the arr array, we also modified the corresponding element in the s series.

Since NumPy library is the base of the pandas library, as a result, for its data structures, many operations that are applicable to NumPy arrays are extended to the series. One of these is filtering values contained in the data structure through conditions. For example, if we need to know which elements in the series are greater than 8, we can find in the following way:

import pandas as pd
import numpy as np

arr = np.array([1,12,3,34,2,16,7])

s = pd.Series(arr)
print('define a new series starting with NumPy arrays\n')
print(s)

print('\nThe filtered series\n')
print(s[s>8])

The output of the program is shown as follows:

define a new series starting with NumPy arrays

0     1
1    12
2     3
3    34
4     2
5    16
6     7
dtype: int32

The filtered series

1    12
3    34
5    16
dtype: int32
------------------
(program exited with code: 0)

Press any key to continue . . .

Here I am ending today's post. In the next post we'll continue to explore the series data structure. Until we meet again keep practicing and learning Python, as Python is easy to learn!





























































Share:

0 comments:

Post a Comment