Tuesday, April 30, 2019

Pandas - 26 (Permutation and Random Sampling)

Permutation is random reordering of a series or the rows of a dataframe. These operations are easy to do using the numpy.random.permutation() function. Let's see an example of how permutation is performed. See the following program:

import pandas as pd
import numpy as np

mydataframe = pd.DataFrame(np.arange(25).reshape(5,5))

print('\nThe original dataframe\n')
print(mydataframe)

new_order = np.random.permutation(5)
print('\nThe new order in which to set the values of a row of the dataframe.\n')
print(new_order) 

print('\nApply new order on all lines of the dataframe\n')

print(mydataframe.take(new_order)) 

print('\nSubmitting a portion of the entire dataframe to a permutation\n')
new_order = [3,4,2]

print(mydataframe.take(new_order)) 


The output of the program is shown below:

The original dataframe

      0    1    2   3   4
0    0    1    2   3   4
1    5    6    7   8   9
2  10  11  12  13  14
3  15  16  17  18  19
4  20  21  22  23  24

The new order in which to set the values of a row of the dataframe.

[3 4 0 2 1]

Apply new order on all lines of the dataframe

      0     1   2    3   4
3  15  16  17  18  19
4  20  21  22  23  24
0   0   1   2   3   4
2  10  11  12  13  14
1   5   6   7   8   9

Submitting a portion of the entire dataframe to a permutation

       0   1     2   3   4
3  15  16  17  18  19
4  20  21  22  23  24
2  10  11  12  13  14
------------------
(program exited with code: 0)

Press any key to continue . . .


If we have huge dataframe, we might need to sample it randomly, and the quickest way to do this is by using the np.random.randint() function. The following program performs random sampling:

import pandas as pd
import numpy as np

mydataframe = pd.DataFrame(np.arange(25).reshape(5,5))

print('\nThe original dataframe\n')
print(mydataframe)

mysample = np.random.randint(0, len(mydataframe), size=3)

print('\nThe created sample \n')
print(mysample)


print('\nRandom sample\n')
print(mydataframe.take(mysample))


The output of the program is shown below:


The original dataframe

      0    1    2    3   4
0    0    1    2    3   4
1    5    6    7    8   9
2  10  11  12  13  14
3  15  16  17  18  19
4  20  21  22  23  24

The created sample

[2 2 1]

Random sample

      0    1    2    3    4
2  10  11  12  13  14
2  10  11  12  13  14
1   5   6   7   8   9
------------------
(program exited with code: 0)

Press any key to continue . . .


Here I am ending today’s post. Until we meet again keep practicing and learning Python, as Python is easy to learn!
Share:

0 comments:

Post a Comment