Concatenation is another type of data combination and NumPy provides a concatenate() function to do this kind of operation with arrays. See
the following program :
import pandas as pd
import numpy as np
array1 = np.arange(9).reshape((3,3))
print('Array 1\n')
print(array1)
array2 = np.arange(9).reshape((3,3))+6
print('\nArray 2\n')
print(array2)
print('\nConcatenated array axis=1\n')
print(np.concatenate([array1,array2],axis=1))
print('\nConcatenated array axis=0\n')
print(np.concatenate([array1,array2],axis=0))
The output of the program is shown below:
Array 1
[[0 1 2]
[3 4 5]
[6 7 8]]
Array 2
[[ 6 7 8]
[ 9 10 11]
[12 13 14]]
Concatenated array axis=1
[[ 0 1 2 6 7 8]
[ 3 4 5 9 10 11]
[ 6 7 8 12 13 14]]
Concatenated array axis=0
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 6 7 8]
[ 9 10 11]
[12 13 14]]
------------------
(program exited with code: 0)
Press any key to continue . . .
The pandas library and its data structures like series and dataframe, having labeled axes allows you to further generalize the concatenation of arrays. The concat() function is provided by pandas for this kind of operation. See the following program :
import pandas as pd
import numpy as np
ser1 = pd.Series(np.random.rand(4), index=[1,2,3,4])
print('Series 1\n')
print(ser1)
ser2 = pd.Series(np.random.rand(4), index=[5,6,7,8])
print('\nSeries 2\n')
print(ser2)
print('\nConcatenated series axis=1\n')
print(pd.concat([ser1,ser2], axis=1))
print('\nConcatenated series axis=0\n')
print(pd.concat([ser1,ser2]))
By default, the concat() function works on axis = 0, having as a returned object a series. If you set the axis = 1, then the result will be a dataframe. The output of the program is shown below:
Series 1
1 0.936029
2 0.194529
3 0.448288
4 0.952875
dtype: float64
Series 2
5 0.392544
6 0.978594
7 0.453258
8 0.661619
dtype: float64
Concatenated series axis=1
0 1
1 0.936029 NaN
2 0.194529 NaN
3 0.448288 NaN
4 0.952875 NaN
5 NaN 0.392544
6 NaN 0.978594
7 NaN 0.453258
8 NaN 0.661619
Concatenated series axis=0
1 0.936029
2 0.194529
3 0.448288
4 0.952875
5 0.392544
6 0.978594
7 0.453258
8 0.661619
dtype: float64
------------------
(program exited with code: 0)
Press any key to continue . . .
When we concatenate the series with axis=1, in the output the concatenated parts are not identifiable in the result. Let's say we want to create a hierarchical index on the axis of concatenation. To do this, we have to use the keys option as shown in the following program :
import pandas as pd
import numpy as np
ser1 = pd.Series(np.random.rand(4), index=[1,2,3,4])
print('Series 1\n')
print(ser1)
ser2 = pd.Series(np.random.rand(4), index=[5,6,7,8])
print('\nSeries 2\n')
print(ser2)
print('\nConcatenated series using the keys option\n')
print(pd.concat([ser1,ser2], keys=[1,2]))
print('\nConcatenated series using the keys option along axis=1\n')
print(pd.concat([ser1,ser2], axis=1, keys=[1,2]))
The output of the program is shown below:
Series 1
1 0.034474
2 0.984395
3 0.912107
4 0.543064
dtype: float64
Series 2
5 0.864616
6 0.231658
7 0.875177
8 0.400951
dtype: float64
Concatenated series using the keys option
1 1 0.034474
2 0.984395
3 0.912107
4 0.543064
2 5 0.864616
6 0.231658
7 0.875177
8 0.400951
dtype: float64
Concatenated series using the keys option along axis=1
1 2
1 0.034474 NaN
2 0.984395 NaN
3 0.912107 NaN
4 0.543064 NaN
5 NaN 0.864616
6 NaN 0.231658
7 NaN 0.875177
8 NaN 0.400951
------------------
(program exited with code: 0)
Press any key to continue . . .
As you may have noticed in the case of combinations between series along the axis = 1 the keys become the column headers of the dataframe.
Just like series, the concatenation applied to the dataframe. The following program shows the concatenation applied to the dataframe:
import pandas as pd
import numpy as np
frame1 = pd.DataFrame(np.random.rand(9).reshape(3,3), index=[1,2,3], columns=['A','B','C'])
print('Frame 1\n')
print(frame1)
frame2 = pd.DataFrame(np.random.rand(9).reshape(3,3), index=[4,5,6], columns=['A','B','C'])
print('\nFrame 2\n')
print(frame2)
print('\nConcatenated frames\n')
print(pd.concat([frame1, frame2]))
print('\nConcatenated frames along axis=1\n')
print(pd.concat([frame1, frame2], axis=1))
The output of the program is shown below:
Frame 1
A B C
1 0.216094 0.206833 0.565031
2 0.278919 0.311937 0.410026
3 0.262882 0.487224 0.489479
Frame 2
A B C
4 0.660482 0.491644 0.411970
5 0.511529 0.394583 0.475184
6 0.638702 0.849363 0.190679
Concatenated frames
A B C
1 0.216094 0.206833 0.565031
2 0.278919 0.311937 0.410026
3 0.262882 0.487224 0.489479
4 0.660482 0.491644 0.411970
5 0.511529 0.394583 0.475184
6 0.638702 0.849363 0.190679
Concatenated frames along axis=1
A B C A B C
1 0.216094 0.206833 0.565031 NaN NaN NaN
2 0.278919 0.311937 0.410026 NaN NaN NaN
3 0.262882 0.487224 0.489479 NaN NaN NaN
4 NaN NaN NaN 0.660482 0.491644 0.411970
5 NaN NaN NaN 0.511529 0.394583 0.475184
6 NaN NaN NaN 0.638702 0.849363 0.190679
------------------
(program exited with code: 0)
Press any key to continue . . .
Let's consider a scenario in which we want the two datasets to have indexes that overlap in their entirety or at least partially. This combination of data cannot be obtained either with merging or with concatenation. One applicable function to series is combine_first(), which performs this kind of
operation along with data alignment. See the following program :
import pandas as pd
import numpy as np
ser1 = pd.Series(np.random.rand(5),index=[1,2,3,4,5])
print('Series 1\n')
print(ser1)
ser2 = pd.Series(np.random.rand(4),index=[2,4,5,6])
print('\nSeries 2\n')
print(ser2)
print('\nCombined series with ser2 as an arument\n')
print(ser1.combine_first(ser2))
print('\nCombined series with ser1 as an arument\n')
print(ser2.combine_first(ser1))
print('\nCombined series with partial overlap\n')
print(ser1[:3].combine_first(ser2[:3]))
The output of the program is shown below:
Series 1
1 0.546086
2 0.855131
3 0.975251
4 0.159282
5 0.778717
dtype: float64
Series 2
2 0.420990
4 0.883285
5 0.483201
6 0.848290
dtype: float64
Combined series with ser2 as an arument
1 0.546086
2 0.855131
3 0.975251
4 0.159282
5 0.778717
6 0.848290
dtype: float64
Combined series with ser1 as an arument
1 0.546086
2 0.420990
3 0.975251
4 0.883285
5 0.483201
6 0.848290
dtype: float64
Combined series with partial overlap
1 0.546086
2 0.855131
3 0.975251
4 0.883285
5 0.483201
dtype: float64
------------------
(program exited with code: 0)
Press any key to continue . . .
Here I am ending today’s post. Until we meet again keep practicing and learning Python, as Python is easy to learn!
import pandas as pd
import numpy as np
array1 = np.arange(9).reshape((3,3))
print('Array 1\n')
print(array1)
array2 = np.arange(9).reshape((3,3))+6
print('\nArray 2\n')
print(array2)
print('\nConcatenated array axis=1\n')
print(np.concatenate([array1,array2],axis=1))
print('\nConcatenated array axis=0\n')
print(np.concatenate([array1,array2],axis=0))
The output of the program is shown below:
Array 1
[[0 1 2]
[3 4 5]
[6 7 8]]
Array 2
[[ 6 7 8]
[ 9 10 11]
[12 13 14]]
Concatenated array axis=1
[[ 0 1 2 6 7 8]
[ 3 4 5 9 10 11]
[ 6 7 8 12 13 14]]
Concatenated array axis=0
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 6 7 8]
[ 9 10 11]
[12 13 14]]
------------------
(program exited with code: 0)
Press any key to continue . . .
The pandas library and its data structures like series and dataframe, having labeled axes allows you to further generalize the concatenation of arrays. The concat() function is provided by pandas for this kind of operation. See the following program :
import pandas as pd
import numpy as np
ser1 = pd.Series(np.random.rand(4), index=[1,2,3,4])
print('Series 1\n')
print(ser1)
ser2 = pd.Series(np.random.rand(4), index=[5,6,7,8])
print('\nSeries 2\n')
print(ser2)
print('\nConcatenated series axis=1\n')
print(pd.concat([ser1,ser2], axis=1))
print('\nConcatenated series axis=0\n')
print(pd.concat([ser1,ser2]))
By default, the concat() function works on axis = 0, having as a returned object a series. If you set the axis = 1, then the result will be a dataframe. The output of the program is shown below:
Series 1
1 0.936029
2 0.194529
3 0.448288
4 0.952875
dtype: float64
Series 2
5 0.392544
6 0.978594
7 0.453258
8 0.661619
dtype: float64
Concatenated series axis=1
0 1
1 0.936029 NaN
2 0.194529 NaN
3 0.448288 NaN
4 0.952875 NaN
5 NaN 0.392544
6 NaN 0.978594
7 NaN 0.453258
8 NaN 0.661619
Concatenated series axis=0
1 0.936029
2 0.194529
3 0.448288
4 0.952875
5 0.392544
6 0.978594
7 0.453258
8 0.661619
dtype: float64
------------------
(program exited with code: 0)
Press any key to continue . . .
When we concatenate the series with axis=1, in the output the concatenated parts are not identifiable in the result. Let's say we want to create a hierarchical index on the axis of concatenation. To do this, we have to use the keys option as shown in the following program :
import pandas as pd
import numpy as np
ser1 = pd.Series(np.random.rand(4), index=[1,2,3,4])
print('Series 1\n')
print(ser1)
ser2 = pd.Series(np.random.rand(4), index=[5,6,7,8])
print('\nSeries 2\n')
print(ser2)
print('\nConcatenated series using the keys option\n')
print(pd.concat([ser1,ser2], keys=[1,2]))
print('\nConcatenated series using the keys option along axis=1\n')
print(pd.concat([ser1,ser2], axis=1, keys=[1,2]))
The output of the program is shown below:
Series 1
1 0.034474
2 0.984395
3 0.912107
4 0.543064
dtype: float64
Series 2
5 0.864616
6 0.231658
7 0.875177
8 0.400951
dtype: float64
Concatenated series using the keys option
1 1 0.034474
2 0.984395
3 0.912107
4 0.543064
2 5 0.864616
6 0.231658
7 0.875177
8 0.400951
dtype: float64
Concatenated series using the keys option along axis=1
1 2
1 0.034474 NaN
2 0.984395 NaN
3 0.912107 NaN
4 0.543064 NaN
5 NaN 0.864616
6 NaN 0.231658
7 NaN 0.875177
8 NaN 0.400951
------------------
(program exited with code: 0)
Press any key to continue . . .
As you may have noticed in the case of combinations between series along the axis = 1 the keys become the column headers of the dataframe.
Just like series, the concatenation applied to the dataframe. The following program shows the concatenation applied to the dataframe:
import pandas as pd
import numpy as np
frame1 = pd.DataFrame(np.random.rand(9).reshape(3,3), index=[1,2,3], columns=['A','B','C'])
print('Frame 1\n')
print(frame1)
frame2 = pd.DataFrame(np.random.rand(9).reshape(3,3), index=[4,5,6], columns=['A','B','C'])
print('\nFrame 2\n')
print(frame2)
print('\nConcatenated frames\n')
print(pd.concat([frame1, frame2]))
print('\nConcatenated frames along axis=1\n')
print(pd.concat([frame1, frame2], axis=1))
The output of the program is shown below:
Frame 1
A B C
1 0.216094 0.206833 0.565031
2 0.278919 0.311937 0.410026
3 0.262882 0.487224 0.489479
Frame 2
A B C
4 0.660482 0.491644 0.411970
5 0.511529 0.394583 0.475184
6 0.638702 0.849363 0.190679
Concatenated frames
A B C
1 0.216094 0.206833 0.565031
2 0.278919 0.311937 0.410026
3 0.262882 0.487224 0.489479
4 0.660482 0.491644 0.411970
5 0.511529 0.394583 0.475184
6 0.638702 0.849363 0.190679
Concatenated frames along axis=1
A B C A B C
1 0.216094 0.206833 0.565031 NaN NaN NaN
2 0.278919 0.311937 0.410026 NaN NaN NaN
3 0.262882 0.487224 0.489479 NaN NaN NaN
4 NaN NaN NaN 0.660482 0.491644 0.411970
5 NaN NaN NaN 0.511529 0.394583 0.475184
6 NaN NaN NaN 0.638702 0.849363 0.190679
------------------
(program exited with code: 0)
Press any key to continue . . .
Let's consider a scenario in which we want the two datasets to have indexes that overlap in their entirety or at least partially. This combination of data cannot be obtained either with merging or with concatenation. One applicable function to series is combine_first(), which performs this kind of
operation along with data alignment. See the following program :
import pandas as pd
import numpy as np
ser1 = pd.Series(np.random.rand(5),index=[1,2,3,4,5])
print('Series 1\n')
print(ser1)
ser2 = pd.Series(np.random.rand(4),index=[2,4,5,6])
print('\nSeries 2\n')
print(ser2)
print('\nCombined series with ser2 as an arument\n')
print(ser1.combine_first(ser2))
print('\nCombined series with ser1 as an arument\n')
print(ser2.combine_first(ser1))
print('\nCombined series with partial overlap\n')
print(ser1[:3].combine_first(ser2[:3]))
The output of the program is shown below:
Series 1
1 0.546086
2 0.855131
3 0.975251
4 0.159282
5 0.778717
dtype: float64
Series 2
2 0.420990
4 0.883285
5 0.483201
6 0.848290
dtype: float64
Combined series with ser2 as an arument
1 0.546086
2 0.855131
3 0.975251
4 0.159282
5 0.778717
6 0.848290
dtype: float64
Combined series with ser1 as an arument
1 0.546086
2 0.420990
3 0.975251
4 0.883285
5 0.483201
6 0.848290
dtype: float64
Combined series with partial overlap
1 0.546086
2 0.855131
3 0.975251
4 0.883285
5 0.483201
dtype: float64
------------------
(program exited with code: 0)
Press any key to continue . . .
Here I am ending today’s post. Until we meet again keep practicing and learning Python, as Python is easy to learn!
0 comments:
Post a Comment