Thursday, April 11, 2019

Pandas - 13 (Writing Data in CSV)

It's also required to write a data file produced by a calculation, or in general the data contained in a data structure. In the following program we'll write the data contained in a dataframe to a CSV
file. To do this writing process, we will use the to_csv() function, which accepts as an argument the name of the file we generate :

import pandas as pd
import numpy as np


frame1 = pd.DataFrame(np.arange(16).reshape((4,4)),
             index = ['red', 'blue', 'yellow', 'white'],
             columns = ['ball', 'pen', 'pencil', 'paper'])
frame1.to_csv('mydata7.csv')
frame2 = pd.read_csv('mydata7.csv')
print('\nThe dataframe\n')
print(frame2)


The output of the program is shown below: 

The dataframe

  Unnamed: 0              ball  pen  pencil  paper
0                  red              0    1       2      3
1                  blue            4    5       6      7
2                  yellow        8    9      10     11
3                  white         12   13      14     15
------------------
(program exited with code: 0)

Press any key to continue . . .


If we open the new file called mydata7.csv generated by the pandas library, we will see following data:

,ball,pen,pencil,paper
red,0,1,2,3
blue,4,5,6,7
yellow,8,9,10,11
white,12,13,14,15

When we write a dataframe to a file, indexes and columns are marked on the file by default. Should you wish to change the default behavior set the two options index and header to False in the to_csv()  as shown in the following program :

import pandas as pd
import numpy as np


frame1 = pd.DataFrame(np.arange(16).reshape((4,4)),
             index = ['red', 'blue', 'yellow', 'white'],
             columns = ['ball', 'pen', 'pencil', 'paper'])
frame1.to_csv('mydata7.csv',index=False, header=False)
frame2 = pd.read_csv('mydata7.csv')
print('\nThe dataframe\n')
print(frame2)


The output of the program is shown below:

The dataframe

     0   1   2   3
0   4   5   6   7
1   8   9  10  11
2  12  13  14  15
------------------
(program exited with code: 0)

Press any key to continue . . .





If we open the new file called mydata7.csv generated by the pandas library, we will see following data:

0,1,2,3
4,5,6,7
8,9,10,11
12,13,14,15

The next program shows what happens when we write files containing NaN values in a datastructure:

import pandas as pd
import numpy as np

frame = pd.DataFrame([[6,np.nan,np.nan,6,np.nan],
            [np.nan,np.nan,np.nan,np.nan,np.nan],
            [np.nan,np.nan,np.nan,np.nan,np.nan],
            [20,np.nan,np.nan,20.0,np.nan],
            [19,np.nan,np.nan,19.0,np.nan]
            ],
           
            index=['blue','green','red','white','yellow'],
            columns=['ball','mug','paper','pen','pencil'])
           
print('\nThe dataframe\n')
print(frame)           

frame.to_csv('mydata8.csv')
frame2 = pd.read_csv('mydata8.csv')
print('\nThe content from csv created\n')
print(frame2)


The output of the program is shown below: 

The dataframe

        ball  mug  paper   pen  pencil
blue     6.0  NaN    NaN   6.0     NaN
green    NaN  NaN    NaN   NaN     NaN
red      NaN  NaN    NaN   NaN     NaN
white   20.0  NaN    NaN  20.0     NaN
yellow  19.0  NaN    NaN  19.0     NaN

The content from csv created

  Unnamed: 0  ball  mug  paper   pen  pencil
0       blue   6.0  NaN    NaN   6.0     NaN
1      green   NaN  NaN    NaN   NaN     NaN
2        red   NaN  NaN    NaN   NaN     NaN
3      white  20.0  NaN    NaN  20.0     NaN
4     yellow  19.0  NaN    NaN  19.0     NaN
------------------
(program exited with code: 0)

Press any key to continue . . .


If we open the new file called mydata8.csv generated by the pandas library, we will see following data:

,ball,mug,paper,pen,pencil
blue,6.0,,,6.0,
green,,,,,
red,,,,,
white,20.0,,,20.0,
yellow,19.0,,,19.0,

To replace this empty field with a value to our liking we can use the na_rep option in the to_csv() function. Common values may be NULL, 0, or the same NaN. See the following program :

import pandas as pd
import numpy as np

frame = pd.DataFrame([[6,np.nan,np.nan,6,np.nan],
            [np.nan,np.nan,np.nan,np.nan,np.nan],
            [np.nan,np.nan,np.nan,np.nan,np.nan],
            [20,np.nan,np.nan,20.0,np.nan],
            [19,np.nan,np.nan,19.0,np.nan]
            ],
           
            index=['blue','green','red','white','yellow'],
            columns=['ball','mug','paper','pen','pencil'])
           
print('\nThe dataframe\n')
print(frame)           

frame.to_csv('mydata9.csv',na_rep='NULL')
frame2 = pd.read_csv('mydata9.csv')
print('\nThe content from csv created\n')
print(frame2)


The output of the program is shown below:

The dataframe

        ball  mug  paper   pen  pencil
blue     6.0  NaN    NaN   6.0     NaN
green    NaN  NaN    NaN   NaN     NaN
red      NaN  NaN    NaN   NaN     NaN
white   20.0  NaN    NaN  20.0     NaN
yellow  19.0  NaN    NaN  19.0     NaN

The content from csv created

  Unnamed: 0  ball  mug  paper   pen  pencil
0       blue   6.0  NaN    NaN   6.0     NaN
1      green   NaN  NaN    NaN   NaN     NaN
2        red   NaN  NaN    NaN   NaN     NaN
3      white  20.0  NaN    NaN  20.0     NaN
4     yellow  19.0  NaN    NaN  19.0     NaN
------------------
(program exited with code: 0)

Press any key to continue . . .


If we open the new file called mydata9.csv generated by the pandas library, we will see following data:

,ball,mug,paper,pen,pencil
blue,6.0,NULL,NULL,6.0,NULL
green,NULL,NULL,NULL,NULL,NULL
red,NULL,NULL,NULL,NULL,NULL
white,20.0,NULL,NULL,20.0,NULL
yellow,19.0,NULL,NULL,19.0,NULL



Here I am ending today’s post. Until we meet again keep practicing and learning Python, as Python is easy to learn!
Share:

0 comments:

Post a Comment