Tuesday, April 23, 2019

Pandas - 20 (Interacting with Databases)

Usually data are stored in databases (SQL-based relational database and NoSQL databases). For loading data from SQL in a dataframe pandas has some functions to simplify the process.

The pandas.io.sql module provides a unified interface independent of the DB, called sqlalchemy. This interface simplifies the connection mode, since regardless of the DB, the commands will always be the same.

To make a connection we use the create_engine() function (from sqlalchemy import create_engine). With this feature we can configure all the properties necessary to use the driver, as a user, password, port, and database instance. Following is a list of examples for the various types of databases:

For PostgreSQL:
>>> engine = create_engine('postgresql://scott:tiger@localhost:5432/mydatabase')
For MySQL
>>> engine = create_engine('mysql+mysqldb://scott:tiger@localhost/foo')
For Oracle
>>> engine = create_engine('oracle://scott:tiger@127.0.0.1:1521/sidname')
For MSSQL
>>> engine = create_engine('mssql+pyodbc://mydsn')
For SQLite
>>> engine = create_engine('sqlite:///foo.db')

Loading and Writing Data with SQLite3

Let's see how to interact withSQLite3 which is one of the frequently used databases for especially for development environments. We will use a SQLite database using the driver’s built-in Python sqlite3. SQLite3 is a tool that implements a DBMS SQL in a very simple and lightweight way, so it can be incorporated in any application implemented with the Python language. This tool is ideal for those want to have the functions of a database without having to install a real database. SQLite3 could be the right choice for those who wants to practice before going on to a real database, or for anyone who needs to use the functions of a database to collect data, but remaining within a single program, without having to interface with a database. See the following program :

import pandas as pd
import numpy as np
from sqlalchemy import create_engine

frame = pd.DataFrame(np.arange(20).reshape(4,5),
                    columns=['white','red','blue','black','green'])

print('The frame\n')
print(frame)
engine = create_engine('sqlite:///mydata.db')
frame.to_sql('colors',engine)

print('\nReading the database\n')
print(pd.read_sql('colors',engine))



Make sure you have installed sqlalchemy module. First we create a dataframe that we will use to create a new table on the SQLite3 database. Next we implement the connection to the SQLite3 database reate_engine('sqlite:///mydata.db'). Then we convert the dataframe in a table within the database using frame.to_sql('colors',engine). To read the database, we have to use the read_sql() function with the name of the table and the engine (pd.read_sql('colors',engine)).

The output of the program is shown below:

The frame



whiteredblueblackgreen
001234
156789
21011121314
31516171819

Reading the database


indexwhiteredblueblackgreen
0001234
1156789
221011121314
331516171819

Using the I/O APIs available in the pandas library the writing operation on the database became
very simple. If you don't agree then let's not use the I/O APIs available in the pandas library to perform the writing operation on the database. See the following program :

import pandas as pd
import numpy as np
from sqlalchemy import create_engine
import sqlite3
query = """
         CREATE TABLE test
         (a VARCHAR(20), b VARCHAR(20),
          c REAL,        d INTEGER
         );"""
con = sqlite3.connect(':memory:')
con.execute(query)
con.commit()
data = [('white','up',1,3),
        ('black','down',2,8),
        ('green','up',4,4),
        ('red','down',5,5)]
stmt = "INSERT INTO test VALUES(?,?,?,?)"
con.executemany(stmt, data)
con.commit()
cursor = con.execute('select * from test')
rows = cursor.fetchall()
print('\nRecorded data\n')
print(rows)
print('\nFinding column names\n')
print(cursor.description)
print('\nThe table\n')
print(pd.DataFrame(rows, columns=list(zip(*cursor.description))[0]))


First we establish a connection to the DB and create a table by defining the corrected data types, so as to accommodate the data to be loaded. Next we enter data using the SQL INSERT statement. After loading the data on a table, we query the database to get the data we just recorded using an SQL SELECT statement. We the find the name of the columns within the description attribute of the cursor. Finally we pass the list of tuples to the constructor of the dataframe. The output of the program is shown below:

Recorded data

[('white', 'up', 1.0, 3), ('black', 'down', 2.0, 8), ('green', 'up', 4.0, 4), ('
red', 'down', 5.0, 5)]

Finding column names

(('a', None, None, None, None, None, None), ('b', None, None, None, None, None,
None), ('c', None, None, None, None, None, None), ('d', None, None, None, None,
None, None))

The table

       a         b         c      d
0  white    up        1.0   3
1  black    down   2.0   8
2  green    up       4.0    4
3    red      down  5.0    5

------------------
(program exited with code: 0)

Press any key to continue . . .


As seen from the program we can easy conclude that this approach is quite laborious and using the I/O APIs available in the pandas library the writing operation on the database is much easier. As I have only sqlite DB I am not discussing about interaction with other databases. Whenever I start using other databases, I'll create a post on how to interact with then using the I/O APIs available in the pandas library.


Here I am ending today’s post. Until we meet again keep practicing and learning Python, as Python is easy to learn!
Share:

0 comments:

Post a Comment