Friday, December 6, 2019

Obtaining Data from SQL Databases

Before you begin, check to ensure you have a connection with the Python library in question. Once the connection is established, you can then push a query to Pandas. You need SQLite to establish a connection with your database, from where you will then create a DataFrame using the SELECT query as follows:

import sqlite3
con = sqlite3.connect("database.db")

Using our car dealership example from the previous posts, the SQL database will have a table denoted as sales , and the index. We can read from the database using the command below:

df = pd.read_sql_query("SELECT * FROM sales", con)
df

You will have the following output:






Just as we did with the CSV files, you can also bypass the index as follows:
df = df.set_index('index')
df

You will have the output below:



Once you are done with your data, you need to save it in a file system that is relevant to your needs. In Pandas, you can convert files to and from any of the file formats discussed above in the same way that you read the data files, when storing them as shown below:

df.to_csv('new_sales.csv')
df.to_sql('new_sales', con)
df.to_json('new_sales.json')


In data analysis, there are lots of methods that you can employ when using DataFrames, all of which are important to your analysis. Some operations are useful in performing simple data transformations, while others are necessary for complex statistical approaches.

In the examples below, we will use an example of a dataset from the English Premier League below:

squad_df = pd.read_csv("EPL-Data.csv", index_col="Teams")

As we load this dataset from the CSV file, we will use teams as our index. To view the data, you must first open a new dataset by printing out rows as follows:

squad_df.head()

You will have the following Output:






.head() will by default print the first five rows of your DataFrame. However, if you need more rows displayed, you can input a specific number to be printed as follows:


squad_df.head(7)

This will output the top seven rows as shown below:



In case you need to display only the last rows, use the .tail() syntax. You can also input a specific number. Assuming we want to determine the last three teams, we will use the syntax below:

squad_df.tail(3)

Our output will be as follows:



Generally, whenever you access any dataset, you will often access the first five rows to determine whether you are looking at the correct data set. From the display, you can see the index, column names, and the preset values. You will notice from the example above that the index for our DataFrame is the Teams column.
Share:

0 comments:

Post a Comment