Friday, December 27, 2019

Scatter Plots

The role of a scatter plot is to identify the relationship between a couple of variables displayed in a coordinate system. Each data point is identified according to the variable values. From the scatter graph, you can tell whether there is a relationship between the variables or not.

When studying a scatter plot diagram, the direction of the trend tells you the nature of correlation. A positive correlation, for example, is represented by an upward pattern. A scatter plot can also be used alongside a bubble chart. Bubble charts introduce a third variable beyond the two identified in the scatter plot. The size of the bubble around the data points is used to determine the value of the third variable.

In matplotlib, scatter plots are called through the scatter () function. The following commands are used to access the scatter function’s documentation:

$ ipython -pylab
In [1] : help(scatter)

In the example below, we introduce three parameters, s to represent the size of the bubble chart, alpha to represent the transparency of the bubbles when plotted on the chart, and c to represent the colors. The alpha variable values are in the range of 0 - completely transparent, and 1 - completely opaque. You will have a scatter chart with the following coordinates:

plt.scatter(years, cnt_log, c= 200 * years, s=20 + 200 *
gpu_counts/gpu_counts.max(), alpha=0.5)

You should have the following code:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.read_csv('transcount.csv')
df = df.groupby('year').aggregate(np.mean)
gpu = pd.read_csv('gpu_transcount.csv')
gpu = gpu.groupby('year').aggregate(np.mean)
df = pd.merge(df, gpu, how='outer', left_index=True, right_index=True)
df = df.replace(np.nan, 0)
print df
years = df.index.values
counts = df['trans_count'].values
gpu_counts = df['gpu_trans_count'].values
cnt_log = np.log(counts)
plt.scatter(years, cnt_log, c= 200 * years, s=20 + 200 * gpu_counts/
gpu_counts.max(), alpha=0.5)
plt.show()


The output as obtained on the output window is as follows-

       trans_count  gpu_trans_count
year
1971  2.300000e+03     0.000000e+00
1972  3.500000e+03     0.000000e+00
1974  4.533333e+03     0.000000e+00
1975  3.510000e+03     0.000000e+00
1976  7.500000e+03     0.000000e+00
1978  1.900000e+04     0.000000e+00
1979  4.850000e+04     0.000000e+00
1982  9.450000e+04     0.000000e+00
1983  8.500000e+03     0.000000e+00
1984  2.000000e+05     0.000000e+00
1985  1.053333e+05     0.000000e+00
1986  2.500000e+04     0.000000e+00
1988  2.500000e+05     0.000000e+00
1989  7.401175e+05     0.000000e+00
1991  6.900000e+05     0.000000e+00
1993  3.100000e+06     0.000000e+00
1994  5.789770e+05     0.000000e+00
1995  5.500000e+06     0.000000e+00
1996  4.300000e+06     0.000000e+00
1997  8.150000e+06     3.500000e+06
1998  7.500000e+06     0.000000e+00
1999  1.760000e+07     1.350000e+07
2000  3.150000e+07     2.500000e+07
2001  4.500000e+07     5.850000e+07
2002  1.375000e+08     8.500000e+07
2003  1.900667e+08     1.260000e+08
2004  3.520000e+08     1.910000e+08
2005  1.690000e+08     3.120000e+08
2006  6.040000e+08     5.325000e+08
2007  3.716000e+08     7.270000e+08
2008  9.032000e+08     1.179500e+09
2009  3.450000e+09     2.154000e+09
2010  1.511667e+09     2.946667e+09
2011  1.733500e+09     4.312712e+09
2012  2.014826e+09     5.310000e+09
2013  5.000000e+09     6.300000e+09
2014  4.310000e+09     0.000000e+00

 

The following plot will be obtained-


Share:

0 comments:

Post a Comment