Tuesday, October 8, 2019

Visualizing Repositories Using Plotly

In this post we'll make a visualization using the data we have now to show the relative popularity of Python projects on GitHub. We’ll make an interactive bar chart: the height of each bar will represent the number of stars the project has acquired, and you can click the bar’s label to go to that project’s home on GitHub. The following program implements our idea:

import requests
from plotly.graph_objs import Bar
from plotly import offline

# Make an API call and store the response.
url = 'https://api.github.com/search/repositories?q=language:python&sort=stars'
headers = {'Accept': 'application/vnd.github.v3+json'}
r = requests.get(url, headers=headers)
print(f"Status code: {r.status_code}")

# Process results.
response_dict = r.json()
repo_dicts = response_dict['items']
repo_names, stars = [], []
for repo_dict in repo_dicts:
    repo_names.append(repo_dict['name'])
    stars.append(repo_dict['stargazers_count'])
   
# Make visualization.
data = [{
    'type': 'bar',
    'x': repo_names,
    'y': stars,
   
 }]

my_layout = {
'title': 'Most-Starred Python Projects on GitHub',
'xaxis': {'title': 'Repository'},
'yaxis': {'title': 'Stars'},
}

fig = {'data': data, 'layout': my_layout}
offline.plot(fig, filename='python_repos.html')


We import the Bar class and the offline module from plotly . Next we print the status of the API call
response so we’ll know if there is a problem.


We then create two empty lists to store the data we’ll include in the initial chart. We’ll need the name of each project to label the bars, and the number of stars to determine the height of the bars. In the loop, we append the name of each project and the number of stars it has to these lists.

Next, we define the data list containing a dictionary, which defines the type of the plot and provides the data for the x- and y-values. The x-values are the names of the projects, and the yvalues are the number of stars each project has been given.

Finally we define the layout for this chart using the dictionary approach. Instead of making an instance of the Layout class, we build a dictionary with the layout specifications we want to use. We set a title for the overall chart, and we define a label for each axis.

The figure below shows the resulting chart.



Now we'll refine the chart’s styling. We can include all the styling directives as key-value pairs in the data and my_layout dictionaries. Changes to the data object affect the bars. The following code shows the modified version of the data object for our chart that gives us a specific color and a clear border for each bar:

data = [{
    'type': 'bar',
    'x': repo_names,
    'y': stars,
    'marker': {
    'color': 'rgb(60, 100, 150)',
    'line': {'width': 1.5, 'color': 'rgb(25, 25, 25)'}
    },
    'opacity': 0.6,
    }]

The marker settings shown here affect the design of the bars. We set a custom blue color for the bars and specify that they’ll be outlined with a dark gray line that’s 1.5 pixels wide. We also set the opacity of the bars to 0.6 to soften the appearance of the chart a little. Now, we'll modify my_layout:

my_layout = {
    'title': 'Most-Starred Python Projects on GitHub',
    'titlefont': {'size': 28},
    'xaxis': {
        'title': 'Repository',
        'titlefont': {'size': 24},
        'tickfont': {'size': 14},
    },
    'yaxis': {
        'title': 'Stars',
        'titlefont': {'size': 24},
        'tickfont': {'size': 14},
    },

}

We use the 'titlefont' key to define the font size of the overall chart title. Within the 'xaxis' dictionary, we add settings to control the font size of the x-axis title ('titlefont') and also of the tick labels ('tickfont'). Because these are individual nested dictionaries, you can include keys for the color and font family of the axis titles and tick labels. Similarly we define settings for the y-axis. Let's see the output now:



In Plotly, we can hover the cursor over an individual bar to show the information that the bar represents. This is commonly called a tooltip, and in this case, it currently shows the number of stars a project has. Let’s create a custom tooltip to show each project’s description as well as the project’s owner. We need to pull some additional data to generate the tooltips and modify the data object as shown by the code below:

# Process results.
response_dict = r.json()
repo_dicts = response_dict['items']
repo_names, stars, labels = [], [], []
for repo_dict in repo_dicts:
    repo_names.append(repo_dict['name'])
    stars.append(repo_dict['stargazers_count'])
   
    owner = repo_dict['owner']['login']
    description = repo_dict['description']
    label = f"{owner}<br />{description}"
    labels.append(label)

   
# Make visualization.
data = [{
    'type': 'bar',
    'x': repo_names,
    'y': stars,
    'hovertext': labels,
    'marker': {
    'color': 'rgb(60, 100, 150)',
    'line': {'width': 1.5, 'color': 'rgb(25, 25, 25)'}
    },
    'opacity': 0.6,
    }]

In the code shown above we first define a new empty list, labels, to hold the text we want to
display for each project. In the loop where we process the data, we pull the owner and the description for each project. Plotly allows you to use HTML code within text elements, so we generate a string for the label with a line break (<br />) between the project owner’s username and the description. We then store this label in the list labels.

In the data dictionary, we add an entry with the key 'hovertext' and assign it the list we just created. As Plotly creates each bar, it will pull labels from this list and only display them when the viewer hovers over a bar. The new output is shown below:




Plotly allows us to use HTML on text elements, so we can easily add links to a chart. Let’s use the x-axis labels as a way to let the viewer visit any project’s home page on GitHub. We need to pull the URLs from the data and use them when generating the x-axis labels as shown in the code below:

repo_links, stars, labels = [], [], []
for repo_dict in repo_dicts:
    repo_name = repo_dict['name']
    repo_url = repo_dict['html_url']
    repo_link = f"<a href='{repo_url}'>{repo_name}</a>"
    repo_links.append(repo_link)


    stars.append(repo_dict['stargazers_count'])

    owner = repo_dict['owner']['login']
    description = repo_dict['description']
    label = f"{owner}<br />{description}"
    labels.append(label)

# Make visualization.
data = [{
'type': 'bar',
'x': repo_links,
'y': stars,
'hovertext': labels,
    'marker': {
        'color': 'rgb(60, 100, 150)',
        'line': {'width': 1.5, 'color': 'rgb(25, 25, 25)'}
    },
    'opacity': 0.6,
}]

In the code shown above we update the name of the list we’re creating from repo_names to repo_links to more accurately communicate the kind of information we’re putting together for the chart. We then pull the URL for the project from repo_dict and assign it to the temporary variable repo_url. Next we generate a link to the project. We use the HTML anchor tag, which has the form
<a href='URL'>link text</a>, to generate the link. We then append this link to the list repo_links.
We use this list for the x-values in the chart.

The result looks the same as before, but now the viewer can click any of the project names at the bottom of the chart to visit that project’s home page on GitHub. Now we have an interactive, informative visualization of data retrieved through an API!



Here I am ending this post and will be back with a new topic soon!
Share:

0 comments:

Post a Comment