Generally speaking, data may come from many different sources, including texts, videos, images, and device sensors, among others. From the standpoint of Python scripts that you’ll write, however, the most common data sources are:
An application programming interface (API)
A web page
A database
A file
This list isn’t intended to be comprehensive or restrictive; there are many other sources of data. Technically, all of the options listed here require you to use a corresponding Python library. For example, before you can obtain data from an API, you’ll need to install a Python wrapper for the API or use the Requests Python library to make HTTP requests to the API directly.
Likewise, in order to access data from a database, you’ll need to install a connector from within your Python code that enables you to access databases of that particular type.
While many of these libraries must be downloaded and installed, some libraries used to load data are distributed with Python by default. For example, to load data from a JSON file, you can take advantage of Python’s built-in json package.
For now, we’ll take a brief look at each of the common source types mentioned in the preceding list, in this post we'll look at -
APIs
Perhaps the most common way of acquiring data today is via an API (a software intermediary that enables two applications to interact with each other). As mentioned, to take advantage of an API in Python, you may need to install a wrapper for that API in the form of a Python library. The most common way to do this nowadays is via the pip command.
Not all APIs have their own Python wrapper, but this doesn’t necessarily mean you can’t make calls to them from Python. If an API serves HTTP requests, you can interact with that API from Python using the Requests library. This opens you up to thousands of APIs that you can use in your Python code to request datasets for further processing.
When choosing an API for a particular task, you should take the following into account:
Functionality Many APIs provide similar functionalities, so you need to understand your precise requirements. For example, many APIs let you conduct a web search from within your Python script, but only some allow you to narrow down your search results by date of publication.
Cost Many APIs allow you to use a so-called developer key, which is usually provided for free but with certain limitations, such as a limited number of calls per day.
Stability Thanks to the Python Package Index (PyPI) repository (https://pypi.org), anyone can pack an API into a pip package and make it publicly available. As a result, there’s an API (or several) for virtually any task you can imagine, but not all of these are completely reliable. Fortunately, the PyPI repository tracks the performance and usage of packages.
Documentation Popular APIs usually have a corresponding documentation website, allowing you to see all of the API commands with sample usages. As a good model, look at the documentation page for the Nasdaq Data Link (aka Quandl) API (https://docs.data.nasdaq.com/docs/python-time-series), where you’ll find examples of making different time series calls.
Many APIs return results in one of the following three formats: JSON, XML, or CSV. Data in any of these formats can easily be translated into data structures that are either built into or commonly used with Python. For example, the Yahoo Finance API retrieves and analyzes stock data, then returns the information already translated into a pandas DataFrame.
0 comments:
Post a Comment