Web pages can be static or generated on the fly in response to a user’s interaction, in which case they may contain information from many different sources. In either case, a program can read a web page and extract parts of it. Called web scraping, this is quite legal as long as the page is publicly available.A typical scraping scenario in Python involves two libraries: Requests and BeautifulSoup....
Wednesday, June 29, 2022
Monday, June 27, 2022
Sources of Data
Generally speaking, data may come from many different sources, including texts, videos, images, and device sensors, among others. From the standpoint of Python scripts that you’ll write, however, the most common data sources are:An application programming interface (API)A web pageA databaseA fileThis list isn’t intended to be comprehensive or restrictive; there are many other sources of data. Technically,...
Friday, June 24, 2022
Time Series Data
A time series is a set of data points indexed or listed in time order. Many financial datasets are stored as a time series due to the fact that financial data typically consists of observations at a specific time.Time series data can be either structured or semi-structured. Imagine you’re receiving location data in records from a taxi’s GPS tracking device at regular time intervals. The data might...
Wednesday, June 22, 2022
Semistructured Data
In cases where the structural identity of the information doesn’t conform to stringent formatting requirements, we may need to process semistructured data formats, which let us have records of different structures within the same container (database table or document). Like unstructured data, semistructured data isn’t tied to a predefined organizational schema; unlike unstructured data, however, samples...
Monday, June 20, 2022
Structured Data
Structured data has a predefined format that specifies how the data is organized. Such data is usually stored in a repository like a relational database or just a .csv (comma-separated values) file. The data fed into such a repository is called a record, and the information in it is organized in fields that must arrive in a sequence matching the expected structure. Within a database, records of the...
Friday, June 17, 2022
Categories of Data
Programmers divide data into three main categories: unstructured, structured, and semi-structured. In a data processing pipeline, the source data is typically unstructured; from this, you form structured or semi-structured datasets for further processing. Some pipelines, however, use structured data from the start. For example, an application processing geographical locations might receive structured...
Wednesday, June 15, 2022
REAL-WORLD REINFORCEMENT LEARNING
Often in reinforcement learning research, “real-life” tasks are linked to robotics and self-driving cars. However, there is a much broader range of problems that are yet to be fully solved and require less investment into special hardware as, for example, robotics would.We argue that whether a task is “real-life” or not is a spectrum rather than a binary classification.A toy problem, such as an abstract...
Monday, June 13, 2022
Logic Control
The main part of programming is learning how to make your code do something, primarily through a variety of logic controllers. These controllers handle if-then conditions, reiterative processing through loops, and dealing with errors. While there are other ways of working with code, these are the most...
Friday, June 10, 2022
Modules as scripts
An important thing to know about Python is that modules, as written, are pretty much only useful as imported objects for other programs.However, a module can be written to be imported or function as a standalone program.When a module is imported into another program, only certain objects are imported over. Not everything is imported, which is what allows a module to perform dual duty. To make a module...
Wednesday, June 8, 2022
Dot nomenclature and type of imports
When a module is imported, after the local and global checks within the current program, the imported module will be examined for the called object as well (prior to checking the built-ins). The problem with this is that, unless the called object is explicitly identified with the dot nomenclature, an...
Monday, June 6, 2022
Importing modules
Modules are also called libraries or packages. Modules are modular, often self-contained Python programs that are commonly utilized in other programs, hence the need to import them for access.Modules are used to separate code to make a program easier to work with, as each module can be designed to do...