Wednesday, June 29, 2022

Web Pages

Web pages can be static or generated on the fly in response to a user’s interaction, in which case they may contain information from many different sources. In either case, a program can read a web page and  extract parts of it. Called web scraping, this is quite legal as long as the page is publicly available.

A typical scraping scenario in Python involves two libraries: Requests and BeautifulSoup. Requests fetches the source code of the page, and then BeautifulSoup creates a parse tree for the page, which is a hierarchical representation of the page’s content. You can search the parse tree and extract data from it using Pythonic idioms. For example, the following fragment of a parse tree:

[<td title="03/01/2020 00:00:00"><a href="Download.aspx?

ID=630751" id="lnkDownload630751"

target="_blank">03/01/2020</a></td>,

<td title="03/01/2020 00:00:00"><a href="Download.aspx?

ID=630753" id="lnkDownload630753"

target="_blank">03/01/2020</a></td>,

<td title="03/01/2020 00:00:00"><a href="Download.aspx?

ID=630755" id="lnkDownload630755"

target="_blank">03/01/2020</a></td>]

can be easily transformed into the following list of items within a for loop in your Python script:

[

{'Document_Reference': '630751', 'Document_Date':

'03/01/2020',

'link': 'http://www.dummy.com/Download.aspx?ID=630751'}

{'Document_Reference': '630753', 'Document_Date':

'03/01/2020',

'link': 'http://www.dummy.com/Download.aspx?ID=630753'}

{'Document_Reference': '630755', 'Document_Date':

'03/01/2020',

'link': 'http://www.dummy.com/Download.aspx?ID=630755'}

]

This is an example of transforming semistructured data into structured data. In the next post we'll cover Databases.

Share:

Monday, June 27, 2022

Sources of Data

Generally speaking, data may come from many different sources, including texts, videos, images, and device sensors, among others. From the standpoint of Python scripts that you’ll write, however, the most common data sources are:

An application programming interface (API)

A web page

A database

A file

This list isn’t intended to be comprehensive or restrictive; there are many other sources of data. Technically, all of the options listed here require you to use a corresponding Python library. For example, before you can obtain data from an API, you’ll need to install a Python wrapper for the API or use the Requests Python library to make HTTP requests to the API directly.

Likewise, in order to access data from a database, you’ll need to install a connector from within your Python code that enables you to access databases of that particular type.

While many of these libraries must be downloaded and installed, some libraries used to load data are distributed with Python by default. For example, to load data from a JSON file, you can take advantage of Python’s built-in json package.

For now, we’ll take a brief look at each of the common source types mentioned in the preceding list, in this post we'll look at -

APIs

Perhaps the most common way of acquiring data today is via an API (a software intermediary that enables two applications to interact with each other). As mentioned, to take advantage of an API in Python, you may need to install a wrapper for that API in the form of a Python library. The most common way to do this nowadays is via the pip command.

Not all APIs have their own Python wrapper, but this doesn’t necessarily mean you can’t make calls to them from Python. If an API serves HTTP requests, you can interact with that API from Python using the Requests library. This opens you up to thousands of APIs that you can use in your Python code to request datasets for further processing.

When choosing an API for a particular task, you should take the following into account:

Functionality Many APIs provide similar functionalities, so you need to understand your precise requirements. For example, many APIs let you conduct a web search from within your Python script, but only some allow you to narrow down your search results by date of publication.

Cost Many APIs allow you to use a so-called developer key, which is usually provided for free but with certain limitations, such as a limited number of calls per day.

Stability Thanks to the Python Package Index (PyPI) repository (https://pypi.org), anyone can pack an API into a pip package and make it publicly available. As a result, there’s an API (or several) for virtually any task you can imagine, but not all of these are completely reliable. Fortunately, the PyPI repository tracks the performance and usage of packages.

Documentation Popular APIs usually have a corresponding documentation website, allowing you to see all of the API commands with sample usages. As a good model, look at the documentation page for the Nasdaq Data Link (aka Quandl) API (https://docs.data.nasdaq.com/docs/python-time-series), where you’ll find examples of making different time series calls.

Many APIs return results in one of the following three formats: JSON, XML, or CSV. Data in any of these formats can easily be translated into data structures that are either built into or commonly used with Python. For example, the Yahoo Finance API retrieves and analyzes stock data, then returns the information already translated into a pandas DataFrame.

Share:

Friday, June 24, 2022

Time Series Data

A time series is a set of data points indexed or listed in time order. Many financial datasets are stored as a time series due to the fact that financial data typically consists of observations at a specific time.

Time series data can be either structured or semi-structured. Imagine you’re receiving location data in records from a taxi’s GPS tracking device at regular time intervals. The data might arrive in the following format:

[

{

"cab": "cab_238",

"coord": (43.602508,39.715685),

"tm": "14:47",

"state": "available"

},

{

"cab": "cab_238",

"coord": (43.613744,39.705718),

"tm": "14:48",

"state": "available"

}

...

]

A new data record arrives every minute that includes the latest location coordinates (latitude/longitude) from cab_238. Each record has the same sequence of fields, and each field has a consistent structure from one record to the next, allowing you to store this time series data in a relational database table as regular structured data.

Now suppose the data comes at unequal intervals, which is often the case in practice, and that you receive more than one set of coordinates in one minute. The incoming structure might look like this:

[

{

"cab": "cab_238",

"coord": [(43.602508,39.715685),(43.602402,39.709672)],

"tm": "14:47",

"state": "available"

},

{

"cab": "cab_238",

"coord": (43.613744,39.705718),

"tm": "14:48",

"state": "available"

}

]

Note that the first coord field includes two sets of coordinates and is thus not consistent with the second coord field. This data is semi-structured.

Share:

Wednesday, June 22, 2022

Semistructured Data

In cases where the structural identity of the information doesn’t conform to stringent formatting requirements, we may need to process semistructured data formats, which let us have records of different structures within the same container (database table or document). Like unstructured data, semistructured data isn’t tied to a predefined organizational schema; unlike unstructured data, however, samples of semistructured data do exhibit some degree of structure, usually in the form of self-describing tags or other markers.

The most common semistructured data formats include XML and JSON. This is what our financial statement might look like in JSON format:

{

"Company": "GoodComp",

"Date": "2021-01-07",

"Stock": 8.2,

"Details": "the company announced positive early-stage

trial results for its vaccine."

}

Here you can recognize the key information that we previously extracted from the statement. Each piece of information is paired with a descriptive tag, such as "Company" or "Date". Thanks to the tags, the information is organized similarly to how it appeared in the previous section, but now we have a fourth tag, "Details", paired with an entire fragment of the original statement, which looks unstructured. This example shows how semistructured data formats can accommodate both structured and unstructured pieces of data within a single record.

Moreover, you can put multiple records of unequal structure into the same container. Here, we store the two different records derived from our example financial statement in the same JSON document:

[

{

"Company": "GoodComp",

"Date": "2021-01-07",

"Stock": 8.2

},

{

"Company": "GoodComp",

"Date": "2021-01-07",

"Product": "vaccine",

"Stage": "early-stage trial"

}

]

Recall from the discussion in the previous post that a relational database, being a rigidly structured data repository, cannot accommodate records of varying structures in the same table.

Share:

Monday, June 20, 2022

Structured Data

Structured data has a predefined format that specifies how the data is organized. Such data is usually stored in a repository like a relational database or just a .csv (comma-separated values) file. The data fed into such a repository is called a record, and the information in it is organized in fields that must arrive in a sequence matching the expected structure. Within a database, records of the same structure are logically grouped in a container called a table. A database may contain various tables, with each table having a set structure of fields.

There are two basic types of structured data: numerical and categorical. Categorical data is that which can be categorized on the basis of similar characteristics; cars, for example, might be categorized by make and model. Numerical data, on the other hand, expresses information in numerical form, allowing you to perform mathematical operations on it.

Keep in mind that categorical data can sometimes take on numerical values. For example, consider ZIP codes or phone numbers. Although they are expressed with numbers, it wouldn’t make any sense to perform math operations on them, such as finding the median ZIP code or average phone number.

How can we organize the text sample introduced in the previous section into structured data? We’re interested in specific information in this text, such as company names, dates, and stock prices. We want to present that information in fields in the following format, ready for insertion into a database:

Company: ABC

Date: yyyy-mm-dd

Stock: nnnnn

Using techniques of natural language processing (NLP), a discipline that trains machines to understand human-readable text, we can extract information appropriate for these fields. For example, we look for a company name by recognizing a categorical data variable that can only be one of many preset values, such as Google, Apple, or GoodComp.

Likewise, we can recognize a date by matching its explicit ordering to one of a set of explicit ordering formats, such as yyyy-mm-dd. In our example, we recognize, extract, and present our data in the predefined format like this:

Company: GoodComp

Date: 2021-01-07

Stock: +8.2%

To store this record in a database, it’s better to present it as a row-like sequence of fields. We therefore might reorganize the record as a rectangular data object, or a 2D matrix:

Company | Date | Stock

---------------------------

GoodComp |2021-01-07 | +8.2%

The information you choose to extract from the same unstructured data source depends on your requirements. Our example statement not only contains the change in GoodComp’s stock value for a certain date but also indicates the reason for that change, in the phrase “the company announced positive early-stage trial results for its vaccine.” Taking the statement from this angle, you might create a record with these fields:

Company: GoodComp

Date: 2021-01-07

Product: vaccine

Stage: early-stage trial

Compare this to the first record we extracted:

Company: GoodComp

Date: 2021-01-07

Stock: +8.2%

Notice that these two records contain different fields and therefore have different structures. As a result, they must be stored in two different database tables.

Share:

Friday, June 17, 2022

Categories of Data

Programmers divide data into three main categories: unstructured, structured, and semi-structured. In a data processing pipeline, the source data is typically unstructured; from this, you form structured or semi-structured datasets for further processing. Some pipelines, however, use structured data from the start. For example, an application processing geographical locations might receive structured data directly from GPS sensors. Let's explore the three main categories of data as well as time series data, a special type of data that can be structured or semi-structured. In this post we will focus on-

Unstructured Data

Unstructured data is data with no predefined organizational system, or schema. This is the most widespread form of data, with common examples including images, videos, audio, and natural language text. To illustrate, consider the following financial statement from a pharmaceutical company:

GoodComp shares soared as much as 8.2% on 2021-01-07 after the company announced positive early-stage trial results for its vaccine.

This text is considered unstructured data because the information found in it isn’t organized with a predefined schema. Instead, the information is randomly scattered within the statement. You could rewrite this statement in any number of ways while still conveying the same information. For example:

Following the January 7, 2021, release of positive results from its vaccine trial, which is still in its early stages, shares in GoodComp rose by 8.2%.

Despite its lack of structure, unstructured data may contain important information, which you can extract and convert to structured or semi-structured data through appropriate transformation and analysis steps.

For example, image recognition tools first convert the collection of pixels within an image into a dataset of a predefined format and then analyze this data to identify content in the image. Similarly, the  following section will show a few ways in which the data extracted from our financial statement could be structured.


Share:

Wednesday, June 15, 2022

REAL-WORLD REINFORCEMENT LEARNING

Often in reinforcement learning research, “real-life” tasks are linked to robotics and self-driving cars. However, there is a much broader range of problems that are yet to be fully solved and require less investment into special hardware as, for example, robotics would.

We argue that whether a task is “real-life” or not is a spectrum rather than a binary classification.

A toy problem, such as an abstract grid world with a simple agent, can be very useful for understanding reinforcement learning, but could not be directly used in any problem that someone outside of computer science would be interested in.

A simulated problem can encompass a wider range of tasks. This could be a simple simulator of a pole-balancing system that a researcher constructed themselves based on simple physical equations. Or it could be a sophisticated simulation that mimics a real-world task, such as using a data-driven approach to model an oil well. When thinking about how “real” such a task is, one may consider whether the simulation is based on data, how well it mimics the true system, or whether the simulation may have been designed to highlight a reinforcement learning algorithm’s success (rather than the other way around).

A virtual problem is the setting when a virtual task is the true task. For instance, when competing in video game tournaments or trading stocks, there is no physical implementation. Instead, the true problem is fully virtual, but actions do have real-life consequences.

Finally, a physical problem is one where the agent takes actions that have physical consequences. On the simpler side, this could be maneuvering a robot in a controlled lab setting. On the more complex side, this could involve coordinating multiple self-driving cars through a busy intersection.

A high-fidelity simulation, a virtual problem that has real-life impacts, or a method to optimize a chemical plant over time are all examples that are on the “real-world tasks” end of the spectrum. Similarly, a toy problem or a simple robot arm stacking blocks in a lab are less “real world.” We would also argue that the task could either be control or evaluation (e.g., reinforcement learning can be used to judge the quality of a laser weld, which is also a sequential decision task).

In some cases, reinforcement learning may only be a small component in a bigger system. For example, it could be used to tune the parameters for control engineering within a prediction problem or control a single thermostat. Alternatively, it can be used as an end to end replacement for a problem that either naturally works well for reinforcement learning or can be adapted such that doing so has advantages (e.g., automation). In both cases, it is incredibly important to understand both the limitations and advantages of introducing reinforcement learning to known problems which will be decided by both the data available and real-world politics (e.g., ensuring ethical automation practices).

Share:

Monday, June 13, 2022

Logic Control

The main part of programming is learning how to make your code do something, primarily through a variety of logic controllers. These controllers handle if-then conditions, reiterative processing through loops, and dealing with errors. While there are other ways of working with code, these are the most important ones for new programmers to learn.

When dealing with logic control, a developer needs to be aware of how data is being transferred, particularly when working with user input, network connections, or filesystem access. Python has three data streams for input/output (I/O). sys.stdout is the standard output stream; it handles the output of print() and Python expressions. sys.stdin is the standard input stream; it is used for all interactive input. sys.stderr is the standard error stream; it only takes errors from the program, but also handles the interpreter's own prompts.

One thing to recognize is that, depending on the OS environment, information that you would consider going to stdout is actually sent to stderr, because stderr normally goes to the same location as stdout, by default you'll usually see it on the screen as well. However, you won't know that a response is actually going to stderr without testing. If the particular environment routes stderr to another location, such as a log file, you won't know until you need to troubleshoot. This is important to note because, sometimes, you may not be seeing the information you expect because it's not a normal stdout message. These data streams are considered regular text files and can be accessed and interacted with just like normal files.

One of the most common control structures you'll use, and run into in other programs, is the if...else conditional block. Simply put, the program asks a yes or no question; depending on the answer, different things happen.

If you've programmed in other languages, the if...else statement works the same way. The key  difference is that, in Python, the elseif statement is written as elif for checking multiple conditions, as shown in following screenshot:


In the preceding example, the preference() function is used to hold the main code logic. The input() function prints the string within parentheses to the user (normally a question), and accepts the user's input, and that input is assigned to the answer variable.

When checking for a yes or no condition, the only required part is the if statement. The elif (else/if) and the else statement aren't necessary. Having the else statement as a catch-all, default case is useful, especially if used with a print() command to indicate when an unexpected condition is received.

An if statement can be standalone, as shown here:

x = True
y = False
if x == True:
y = True
print(y)

More common is an if...else block, to have two different options depending on the condition, as shown here:

x = True
y = False
if x == True:
y = True
print(y)
else:
print("'x' is not True")

Python doesn't have a switch or case statement, unlike other languages. A switch statement is a type of control device that allows a single variable to determine the rest of the program execution based on the
variable's value. An example of the switch statement from the C language is shown in the following example:

switch (grade) {
case "A":
printf("Outstanding!");
break;
case "B":
printf("Good job!");
break;
case "C":
printf("Satisfactory performance.");
break;
case "D":
printf("You should try harder.");
case "F":
printf("You failed.");
break;
default:
printf("Invalid grade");
}

While this is somewhat simplistic, you can probably see that a more complicated example could provide different branches to the rest of the program, if desired. The key point is that a single variable is
tested, and the results of that test are compared to a variety of options; the option that matches dictates how the program continues.

You can get the same functionality of switch statements by using if...elif tests, searching within lists, or indexing dictionaries. Since lists and dictionaries are built at runtime, they can be more flexible. Following screenshot demonstrates how a dictionary can be used to perform the same functionality as a switch statement:



Obviously, this isn't the most intuitive way to write this program. A better way would be to create the dictionary as a separate object, and then use a dictionary method such as key in dict to find the value
corresponding to your choice. In this case, you could use "spam" in choice.

However, it's more common to use if...else statements to perform this operation, as it looks similar to the normal switch choices and is the easiest way to deal with choices. The benefit to using a dictionary is that dynamic programs can create these data structures relatively easily. With if...else statements, they have to be written by the programmer prior to running the program, whereas dictionaries can be populated and tested programmatically during runtime.

Share:

Friday, June 10, 2022

Modules as scripts

An important thing to know about Python is that modules, as written, are pretty much only useful as imported objects for other programs.However, a module can be written to be imported or function as a standalone program.

When a module is imported into another program, only certain objects are imported over. Not everything is imported, which is what allows a module to perform dual duty. To make a module operate by itself, a special line has to be inserted near the end of the program.

The following program is a simple dice rolling simulator, broken up into separate parts:

random_dice_roller.py (part 1)

1 import random #randint

2

3 def randomNumGen(choice):

4 """Get a random number to simulate a d6, d10, or d100 roll."""

5

6 if choice == 1: #d6 roll

7 die = random.randint(1, 6)

8 elif choice == 2: #d10 roll

9 die = random.randint(1, 10)

10 elif choice == 3: #d100 roll

11 die = random.randint(1, 100)

12 elif choice == 4: #d4 roll

13 die = random.randint(1, 4)

14 elif choice == 5: #d8 roll

15 die = random.randint(1, 8)

The preceding code imports the random library from the built-in Python modules. Next, we define the function that will actually perform the dice simulation in line 3:

# random_dice_roller.py (part 2)

1 elif choice == 6: #d12 roll

2 die = random.randint(1, 12)

3 elif choice == 7: #d20 roll

4 die = random.randint(1, 20)

5 else: #simple error message

6 return "Shouldn't be here. Invalid choice"

7 return die

8

9 def multiDie(dice_number, die_type):

10 """Add die rolls together, e.g. 2d6, 4d10, etc."""

11

12 #---Initialize variables

13 final_roll = 0

14 val = 0

In the preceding code, we continue the different dice rolls and then define another function (line 9) that combines multiple dice together, as frequently used in games:

# random_dice_roller.py (part 3)

1 while val < dice_number:

2 final_roll += randomNumGen(die_type)

3 val += 1

4 return final_roll

5

6 if __name__ == "__main__": #run test() if calling as a separate

program

7 """Test criteria to show script works."""

8

9 _1d6 = multiDie(1,1) #1d6

10 print("1d6 = ", _1d6, end=' ')

11 _2d6 = multiDie(2,1) #2d6

12 print("\n2d6 = ", _2d6, end=' ')

13 _3d6 = multiDie(3,1) #3d6

14 print("\n3d6 = ", _3d6, end=' ')

In the preceding code, we finish with the summation of dice. The key part of the entire program is line 6. This line determines whether the module can run by itself or can only be imported into other programs.

Line 6 states that, if the namespace seen by the interpreter is the main one (that is, if random_dice_roller.py is the main program being run and not something else), then the interpreter will process any operations that are specified below line 6. In this case, these operations are simply tests to confirm that the main logic (preceding line 6) works as expected.

If this program were to be imported into another program, then everything before line 6 would be imported while everything following it would be ignored. Thus, you can make a program that functions as a standalone program or can be imported; the only difference is what code is written below if __name__ == "__main__": 

# random_dice_roller.py (part 4)

1 _4d6 = multiDie(4,1) #4d6

2 print("\n4d6 = ", _4d6, end=' ')

3 _1d10 = multiDie(1,2) #1d10

4 print("\n1d10 = ", _1d10, end=' ')

5 _2d10 = multiDie(2,2) #2d10

6 print("\n2d10 = ", _2d10, end=' ')

7 _3d10 = multiDie(2,2) #3d10

8 print("\n3d10 = ", _3d10, end=' ')

9 _d100 = multiDie(1,3) #d100

10 print("\n1d100 = ", _d100)

This finishes the self-tests for the dice rolling program.

Share:

Wednesday, June 8, 2022

Dot nomenclature and type of imports

When a module is imported, after the local and global checks within the current program, the imported module will be examined for the called object as well (prior to checking the built-ins). The problem with this is that, unless the called object is explicitly identified with the dot nomenclature, an error will still be generated.

In the following screenshot, we see how the dot nomenclature works:


In this case, we attempt to calculate the square root of a number. Since this operation is unknown to the default Python interpreter, an error is generated. However, if we import the math library in line 2, and then attempt to perform the calculation again, we get an answer.

Note that we explicitly told Python that the square root function is to be found in the math library by using the math.sqrt() command. This is the dot nomenclature that we talked about earlier; the dot indicates

that the sqrt() function can be found in the math library. We will see many other examples of this as we discuss programming further, so while it may not make sense right now, hopefully more examples will help.

In the previous screenshot, we performed a basic module import. This just means that we imported the module, and then referenced something within it through the dot nomenclature. With this type of import, the main program and the imported module maintain their separate namespaces, hence the need to explicitly identify a function through the dot nomenclature.

There are other ways to import modules. One of the most common is to use the from version of import to get only specified objects from a module, rather than the entire module, as shown in the following screenshot.


In this case, we are importing just randint() from the random library in line 4. Because we have explicitly imported this function, it is now part of the overall program's namespace, rather than being separated into the random namespace and requiring the dot nomenclature to call it. This allows us to call it in line 5 without any special conditions.

If we try to do the same for randrange() in line 6, we get an error because we never imported randrange() explicitly. Even if we try to use the dot nomenclature in line 7, we still get an error because the entire random library was not imported.

One way around this is to use the from <module> import * command, which imports nearly all objects from the specified module. The problem with this is the possibility of name shadowing because of all the imported objects, especially if multiple modules are imported this way.

In general, if only a handful of objects are needed, explicitly importing them is the safest way to work with them. If you need most or all of a library, you can use the import * command (it's easier to work with but not as safe) or the dot nomenclature (which is safer but requires more typing).


Share:

Monday, June 6, 2022

Importing modules

Modules are also called libraries or packages. Modules are modular, often self-contained Python programs that are commonly utilized in other programs, hence the need to import them for access.

Modules are used to separate code to make a program easier to work with, as each module can be designed to do one thing well, rather than having to make a single program that is responsible for all logic.

The Python Package Index (PyPI) website (https://pypi.org) is the official repository of third-party Python libraries. There are more than 150,000 packages available for download from PyPI. Most of these  packages are designed to be imported into a Python project to provide additional, or easier, functionality than can be achieved with the default Python libraries.

Another benefit of modules is that they create additional namespaces for code. Namespaces (also called scopes) are the hierarchy of control that a module has. Normally, objects outside of a module aren't visible to code within the module; thus, they can't be accessed or utilized within the current module.

The benefit of this segregation is that variable shadowing is less likely. Variable shadowing is the creation of duplicate variable names in different blocks of code, such that one variable is hidden (shadowed) by an identical variable and cannot be accessed, or the Python interpreter may call the incorrect variable.

Using a module allows the same variable name to be used in multiple locations without requiring shadowing to occur, as a specific variable is identified by the module it resides in. Of course, there is nothing to stop a programmer from using the same variable within a module, with the potential of shadowing occurring, but the namespace hierarchy makes that unlikely.

Global variables are an option, but aren't recommended. Global variables allow a programmer to define a variable that can be accessed within any namespace; they are commonly used to contain data that is used in multiple locations, such as a counter. Of course, this leaves open the possibility that a global variable will be overwritten without the programmer realizing, causing a problem later on in the program.

Program scope works inside-out. As a module is typically made with multiple functions or methods, when an object is called, the Python interpreter will look for the correct reference within the current function/method. If the object isn't defined there, the interpreter will move to the enclosing container, if one exists (such as another function or a method's class). If the variable can't be found there, the interpreter looks for a global variable. Not finding one, the interpreter will look in the built-in libraries. If still not found, Python will generate an error. The flow looks like this: local container|enclosing container|global scope|built-in module|error.

The following is a simple program that should help explain this idea a little better. We haven't directly talked about functions or if...else statements yet, but hopefully this won't be too confusing: 

# scope_example.py (part 1)

1 var1 = 1 # global variable

2

3 if var1 == 1:

4 var2 = 0 # also a global variable

5 print("Unmodified var2: {}".format(var2))

6

7 def my_funct():

8 var3 = 3 # local variable

9 var1 = 42 # shadows global var1 variable

10 global var2

11 var2 = 80

12 print("Inside function, var1: {}".format(var1))

13 print("Inside function, var2: {}".format(var2))

14 print("Inside function, var3: {}".format(var3))

In line 1, a global variable is created. Line 3 is a test to see whether var1 is equal to 1; if so, then a new global variable (var2) is created, and its value is printed.

Line 7 is the start of a Python function. Within this function, a variable that is only accessible within the function is created (var3). In addition, a new variable (var1) is created; this variable hides the previously made global variable var1, so when the value of this local var1 is printed in line 12, the local value is printed, rather than the value of the global variable.

Line 10 explicitly calls the global variable var2; this allows the function to manipulate the global variable in line 11 without attempting to make a local variable that would shadow it. 

Lines 12-14 print the values of the variables as seen within the scope of the function:

# scope_example.py (part 2)

1 my_funct()

2

3 print("Outside function, var1: {}".format(var1))

4 print("Outside function, var2: {}".format(var2))

5 print("Outside function, var3: {}".format(var3))

In the second part, line 1 is the call to the function to actually run it. Lines 3-5 print the values of the variables as seen outside the function.

The following screenshot displays the output of the previous program.


The print() calls show the value of each variable at its respective location within the program. Initially, var1 and var2 are the values of the globally defined variables. Once the function has been called and performs its operations, the local var1 and var3 variables are printed, along with the global var2 variable, whose value has been replaced.

When the function is complete and we are back outside the function, we see that the globally defined variable var1 is back to its original value, and is no longer hidden by the local function variable of the same name. However, because var2 was explicitly called to reference the global variable, rather than shadow it, var2 retains the value it was assigned while within the function.

Finally, because var3 doesn't exist outside the function, the interpreter doesn't know what to do with it. Since we are no longer within the function, there is no local reference to it. Moving up the namespace, there is no encapsulating function or other object, and there is no global reference to var3. Since Python doesn't have a var3 object in any of its built-in libraries, the only thing Python can do with the call is to give up and throw an error.

Share: