Wednesday, May 4, 2022

Indexing and slicing strings

Python strings functionally operate the same as Python lists, which are basically C arrays (see the Lists section). Unlike C arrays, characters within a string can be accessed both forward and backward. Frontward, a string starts off with a position of 0 and the character desired is found through an offset value (how far to move from the beginning of the string). However, you also can find this character by using a negative offset value from the end of the string.

The following screenshot briefly demonstrates this:


Line 30 creates a string variable, and then line 31 requests the characters at position 0 (the very first entry of the string), as well as the second character from the end of the string.

Indexing is simply telling Python where a character can be found within the string. Like many other languages, Python starts counting at 0 instead of 1. So the first character's index is 0, the second character's index is 1, and so on. It's the same counting backward through the string, except that the last letter's index is -1 instead of 0 (since 0 is already taken). Therefore, to index the final letter, you would use -1, the second-to-last letter is -2, and so on. Knowing the index of a character is important for slicing.

Slicing a string is basically what it sounds like: by giving upper and lower index values, we can slice the string into sections and pull out just the characters we want. A great example of this is when processing an input file where each line is terminated with a newline character; just slice off the last character and process each line.

The following screenshot demonstrates how string slicing works in more detail.


You'll note in the previous screenshot that the colon symbol is used when indicating the slice. The colon acts as a separator between the upper and lower index values. If one of those values is not given, Python interprets that to mean that you want everything from the index value to the end of the string. In the preceding example, the first slice is from index 1 (the second letter, inclusive) to index 3 (the fourth letter, exclusive). You can consider the index to actually be the space before each letter; that's why the letter m isn't included in the first slice but the letter p is. 

The second slice is from index 1 (the second letter) to the end of the string. The third slice starts as the beginning of the string and includes everything except the last character.

One neat feature about the [:-1] index: it works on any character, not just letters or numbers. So if you have a newline character (\n), you can put [:-1] in your code to slice off that character, leaving you with just the text you care about.

You'll see that entering -1 as the ending value makes it easy to find the end of a string. You could, alternatively, use len(S) to get the length of the string, and then use that to identify the last value, but why bother when [:-1] does the same thing?

You could also use slicing to process command-line arguments by filtering out the program name. When the Python interpreter receives a program to process, the very first argument provided to the OS is the name of the program. By slicing out the first argument, we can capture the real arguments for processing.

For example, the following code shows how Python can automatically strip out the program's name from a list of arguments passed in to the operating system:

capture_arguments.py

1 import sys

2 if len(sys.argv) > 1: # Check if arguments are provided

3 entered_value = sys.argv[1:] # Capture all arguments except program name

Next we'll discuss about string formatting.

Share:

0 comments:

Post a Comment