#|eval: false
#answer = input("What's your name?")
#answer.strip().title()
#print(f"Hello, {answer}")
# Abbreviated version
#answer = input("What's your name?").strip().title()
#print(f"hello, {answer}")
Strings
Basics
- Strings are a sequence of characters
- Think of a string as a list or a tupple
Input
- I’ll start here because inputs required from the user ecwill be treated as a variable regardless of what’s entered
- In other words, if we ask the user to input their name, they might enter numbers instead, so for now let’s just treat input as that a variable assigned whatever the user responds to the input prompt
- What you see in the first chunk is asking the user to enter a name
- Then we strip any spaces from either end of the answer, chained to title() which will capitalize the first letter of each word they input
- Then we format print the response
Indexing
- Each element of the sequence can be accessed using an index with the first character on the left being 0
- Let’s pretend we have a
Name= Michael Jackson
so:Name[0]->N
,Name[6]->l
Negative Indexing
- Is when we want to access the string in reverse order, the last character is -1 and counting down as we move towards the beginning of the string
Name[-1]-> n
&Name[-6]->a
Slice to Var
- You can bind a string to another var, even part of a string
- Here we slice part of Name and bind it to Mich
Name[0:4] -> Mich
&Name[8:12]->Jack
IMPORTANT: 0:4 -> Mich which is only 4 characters, while 0:4 is 5 characters in R. So in R 0:4 would be Micha. Same applies with negative indexing and mid string slicing like [8:12] -> Jack in R it would -> Jacks
Stride
- We can extract a stride value such as every second variable with ::2 as such:
Name[::2] -> "McalJcsn"
- We can incorporate striding and slicing:
Name[0:5:2] -> "Mca"
Len
- We can use the len() to obtain the length of the string
- len(Name) -> 15
# Measure some strings:
= ['cat', 'window', 'defenestrate']
words for w in words:
print(w, len(w))
cat 3
window 6
defenestrate 12
Concatenate
- Combine strings together end to beginning using
+
+
does not add a separator between the concatenated strings
Split
- Splits string into a list based on a delimiter
- Default separator is white space
- For example:
"$123,456,789"
if we split using","
we end up with"$123" "456" "789"
=" Hello, there. What's up, goose "
my_string= my_string.split(",")
split_text split_text
[' Hello', " there. What's up", ' goose ']
We can expand on the input and split:
- Let’s suppose the input we want is first , last name
- So answer from input above will have two words: first and last separated by space
- So answer = Santa Claus is what the user inputs and it is assigned to answer
- Now we can use split the answer by a separator and assign the left string to first and second string to last using
- first, last = name.split(” “)
- Now we can choose one or both variable to print
#|eval: false
# remember name = input("What's your name?").strip().title()
# first, last = name.split(" ")
# print(f"Hello, {first}")
Strip
- Removes leading and trailing whitespaces ONLY
- Does not remove any spaces inside the string
= my_string.strip()
stripped print(stripped)
Hello, there. What's up, goose
- Here is another example
= " this is crazy "
blah blah.strip()
'this is crazy'
#If we want to Title it we can chain commands
blah.strip().title()
'This Is Crazy'
Replicate
- We can replicate a string by using the multiplication like this
Name * 3 -> "Michael Jackson Michael Jackson Michael Jackson"
Immutable
- Strings are immutable, in other words you cannot change the value of a string
Escape Sequences
\
are meant to proceed escape sequences- Escape sequences are strings that are difficult to interpert on their own unless preceeded by the
\
print("Michael Jacksion \n is over there")
here means a new line- epresents a tab
- To use a \ in a string use the \\
Raw Strings
Raw strings are a powerful tool for handling textual data, especially when dealing with escape characters.
By prefixing a string literal with the letter ‘r’, Python treats the string as raw, meaning it interprets backslashes as literal characters rather than escape sequences.
Here is an example without the use of
r
:In the regular string regular_string variable, the backslashes (\n) are interpreted as escape sequences. Therefore, \n represents a newline character, which would lead to an incorrect file path representation.
= "C:\new_folder\file.txt"
regular_string print("Regular String:", regular_string)
Regular String: C:
ew_folderile.txt
- Here is the same using the raw string method
- In the raw string raw_string, the backslashes are treated as literal characters.
- This means that \n is not interpreted as a newline character, but rather as two separate characters, ’’ and ‘n’. Consequently, the file path is represented exactly as it appears.
= r"C:\new_folder\file.txt"
raw_string print("Raw String:", raw_string)
Raw String: C:\new_folder\file.txt
Method
Upper
- Method is an operation performed on a string such as
upper()
- new_name = Name.upper()
Lower
- new_name = Name.lower()
- Here is an example, ignore the meaning of the function
_init_
# Constructor method
def __init__(self, text):
= text.lower()
lowertext = lowertext.replace('.','').replace('!','').replace('?','').replace(',','')
cleantext self.fmtText = cleantext
Title
- What if we want to capitalize the first letter of each word, use title
= "hello gringo. did you see santa"
line line.title()
'Hello Gringo. Did You See Santa'
Replace
replace(old, new, count)
will do exactly that- count is for how many occurrences of the old value you want to replace, default is all occurrences if you omit
new_name = Name.replace('Michael', 'Janet')-> new_name:"Janet Jackson"
= 'Michael Jackson'
Name = Name.replace('Michael', 'Janet')
new_name new_name
'Janet Jackson'
Replace Multiple
- If you have multiple sequences to replace you can use one cumulative statement
- Here you can see that we can string
.replace()
together
= texttoedit.replace('.','').replace('!','').replace('?','').replace(',','') cleantext
Find
find(substring)
will search the given string for the substring and returns the index of the first character of the substring found in the given stringName.find('el') -> 5
e in el was encountered at the 5th index - remember the fist character index is 0 so the 5th would make it the sixth character in NameName.find('Jack') -> 8
which would mean J is the ninth character in Name
Count Occurrence
- If you want to count the occurrence of a specific string, or
- Count unique appearance of words
- Please refer to this example project which is found in Projects page.
RegEx
In Python, RegEx (short for Regular Expression) is a tool for matching and handling strings.
This RegEx module provides several functions for working with regular expressions, including search, split, findall, and sub.
Python provides a built-in module called re
, which allows you to work with regular expressions. First, import the re
module
import re
Search
The search() function searches for specified patterns within a string.
- Here is an example that explains how to use the search() function to search for the word “Jackson” in the string “Michael Jackson is the best”.
= "Michael Jackson is the best"
s1
# Define the pattern to search for
= r"Jackson"
pattern
# Use the search() function to search for the pattern in the string
= re.search(pattern, s1)
result
# Check if a match was found
if result:
print("Match found!")
else:
print("Match not found.")
Match found!
Regular expressions (RegEx) are patterns used to match and manipulate strings of text. There are several special sequences in RegEx that can be used to match specific characters or patterns.
Special Sequence | Meaning | Example |
---|---|---|
\d | Matches any digit character (0-9) | “123” matches “\d\d\d” |
\D | Matches any non-digit character | “hello” matches “\D\D\D\D\D” |
\w | Matches any word character (a-z, A-Z, 0-9, and _) | “hello_world” matches “\w\w\w\w\w\w\w\w\w\w\w” |
\W | Matches any non-word character | “@#$%” matches “\W\W\W\W” |
\s | Matches any whitespace character (space, tab, newline, etc.) | “hello world” matches “\w\w\w\w\w\s\w\w\w\w\w” |
\S | Matches any non-whitespace character | “hello_world” matches “\S\S\S\S\S\S\S\S\S” |
\b | Matches the boundary between a word character and a non-word character | “cat” matches “\bcat\b” in “The cat sat on the mat” |
\B | Matches any position that is not a word boundary | “cat” matches “\Bcat\B” in “category” but not in “The cat sat on the mat” |
d
- A simple example of using the
\d
special sequence in a regular expression pattern - The regular expression pattern is defined as r”\d\d\d\d\d\d\d\d\d\d”, which uses the \d special sequence to match any digit character (0-9)
- and the \d sequence is repeated ten times to match ten consecutive digits
= r"\d\d\d\d\d\d\d\d\d\d" # Matches any ten consecutive digits
pattern = "My Phone number is 1234567890"
text = re.search(pattern, text)
match
if match:
print("Phone number found:", match.group())
else:
print("No match")
Phone number found: 1234567890
w
- A simple example of using the
\W
special sequence in a regular expression pattern - The regular expression pattern is defined as r”\W”, which uses the \W special sequence to match any character that is not a word character (a-z, A-Z, 0-9, or _)
- The string we’re searching for matches in is “Hello, world!”
findall
The findall() function finds all occurrences of a specified pattern within a string.
import re
= "Michael Jackson was a singer and known as the 'King of Pop'"
s2 # Use the findall() function to find all occurrences of the "as" in the string
"as",s2) re.findall(
['as', 'as']
# Print out the list of matched words
# Note was a singer a s is counted
split
A regular expression’s split() function splits a string into an array of substrings based on a specified pattern.
# Use the split function to split the string by the "\s"
'\s', s2) re.split(
['Michael', 'Jackson', 'was', 'a', 'singer', 'and', 'known', 'as', 'the', "'King", 'of', "Pop'"]
# The split_array contains all the substrings, split by whitespace characters
sub
The sub function of a regular expression in Python is used to replace all occurrences of a pattern in a string with a specified replacement.
# Define the regular expression pattern to search for
= r"King of Pop"
pattern
# Define the replacement string
= "legend"
replacement
# Use the sub function to replace the pattern with the replacement string
= re.sub(pattern, replacement, s2, flags=re.IGNORECASE)
new_string
# The new_string contains the original string with the pattern replaced by the replacement string
print(new_string)
Michael Jackson was a singer and known as the 'legend'
Formatting Strings
f-string - Interpolation
Format strings are a way to inject variables into a string in Python. They are used to format strings and produce more human-readable outputs. There are several ways to format strings in Python.
Introduced in Python 3.6, f-strings are a new way to format strings in Python. They are prefixed with ‘f’ and use curly braces {} to enclose the variables that will be formatted.
= "John"
name = 30
age print(f"My name is {name} and I am {age} years old.")
My name is John and I am 30 years old.
- F-strings are also able to evaluate expressions inside the curly braces
= 10
x = 20
y print(f"The sum of x and y is {x+y}.")
The sum of x and y is 30.
str.format
- Use curly braces {} as placeholders for variables which are passed as arguments in the format() method.
= "John"
name = 50
age print("My name is {} and I am {} years old.".format(name, age))
My name is John and I am 50 years old.
% operator
- This is one of the oldest ways to format strings in Python. It uses the % operator to replace variables in the string.
= "Johnathan"
name = 30
age print("My name is %s and I am %d years old." % (name, age))
My name is Johnathan and I am 30 years old.