Regular Expressions in Python

Our first program demonstrates the use of regular expressions in Python. Here’s the full source code with explanations:

import re

# This tests whether a pattern matches a string.
match = re.match(r"p([a-z]+)ch", "peach")
print(bool(match))

# In Python, we compile the regular expression pattern into a regex object.
r = re.compile(r"p([a-z]+)ch")

# Many methods are available on these objects. Here's a match test like we saw earlier.
print(bool(r.match("peach")))

# This finds the match for the regex.
print(r.search("peach punch").group())

# This also finds the first match but returns the
# start and end indexes for the match instead of the matching text.
match = r.search("peach punch")
print(f"idx: [{match.start()}, {match.end()}]")

# The group method includes information about both the whole-pattern matches 
# and the submatches within those matches.
match = r.search("peach punch")
print([match.group(), match.group(1)])

# Similarly, we can get information about the
# indexes of matches and submatches.
match = r.search("peach punch")
print([match.start(), match.end(), match.start(1), match.end(1)])

# The findall method finds all matches in the input, not just the first.
print(r.findall("peach punch pinch"))

# We can use finditer to get all matches along with their positions
matches = r.finditer("peach punch pinch")
print("all:", [[m.start(), m.end(), m.start(1), m.end(1)] for m in matches])

# Providing a number as the second argument to findall will limit the number of matches.
print(r.findall("peach punch pinch", 2))

# In Python, we can work directly with strings or bytes objects.
print(bool(r.match(b"peach")))

# When creating global variables with regular expressions, 
# you can use the re.compile function. There's no need for a separate "must compile" variant.
r = re.compile(r"p([a-z]+)ch")
print("regex:", r.pattern)

# The re module can also be used to replace subsets of strings with other values.
print(r.sub("<fruit>", "a peach"))

# The sub method allows you to transform matched text with a given function.
def upper_match(match):
    return match.group().upper()

print(r.sub(upper_match, "a peach"))

To run the program, save it as regular_expressions.py and use python:

$ python regular_expressions.py
True
True
peach
idx: [0, 5]
['peach', 'ea']
[0, 5, 1, 3]
['peach', 'punch', 'pinch']
all: [[0, 5, 1, 3], [6, 11, 7, 9], [12, 17, 13, 15]]
['peach', 'punch']
True
regex: p([a-z]+)ch
a <fruit>
a PEACH

For a complete reference on Python regular expressions, check the re module documentation.