Python RegEx

A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern. RegEx can be used to check if a string contains the specified search pattern.

Python has a built-in package called re, which can be used to work with Regular Expressions.

1. re module

To use regular expressions, import the module:

import re

txt = "The rain in Spain"
x = re.search("^The.*Spain$", txt)

if x:
  print("YES! We have a match!")

2. Key Methods

The re module offers a set of functions that allows us to search a string for a match.

`findall()`

Returns a list containing all matches. If no matches are found, an empty list is returned.

import re
txt = "The rain in Spain"
x = re.findall("ai", txt)
print(x) # ['ai', 'ai']

`search()`

Searches the string for a match, and returns a Match object if there is a match. If there is more than one match, only the first occurrence is returned.

import re
txt = "The rain in Spain"
x = re.search("\s", txt) # Search for white-space
print("The first white-space character is located in position:", x.start())

`split()`

Returns a list where the string has been split at each match.

import re
txt = "The rain in Spain"
x = re.split("\s", txt) # Split at each white-space
print(x) # ['The', 'rain', 'in', 'Spain']

`sub()`

Replaces the matches with the text of your choice.

import re
txt = "The rain in Spain"
x = re.sub("\s", "9", txt)
print(x) # The9rain9in9Spain

3. Metacharacters

Metacharacters are characters with a special meaning:

Character	Description	Example
`[]`	A set of characters	`"[a-m]"`
`\`	Signals a special sequence (can also be used to escape special characters)	`"\d"`
`.`	Any character (except newline character)	`"he..o"`
`^`	Starts with	`"^hello"`
`$`	Ends with	`"planet$"`
`*`	Zero or more occurrences	`"he.*o"`
`+`	One or more occurrences	`"he.+o"`
`?`	Zero or one occurrences	`"he.?o"`
`{}`	Exactly the specified number of occurrences	`"al{2}o"`
`\|`	Either or	`"falls\|stays"`
`()`	Capture and group

4. Special Sequences

A special sequence is a \ followed by one of the characters in the list below:

\d: Returns a match where the string contains digits (numbers from 0-9)
\w: Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character)
\s: Returns a match where the string contains a white space character
\D, \W, \S: The exact opposites of the above.

5. Grouping

You can extract parts of the match by grouping them with parentheses ().

import re
txt = "John Doe, age: 30"
match = re.search(r"(\w+) (\w+), age: (\d+)", txt)

if match:
    print("First Name:", match.group(1)) # John
    print("Last Name:", match.group(2))  # Doe
    print("Age:", match.group(3))        # 30

(The r before the string literal defines a raw string, which prevents Python from parsing the backslashes as escape characters).

Python RegEx

1. re module

2. Key Methods

findall()

search()

split()

sub()