A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern. RegEx can be used to check if a string contains the specified search pattern.
Python has a built-in package called re, which can be used to work with Regular Expressions.
1. re module
To use regular expressions, import the module:
import re
txt = "The rain in Spain"
x = re.search("^The.*Spain$", txt)
if x:
print("YES! We have a match!")
2. Key Methods
The re module offers a set of functions that allows us to search a string for a match.
findall()
Returns a list containing all matches. If no matches are found, an empty list is returned.
import re
txt = "The rain in Spain"
x = re.findall("ai", txt)
print(x) # ['ai', 'ai']
search()
Searches the string for a match, and returns a Match object if there is a match. If there is more than one match, only the first occurrence is returned.
import re
txt = "The rain in Spain"
x = re.search("\s", txt) # Search for white-space
print("The first white-space character is located in position:", x.start())
split()
Returns a list where the string has been split at each match.
import re
txt = "The rain in Spain"
x = re.split("\s", txt) # Split at each white-space
print(x) # ['The', 'rain', 'in', 'Spain']
sub()
Replaces the matches with the text of your choice.
import re
txt = "The rain in Spain"
x = re.sub("\s", "9", txt)
print(x) # The9rain9in9Spain
3. Metacharacters
Metacharacters are characters with a special meaning:
| Character | Description | Example |
|---|---|---|
[] | A set of characters | "[a-m]" |
\ | Signals a special sequence (can also be used to escape special characters) | "\d" |
. | Any character (except newline character) | "he..o" |
^ | Starts with | "^hello" |
$ | Ends with | "planet$" |
* | Zero or more occurrences | "he.*o" |
+ | One or more occurrences | "he.+o" |
? | Zero or one occurrences | "he.?o" |
{} | Exactly the specified number of occurrences | "al{2}o" |
| | Either or | "falls|stays" |
() | Capture and group |
4. Special Sequences
A special sequence is a \ followed by one of the characters in the list below:
\d: Returns a match where the string contains digits (numbers from 0-9)\w: Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character)\s: Returns a match where the string contains a white space character\D,\W,\S: The exact opposites of the above.
5. Grouping
You can extract parts of the match by grouping them with parentheses ().
import re
txt = "John Doe, age: 30"
match = re.search(r"(\w+) (\w+), age: (\d+)", txt)
if match:
print("First Name:", match.group(1)) # John
print("Last Name:", match.group(2)) # Doe
print("Age:", match.group(3)) # 30
(The r before the string literal defines a raw string, which prevents Python from parsing the backslashes as escape characters).
Discussion
Loading comments...