Regular Expressions in Python
Regular expressions, or regex, are powerful tools for searching, matching, and manipulating text. Python’s re module lets you harness this power to validate inputs, extract data, and automate string operations efficiently.
What You’re Going to Learn
You’ll learn the basics of regular expressions using Python — including how to write patterns, use common symbols.
What Is a Regular Expression?
A regular expression (or regex) is a special pattern of characters used to search, match, or manipulate text. Instead of scanning strings manually, you can describe what you're looking for using precise patterns — whether it's digits, email addresses, dates, or specific word formats.
Python comes with a built-in module called re that makes working with regular expressions both powerful and efficient.
Here's a basic example: searching for the word "hello" in a string.
import re
# Look for the word 'hello' in a string
match = re.search(r'hello', 'hello world')
if match:
print("Found it!")
import re
# Look for the word 'hello' in a string
match = re.search(r'hello', 'hello world')
if match:
print("Found it!")
🔍 How It Works
- import re: Brings in Python’s regular expression module.
- r'hello': The pattern to search for. The r before the string means it's a raw string, so Python won’t treat \\ as escape characters.
- re.search(): Scans the string and returns a match object if the pattern is found anywhere in the text.
- match: If a match is found, this variable will be a match object — otherwise, it will be None.
- print("Found it!"): This runs only if a match is detected.
Output:
Found it!
Found it!
This example shows just how easy it is to start using regex in Python — and how powerful it can be for finding patterns in text.
Basic re
Module Methods:
Python’s re module provides powerful functions to work with regular expressions. These methods let you search, match, find all, split, and replace strings based on patterns. They’re essential for tasks like input validation, text parsing, and string manipulation.
Method | Details |
---|---|
re.search() | Searches for the first match anywhere in the string. Returns a match object or None. Example: re.search(r'\d+', 'Item 123') → match object |
re.match() | Checks for a match only at the beginning of the string. Returns a match object or None. Example: re.match(r'\d+', '123abc') → match object |
re.findall() | Returns a list of all non-overlapping matches as strings. Example: re.findall(r'\d+', 'abc123xyz456') → ['123', '456'] |
re.finditer() | Returns an iterator yielding match objects for all matches. Example: [m.group() for m in re.finditer(r'\d+', 'abc123xyz456')] → ['123', '456'] |
re.sub() | Replaces matches with a replacement string (or function). Example: re.sub(r'\d+', '456', 'hello 123') → 'hello 456' |
re.split() | Splits string by regex pattern into a list. Example: re.split(r'[;,]', 'a,b;c') → ['a', 'b', 'c'] |
re.compile() | Compiles a regex pattern into a regex object for repeated use (more efficient). Example: pattern = re.compile(r'\d+') |
match_obj.group() | Extracts matched text from a match object. Example:m = re.search(r'\d+', 'abc123'); m.group() → '123' |
match_obj.start() | Returns the start index of the match. Example: m.start() → 3 |
match_obj.end() | Returns the end index of the match. Example: m.end() → 6 |
Example usage:
import re
pattern = r"\d+"
text = "There are 3 cats and 4 dogs."
matches = re.findall(pattern, text)
print(matches) # Output: ['3', '4']
import re
pattern = r"\d+"
text = "There are 3 cats and 4 dogs."
matches = re.findall(pattern, text)
print(matches) # Output: ['3', '4']
To use regular expressions in Python, always start by importing the re module. You can also combine it with raw strings (e.g., r"\d+") to avoid escaping backslashes.
To learn more about the re module in depth, check out this full re module page.
Basic Patterns
These are some essential regex characters you'll frequently use in Python (with the re module):
Pattern | Meaning & Examples |
---|---|
. | Matches any character (except newline by default) Pattern c.t Matched "cat", "cut" Not Matched "ct", "cart" |
^ | Matches the start of a string or line (with re.MULTILINE) Pattern ^a Matched "apple", "ant" Not Matched "bat", "maple" |
$ | Matches the end of a string or line (with re.MULTILINE) Pattern e$ Matched "toe", "apple" Not Matched "each", "tap" |
* | Matches 0 or more repetitions Pattern lo* Matched "loo", "lo", "l" Not Matched "snap", "pop" |
+ | Matches 1 or more repetitions Pattern lo+ Matched "loo", "lo" Not Matched "lap", "lip" |
? | Matches 0 or 1 repetition Pattern colou?r Matched "color", "colour" Not Matched "colouur", "colr" |
{n} | Matches exactly n times Pattern a3 Matched "aaa" Not Matched "aa", "aaaa" |
{n,} | Matches at least n times Pattern a{2,} Matched "aa", "aaa", "aaaa" Not Matched "a" |
{n,m} | Matches between n and m times Pattern a{2,4} Matched "aa", "aaa", "aaaa" Not Matched "a", "aaaaa" |
[ae] | Matches one character in the set Pattern [ae] Matched "a", "e" Not Matched "i", "o" |
| | Matches one pattern or another Pattern cat|dog Matched "cat", "dog" Not Matched "bat", "fog" |
() | Groups pattern for capturing or applying quantifiers Pattern (ab)+ Matched "ab", "abab" Not Matched "a", "aba" |
\d | Matches any digit (0–9) Pattern \d Matched "1", "7" Not Matched "a", "-" |
\D | Matches any non-digit Pattern \D Matched "a", "@" Not Matched "0", "9" |
\w | Matches word character (a-z, A-Z, 0-9, _) Pattern \w Matched "a", "5", "_" Not Matched "!", " " |
\W | Matches non-word character Pattern \W Matched "@", "!" Not Matched "a", "9" |
\s | Matches whitespace character (space, tab, newline) Pattern \s Matched " ", "\n" Not Matched "a", "." |
\S | Matches non-whitespace character Pattern \S Matched "a", "@", "1" Not Matched " ", "\t" |
\b | Matches word boundary Pattern \bcat Matched "cat food", "the cat" Not Matched "concatenate", "educate" |
\B | Matches non-word boundary Pattern \Bcat Matched "concatenate", "educate" Not Matched "cat food", "the cat" |
Example: Validating an Email Address
You can use the re module in Python to check whether a string is a valid email format. The re.match() function tests if the pattern matches at the start of the string and returns a match object or None.
import re
email_pattern = r'^[^\s@]+@[^\s@]+\.[^\s@]+$'
email = "user@example.com"
if re.match(email_pattern, email):
print("Valid email address")
else:
print("Invalid email address")
import re
email_pattern = r'^[^\s@]+@[^\s@]+\.[^\s@]+$'
email = "user@example.com"
if re.match(email_pattern, email):
print("Valid email address")
else:
print("Invalid email address")
How It Works:
- r'^[^\\s@]+@[^\\s@]+\\.[^\\s@]+$': Regex pattern that checks the email structure — characters before and after @, and a . in the domain part.
- re.match(): Function that checks if the pattern matches from the start of the string, returning a match object or None.
- "user@example.com": The input string being validated.
- If the string matches, "Valid email address" is printed; otherwise, "Invalid email address".
Output
Valid email address
Valid email address
Example 2: Extracting All Numbers from a String
You can use the re.findall() function in Python to find all occurrences of a pattern in a string. This example extracts all the numbers from a given string.
import re
text = "There are 12 apples, 24 oranges, and 7 bananas."
numbers = re.findall(r'\d+', text)
print(numbers) # Output: ['12', '24', '7']
import re
text = "There are 12 apples, 24 oranges, and 7 bananas."
numbers = re.findall(r'\d+', text)
print(numbers) # Output: ['12', '24', '7']
How It Works:
- r'\\d+': Regex pattern that matches one or more digits.
- re.findall(): Finds all substrings in text matching the pattern and returns them as a list.
- The result is a list of strings representing all numbers found in the text.
Output
['12', '24', '7']
['12', '24', '7']
Example 3: Replacing All Whitespace with a Dash
You can use the re.sub() function in Python to replace parts of a string matching a regex pattern. This example replaces all whitespace characters with a dash -.
import re
text = "Hello world! This is regex."
result = re.sub(r'\s+', '-', text)
print(result) # Output: Hello-world!-This-is-regex.
import re
text = "Hello world! This is regex."
result = re.sub(r'\s+', '-', text)
print(result) # Output: Hello-world!-This-is-regex.
How It Works:
- r'\\s+': Regex pattern matching one or more whitespace characters (spaces, tabs, newlines).
- re.sub(): Replaces all occurrences of the pattern in text with a dash -.
- The result is a string where all whitespace sequences are replaced by a single dash.
Output
Hello-world!-This-is-regex.
Hello-world!-This-is-regex.
Frequently Asked Questions
What is regex in Python?
What is regex in Python?
Regex (regular expressions) is a sequence of characters used to match patterns in text. Python uses the re module to work with regex for searching, matching, and manipulating strings.
How do I use the re module in Python?
How do I use the re module in Python?
You need to import the re module first. Then you can use functions like re.match(), re.search(), re.findall(), and re.sub() to work with regular expressions.
What is the difference between re.match() and re.search()?
What is the difference between re.match() and re.search()?
re.match() checks for a match only at the beginning of the string, while re.search() scans through the entire string and returns the first match found.
How can I use regex to find all matches in a string?
How can I use regex to find all matches in a string?
You can use re.findall() to return a list of all non-overlapping matches of a pattern in the string.
Can I use regex to replace text in Python?
Can I use regex to replace text in Python?
Yes, the re.sub() function allows you to replace occurrences of a regex pattern with a specified replacement string.
What's Next?
Now that you've learned the basics of regular expressions in Python, you can explore asynchronous programming with async
and await
. This powerful feature helps you write efficient, non-blocking code for tasks like I/O operations, networking, and concurrency.