Regular Expressions in Python

Regular expressions, or regex, are powerful tools for searching, matching, and manipulating text. Python’s re module lets you harness this power to validate inputs, extract data, and automate string operations efficiently.



What You’re Going to Learn

You’ll learn the basics of regular expressions using Python — including how to write patterns, use common symbols.


What Is a Regular Expression?

A regular expression (or regex) is a special pattern of characters used to search, match, or manipulate text. Instead of scanning strings manually, you can describe what you're looking for using precise patterns — whether it's digits, email addresses, dates, or specific word formats.

Python comes with a built-in module called re that makes working with regular expressions both powerful and efficient.

Here's a basic example: searching for the word "hello" in a string.

python
import re

# Look for the word 'hello' in a string
match = re.search(r'hello', 'hello world')

if match:
    print("Found it!")

🔍 How It Works

  • import re: Brings in Python’s regular expression module.
  • r'hello': The pattern to search for. The r before the string means it's a raw string, so Python won’t treat \\ as escape characters.
  • re.search(): Scans the string and returns a match object if the pattern is found anywhere in the text.
  • match: If a match is found, this variable will be a match object — otherwise, it will be None.
  • print("Found it!"): This runs only if a match is detected.

Output:

Found it!

This example shows just how easy it is to start using regex in Python — and how powerful it can be for finding patterns in text.


Basic re Module Methods:

Python’s re module provides powerful functions to work with regular expressions. These methods let you search, match, find all, split, and replace strings based on patterns. They’re essential for tasks like input validation, text parsing, and string manipulation.

MethodDetails
re.search()
Searches for the first match anywhere in the string. Returns a match object or None.
Example: re.search(r'\d+', 'Item 123')match object
re.match()
Checks for a match only at the beginning of the string. Returns a match object or None.
Example: re.match(r'\d+', '123abc')match object
re.findall()
Returns a list of all non-overlapping matches as strings.
Example: re.findall(r'\d+', 'abc123xyz456')['123', '456']
re.finditer()
Returns an iterator yielding match objects for all matches.
Example: [m.group() for m in re.finditer(r'\d+', 'abc123xyz456')]['123', '456']
re.sub()
Replaces matches with a replacement string (or function).
Example: re.sub(r'\d+', '456', 'hello 123')'hello 456'
re.split()
Splits string by regex pattern into a list.
Example: re.split(r'[;,]', 'a,b;c')['a', 'b', 'c']
re.compile()
Compiles a regex pattern into a regex object for repeated use (more efficient).
Example: pattern = re.compile(r'\d+')
match_obj.group()
Extracts matched text from a match object.
Example:m = re.search(r'\d+', 'abc123'); m.group() → '123'
match_obj.start()
Returns the start index of the match.
Example: m.start()3
match_obj.end()
Returns the end index of the match.
Example: m.end()6

Example usage:

python
import re

pattern = r"\d+"
text = "There are 3 cats and 4 dogs."

matches = re.findall(pattern, text)
print(matches)  # Output: ['3', '4']

To use regular expressions in Python, always start by importing the re module. You can also combine it with raw strings (e.g., r"\d+") to avoid escaping backslashes.

To learn more about the re module in depth, check out this full re module page.


Basic Patterns

These are some essential regex characters you'll frequently use in Python (with the re module):

Pattern Meaning & Examples
.
Matches any character (except newline by default)
Pattern
c.t
Matched
"cat", "cut"
Not Matched
"ct", "cart"
^
Matches the start of a string or line (with re.MULTILINE)
Pattern
^a
Matched
"apple", "ant"
Not Matched
"bat", "maple"
$
Matches the end of a string or line (with re.MULTILINE)
Pattern
e$
Matched
"toe", "apple"
Not Matched
"each", "tap"
*
Matches 0 or more repetitions
Pattern
lo*
Matched
"loo", "lo", "l"
Not Matched
"snap", "pop"
+
Matches 1 or more repetitions
Pattern
lo+
Matched
"loo", "lo"
Not Matched
"lap", "lip"
?
Matches 0 or 1 repetition
Pattern
colou?r
Matched
"color", "colour"
Not Matched
"colouur", "colr"
{n}
Matches exactly n times
Pattern
a3
Matched
"aaa"
Not Matched
"aa", "aaaa"
{n,}
Matches at least n times
Pattern
a{2,}
Matched
"aa", "aaa", "aaaa"
Not Matched
"a"
{n,m}
Matches between n and m times
Pattern
a{2,4}
Matched
"aa", "aaa", "aaaa"
Not Matched
"a", "aaaaa"
[ae]
Matches one character in the set
Pattern
[ae]
Matched
"a", "e"
Not Matched
"i", "o"
|
Matches one pattern or another
Pattern
cat|dog
Matched
"cat", "dog"
Not Matched
"bat", "fog"
()
Groups pattern for capturing or applying quantifiers
Pattern
(ab)+
Matched
"ab", "abab"
Not Matched
"a", "aba"
\d
Matches any digit (0–9)
Pattern
\d
Matched
"1", "7"
Not Matched
"a", "-"
\D
Matches any non-digit
Pattern
\D
Matched
"a", "@"
Not Matched
"0", "9"
\w
Matches word character (a-z, A-Z, 0-9, _)
Pattern
\w
Matched
"a", "5", "_"
Not Matched
"!", " "
\W
Matches non-word character
Pattern
\W
Matched
"@", "!"
Not Matched
"a", "9"
\s
Matches whitespace character (space, tab, newline)
Pattern
\s
Matched
" ", "\n"
Not Matched
"a", "."
\S
Matches non-whitespace character
Pattern
\S
Matched
"a", "@", "1"
Not Matched
" ", "\t"
\b
Matches word boundary
Pattern
\bcat
Matched
"cat food", "the cat"
Not Matched
"concatenate", "educate"
\B
Matches non-word boundary
Pattern
\Bcat
Matched
"concatenate", "educate"
Not Matched
"cat food", "the cat"

Example: Validating an Email Address

You can use the re module in Python to check whether a string is a valid email format. The re.match() function tests if the pattern matches at the start of the string and returns a match object or None.

python
import re

email_pattern = r'^[^\s@]+@[^\s@]+\.[^\s@]+$'
email = "user@example.com"

if re.match(email_pattern, email):
    print("Valid email address")
else:
    print("Invalid email address")

How It Works:

  • r'^[^\\s@]+@[^\\s@]+\\.[^\\s@]+$': Regex pattern that checks the email structure — characters before and after @, and a . in the domain part.
  • re.match(): Function that checks if the pattern matches from the start of the string, returning a match object or None.
  • "user@example.com": The input string being validated.
  • If the string matches, "Valid email address" is printed; otherwise, "Invalid email address".

Output

Valid email address

Example 2: Extracting All Numbers from a String

You can use the re.findall() function in Python to find all occurrences of a pattern in a string. This example extracts all the numbers from a given string.

python
import re

text = "There are 12 apples, 24 oranges, and 7 bananas."
numbers = re.findall(r'\d+', text)

print(numbers)  # Output: ['12', '24', '7']

How It Works:

  • r'\\d+': Regex pattern that matches one or more digits.
  • re.findall(): Finds all substrings in text matching the pattern and returns them as a list.
  • The result is a list of strings representing all numbers found in the text.

Output

['12', '24', '7']

Example 3: Replacing All Whitespace with a Dash

You can use the re.sub() function in Python to replace parts of a string matching a regex pattern. This example replaces all whitespace characters with a dash -.

python
import re

text = "Hello   world! This is   regex."
result = re.sub(r'\s+', '-', text)

print(result)  # Output: Hello-world!-This-is-regex.

How It Works:

  • r'\\s+': Regex pattern matching one or more whitespace characters (spaces, tabs, newlines).
  • re.sub(): Replaces all occurrences of the pattern in text with a dash -.
  • The result is a string where all whitespace sequences are replaced by a single dash.

Output

Hello-world!-This-is-regex.

Frequently Asked Questions

What is regex in Python?

Regex (regular expressions) is a sequence of characters used to match patterns in text. Python uses the re module to work with regex for searching, matching, and manipulating strings.


How do I use the re module in Python?

You need to import the re module first. Then you can use functions like re.match(), re.search(), re.findall(), and re.sub() to work with regular expressions.


What is the difference between re.match() and re.search()?

re.match() checks for a match only at the beginning of the string, while re.search() scans through the entire string and returns the first match found.


How can I use regex to find all matches in a string?

You can use re.findall() to return a list of all non-overlapping matches of a pattern in the string.


Can I use regex to replace text in Python?

Yes, the re.sub() function allows you to replace occurrences of a regex pattern with a specified replacement string.



What's Next?

Now that you've learned the basics of regular expressions in Python, you can explore asynchronous programming with async and await. This powerful feature helps you write efficient, non-blocking code for tasks like I/O operations, networking, and concurrency.