String Operations with NumPy np.char

NumPy’s char module provides a powerful set of vectorized string operations designed to work efficiently with entire arrays of strings. These functions help you manipulate, transform, and evaluate strings without using explicit Python loops.

Why Use np.char for Strings?

  • Vectorized performance: Perform string operations across large arrays without slow Python loops.
  • Consistent syntax: Functions mirror Python’s native string methods (e.g., lower, split, replace).
  • Seamless integration: Easily combine with NumPy arrays and other numerical data.

Whether you're cleaning up data, parsing strings, or building text-based pipelines, mastering np.char functions can make your string manipulation workflows faster and more efficient.


Basic String Case Operations

NumPy's np.char functions allow you to easily change the case of strings within an array. These operations are vectorized, so they apply to every string element in the array.

python
import numpy as np

arr = np.array(["Hello WORLD", "NumPy STRING functions", "python is Fun"])

print("Original:", arr)

print("Lowercase:", np.char.lower(arr))
print("Uppercase:", np.char.upper(arr))
print("Capitalized:", np.char.capitalize(arr))
print("Title Case:", np.char.title(arr))

Common Case Functions

  • np.char.lower() – converts all characters to lowercase
  • np.char.upper() – converts all characters to uppercase
  • np.char.capitalize() – capitalizes the first letter of each string
  • np.char.title() – capitalizes the first letter of each word

Output

Original: ['Hello WORLD' 'NumPy STRING functions' 'python is Fun']
Lowercase: ['hello world' 'numpy string functions' 'python is fun']
Uppercase: ['HELLO WORLD' 'NUMPY STRING FUNCTIONS' 'PYTHON IS FUN']
Capitalized: ['Hello world' 'Numpy string functions' 'Python is fun']
Title Case: ['Hello World' 'Numpy String Functions' 'Python Is Fun']

💡 Tip: These operations are especially useful for cleaning inconsistent text data in preprocessing steps.


Replacing and Splitting Strings

NumPy offers convenient functions for modifying string content within arrays. Whether you need to replace substrings or split text into parts, np.char provides efficient and vectorized tools for the job.

1. Replacing Substrings

Use np.char.replace() to replace occurrences of a substring with another. It works on each string element in the array.

python
import numpy as np

arr = np.array(["data-cleaning", "data-mining", "data-visualization"])

# Replace 'data-' with 'info-'
replaced = np.char.replace(arr, "data-", "info-")

print("Original:", arr)
print("Replaced:", replaced)

Output

Original: ['data-cleaning' 'data-mining' 'data-visualization']
Replaced: ['info-cleaning' 'info-mining' 'info-visualization']

2. Splitting Strings

Use np.char.split() to split strings based on a separator. This returns an array of Python lists, with each string split into parts.

python
arr = np.array(["first,last", "name,surname", "python,numpy"])

# Split by comma
split = np.char.split(arr, ",")

print("Split:", split)

Output

Split: [list(['first', 'last']) list(['name', 'surname']) list(['python', 'numpy'])]

💡 Tip: After splitting, you’ll get an array of lists. To work with individual elements, consider looping or converting to a different structure depending on your use case.


String Comparisons with NumPy

NumPy provides vectorized string comparison functions that let you compare entire arrays of strings efficiently. These functions behave similarly to Python’s native string comparison methods, but work element-wise over arrays.

1. Equality and Inequality Checks

Use np.char.equal() and np.char.not_equal() to compare strings for exact equality or difference.

python
import numpy as np

arr1 = np.array(["sky", "land", "sea"])
arr2 = np.array(["sky", "water", "sea"])

equal = np.char.equal(arr1, arr2)
not_equal = np.char.not_equal(arr1, arr2)

print("Equal:", equal)
print("Not Equal:", not_equal)

Output

Equal: [ True False  True]
Not Equal: [False  True False]

2. Startswith and Endswith

You can use np.char.startswith() and np.char.endswith() to test whether strings start or end with specific substrings.

python
arr = np.array(["filename.txt", "report.pdf", "image.png"])

starts_with_file = np.char.startswith(arr, "file")
ends_with_pdf = np.char.endswith(arr, ".pdf")

print("Starts with 'file':", starts_with_file)
print("Ends with '.pdf':", ends_with_pdf)

Output

Starts with 'file': [ True False False]
Ends with '.pdf': [False  True False]

💡 Tip: These functions are particularly useful for filtering or validating string-based data like filenames, tags, or labels.


Stripping and Joining Strings

NumPy provides string manipulation tools to clean and concatenate string arrays efficiently. With np.char.strip() and np.char.join(), you can remove unwanted characters or combine elements in a structured way.

1. Stripping Whitespace or Characters

Use strip(), lstrip(), and rstrip() to remove characters from both ends, the left, or the right of each string.

python
import numpy as np

arr = np.array(["  hello ", "  world", "python  "])

stripped = np.char.strip(arr)
left_stripped = np.char.lstrip(arr)
right_stripped = np.char.rstrip(arr)

print("Original:", arr)
print("Stripped:", stripped)
print("Left Stripped:", left_stripped)
print("Right Stripped:", right_stripped)

Output

Original: ['  hello ' '  world' 'python  ']
Stripped: ['hello' 'world' 'python']
Left Stripped: ['hello ' 'world' 'python  ']
Right Stripped: ['  hello' '  world' 'python']

2. Joining Strings

Use np.char.add() to concatenate string arrays element-wise, and np.char.join() to insert a separator between characters of each string.

python
arr1 = np.array(["data", "machine"])
arr2 = np.array(["science", "learning"])

added = np.char.add(arr1, arr2)
joined = np.char.join("-", arr1)

print("Added:", added)
print("Joined with hyphen:", joined)

Output

Added: ['datascience' 'machinelearning']
Joined with hyphen: ['d-a-t-a' 'm-a-c-h-i-n-e']

💡 Tip: Use np.char.add() for combining arrays of strings, and np.char.join() for formatting individual string elements with custom separators.


Frequently Asked Questions

What is the purpose of np.char in NumPy?

The np.char module is used for vectorized string operations in NumPy. It provides efficient functions for string manipulations like case transformations, string replacements, and more.


How can I convert all strings in a NumPy array to uppercase?

Use np.char.upper() to convert all strings in the NumPy array to uppercase. For example: np.char.upper(arr).


How do I replace a substring in all elements of a NumPy string array?

You can replace substrings using np.char.replace(). For example, np.char.replace(arr, 'old', 'new') replaces 'old' with 'new' in all strings of the array.


Can I check if a string contains a substring in a NumPy array?

Yes, you can use np.char.find() to check for substrings in a string array. It returns the index of the first occurrence, or -1 if the substring is not found.


How do I join strings from a NumPy array?

You can use np.char.add() to concatenate strings in a NumPy array. For more complex join operations, use np.char.join().


How do I split a string in a NumPy array?

Use np.char.split() to split strings in a NumPy array. For example, np.char.split(arr, ' ') will split strings by spaces.


How do I check the length of each string in a NumPy array?

You can check the length of each string using np.char.str_len(). Example: np.char.str_len(arr) gives you the length of each string in the array.


Can I modify strings in a NumPy array directly using np.char?

Yes, you can modify the strings in a NumPy array directly using np.char functions. For instance, you can use np.char.replace() to modify substrings or np.char.upper() to change all strings to uppercase.



What's Next?

Up next, we’ll dive into datetime64 and timedelta64 in NumPy — powerful tools for handling and manipulating dates and times in arrays. You’ll learn how to work with date and time data, perform arithmetic on dates, and format time intervals in your NumPy arrays.