NumPy String Functions with Examples

In this tutorial, we will be diving deep into the world of strings and characters. NumPy is primarily known for its powerful array manipulation capabilities, especially when it comes to numerical data. However, NumPy also supports string operations for several reasons:

Unified Data Manipulation: NumPy aims to provide a unified framework for handling different types of data, including numerical data and strings. This makes it convenient to work with mixed data types in a consistent manner.

Data Preprocessing: In data analysis and preprocessing, strings play a significant role. For instance, when dealing with data containing textual information, you might need to clean, tokenize, and process the text before analysis. NumPy’s string functions can assist in these tasks.

Array-Like Operations: NumPy treats strings as array-like objects, allowing you to apply vectorized operations on strings. This means you can perform string operations on entire arrays of strings efficiently.

Efficiency: NumPy is optimized for efficient numerical computations, and the same optimizations can be applied to certain string operations. This can lead to faster processing times compared to using basic Python string operations in loops.

Integration with Numerical Data: Often, data contains a mix of numerical and textual information. NumPy’s string capabilities allow you to work seamlessly with both types of data within the same framework.

Apart from its core functionalities, NumPy also offers a range of string manipulation functions that are quite handy for data processing tasks. In this tutorial, we’ll cover some of the basic string functions that NumPy provides, and we’ll walk through examples to illustrate their usage. So, let’s dive in!

Let’s introduce and understand the new string functions in NumPy:

add()

The add() function concatenates corresponding elements of arrays, making it a handy tool for combining strings.

multiply()

The multiply() function returns multiple copies of the specified string. For instance, multiplying the string ‘hello’ by 3 will result in ‘hello hello hello’.

center()

With center(), you can center-align a string within a specified width. The extra space on both sides is filled with the specified fill characters.

capitalize()

The capitalize() function transforms the first letter of a string to uppercase.

title()

Converting a string into a title case, where the first letter of each word is capitalized, is made easy with the title() function.

lower()

To convert all characters in a string to lowercase, use the lower() function.

upper()

On the contrary, the upper() function transforms all characters to uppercase.

split()

To split a string into a list of words, the split() function comes in handy.

splitlines()

Breaking a string into lines is done effortlessly with the splitlines() function.

strip()

Removal of leading and trailing white spaces from a string can be done using the strip() function.

join()

The join() function concatenates strings from a sequence into one string using a specified separator.

replace()

Replacing occurrences of a substring in a string is achieved with the replace() function.

decode()

The decode() function decodes a string element-wise using the specified codec.

encode()

Conversely, the encode() function encodes a decoded string element-wise.

Examples

import numpy as np

# Example Strings
string1 = np.array("Data")
string2 = np.array("Flair")
sentence = np.array("  dataflair tutorials  ")

# add()
concatenated = np.char.add(string1, string2)
print("add():", concatenated)  # Output- add(): ['DataFlair']

# multiply()
multiplied = np.char.multiply("DataFlair ", 3)
print("multiply():", multiplied)  # Output-  multiply(): DataFlair DataFlair DataFlair

# center()
centered = np.char.center("DataFlair", 20, "*")
print("center():", centered)  # Output- center(): ****DataFlair*****

# capitalize()
capitalized = np.char.capitalize("dataflair")
print("capitalize():", capitalized)  # Output- capitalize(): Dataflair

# title()
title_case = np.char.title("dataflair tutorials")
print("title():", title_case)  # Output-  title(): Dataflair Tutorials

# lower()
lower_case = np.char.lower("DATAFLAIR")
print("lower():", lower_case)  # Output-  lower(): dataflair

# upper()
upper_case = np.char.upper("dataflair")
print("upper():", upper_case)  # Output-  upper(): DATAFLAIR

# split()
words = np.char.split(sentence)
print("split():", words)  # Output-  split(): [array(['', '', 'dataflair', 'tutorials', '', ''], dtype='<U9')]

# splitlines()
lines = np.char.splitlines("line 1\nline 2\nline 3")
print("splitlines():", lines)  # Output-  splitlines(): [array(['line 1', 'line 2', 'line 3'], dtype='<U6')]

# strip()
stripped = np.char.strip(sentence)
print("strip():", stripped)  # Output- strip(): ['dataflair tutorials']

# join()
words_list = ["Data", "Flair", "Tutorials"]
joined = np.char.join(" ", words_list)
print("join():", joined)  # Output-  join(): D a t a   F l a i r   T u t o r i a l s

# replace()
replaced = np.char.replace("DataFlairFlair", "Flair", "Tutorials")
print("replace():", replaced)  # Output- replace(): DataTutorialsTutorials

# decode() and encode()
decoded = np.char.decode(np.char.encode("DataFlair", encoding='utf-8'), encoding='utf-8')
print("decode():", decoded)  # Output-  decode(): DataFlair

String Functions in a gist:

add(x1, x2) Element-wise string concatenation for two arrays of str or unicode.
multiply(a, i) Element-wise string multiple concatenation, equivalent to (a * i).
mod(a, values) Element-wise pre-Python 2.6 string formatting (interpolation) for a pair of array_likes of str or unicode.
capitalize(a) Return a copy of ‘a’ with only the first character of each element capitalized.
center(a, width[, fillchar]) Return a copy of ‘a’ with its elements centered in a string of length ‘width’.
decode(a[, encoding, errors]) Calls bytes.decode element-wise.
encode(a[, encoding, errors]) Calls str.encode element-wise.
expandtabs(a[, tabsize]) Return a copy of each string element where all tab characters are replaced by one or more spaces.
join(sep, seq) Return a string, which is the concatenation of the strings in the sequence ‘seq’ using ‘sep’ as the delimiter.
ljust(a, width[, fillchar]) Return an array with the elements of ‘a’ left-justified in a string of length ‘width’.
lower(a) Return an array with the elements converted to lowercase.
lstrip(a[, chars]) For each element in ‘a’, return a copy with the leading characters removed, optionally specified by ‘chars’.
partition(a, sep) Partition each element in ‘a’ around ‘sep’.
replace(a, old, new[, count]) For each element in ‘a’, return a copy of the string with all occurrences of substring ‘old’ replaced by ‘new’, optionally limited by ‘count’.
rjust(a, width[, fillchar]) Return an array with the elements of ‘a’ right-justified in a string of length ‘width’.
rpartition(a, sep) Partition (split) each element around the right-most separator ‘sep’.
rsplit(a[, sep, maxsplit]) For each element in ‘a’, return a list of words in the string, using ‘sep’ as the delimiter string and ‘maxsplit’ as the maximum number of splits.
rstrip(a[, chars]) For each element in ‘a’, return a copy with the trailing characters removed, optionally specified by ‘chars’.
split(a[, sep, maxsplit]) For each element in ‘a’, return a list of words in the string, using ‘sep’ as the delimiter string and ‘maxsplit’ as the maximum number of splits.
splitlines(a[, keepends]) For each element in ‘a’, return a list of lines in the element, breaking at line boundaries, optionally keeping line endings.
strip(a[, chars]) For each element in ‘a’, return a copy with the leading and trailing characters removed, optionally specified by ‘chars’.
swapcase(a) Return element-wise a copy of the string with uppercase characters converted to lowercase and vice versa.
title(a) Return element-wise title cased version of the string or unicode.
translate(a, table[, deletechars]) For each element in ‘a’, return a copy of the string where characters in ‘deletechars’ are removed, and the remaining characters are mapped through ‘table’.
upper(a) Return an array with the elements converted to uppercase.
zfill(a, width) Return the numeric string left-filled with zeros to reach a width of ‘width’.

Conclusion

Congratulations! You’ve now expanded your knowledge of NumPy string functions. With these advanced functions, you’re equipped to perform a wide range of string manipulation tasks efficiently. As you continue your journey into data analysis and manipulation, remember that these functions will be valuable tools in your toolkit.

Stay tuned to TechVidvan for more tutorials on Python libraries and data manipulation techniques. Happy coding, and keep exploring the world of data manipulation!