Top 10 String Manipulation Functions in R programming

In this blog of TechVidvan’s R tutorial series, we will take a look at the string manipulation functions in R programming. String manipulation functions are the functions that allow creation and modification of strings in R.

Using these functions, you can construct strings with definite patterns or even at random. You can change and modify them in any desired way.

String Manipulation in R Programming

Here are a few of the string manipulation functions available in R’s base packages. We are going to look at these functions in detail.

  1. The nchar function
  2. The toupper function
  3. The tolower function
  4. The substr function
  5. The grep function
  6. The paste function
  7. The strsplit function
  8. The sprintf function
  9. The cat function
  10. The sub function

Let’s take a look at all of the above functions one by one.

1. The nchar function

The nchar() function takes a character vector as the input and returns a vector that contains the sizes of all the elements inside the character vector. Here the syntax for the nchar function.

Code:

nchar(x, type = ”char”, allowNA = FALSE, keepNA = NA )

Where x is a character vector,

type sets what type of data is stored inside the input vector, by default, its value is set to “char”,

allowNA is a boolean that decides whether NA values should be returned for elements in the input vector that are invalid,

keepNA is a boolean that decides whether NA values should be returned when elements inside the input vector are NA

Here is an example of the usage of the nchar function.

Code:

string <- "Hello My Name Is TechVidvan"
nchar(string)
strvec <- c(string,"HI", "hey", "haHa")
nchar(strvec)

Output:

string manipulation in r - nchar function

Follow TechVidvan on Google & Stay updated with latest technology trends

2. The toupper function

The toupper() function, as the name suggests, turns the input character vector to upper case. The syntax of the toupper function is very simple.

Code:

toupper(x)

Where x is the input character vector.

Here is an example of the usage of the toupper function.

Code:

toupper(string)
toupper(strvec)

Output:

string manipulation in r - toupper function

3. The tolower function

The tolower() function does the opposite of the toupper() function. It turns the input character vector to lowercase. The syntax of the tolower function is as follows.

Code:

tolower(x)

Where x is the input character vector.

Here is an example of the usage of the tolower function.

Code:

tolower(string)
tolower(strvec)

Output:

string manipulation in r - tolower function

4. The substr() function

The substr() function extracts and returns a part of a given input string. The function takes a string, a start integer, and a stop integer as input. It then extracts a part of the input string starting from the start point and ending at the endpoint. It then returns the extracted substring. The syntax of the substr function is as follows.

Code:

substr(x, start, stop)

Wherex is the input string,

start is the starting point of extraction,

and stop is the endpoint of extraction.

Here is an example of the usage of the substr function.

Code:

substr(string, 5, 20)

Output:

string manpulation in r - substr function

5. The grep function

The grep() function searches for a pattern inside a given string and returns the number of instances a match is found. The following is the syntax of the grep function.

Code:

grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE, fixed = FALSE, useBytes = FALSE, invert = FALSE)

Where pattern is a regular expression which is used as a search keyword,

x is the input string,

ignore.case is a boolean which shows whether the search is to be case sensitive or not,

perl is a boolean which shows whether perl-compatible regex are to be used,

Value is a boolean which shows whether the output should contain the

position of the matches or their values,

fixed is a boolean that shows whether matching has to be exact,

useBytes is boolean that shows whether the matching is to be done byte-by-

byte or character-by-character,

Invert is a boolean that show whether the output should contain matched values or the values that do not match.

Here is an example of the usage of the grep function:

Code:

grep("Tech", string)

Output:

string manipulation in r - grep function

6. The paste function

The paste() function converts objects into characters and concatenates them. The syntax of the paste function is as follows.

Code:

paste(. . . , sep = “ ”, collapse = NULL)

Where . . . are the objects to be concatenated (after being converted into character vectors),

sep is a character string that acts as the separator between the concatenated terms,

And collapse is an optional character string that separates the results.

Here is an example of the usage of the paste function.

Code:

paste("hello", "techvidvan", string, sep = "-")

Output:

string manipulation in r - paste function

7. The strsplit function

The strsplit() function splits the given input string into substrings according to the given split argument. Here is the syntax of the strsplit function.

Code:

strsplit(x, split, fixed = FALSE, perl = FALSE, useBytes = FALSE)

Where x is the input character string,

split is a regex according to which the input string is split,

fixed is a boolean that tells whether the matches have to be exact,

perl is a boolean that tells whether perl-compatible regexes have to be used,

useBytes is a boolean that tells whether the matching has to be done byte-by-byte or character-by-character.

Here is an example of the usage of the strsplit function:

Code:

strsplit(string,'e')

Output:

string manipulation in r - strsplit function

8. The sprintf function

The sprintf() function of R is very similar to the variety of print functions in C/C++. This function can print strings with variables in them. The function replaces the variable names with their values. The syntax of the sprintf function is:

Code:

sprintf(fmt, . . . )

Where fmt is a C style string format with appropriate characters to signify variables and their data types,

And . . . are the values and variable names to be passed to fmt.

Here is an example of the usage of the sprintf function.

Code:

count <- 5L
name <- "Bob"
place <- "pocket"
sprintf("There are %d dollars in %s's %s", count, name, place)

Output:

string manipulation in r - sprintf function

9. The cat function

The cat() function combines all input objects into a single character vector. It can also create, edit or append a file to save the output. The syntax of the cat function looks like this.

Code:

cat(. . . , file = “”, sep = “”, append = “FALSE”)

Where,
. . . is the set of objects, character vectors, or strings that will be combined into a single character vector.

file is an optional argument that specifies a file name to be created, appended or overwritten.

sep specifies the character that separates the objects in the . . . argument.

append controls whether the output should be appended or overwritten in the output file if a filename has been provided in the file argument.

Code:

cat("hello","this","is","Techvidvan",sep = "-")

Output:

string manipulation in r - cat function

10. The sub function

The sub() function replaces the first occurrence of a substring in a string with another substring. The syntax of the sub function is very simple. It is as follows:

Code:

sub(old, new, string)

Where,

old is the old substring that has to be replaced,

new is the new substring that will take the place of the old substring.

string is the name of the string in which the substring has to be replaced.

Code:

sub("My Name Is", "I Am", string)

Output:

string manipulation in r - sub function

Summary

R has a wide variety of functions that can manipulate any kind of data. Strings and character vectors are no exceptions. In this R tutorial, we learn about a few R functions that help manipulate strings or give more information about them.

Finding difficulty executing  String Manipulation in R

Keep Executing!!

We work very hard to provide you quality material
Could you take 15 seconds and share your happy experience on Google | Facebook


1 Response

  1. Michael Yung says:

    Thanks for your post, very helpful. I have a dataset containing strings of variable lengths with commas separating them e.g. a,bc,def,
    I want to truncate the strings by removing anything after the last comma. e.g. for ab, cd, ef
    I want to removed ef
    Does substr and/or nchar help here and if so how?

Leave a Reply

Your email address will not be published. Required fields are marked *