In this blog of TechVidvan’s R tutorial series, we will take a look at the string manipulation functions in R programming. String manipulation functions are the functions that allow creation and modification of strings in R.
Using these functions, you can construct strings with definite patterns or even at random. You can change and modify them in any desired way.
String Manipulation in R Programming
Here are a few of the string manipulation functions available in R’s base packages. We are going to look at these functions in detail.
- The nchar function
- The toupper function
- The tolower function
- The substr function
- The grep function
- The paste function
- The strsplit function
- The sprintf function
- The cat function
- The sub function
Let’s take a look at all of the above functions one by one.
1. The nchar function
The nchar()
function takes a character vector as the input and returns a vector that contains the sizes of all the elements inside the character vector. Here the syntax for the nchar
function.
Code:
nchar(x, type = ”char”, allowNA = FALSE, keepNA = NA )
Where x
is a character vector,
type
sets what type of data is stored inside the input vector, by default, its value is set to “char
”,
allowNA
is a boolean that decides whether NA
values should be returned for elements in the input vector that are invalid,
keepNA
is a boolean that decides whether NA
values should be returned when elements inside the input vector are NA
Here is an example of the usage of the nchar
function.
Code:
string <- "Hello My Name Is TechVidvan" nchar(string) strvec <- c(string,"HI", "hey", "haHa") nchar(strvec)
Output:
2. The toupper function
The toupper()
function, as the name suggests, turns the input character vector to upper case. The syntax of the toupper
function is very simple.
Code:
toupper(x)
Where x
is the input character vector.
Here is an example of the usage of the toupper
function.
Code:
toupper(string) toupper(strvec)
Output:
3. The tolower function
The tolower()
function does the opposite of the toupper()
function. It turns the input character vector to lowercase. The syntax of the tolower
function is as follows.
Code:
tolower(x)
Where x
is the input character vector.
Here is an example of the usage of the tolower
function.
Code:
tolower(string) tolower(strvec)
Output:
4. The substr() function
The substr()
function extracts and returns a part of a given input string. The function takes a string, a start integer, and a stop integer as input. It then extracts a part of the input string starting from the start point and ending at the endpoint. It then returns the extracted substring. The syntax of the substr
function is as follows.
Code:
substr(x, start, stop)
Wherex
is the input string,
start
is the starting point of extraction,
and stop
is the endpoint of extraction.
Here is an example of the usage of the substr
function.
Code:
substr(string, 5, 20)
Output:
5. The grep function
The grep()
function searches for a pattern inside a given string and returns the number of instances a match is found. The following is the syntax of the grep function.
Code:
grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE, fixed = FALSE, useBytes = FALSE, invert = FALSE)
Where pattern
is a regular expression which is used as a search keyword,
x
is the input string,
ignore.case
is a boolean which shows whether the search is to be case sensitive or not,
perl
is a boolean which shows whether perl-compatible regex are to be used,
Value
is a boolean which shows whether the output should contain the
position of the matches or their values,
fixed
is a boolean that shows whether matching has to be exact,
useBytes
is boolean that shows whether the matching is to be done byte-by-
byte or character-by-character,
Invert
is a boolean that show whether the output should contain matched values or the values that do not match.
Here is an example of the usage of the grep function:
Code:
grep("Tech", string)
Output:
6. The paste function
The paste()
function converts objects into characters and concatenates them. The syntax of the paste
function is as follows.
Code:
paste(. . . , sep = “ ”, collapse = NULL)
Where . . .
are the objects to be concatenated (after being converted into character vectors),
sep
is a character string that acts as the separator between the concatenated terms,
And collapse
is an optional character string that separates the results.
Here is an example of the usage of the paste function.
Code:
paste("hello", "techvidvan", string, sep = "-")
Output:
7. The strsplit function
The strsplit()
function splits the given input string into substrings according to the given split argument. Here is the syntax of the strsplit
function.
Code:
strsplit(x, split, fixed = FALSE, perl = FALSE, useBytes = FALSE)
Where x
is the input character string,
split
is a regex according to which the input string is split,
fixed
is a boolean that tells whether the matches have to be exact,
perl
is a boolean that tells whether perl-compatible regexes have to be used,
useBytes
is a boolean that tells whether the matching has to be done byte-by-byte or character-by-character.
Here is an example of the usage of the strsplit
function:
Code:
strsplit(string,'e')
Output:
8. The sprintf function
The sprintf()
function of R is very similar to the variety of print functions in C/C++. This function can print strings with variables in them. The function replaces the variable names with their values. The syntax of the sprintf
function is:
Code:
sprintf(fmt, . . . )
Where fmt
is a C style string format with appropriate characters to signify variables and their data types,
And . . .
are the values and variable names to be passed to fmt
.
Here is an example of the usage of the sprintf
function.
Code:
count <- 5L name <- "Bob" place <- "pocket" sprintf("There are %d dollars in %s's %s", count, name, place)
Output:
9. The cat function
The cat()
function combines all input objects into a single character vector. It can also create, edit or append a file to save the output. The syntax of the cat
function looks like this.
Code:
cat(. . . , file = “”, sep = “”, append = “FALSE”)
Where,
. . .
is the set of objects, character vectors, or strings that will be combined into a single character vector.
file
is an optional argument that specifies a file name to be created, appended or overwritten.
sep
specifies the character that separates the objects in the . . .
argument.
append controls whether the output should be appended or overwritten in the output file if a filename has been provided in the file argument
.
Code:
cat("hello","this","is","Techvidvan",sep = "-")
Output:
10. The sub function
The sub()
function replaces the first occurrence of a substring in a string with another substring. The syntax of the sub function is very simple. It is as follows:
Code:
sub(old, new, string)
Where,
old
is the old substring that has to be replaced,
new
is the new substring that will take the place of the old substring.
string
is the name of the string in which the substring has to be replaced.
Code:
sub("My Name Is", "I Am", string)
Output:
Summary
R has a wide variety of functions that can manipulate any kind of data. Strings and character vectors are no exceptions. In this R tutorial, we learn about a few R functions that help manipulate strings or give more information about them.
Finding difficulty executing String Manipulation in R
Keep Executing!!