6 Inbuilt Data Structures in R with practical examples

In R, most of the time you will be dealing with collections of data and not singular elements. We hold collections of data in data structures.

Data structures are objects in R that provide a method to arrange data in the desired format.

In this article, we will take a look at the data structures in R. We will learn what they are, and what are their uses. We will also explore a few features and functions of these structures.

Let’s start with understanding the basics of data structures.

Introduction to Data Structures in R

R has six types of basic data structures. We can organize these data structures according to their dimensions(1d, 2d, nd). We can also classify them as homogeneous or heterogeneous (can their contents be of different types or not).

Homogeneous data structures are ones that can only store a single type of data (numeric, integer, character, etc.).

Heterogeneous data structures are ones that can store more than one type of data at the same time.

R does not have 0 dimensional or scalar type. Variables containing single values are vectors of length 1.

We have discussed every concept of R data types in our previous article, now we are going to understand R data structures in detail.

R has the following basic data structures:

Vector
List
Matrix
Data frame
Array
Factor

So, let’s not wait anymore and get to it!

1. Vectors

Vectors are single-dimensional, homogeneous data structures. To create a vector, use the c() function.

For example:

> vec <- c(1,2,3) # creates a vector named vec
> vec

Output:

[1] 1 2 3

The assign() function is another way to create a vector.

For example:

> assign("vec2", c(4,5,6))
> vec2

Output:

[1] 4 5 6

Vectors can hold values of a single data type. Thus, they can be numeric, logical, character, integer or complex vectors.

For example:

> numeric_vec <- c(1,2,3,4,5)
> integer_vec <- c(1L,2L,3L,4L,5L)
> logical_vec <- c(TRUE, TRUE, FALSE, FALSE, FALSE)
> complex_vec <- c(12+2i, 3i, 4+1i, 5+12i, 6i)
> character_vec <- c("techvidvan", "this", "is", "a", "character vector")
> numeric_vec
> integer_vec
> logical_vec
> complex_vec
> character_vec

Output:

[1] 1 2 3 4 5

[1] TRUE FALSE TRUE FALSE FALSE

[1] 12+ 2i 0+ 3i 4+ 1i 5+12i 0+ 6i

[1] “techvidvan” “this”
[3] “is” “a”
[5] “character vector”

The above code will create the following vectors with corresponding values and types.

Note: Technically, we can store different types of data but R converts the values to maintain the vector’s homogeneous nature. This is called Coercion.

If you want to implement this example on your own, then I would recommend you to install R by our step by step R tutorial.

2. Lists

Lists are heterogeneous data structures. They are very similar to vectors except they can store data of different types. To create a list, we use the list() function.

For example

> test_list <- list(1, "hello", c(2,3,1), FALSE, 3+4i, 6L)
> test_list

Output:

[[1]]
[1] 1

[[2]]
[1] “hello”

[[3]]
[1] 2 3 1

[[4]]
[1] FALSE

[[5]]
[1] 3+4i

[[6]]
[1] 6

Lists are often called “recursive vectors” as you can store a list inside another list.

Example:

> test_list2<-list(list(1,"a",TRUE), list("b",45L,"c"), list(1,2))
> str(test_list2) #shows the structure of an object

Output:

List of 3
$ :List of 3
..$ : num 1
..$ : chr “a”
..$ : logi TRUE
$ :List of 3
..$ : chr “b”
..$ : int 45
..$ : chr “c”
$ :List of 2
..$ : num 1
..$ : num 2

Note: If the c() function has a list as an argument, the result will be a list. Other values inside the c() function will be coerced into lists themselves.

3. Matrix

Matrices are two-dimensional, homogeneous data structures. This means that all values in a matrix have to be of the same type. Coercion takes place if there is more than one data type. They have rows and columns.

By default, matrices are in column-wise order. The basic syntax to create a matrix is:

>matrix( data, nrow, ncol, byrow, dimnames)

Where data is the input values in the matrix given as a vector,

nrow is the number of rows,

ncol is the number of columns,

byrow is a logical which tells the function to arrange the matrix row-wise, by default it is set to FALSE,

dimnames is a list of the names of the rows/columns created.

The following code will create a matrix with 3 rows and values 1 to 9 in a column-wise order.

For example:

> test_matrix1 <- matrix(c(1:9), ncol = 3)
> test_matrix1

Output:

[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9

An example of a matrix with row-names and column-names:

> rownames <- c("row1", "row2", "row3")
> colnames <- c("col1", "col2", "col3")
> test_matrix2 <- matrix(c(1:9), ncol = 3, dimnames = list(rownames, colnames))
> test_matrix2

Output:

col1 col2 col3
row1 1 4 7
row2 2 5 8
row3 3 6 9

4. Data Frames

Data frames are two-dimensional, heterogeneous data structures. They are lists of vectors of equal lengths. Data frames have the following constraints placed upon them:

A data-frame must have column-names and each row should have a unique name.
Each column should have the same number of items.
Each item in a single column should be of the same type.
Different columns can have different data types.

To create a data frame, use the data.frames() function.

For example:

> student_id <- c(1:5)
> student_name <- c("raj", "jacob", "iqbal", "shawn", "hitesh")
> student_rank <- c("third", "fifth", "second", "fourth", "first")
> student.data <- data.frame(student_id , student_name, student_rank)
> student.data

Output:

student_id student_name student_rank
1 1 raj third
2 2 jacob fifth
3 3 iqbal second
4 4 shawn fourth
5 5 hitesh first

5. Arrays

Arrays are three dimensional, homogeneous data structures. They are collections of matrices stacked one on top of the other in layers.

You can create an array using the array() function. The following is the syntax of it:

Array_name = array(data,dim,dimnames)

Where array_name is the name of the array,

data is the data that is filled inside the array,

dim is a vector containing the dimensions of the array,

and dimnames is a list containing the names of the rows, columns, and matrices inside the array.

Here is an example of the array() function:

> arr1 <- array(c(1:18),dim=c(2,3,3))
> arr1

Output:

, , 1[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6, , 2[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12, , 3[,1] [,2] [,3]
[1,] 13 15 17
[2,] 14 16 18

6. Factors

Factors are vectors that can only store predefined values. They are useful for storing categorical data. Factors have two attributes:

Class – which has a value of “factor”, it makes it behave differently than a normal vector.
Levels – which is the set of allowed values

You can create a factor using the factor() function.

For example:

> fac <- factor(c("a", "b", "a", "b", "b"))
> fac

Output:

[1] a b a b b
Levels: a b

Factors can store both strings and integers. They are useful to categorize unique values in columns like “TRUE” or “FALSE”, or “MALE” or “FEMALE”, etc..

Summary

In this tutorial, we looked at the basic data structures available in R. R has many complex data structures. We can make them using these basic structures in different combinations and formations.

In R, almost every calculation is done on structures containing many values. Calculations on singular values are very rare.

Thus, these data structures are very important building blocks in the R programming language.

Still, any confusion regarding data structures in R?

Ask TechVidvan experts in the comment section below!

6 Inbuilt Data Structures in R with practical examples

Introduction to Data Structures in R

1. Vectors

2. Lists

3. Matrix

4. Data Frames

5. Arrays

6. Factors

Summary

Data Science Tutorials

Programming Tutorials

Trending Tutorials