Data Reshaping in R – Popular Functions to Organise Data

In this TechVidvan tutorial, discover why data reshaping is required in R and learn how to use different functions in R to do this.

For an analysis project, the gathered data is messy and unstructured most of the time. It is collected from different sources, has different variables, and has irregular formats.

With this tutorial, learn how to change the gathered data and conform it to our desired format in R. We will take a look at the functions in R that allow us to do this and much more. So, get ready for the ride!

What is Data Reshaping in R?

Before we can perform any kind of analysis, we first need to shape the gathered data into a regular and processable format. We need to ensure that all of the data fits into proper variables.

We also need to take care of missing values and put placeholders in their place that our analysis tools can understand.

This is the first step in any analysis project. We call this process as data reshaping.

Now let’s learn popular functions used for data reshaping in R.

The cbind(), rbind(), and t() Functions

There are many functions in R that allow us to manipulate data objects in many ways.

cbind(), rbind(), and t() are the most commonly used functions for data reshaping. We will be going through them one-by-one:

1. cbind(): The cbind() function allows us to join objects as column. We can combine matrices, data-frames, vectors or any combination of these.

Code:

vec1 <- c(1,2,3,4,5)
vec2 <- c(6,7,8,9,10)
mat1 <- matrix(c(1:15),c(5,3))
mat2 <- matrix(c(16:30),c(5,3))
df1 <- data.frame(matrix(c(1:30),nrow=5))
df2 <- data.frame(matrix(c(31:60),nrow=5))
cbind(vec1,vec2)

Output:

cbind() vector n vector - data reshaping in r

Code:

cbind(mat1,mat2)

Code:

cbind(df1,df2)

Output:

cbind() mat mat and df df - data reshaping in r

Code:

cbind(vec1,mat2)

Code:

cbind(vec2,df1)

Output:

data reshaping in r - cbind() vec mat and vec df3

Code:

cbind(mat1,df2)

Output:

cbind() mat df4 - data reshaping in r

Note: The number of rows should be the same for the cbind() function to work.

Wait! before proceeding ahead get a clear understanding of user-defined functions in R.

2. rbind(): The rbind() function allows us to join objects as rows.

Code:

rbind(vec1,vec2)

Code:

rbind(vec1,df2)

Output:

data reshaping in r - rbind() vec vec and vec df5

Note: The number of columns should be the same for the rbind() function to work.

3. t(): Thet() function transposes a matrix that is it turns the rows into columns and columns into rows.

Code:

t(mat1)

Code:

t(mat2)

Output:

t() - data reshaping in r

Follow TechVidvan on Google & Stay updated with latest technology trends

The Tidyr package

The tidyr package is the most commonly used R package for data reshaping in R. tidyr helps you tidy your data.

It allows you to convert it into the desired format and make it easier to process and analyze. Tidyr simplifies the process of data reshaping.

To install tidyr, use the following command:

install.packages("tidyr")

installing tidyr7 - data reshaping in r

Once installed, include it into your current R session by using the library() command:

library(tidyr)

library(tidyr)8 - data reshaping in r

These functions of tidyr are very useful in data reshaping and keeping your data tidy.

  1. gather()
  2. spread()
  3. unite()
  4. separate()

data reshaping in R

Let’s look at these functions and their usage:

1. gather() Function

The gather() function helps us in reshaping wide-format data-frames to long-format.

Sometimes, datasets have attributes of common concern spread across different columns. This creates unnecessary variables. Such a dataset is said to be in the wide-format.

It would be more efficient to stack similar attributes together and turning the dataset into long-format. The gather() function allows us to do that.

Code:

month <- month.abb[1:3]
delhi <- sample(seq(-5,47,by=0.01),3,rep=TRUE)
mumbai <-sample(seq(-5,47,by=0.01),3,rep=TRUE)
chennai <-sample(seq(-5,47,by=0.01),3,rep=TRUE)
bangalore <- sample(seq(-5,47,by=0.01),3,rep=TRUE)
kolkata <- sample(seq(-5,47,by=0.01),3,rep=TRUE)
data <- data.frame(month,delhi,mumbai,bangalore,chennai,kolkata)
data

Output:

tidyr gather()1 - data reshaping in r

Code:

gathered_data <- gather(data,key="city",value="avg.temp",-month)
gathered_data

Output:

tidyr gather() - data reshaping in r

2. spread() Function

The spread() function is the complement to the gather() function. It spreads long-format data-frames to wide-format.

Code:

spread_data <- spread(gathered_data,key="city",value="avg.temp")

Output:

tidyr spread() - data reshaping in R

3. unite() Function

Take a look at the following dataset:

S.noMonthYearTemp.
1jan20184.64
2feb201819.68
3jan20192.56
4mar201936.74

In the dataset, the month and year have separate columns. It looks inefficient, doesn’t it?

The two variables month and year can be in the same column without affecting the information conveyed by the data. This is exactly what the unite() function does.

Code:

months <- c("jan","feb","jan","mar")
year <- c("2018","2018","2019","2019")
temp <- c(4.64,19.68,2.56,36.74)
delhi_temp <- data.frame(months,year,temp)
delhi_temp

Output:

tidyr unite()1 - data reshaping in r

Code:

united_delhi_temp <- unite(delhi_temp,"interval",months,year)
united_delhi_temp

Output:

tidyr unite() - data reshaping in r

4. separate() Function

The separate() function is the complement to the unite() function. It separates values into separate columns.

Code:

sep_delhi_temp <- separate(united_delhi_temp,
interval,c("month","year"))
sep_delhi_temp

Output:

tidyr separate() - data reshaping in r

The Reshape2 Package

Reshape2 is another R package that is used for data reshaping. Reshape2 can be considered as an older version of the tidyr package.

The development of the reshape2 package has stopped. The most commonly used reshape2 functions are the melt() and merge() functions.

1. melt() Function

The melt() function is very similar to the gather() function from the tidyr package. It melts the input data frame and converts wide-format data into long-format. For example:

Code:

mdata <- melt(data,id=c("month"),variable.name="city", value.name="avg.temp")
mdata

Output:

data reshaping in r - reshape2 melt()

2. merge() Function

The merge() function can merge data frames. The function merges the input data frames horizontally, therefore, the data frames must have the same variable that is column names. For example:

Code:

months2 <- c("apr", "mar", "feb", "jun")
year2 <- c("2018","2018","2019","2019")
temp2 <- c(38.75,37.68,28.56,41.74)
delhi_temp2 <- data.frame(months2,year2,temp2)
colnames(delhi_temp2) <- c("months","year","temp")
delhi_temp2

Output:

reshape2 merge() - data reshaping in r

Code:

merge_delhi_temp <- merge(delhi_temp,delhi_temp2,by="year")
merge_delhi_temp

Output:

eshape2 merge()2 - data reshaping in r

Summary

Data reshaping is the first step of any data analysis project. It is also called data formatting and data cleaning.

In this article, we looked at the functions in base R that allow us to reshape and transform our data.

We also looked at the most popular data processing and transformation package, the tidyr package. The tidyr package is a part of the tidyverse collection of R packages. It is a must-have for beginner, intermediate and advanced R programmers.

Still, have some doubts about the data reshaping in R? Ask us and our TechVidvan experts will be happy to help you.

Keep Visiting ?

If you are Happy with TechVidvan, do not forget to make us happy with your positive feedback on Google | Facebook


Leave a Reply

Your email address will not be published. Required fields are marked *