For an analysis project, the gathered data is messy and unstructured most of the time. It is collected from different sources, has different variables, and has irregular formats.

## What is Data Reshaping in R?

Before we can perform any kind of analysis, we first need to shape the gathered data into a regular and processable format. We need to ensure that all of the data fits into proper variables.

We also need to take care of missing values and put placeholders in their place that our analysis tools can understand.

This is the first step in any analysis project. We call this process as **data reshaping.**

Now let’s learn popular functions used for data reshaping in R.

### The cbind(), rbind(), and t() Functions

There are many functions in R that allow us to manipulate data objects in many ways.

**cbind()**, **rbind()**, and **t()** are the most commonly used functions for data reshaping. We will be going through them one-by-one:

**1. cbind():** The cbind() function allows us to **join objects as column**. We can combine matrices, data-frames,** vectors** or any combination of these.

**Code:**

vec1 <- c(1,2,3,4,5) vec2 <- c(6,7,8,9,10) mat1 <- matrix(c(1:15),c(5,3)) mat2 <- matrix(c(16:30),c(5,3)) df1 <- data.frame(matrix(c(1:30),nrow=5)) df2 <- data.frame(matrix(c(31:60),nrow=5)) cbind(vec1,vec2)

**Output:**

**Code:**

cbind(mat1,mat2)

**Code:**

cbind(df1,df2)

**Output:**

**Code:**

cbind(vec1,mat2)

**Code:**

cbind(vec2,df1)

**Output:**

**Code:**

cbind(mat1,df2)

**Output:**

**Note:** The number of rows should be the same for the ** cbind()** function to work.

**2. rbind():** The ** rbind()** function

**allows us to join objects as rows**.

**Code:**

rbind(vec1,vec2)

**Code:**

rbind(vec1,df2)

**Output:**

**Note:** The number of columns should be the same for the ** rbind()** function to work.

**3. t()**: The** t()** function transposes a

**matrix**that is it

**turns the rows into columns and columns into rows.**

**Code:**

t(mat1)

**Code:**

t(mat2)

**Output:**

### The Tidyr package

The ** tidyr** package is the most

**commonly used**R package for data reshaping in R.

**helps you tidy your data.**

`tidyr`

It allows you to convert it into the desired format and make it easier to process and analyze. Tidyr simplifies the process of data reshaping.

To install ** tidyr**, use the following command:

install.packages("tidyr")

Once installed, include it into your current R session by using the ** library()** command:

library(tidyr)

These functions of ** tidyr** are very useful in data reshaping and keeping your data tidy.

- gather()
- spread()
- unite()
- separate()

Let’s look at these **functions** and their usage:

#### 1. gather() Function

The ** gather()** function helps us in reshaping wide-format data-frames to long-format.

Sometimes, datasets have attributes of common concern spread across different columns. This creates unnecessary variables. Such a dataset is said to be in the wide-format.

It would be more efficient to stack similar attributes together and turning the dataset into long-format. The ** gather()** function allows us to do that.

**Code:**

month <- month.abb[1:3] delhi <- sample(seq(-5,47,by=0.01),3,rep=TRUE) mumbai <-sample(seq(-5,47,by=0.01),3,rep=TRUE) chennai <-sample(seq(-5,47,by=0.01),3,rep=TRUE) bangalore <- sample(seq(-5,47,by=0.01),3,rep=TRUE) kolkata <- sample(seq(-5,47,by=0.01),3,rep=TRUE) data <- data.frame(month,delhi,mumbai,bangalore,chennai,kolkata) data

**Output:**

**Code:**

gathered_data <- gather(data,key="city",value="avg.temp",-month) gathered_data

**Output:**

#### 2. spread() Function

The ** spread()** function is the complement to the

**function. It spreads long-format data-frames to wide-format.**

`gather()`

**Code:**

spread_data <- spread(gathered_data,key="city",value="avg.temp")

**Output:**

#### 3. unite() Function

Take a look at the following dataset:

S.no |
Month |
Year |
Temp. |

1 | jan | 2018 | 4.64 |

2 | feb | 2018 | 19.68 |

3 | jan | 2019 | 2.56 |

4 | mar | 2019 | 36.74 |

In the dataset, the month and year have separate columns. It looks inefficient, doesn’t it?

The two variables month and year can be in the same column without affecting the information conveyed by the data. This is exactly what the ** unite()** function does.

**Code:**

months <- c("jan","feb","jan","mar") year <- c("2018","2018","2019","2019") temp <- c(4.64,19.68,2.56,36.74) delhi_temp <- data.frame(months,year,temp) delhi_temp

**Output:**

**Code:**

united_delhi_temp <- unite(delhi_temp,"interval",months,year) united_delhi_temp

**Output:**

#### 4. separate() Function

The ** separate()** function is the complement to the

**function. It separates values into separate columns.**

`unite()`

**Code:**

sep_delhi_temp <- separate(united_delhi_temp, interval,c("month","year")) sep_delhi_temp

**Output:**

### The Reshape2 Package

Reshape2 is another R package that is used for data reshaping. Reshape2 can be considered as an older version of the ** tidyr** package.

The development of the reshape2 package has stopped. The most commonly used reshape2 functions are the ** melt()** and

**functions.**

`merge()`

#### 1. melt() Function

The ** melt()** function is very similar to the

**function from the**

`gather()`

**package. It melts the input**

`tidyr`

**data frame**and converts wide-format data into long-format. For example:

**Code:**

mdata <- melt(data,id=c("month"),variable.name="city", value.name="avg.temp") mdata

**Output:**

#### 2. merge() Function

The ** merge()** function can merge data frames. The function merges the input data frames horizontally, therefore, the data frames must have the same variable that is column names. For example:

**Code:**

months2 <- c("apr", "mar", "feb", "jun") year2 <- c("2018","2018","2019","2019") temp2 <- c(38.75,37.68,28.56,41.74) delhi_temp2 <- data.frame(months2,year2,temp2) colnames(delhi_temp2) <- c("months","year","temp") delhi_temp2

**Output:**

**Code:**

merge_delhi_temp <- merge(delhi_temp,delhi_temp2,by="year") merge_delhi_temp

**Output:**

## Summary

Data reshaping is the first step of any data analysis project. It is also called **data formatting** and **data cleaning**.

In this article, we looked at the functions in base R that allow us to reshape and transform our data.

We also looked at the most popular data processing and transformation package, the ** tidyr** package. The

**package is a part of the tidyverse collection of R packages. It is a must-have for beginner, intermediate and advanced R programmers.**

`tidyr`

