R Factors – Operating on Factors and Factor Levels

by TechVidvan Team

Factors are data structures in R that store categorical data. In datasets, there are often fields that take only a few predefined values. For example – gender, availability, country, marital status, etc. Such data is called categorical data.

R factors only allow values that are among a predefined set of possible values known as levels.

Today, we are going to learn everything about factors in the R programming language.

Get ready for an exciting tutorial, because here it comes!

Categorical vs. Continuous Variables

In a dataset, there are two types of variables:

Continuous variables
Categorical variables

Continuous variables can take any values. They can be numerics, integers or characters. Examples of such variables would be names, addresses, phone numbers, etc..

Categorical variables can only take values from a finite group of possible values. They can be logical, characters, numerics or integers. Examples of categorical variables would be gender, marital status, logical values, country, etc.

Factors in R

Factors are data structures in R that store categorical data. They have a levels attribute that holds all the possible values that elements of the factor can take.

R factors can be of any type. They only allow values permitted by the levels. Factors can have NA values, if a value that is not in the levels of a factor is entered into it.

Creating a Factor

We use the factor() function to create factors. The following is the syntax of the factor() function:

factor_name=factor(x=character(),levels,labels,exclude,ordered,nmax)

Where x is a vector with the data for the factor,

levelsis an optional vector with unique values that x might take,

labels is an optional vector of labels for the levels in the factor,

exclude is a set of values that are excluded from the levels of the factor,

ordered is a logical value that determines whether the factor is an ordered or unordered factor,

nmax is an upper limit on the number of levels.

Code:

> gender <- c("male","male","female","male","female","male")
> fac1 <- factor(gender)
> fac1

Output:

Indexing R Factor

We can use the same indexing techniques as a vector to access the elements of a factor.

Forgot the basic data structure in R? Revise the concept of R vector.

1. Using positive integers

We can index factors by using positive integers or vectors of positive integers.

Code:

> fac1[3]

Code:

> fac1[c(2:4)]

Output:

2. Using negative integers

We can use negative integers or vectors of negative integers to exclude certain elements from R factor.

Code:

> fac1[-2]

Code:

> fac1[c(-3,-4)]

Output:

3. Using logical vectors

We can index R factors by using logical vectors.

Code:

> fac1[c(T,F,T,T,F,F)]

Code:

> fac1[c(T,F,T)] #logical index vector recycling

Output:

Modifying the R Factor

We can modify existing values and add new ones as well by using reassignment. The new values must be within the level of the factor.

Initial state of factor:

> fac1

Output:

[1] male male female male female male
Levels: female male

Example 1:

> fac1[2] <- "female"
> fac1

Output:

[1] male female female male female male
Levels: female male

Example 2:

> fac1[7] <- "other"

Output:

Warning message:
In `[<-.factor`(`*tmp*`, 7, value = “other”) :
invalid factor level, NA generated

Note: a factor does not allow values that are not in its level.

Final state of factor:

> fac1

Output:

[1] male female female male female male <NA>
Levels: female male

Adding and Dropping Levels of R Factor

To add a value that does not exist in the level of a factor, we must add it to the level first.

Code:

> levels(fac1) <- c("female","male","other")
> fac1

Output:

We can also remove levels from a factor by using the droplevels() function. This function removes unused levels from a factor.

Code:

> droplevels(fac1)

Output:

Ordered Factors in R

We can classify R factors as ordered or unordered. By default, the levels are arranged in alphabetical order and are all considered equal irrespective of their arrangement. For comparison purposes, these levels can be ordered according to increasing weight or value.

Here is an example of an unordered R factor:

Code:

> sizes <- c("s","m","s","l","m","xs","l","m","xl","xxl","s", "l","xs","xl","m","l")
> sizef <- factor(sizes)
> sizef

Output:

We can provide a specific order of levels by using the levels = argument of the factor() function. However, the resultant factor is still considered unordered.

Code:

> sizef <- factor(sizes,levels=c("xs","s","m","l","xl","xxl"))
> sizef

Output:

We can also use the ordered() function to create an ordered factor. This function has the same syntax as the factor() function.

Code:

> sizeof <- ordered(sizes,levels=c("xs","s","m","l","xl","xxl"))
> sizeof

Output:

We can also convert unordered factors into ordered ones by using the as.ordered() function.

Code:

> as.ordered(sizef)

Output:

Functions of R Factors

There are a few functions that give us information about the R factor variables we use. Functions like is.factor(), as.factor(), is.ordered(), etc.. This is what these functions do:

1. `is.factor()`

The is.factor() function checks if a variable is a factor or not. It returns a logical value of TRUE if the variable is a factor and FALSE if it is not.

Code:

> is.factor(fac1)
> is.factor(gender)

2. `as.factor()`

The as.factor() function converts vector inputs into factors.

> as.factor(c(TRUE,FALSE,TRUE,TRUE,FALSE))

3. `is.ordered()`

The is.ordered() function checks if the variable is an ordered factor or not. This function returns TRUE if the variable is an ordered factor and FALSE if it is not.

> is.ordered(sizeof)
> is.ordered(fac1)

Summary

R factors are data structures that store categorical data. They only allow observations of certain predefined values. Therefore, they are useful for fields that have a limited number of possible values like gender, marital status, availability, confirmation, etc..

In this article, we learned about R factors. We learned how to create them and how to modify them. We saw what they are used for. Finally, we looked at a few functions that provide more insights into factors and convert other variables into R factors.

Before start working with R you must have a basic understanding of data structures in R.

Still, have some doubts about the R factor? Ask us and our TechVidvan experts will be happy to help you.

Keep Learning ?

R Factors – Operating on Factors and Factor Levels

Categorical vs. Continuous Variables

Factors in R

Creating a Factor

Indexing R Factor

1. Using positive integers

2. Using negative integers

3. Using logical vectors

Modifying the R Factor

Adding and Dropping Levels of R Factor

Ordered Factors in R

Functions of R Factors

1. `is.factor()`

2. `as.factor()`

3. `is.ordered()`

Summary

Data Science Tutorials

Programming Tutorials

Trending Tutorials

R Factors – Operating on Factors and Factor Levels

Categorical vs. Continuous Variables

Factors in R

Creating a Factor

Indexing R Factor

1. Using positive integers

2. Using negative integers

3. Using logical vectors

Modifying the R Factor

Adding and Dropping Levels of R Factor

Ordered Factors in R

Functions of R Factors

1. is.factor()

2. as.factor()

3. is.ordered()

Summary

1. `is.factor()`

2. `as.factor()`

3. `is.ordered()`