R Factors – Operating on Factors and Factor Levels

Factors are data structures in R that store categorical data. In datasets, there are often fields that take only a few predefined values. For example – gender, availability, country, marital status, etc. Such data is called categorical data.

R factors only allow values that are among a predefined set of possible values known as levels.

Today, we are going to learn everything about factors in the R programming language.

Get ready for an exciting tutorial, because here it comes!

Keeping you updated with latest technology trends, Join TechVidvan on Telegram

Categorical vs. Continuous Variables

In a dataset, there are two types of variables:

  1. Continuous variables
  2. Categorical variables

Continuous variables can take any values. They can be numerics, integers or characters. Examples of such variables would be names, addresses, phone numbers, etc..

Categorical variables can only take values from a finite group of possible values. They can be logical, characters, numerics or integers. Examples of categorical variables would be gender, marital status, logical values, country, etc.

Factors in R

Factors are data structures in R that store categorical data. They have a levels attribute that holds all the possible values that elements of the factor can take.

R factors can be of any type. They only allow values permitted by the levels. Factors can have NA values, if a value that is not in the levels of a factor is entered into it.

Creating a Factor

We use the factor() function to create factors. The following is the syntax of the factor() function:

factor_name=factor(x=character(),levels,labels,exclude,ordered,nmax)

Where x is a vector with the data for the factor,

levelsis an optional vector with unique values that x might take,

labels is an optional vector of labels for the levels in the factor,

exclude is a set of values that are excluded from the levels of the factor,

ordered is a logical value that determines whether the factor is an ordered or unordered factor,

nmax is an upper limit on the number of levels.

Code:

> gender <- c("male","male","female","male","female","male")
> fac1 <- factor(gender)
> fac1

Output:

creating R factor factor() fac1

Indexing R Factor

We can use the same indexing techniques as a vector to access the elements of a factor.

Forgot the basic data structure in R? Revise the concept of R vector.

1. Using positive integers

We can index factors by using positive integers or vectors of positive integers.

Code:

> fac1[3]

Code:

> fac1[c(2:4)]

Output:

indexing R factors positive integers

2. Using negative integers

We can use negative integers or vectors of negative integers to exclude certain elements from R factor.

Code:

> fac1[-2]

Code:

> fac1[c(-3,-4)]

Output:

indexing R factors negative integers

3. Using logical vectors

We can index R factors by using logical vectors.

Code:

> fac1[c(T,F,T,T,F,F)]

Code:

> fac1[c(T,F,T)] #logical index vector recycling

Output:

indexing with logical vectors - R factors

Modifying the R Factor

We can modify existing values and add new ones as well by using reassignment. The new values must be within the level of the factor.

Initial state of factor:

> fac1

Output:

[1] male male female male female male
Levels: female male

Example 1:

> fac1[2] <- "female"
> fac1

Output:

[1] male female female male female male
Levels: female male

Example 2:

> fac1[7] <- "other"

Output:

Warning message:
In `[<-.factor`(`*tmp*`, 7, value = “other”) :
invalid factor level, NA generated

Note: a factor does not allow values that are not in its level.

Final state of factor:

> fac1

Output:

[1] male female female male female male <NA>
Levels: female male

modifying R factor

Adding and Dropping Levels of R Factor

To add a value that does not exist in the level of a factor, we must add it to the level first.

Code:

> levels(fac1) <- c("female","male","other")
> fac1

Output:

adding a level to r factor

We can also remove levels from a factor by using the droplevels() function. This function removes unused levels from a factor.

Code:

> droplevels(fac1)

Output:

dropping a level from R factor

Ordered Factors in R

We can classify R factors as ordered or unordered. By default, the levels are arranged in alphabetical order and are all considered equal irrespective of their arrangement. For comparison purposes, these levels can be ordered according to increasing weight or value.

Here is an example of an unordered R factor:

Code:

> sizes <- c("s","m","s","l","m","xs","l","m","xl","xxl","s", "l","xs","xl","m","l")
> sizef <- factor(sizes)
> sizef

Output:

unordered R factor

We can provide a specific order of levels by using the levels = argument of the factor() function. However, the resultant factor is still considered unordered.

Code:

> sizef <- factor(sizes,levels=c("xs","s","m","l","xl","xxl"))
> sizef

Output:

ordered R factors providing levels

We can also use the ordered() function to create an ordered factor. This function has the same syntax as the factor() function.

Code:

> sizeof <- ordered(sizes,levels=c("xs","s","m","l","xl","xxl"))
> sizeof

Output:

ordered R factors using ordered()

We can also convert unordered factors into ordered ones by using the as.ordered() function.

Code:

> as.ordered(sizef)

Output:

converting R factors with as.ordered()

Functions of R Factors

There are a few functions that give us information about the R factor variables we use. Functions like is.factor(), as.factor(), is.ordered(), etc.. This is what these functions do:

1. is.factor()

The is.factor() function checks if a variable is a factor or not. It returns a logical value of TRUE if the variable is a factor and FALSE if it is not.

Code:

> is.factor(fac1)
> is.factor(gender)

R factor functions is.factor()

2. as.factor()

The as.factor() function converts vector inputs into factors.

> as.factor(c(TRUE,FALSE,TRUE,TRUE,FALSE))

R factor functions as.factor()

3. is.ordered()

The is.ordered() function checks if the variable is an ordered factor or not. This function returns TRUE if the variable is an ordered factor and FALSE if it is not.

> is.ordered(sizeof)
> is.ordered(fac1)

R factor functions is.ordered()

Summary

R factors are data structures that store categorical data. They only allow observations of certain predefined values. Therefore, they are useful for fields that have a limited number of possible values like gender, marital status, availability, confirmation, etc..

In this article, we learned about R factors. We learned how to create them and how to modify them. We saw what they are used for. Finally, we looked at a few functions that provide more insights into factors and convert other variables into R factors.

Before start working with R you must have a basic understanding of data structures in R.

Still, have some doubts about the R factor? Ask us and our TechVidvan experts will be happy to help you.

Keep Learning ?

2 Responses

  1. ulpiano says:

    How do we convert factor to numbers?

    • TechVidvan Team says:

      Hey Ulpiano,
      You can use the as.numeric() function to convert anything into numeric. In case of factors, if the values inside the factors are numbers then they would be converted as it is. If the values are not numbers, then they would be converted into their level’s rank.

      Glad to help. If you need anything else please let me know.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.