R Factors – Operating on Factors and Factor Levels
Factors are data structures in R that store categorical data. In datasets, there are often fields that take only a few predefined values. For example – gender, availability, country, marital status, etc. Such data is called categorical data.
R factors only allow values that are among a predefined set of possible values known as levels.
Today, we are going to learn everything about factors in the R programming language.
Get ready for an exciting tutorial, because here it comes!
Categorical vs. Continuous Variables
In a dataset, there are two types of variables:
- Continuous variables
- Categorical variables
Continuous variables can take any values. They can be numerics, integers or characters. Examples of such variables would be names, addresses, phone numbers, etc..
Categorical variables can only take values from a finite group of possible values. They can be logical, characters, numerics or integers. Examples of categorical variables would be gender, marital status, logical values, country, etc.
Factors in R
Factors are data structures in R that store categorical data. They have a levels attribute that holds all the possible values that elements of the factor can take.
R factors can be of any type. They only allow values permitted by the levels. Factors can have NA values, if a value that is not in the levels of a factor is entered into it.
Creating a Factor
We use the factor()
function to create factors. The following is the syntax of the factor()
function:
factor_name=factor(x=character(),levels,labels,exclude,ordered,nmax)
Where x
is a vector with the data for the factor,
levels
is an optional vector with unique values that x might take,
labels
is an optional vector of labels for the levels in the factor,
exclude
is a set of values that are excluded from the levels of the factor,
ordered
is a logical value that determines whether the factor is an ordered or unordered factor,
nmax
is an upper limit on the number of levels.
Code:
> gender <- c("male","male","female","male","female","male") > fac1 <- factor(gender) > fac1
Output:
Indexing R Factor
We can use the same indexing techniques as a vector to access the elements of a factor.
Forgot the basic data structure in R? Revise the concept of R vector.
1. Using positive integers
We can index factors by using positive integers or vectors of positive integers.
Code:
> fac1[3]
Code:
> fac1[c(2:4)]
Output:
2. Using negative integers
We can use negative integers or vectors of negative integers to exclude certain elements from R factor.
Code:
> fac1[-2]
Code:
> fac1[c(-3,-4)]
Output:
3. Using logical vectors
We can index R factors by using logical vectors.
Code:
> fac1[c(T,F,T,T,F,F)]
Code:
> fac1[c(T,F,T)] #logical index vector recycling
Output:
Modifying the R Factor
We can modify existing values and add new ones as well by using reassignment. The new values must be within the level of the factor.
Initial state of factor:
> fac1
Output:
Levels: female male
Example 1:
> fac1[2] <- "female" > fac1
Output:
Levels: female male
Example 2:
> fac1[7] <- "other"
Output:
In `[<-.factor`(`*tmp*`, 7, value = “other”) :
invalid factor level, NA generated
Note: a factor does not allow values that are not in its level.
Final state of factor:
> fac1
Output:
Levels: female male
Adding and Dropping Levels of R Factor
To add a value that does not exist in the level of a factor, we must add it to the level first.
Code:
> levels(fac1) <- c("female","male","other") > fac1
Output:
We can also remove levels from a factor by using the droplevels()
function. This function removes unused levels from a factor.
Code:
> droplevels(fac1)
Output:
Ordered Factors in R
We can classify R factors as ordered or unordered. By default, the levels are arranged in alphabetical order and are all considered equal irrespective of their arrangement. For comparison purposes, these levels can be ordered according to increasing weight or value.
Here is an example of an unordered R factor:
Code:
> sizes <- c("s","m","s","l","m","xs","l","m","xl","xxl","s", "l","xs","xl","m","l") > sizef <- factor(sizes) > sizef
Output:
We can provide a specific order of levels by using the levels =
argument of the factor()
function. However, the resultant factor is still considered unordered.
Code:
> sizef <- factor(sizes,levels=c("xs","s","m","l","xl","xxl")) > sizef
Output:
We can also use the ordered()
function to create an ordered factor. This function has the same syntax as the factor()
function.
Code:
> sizeof <- ordered(sizes,levels=c("xs","s","m","l","xl","xxl")) > sizeof
Output:
We can also convert unordered factors into ordered ones by using the as.ordered()
function.
Code:
> as.ordered(sizef)
Output:
Functions of R Factors
There are a few functions that give us information about the R factor variables we use. Functions like is.factor()
, as.factor()
, is.ordered()
, etc.. This is what these functions do:
1. is.factor()
The is.factor()
function checks if a variable is a factor or not. It returns a logical value of TRUE
if the variable is a factor and FALSE
if it is not.
Code:
> is.factor(fac1) > is.factor(gender)
2. as.factor()
The as.factor()
function converts vector inputs into factors.
> as.factor(c(TRUE,FALSE,TRUE,TRUE,FALSE))
3. is.ordered()
The is.ordered()
function checks if the variable is an ordered factor or not. This function returns TRUE
if the variable is an ordered factor and FALSE
if it is not.
> is.ordered(sizeof) > is.ordered(fac1)
Summary
R factors are data structures that store categorical data. They only allow observations of certain predefined values. Therefore, they are useful for fields that have a limited number of possible values like gender, marital status, availability, confirmation, etc..
In this article, we learned about R factors. We learned how to create them and how to modify them. We saw what they are used for. Finally, we looked at a few functions that provide more insights into factors and convert other variables into R factors.
Before start working with R you must have a basic understanding of data structures in R.
Still, have some doubts about the R factor? Ask us and our TechVidvan experts will be happy to help you.
Keep Learning ?