# R Factors – Operating on Factors and Factor Levels

Factors are data structures in R that store categorical data. In datasets, there are often fields that take only a few predefined values. For example – gender, availability, country, marital status, etc. Such data is called categorical data.

R factors only allow values that are among a predefined set of possible values known as levels.

Today, we are going to learn everything about factors in the R programming language.

Get ready for an exciting tutorial, because here it comes!

### Categorical vs. Continuous Variables

In a dataset, there are two types of variables:

1. Continuous variables
2. Categorical variables

Continuous variables can take any values. They can be numerics, integers or characters. Examples of such variables would be names, addresses, phone numbers, etc..

Categorical variables can only take values from a finite group of possible values. They can be logical, characters, numerics or integers. Examples of categorical variables would be gender, marital status, logical values, country, etc.

## Factors in R

Factors are data structures in R that store categorical data. They have a levels attribute that holds all the possible values that elements of the factor can take.

R factors can be of any type. They only allow values permitted by the levels. Factors can have NA values, if a value that is not in the levels of a factor is entered into it.

### Creating a Factor

We use the `factor()` function to create factors. The following is the syntax of the `factor()` function:

`factor_name=factor(x=character(),levels,labels,exclude,ordered,nmax)`

Where` x` is a vector with the data for the factor,

`levels`is an optional vector with unique values that x might take,

`labels` is an optional vector of labels for the levels in the factor,

`exclude` is a set of values that are excluded from the levels of the factor,

`ordered` is a logical value that determines whether the factor is an ordered or unordered factor,

`nmax` is an upper limit on the number of levels.

Code:

```> gender <- c("male","male","female","male","female","male")
> fac1 <- factor(gender)
> fac1```

Output:

### Indexing R Factor

We can use the same indexing techniques as a vector to access the elements of a factor.

Forgot the basic data structure in R? Revise the concept of R vector.

#### 1. Using positive integers

We can index factors by using positive integers or vectors of positive integers.

Code:

`> fac1[3]`

Code:

`> fac1[c(2:4)]`

Output:

#### 2. Using negative integers

We can use negative integers or vectors of negative integers to exclude certain elements from R factor.

Code:

`> fac1[-2]`

Code:

`> fac1[c(-3,-4)]`

Output:

#### 3. Using logical vectors

We can index R factors by using logical vectors.

Code:

`> fac1[c(T,F,T,T,F,F)]`

Code:

`> fac1[c(T,F,T)] #logical index vector recycling`

Output:

### Modifying the R Factor

We can modify existing values and add new ones as well by using reassignment. The new values must be within the level of the factor.

Initial state of factor:

`> fac1`

Output:

[1] male male female male female male
Levels: female male

Example 1:

```> fac1[2] <- "female"
> fac1```

Output:

[1] male female female male female male
Levels: female male

Example 2:

`> fac1[7] <- "other"`

Output:

Warning message:
In `[<-.factor`(`*tmp*`, 7, value = “other”) :
invalid factor level, NA generated

Note: a factor does not allow values that are not in its level.

Final state of factor:

`> fac1`

Output:

[1] male female female male female male <NA>
Levels: female male

### Adding and Dropping Levels of R Factor

To add a value that does not exist in the level of a factor, we must add it to the level first.

Code:

```> levels(fac1) <- c("female","male","other")
> fac1```

Output:

We can also remove levels from a factor by using the `droplevels()` function. This function removes unused levels from a factor.

Code:

`> droplevels(fac1)`

Output:

### Ordered Factors in R

We can classify R factors as ordered or unordered. By default, the levels are arranged in alphabetical order and are all considered equal irrespective of their arrangement. For comparison purposes, these levels can be ordered according to increasing weight or value.

Here is an example of an unordered R factor:

Code:

```> sizes <- c("s","m","s","l","m","xs","l","m","xl","xxl","s", "l","xs","xl","m","l")
> sizef <- factor(sizes)
> sizef```

Output:

We can provide a specific order of levels by using the `levels =` argument of the `factor()` function. However, the resultant factor is still considered unordered.

Code:

```> sizef <- factor(sizes,levels=c("xs","s","m","l","xl","xxl"))
> sizef```

Output:

We can also use the `ordered()` function to create an ordered factor. This function has the same syntax as the `factor()` function.

Code:

```> sizeof <- ordered(sizes,levels=c("xs","s","m","l","xl","xxl"))
> sizeof```

Output:

We can also convert unordered factors into ordered ones by using the `as.ordered()` function.

Code:

`> as.ordered(sizef)`

Output:

### Functions of R Factors

There are a few functions that give us information about the R factor variables we use. Functions like `is.factor()`, `as.factor()`, `is.ordered()`, etc.. This is what these functions do:

#### 1. `is.factor()`

The `is.factor()` function checks if a variable is a factor or not. It returns a logical value of `TRUE` if the variable is a factor and `FALSE` if it is not.

Code:

```> is.factor(fac1)
> is.factor(gender)```

#### 2. `as.factor()`

The `as.factor()` function converts vector inputs into factors.

`> as.factor(c(TRUE,FALSE,TRUE,TRUE,FALSE))`

#### 3. `is.ordered()`

The `is.ordered()` function checks if the variable is an ordered factor or not. This function returns `TRUE` if the variable is an ordered factor and `FALSE` if it is not.

```> is.ordered(sizeof)
> is.ordered(fac1)```

## Summary

R factors are data structures that store categorical data. They only allow observations of certain predefined values. Therefore, they are useful for fields that have a limited number of possible values like gender, marital status, availability, confirmation, etc..

In this article, we learned about R factors. We learned how to create them and how to modify them. We saw what they are used for. Finally, we looked at a few functions that provide more insights into factors and convert other variables into R factors.

Before start working with R you must have a basic understanding of data structures in R.