Performance Tuning in R- For Efficient R Programming

by TechVidvan Team

In this article of the TechVidvan’s R tutorial series, we are going look at several techniques to improve R code and its performance. We shall discuss various factors that help in improving the performance of R programs and factors that should be avoided as they degrade the performance. Then we will take a look at some tips of Performance Tuning in R. We will learn how to make it faster and how to find why it is slow.

Performance Tuning in R

1. Is R slow?

R was developed as a learning aid for students of statistics in 1992 by statisticians Ross Ihaka and Robert Gentleman. It was not developed with efficiency or speed in mind. R major advantage is its flexibility in syntax. It was created with convenience in mind. This, however, means that many important good and efficient coding practices and computer science principles are ignored in the R programming language. Essentially, R trades convenience for performance and speed. So, R is a bit slower than other modern programming languages, but that does not mean that it has to stay that way. With good syntax and coding practices, R programs can be made faster.

How to Write R Code That Runs Faster?

Here are a few tips to improve the performance of your R code:

1. Use matrix and vector operations as much as possible. Vector and matrix operations are much less resource-intensive than operations on arrays and data frames.

2. Use double instead of rep. To create vectors, the double() function is faster and less memory consuming than the rep() function.

3. Avoid using data frames wherever possible. You can use matrices instead of data frames when possible as data frames can cause memory management issues and are also slower to process.

4. Avoid creating too many objects in your environment. If there are too many objects in the environment, the execution of code and processing of data will become that much slower.

5. Avoid changing the type and size of objects too much. Changing the type or size of an object results in the reallocation of memory which can cause memory issues and may result in the R program taking too much space.

6. Split big objects into smaller ones to operate on them. Operating on large data objects is slow and resource-intensive. Splitting them into smaller objects and operating on them individually might be more efficient.

2. Functional Programming and the Memory Issues With it

R is primarily a functional programming language. This means that an R program can be thought of as a collection of functions and that most functions in R are first-class functions. This also means that objects in R are immutable. This means that no one can change object in R. When we change an object in any way, R creates a new object and destroys the new one. This causes memory issues.

If a lot of altering and updating takes place, memory reallocation will take place again and again. This will result in a significant unnecessary load on the system’s memory. It will also result in a load on the system processing power resulting in slower execution of R code.

3. Avoid Using Loops

R is a vector-based language. As a result, R does not need loops to operate on a set of values for example vectors. Apart from this, R also includes many functions that automatically operate iteratively on other data objects like matrices, arrays, and data frames. This does not mean that R programs do not require loops for anything.

However, this also means that R is not particularly efficient in handling looping. R programmers avoid loops as much as possible and with good reason.

4. The Rprof Function

The rprof() function allows you to profile the execution of your R code. The syntax of the rprof function is as follows:

Rprof( filename = "Rprof.out", append = FALSE, interval = 0.02, memory.profiling = FALSE, gc.profiling = FALSE, line.profiling = FALSE, numfiles = 100L, bufsize = 10000L)

where,

filename is the name of the file where the result of the profiling is stored,
append is a logical that controls whether the results should be overwritten in the file or appended after existing contents,
interval specifies the real-time intervals between samples,
memory.profiling is a logical that controls whether the memory usage data should be stored in the output file,
gc.profiling is a logical that controls whether GC status should be recorded or not,
line.profiling is a logical that controls whether line locations should be recorded to the output file or not,
numfiles and bufsize are integers that specify the memory allocation for line profiling.

Profiling the R code gives us the opportunity to identify any bottlenecks and sections of codes that may have more efficient alternatives.

By running the rprof() function with a NULL argument, we can stop the profiling mode and return R to default execution.

Summary

In this chapter of the TechVidvan’s R tutorial series, we learn about various techniques and tips that can help us in writing more efficient R code as well as improve the performance of our R programs. We learned about functional programming and how that affects the memory management of R programs. We also studied the drawback of using loops in R programs.

Finally, we looked at the Rprof() function and how profiling helps us in identifying flaws and bottlenecks in our code. These are some of the ways of Performance Tuning in R for efficient R Programming.