# Regularization in Machine Learning to Prevent Overfitting

In machine learning, we face a lot of problems while working with data. These problems can affect the accuracy of your ML model.

So, to tackle these situations, we have various methods and techniques. Regularization is one of them.

This method helps in reducing the problem of overfitting, which is very common and can lead to unwanted errors.

So, in this article, we will be looking at regularization and how it is studied and used.

We would be using some mathematical functions and graph images. These would help us to understand the mathematics part of ML, which is very essential to understand.

We would also be looking at the types of regularization in this article.

So, let’s begin. Keeping you updated with latest technology trends, Join TechVidvan on Telegram

## What is Regularization in Machine Learning?

Regularization is a type of regression, which solves the problem of overfitting in data. This helps to ensure the better performance and accuracy of the ML model.

First, let’s understand why we face overfitting in the first place.

This happens when the ML model includes useless datapoints as well. By useless datapoints, we mean that the points that are less related to the actual data.

When this happens, the model processes both useful and useless data. This would give a less accurate result.

Hence, it would result in the model being more flexible towards learning from all types of datapoints and we do not want that to happen.

It leads to an increase in complexity and slower performance of the model.

We can also call these irrelevant datapoints as noise. So, we can say that, with the help of regularization, we remove noise from data.

There is also a concept of balancing of variance and bias.

This concept helps to understand overfitting and how to tackle it with various mathematical and statistical methods.

## How does Regularization work in ML?

The main idea of regularization is to solve overfitting. To do that, a penalty is imposed on models, which are very complex.

The algorithm would also include a loss function, which would help in reducing the overfitting problem.

We can see how this method works in mathematical terms.

Regularization means shrinking or limiting of the coefficient to near about zero. Since it is a type of regression, we can start by having a simple linear regression relation.

Y ≈ β0 + β1X1 + β2X2 + β3X3 + …… + βnXn

We can also write this as,

Y ≈ β0 + Σ βjXij

Or

Yi – β0 – Σ βjXij ≈ 0 …..eq(1)

Here, the summation is from j=1 to p. Here β0 is the bias and β1…p are the weights or coefficients.

Now, for regularization, we use the RSS or Residual Sum of Squares formula. This is the loss function, which we will be using.

In this, the coefficients used will be adjusted and then minimized to zero by regularization.

Here, the summation is from i=0 to n.

RSS is also called a linear regression objective without regularization.

With this algorithm, the model learns. It will adjust weights in the training data if there is any noise.

In the case of overfitting, the estimated coefficients will not be able to generalize on the unseen data. This is where regularization comes in.

It imposes a penalty on the magnitude of the coefficients and shrinks them to zero.

Note:- Here, X1……XP are the features or feature sets. We also have coefficients accompanying them.

Β1….βP are the weights or the coefficients of the features here. Β0 is the bias here.

For reducing overfitting, the algorithm will require a loss function which includes optimized parameters like weight and bias.

The collective work of all these functions in the algorithm helps the model to predict the accurate value of Y.

There are also some particular ways to impose a penalty on coefficients. We will look at it next.

## Types of Regularization in Machine Learning There are several techniques involving regularization.

But for this article, we will be focusing on two of the most important types.

### 1. Ridge Regression ( L2 Regularization)

In this regularization, the loss function RSS modifies by the addition of a penalty. The penalty, in this case, is the square of the magnitude of coefficients.

Here, we will be learning about some new terms. First, let’s look at the modified mathematical expression of the loss function.

Modified Loss Function = RSS + αΣ(βj)2

The expression of RSS is given above. Also, the summation is from j=1 to p.

The expression that is added to RSS is called the shrinkage quantity. The alpha symbol is the tuning parameter.

This modified loss function can now estimate the coefficients. The tuning parameter decides how much we should penalize our model.

Penalizing the model affects the flexibility of the model.

If the flexibility is high, the magnitude of coefficients would be high. This results in overfitting.

On the other hand, if flexibility is low, then the magnitude of coefficients will be low as well. This is done by minimizing the above loss function.

Also, it is important to note that, β0 is not minimized, but all other coefficients are. Β0 is the bias and it gives mean value as a response when all the features X1…Xp is 0.

We have some particular cases for the tuning parameter.

• If α=0, then the penalty would have no effect. It will return RSS as the resultant loss function and the coefficients would be the same.
• If α=∞, then the impact of the penalty grows. The coefficient estimates, in this case, will tend towards zero.
• If 0<α<∞, then the ridge regression coefficient would be somewhere between 0 and 1.

This is the importance of a tuning parameter.

The methods of Ridge Regression are known as L2 norms.

### 2. Lasso Regression ( L1 Regularization)

Only in this case, the penalty is the absolute value of the coefficient rather than the sum of the square of their magnitudes.

The mathematical representation of the modified loss function will be:

Modified Loss Function = RSS + αΣ|βj|

This is called the L1 norm. The conditions for the tuning parameter are exactly the same as in the L2 norm.