Customer Segmentation using Machine Learning

With this Machine Learning Project, we are going to do customer segmentation on data. This data is from a mall. We are going to divide its customers into different groups. This is a very important project in data science because you will be always analyzing such kinds of data and dividing your customers into groups. For this project, we are going to use KMeans clustering algorithm.

So, let’s start with the project.

Customer Segmentation

Today, many businesses are going online and therefore online marketing is essential to retain customers. However, considering all customers as equal and targeting them all with similar marketing strategies is not an efficient way, since it also annoys the customers by neglecting their individuality, so customer segmentation has become very popular and has also become a viable solution.

The goal of customer segmentation is to divide the company’s customers based on their demographic characteristics (age, gender, marital status) and their behavior characteristics (types of products ordered, annual income). It’s a better approach for customer segmentation to focus on behavioral aspects rather than demographic characteristics since they do not emphasize individuality of customers.

The Model Architecture

It is often used in marketing to create customer segments and understand the behaviors of these segments. The K-means clustering algorithm provides insight into formats and differences in a database. The Python environment offers a great environment for building assemblies

With K-means Clustering, we are given data points with their data sets and features, and the mechanism is to classify their data into clusters according to their similarities. K clusters are formed based on the similarity of the data. A Euclidean distance measurement method is used by KMeans to calculate similarity.

  1. This is step one. A random initialization of k points is done in the first step.
  2. Using K-means, each data point is classified according to its nearest mean, and its coordinates are rewritten according to the mean.
  3. Iteration is continuing up till all data points are classified.

Project Prerequisites

The required modules for this project are :

  1. Pandas – pip install pandas
  2. Numpy- pip install numpy
  3. Seaborn – pip install seaborn
  4. Matplotlib – pip install matplotlib

Customer Segmentation Project

The dataset for this project is a csv file. This file contains four columns which are customer_id, genre, age, annual_income, and spending_score. This data is from customers of a mall. We are going to analyze this data and divide its customers into segments. Please download customer segmentation machine learning project along with the dataset from the following link: Customer Segmentation Project

Steps to Implement Customer Segmentation Project

1. Importing all the modules to be used in the project.

import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
import seaborn as sns
 
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from tensorflow.keras.layers import Input,Conv2D,MaxPooling2D,Dropout,Flatten,Dense,Activation,BatchNormalization,add
from tensorflow.keras.models  import Model,Sequential
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.utils import plot_model
from tensorflow.keras.applications.vgg16 import VGG16,preprocess_input

import os

2. Here we are reading our dataset.

df = pd.read_csv('Mall_Customers.csv')

df.info()

customer segmentation dataset

3. Here we are defining our KMeans Model and we are fitting our training data.

X = df.iloc[:, [3,4]].values
 
from sklearn.cluster import KMeans
wcss=[]
for i in range(1,11):
    kmeans = KMeans(n_clusters= i, init='k-means++', random_state=0)
    kmeans.fit(X)
    wcss.append(kmeans.inertia_)

4. Here we are plotting the result.

plt.plot(range(1,11), wcss)
plt.title('The Elbow Method')
plt.xlabel('no of clusters')
plt.ylabel('wcss')

plt.show()

elbow method plot

5. Here we are passing the testing data and we are predicting the result and we are plotting the data as clusters of the points.

kmeansmodel = KMeans(n_clusters= 5, init='k-means++', random_state=0)
y_kmeans= kmeansmodel.fit_predict(X)
 
plt.scatter(X[y_kmeans == 0, 0], X[y_kmeans == 0, 1], s = 80, c = 'red', label = 'Customer 1')
plt.scatter(X[y_kmeans == 1, 0], X[y_kmeans == 1, 1], s = 80, c = 'blue', label = 'Customer 2')
plt.scatter(X[y_kmeans == 2, 0], X[y_kmeans == 2, 1], s = 80, c = 'green', label = 'Customer 3')
plt.scatter(X[y_kmeans == 3, 0], X[y_kmeans == 3, 1], s = 80, c = 'cyan', label = 'Customerr 4')
plt.scatter(X[y_kmeans == 4, 0], X[y_kmeans == 4, 1], s = 80, c = 'magenta', label = 'Customer 5')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s = 100, c = 'yellow', label = 'Centroids')

plt.title('Clusters of customers')

plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')

plt.legend()

plt.show()

customer segmentation output

Summary

In this Machine Learning project, we learned how to implement customer segmentation. This is a really important project for data science. We are going to analyze such kinds of data a lot of times in data science. For segments, we have used kmeans clustering algorithm.