NumPy Join and Split Array

NumPy’s split() and join() functions are essential tools for working with arrays of data. split() allows you to divide an array into multiple subarrays, while join() allows you to combine multiple subarrays into a single array.

These functions are useful for a variety of tasks, such as:

  • Splitting a dataset into training, testing, and validation sets
  • Splitting a dataset into different categories
  • Combining multiple datasets into a single dataset

In this tutorial, let’s explore the split() and join() functions in more detail and elaborate on how to use them effectively.

Joining Arrays with numpy.concatenate

Joining, in NumPy, refers to combining the contents of two or more arrays into a single array. The primary function used for joining arrays is numpy.concatenate(). It concatenates arrays along a specified axis, and if the axis is not provided, it defaults to axis 0.

numpy.concatenate((a1, a2, …), axis=0, out=None, dtype=None, casting=”same_kind”)

Parameter Description Example
a1, a2, … A sequence of arrays or array-like objects that you want to concatenate. These arrays must have the same shape, except in the dimension corresponding to axis. a1, a2, etc., are the arrays to concatenate.
axis (optional) The axis along which the arrays will be joined. If axis is None, the arrays are flattened before use. The default is 0. axis is an optional parameter (default is 0).
out (optional) An optional parameter specifying the destination to place the result. The shape of out must be correct, matching that of what concatenate would have returned if no out argument were specified. out is an optional parameter.
dtype (optional) An optional parameter specifying the data type of the destination array. Cannot be provided together with out. dtype is an optional parameter (new in v1.20.0).
casting (optional, new in v1.20.0) Controls what kind of data casting may occur during the concatenation process. Defaults to ‘same_kind’. casting is an optional parameter (new in v1.20.0).
Returns The concatenated array containing the elements from the input arrays, joined along the specified axis. The function returns the concatenated array.
import numpy as np


# Creating two arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])


# Joining along rows (vertical stacking)
concatenated_array = np.concatenate((arr1, arr2))


print("Concatenated Array:")
print(concatenated_array)

Output:

Concatenated Array: [1 2 3 4 5 6]

Joining with Axis

When joining arrays, you can specify the axis along which the concatenation should occur. Here are some essential points:

If axis is 0 (the default), arrays are joined along rows (stacking vertically).
If axis is 1, arrays are joined along columns (stacking horizontally).
If axis is 2 or higher, it corresponds to higher-dimensional concatenation.

import numpy as np


# Creating two 2D arrays
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6]])


# Joining along rows (vertical stacking, axis=0)
concatenated_axis0 = np.concatenate((arr1, arr2), axis=0)


# Joining along columns (horizontal stacking, axis=1)
concatenated_axis1 = np.concatenate((arr1, arr2.T), axis=1)


print("Concatenated Along Rows (Axis 0):")
print(concatenated_axis0)


print("Concatenated Along Columns (Axis 1):")
print(concatenated_axis1)

Output for Concatenation Along Rows (Axis 0):

Concatenated Along Rows (Axis 0):
[[1 2]
[3 4]
[5 6]]

Output for Concatenation Along Columns (Axis 1):

Concatenated Along Columns (Axis 1):
[[1 2 5]
[3 4 6]]

Stacking Arrays

Stacking is a common operation when joining arrays of the same dimension along a new axis. There are three types of stacking:

Horizontal Stacking:

This stacks arrays along rows.

import numpy as np


# Creating two arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])


# Horizontal stacking
horizontal_stacked = np.hstack((arr1, arr2))


print("Horizontal Stacking:")
print(horizontal_stacked)

Output:

Horizontal Stacking:
[1 2 3 4 5 6]

Vertical Stacking:

This stacks arrays along columns.

import numpy as np


# Creating two arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])


# Vertical stacking
vertical_stacked = np.vstack((arr1, arr2))


print("Vertical Stacking:")
print(vertical_stacked)

Output:

Vertical Stacking:
[[1 2 3]
[4 5 6]]

Height Stacking:

This stacks arrays along the height dimension (for higher-dimensional arrays).

import numpy as np


# Creating two arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])


# Height stacking
height_stacked = np.dstack((arr1, arr2))


print("Height Stacking:")
print(height_stacked)

Output:

Height Stacking:
[[[1 4]
[2 5]
[3 6]]]

Splitting Arrays with numpy.array_split

The opposite of joining is splitting, where one array is divided into multiple arrays. NumPy provides a useful function for this called numpy.array_split().

numpy.split(ary, indices_or_sections, axis=0)

Parameter Description Example
ary The input numpy array to be divided into sub-arrays. ary is the input array you want to split.
indices_or_sections An integer or a 1-D array of sorted integers determines how the array will be split. If it’s an integer, it divides the array into N equal parts; if an array, it specifies where to split. indices_or_sections can be an integer or an array.
axis (optional) The axis along which to split the array. The default value is 0. axis is an optional parameter (default is 0).
Returns A list of sub-arrays as views into ary. The function returns a list of sub-arrays.
Raises ValueError is raised if indices_or_sections is given as an integer, but the split does not result in equal division. A ValueError exception may be raised if not

Splitting Arrays in NumPy

numpy.array_split() divides an array into multiple sub-arrays.

If the array cannot be divided evenly, it will adjust accordingly.

If you want strict splitting (no adjustment), you can use numpy.split(). However, it may throw errors if elements are insufficient.

Accessing Split Arrays

After splitting an array, you can access the individual sub-arrays using index notation. For example, if you split an array into three parts, you can access them as split_array[0], split_array[1], and split_array[2].

# Creating an array
arr = np.array([1, 2, 3, 4, 5, 6])


# Splitting into three parts
split_array = np.array_split(arr, 3)
print(split_array)
# Output: [array([1, 2]), array([3, 4]), array([5, 6])]


# Accessing split arrays
print(split_array[0])
# Output: [1 2]

Splitting 2-D Arrays in NumPy

For 2-D arrays, you can use functions like hsplit() (horizontal split) and vsplit() (vertical split) to split arrays along rows or columns. There’s also dsplit() for arrays with three or more dimensions.

import numpy as np


# Creating a 2-D array
arr = np.array([[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]])


# Horizontal split into two arrays
horizontal_split = np.hsplit(arr, 2)


# Vertical split into two arrays
vertical_split = np.vsplit(arr, 3)


print("Horizontal Split:")
for sub_arr in horizontal_split:
    print(sub_arr)


print("\nVertical Split:")
for sub_arr in vertical_split:
    print(sub_arr)

Output for Horizontal Split:

Horizontal Split:
[[1 2]
[4 5]
[7 8]]
[[3]
[6]
[9]]

Output for Vertical Split:

Vertical Split:
[[1 2 3]]
[[4 5 6]]
[[7 8 9]]

Difference between Join and Split in NumPy

Let’s summarize the key differences between joining and splitting arrays:

Aspect Join Split
Operation Combines multiple arrays Divides one array into multiple arrays
Primary Function numpy.concatenate() numpy.array_split() or numpy.split()
Axis Specification Choose axis for joining Choose axis for splitting
Adjustment for Uneven Adjusts for uneven data Adjusts or may throw errors for uneven data
Accessing Split Arrays Not applicable Access using index notation
Use for 2-D Arrays Stacking (vertical/horizontal/height) Splitting along rows/columns

Conclusion:

NumPy’s merging and partitioning functions offer robust capabilities for efficiently combining and segmenting arrays. Proficiency in concatenating arrays along various axes and dividing arrays into sub-arrays is essential for effective data handling in scientific computing, data analysis, and machine learning endeavors.

By gaining expertise in these functions, you’ll enhance your ability to manage intricate data manipulation tasks within your Python projects. Enjoy coding!