Serialization in Python

TechVidvan Team

3 years ago

Serialization, as it relates to data storage, is the act of converting a data model or entity status into a representation that can be transferred and later reassembled, or saved (for instance, in a disk or ram buffer).

An item is changed into a storable format known as serialization so that it can later be deserialized and restored to its original form using the serialized format.

We use the procedures and methods from the Python Pickle module when we wish to serialize and de-serialize a Python object. The process of turning a Py object into a stream of bytes is termed pickling.

This is often referred to as “serialization,” “marshaling,” or “flattening.” Pickling is the opposite, changing a binary code or object that resembles bytes into an object from a stream of bytes.

Why is Serialization required in Python?

The process of serialization involves putting the entity into a format suitable for storage or transmission. It is quite handy for us to continue utilizing the saved object later on rather than having to recreate it from scratch after transmitting or retaining the serialized data because we can reconstruct the item later and get the same structure/object.

How to perform serialization in Python?

There are numerous serialization assets available in Python. The JSON file format is a famous example of hash maps that can be used with a variety of texts and is accessible to people. It allows us to maintain and rebuild the dictionary using a similar structure. But JSON can only keep texts and numbers, as well as simple structures like a list and dictionary. JSON cannot be asked to retain the type of data. Furthermore, it is unable to discriminate between lists and Python tuples.

We will examine pickle a popular Python serialization library, in the section that follows.

The pickle module, which is a component of the Python codebase, offers deserialization and serialization techniques for Python objects.

Pickle must be imported into Python to begin:

Input:

# Importing the library
import pickle

We can then use the pickle’s dump() method to encode a Python object, such as a dictionary, and save the character sequence as a file.

Input:

dic = {"Hii": "Everyone!"}
with open("demo.pickle", "wb") as outfile:
      	# Loading the document in binary form
      	pickle.dump(dic, outfile)

The file “demo.pickle” now contains the byte stream “dic”.We used the pickle’s load() method to read the serialized byte streams from the file to retrieve the original object.

with open(“demo.pickle”, “rb”) as infile:

Input:

dic_new = pickle.load(infile)

Be careful of what site you get your information from. The site may have content that is dangerous to your computer. It can do bad things to your computer while you are getting the information.

By combining them, the following code demonstrates that Pickle may retrieve the same entity:

Input:

import pickle
# Demo entity
dic = {"Hii": "Everyone!"}
# Serializing process
with open("demo.pickle", "wb") as outfile:
    pickle.dump(dic, outfile)
print("Reading entity", dic)
# Deserialization
with open("demo.pickle", "rb") as infile:
    dic_new = pickle.load(infile)
print("Rebuilding entity ",dic_new)
if dic ==dic_new:
    print("Successfully rebuilt")

We may acquire the serialized object in Python as an array of bytes type using pickle’s dumps() function in addition to writing the serialized data into a pickle file:

Input:

array_dic = pickle.dumps(dic)

Likewise, we may change from an array of bytes type to the primary object using Pickle’s load method:

Input:

array_dic_new = pickle.loads(array_dic)

Pickle’s ability to serialize nearly any Py entity, including user-defined ones like the ones below, is one of its many advantages.

Input:

import pickle
 class Demo:
    def __init__(self, data):
        print(data)
        self.data = data
# Build an object of class Demo
Demo_obj = Demo(1)
# Serialization and deserialization
info = pickle.dumps(Demo_obj)
rebuilt = pickle.loads(info)
# Confirmation
print("Data from rebuilt entity:", rebuilt.data)

Output:

Data from rebuilt entity: 1

Note that when pickle.loads() is called, the print statement from the class’ function Object() { [native code] } is not executed. This is because the item was rebuilt, not created.

Given that Python functions are first-class objects, Pickle can even serialize Python functions:

Input:

import pickle
def fun():
    return "Hii Everyone!!"
# Serialization and deserialization
p_fun = pickle.dumps(fun)
fun_new= pickle.loads(p_fun)
# Confirmation
print (fun_new()) #prints “Hii Everyone!!”

What can be pickled and unpickled in Python?

Python’s object structure is processed by a procedure known as pickling. During pickling, a Py entity is transformed into a stream of bytes. Unpickling is the method of recovering the original Python objects from the pickle file’s stored text version. The byte stream is changed into a Py entity.

For instance, socket, file handler, database networks, and other items are typically not pickable. By default, pickling is an option for everything that is constructed (recursively) from fundamental Py categories (dicts, lists, primitives, entities, object pointers, even circular).

You can write custom pickling code to, for instance, backup and restore the settings of a database server, but this requires unique, proprietary logic.

How to ensure security

Pickle can be used to safeguard our work. Instead of retraining the model each time it is used, for instance, a learning algorithm via Keras or scikit-learn could be serialized by pickle and retrieved later. The next section demonstrates how we can use Keras to create a LeNet5 model to recognize the MNIST handwritten digits before serializing the model once it has been trained. The model can then be rebuilt without having to train it once again, and it must yield the same outcome as the initial formulation:

Input:

# Importing Libraries
import pickle
import numpy as np
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, Dense, AveragePooling2D, Dropout, Flatten
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import EarlyStopping
 
# Loading the MNIST digits
(XTrain, YTrain), (XTest, YTest) = mnist.load_data()
 
# Reshaping the data
XTrain = np.expand_dims(XTrain, axis=3).astype("float32")
XTest = np.expand_dims(XTest, axis=3).astype("float32")
 
# Encoding the output
YTrain = to_categorical(YTrain)
YTest = to_categorical(YTest)
 
# LeNet5 model
model = Sequential([
    Conv2D(6, (5,5), input_shape=(28,28,1), padding="same", activation="tanh"),
    AveragePooling2D((2,2), strides=2),
    Conv2D(16, (5,5), activation="tanh"),
    AveragePooling2D((2,2), strides=2),
    Conv2D(120, (5,5), activation="tanh"),
    Flatten(),
    Dense(84, activation="tanh"),
    Dense(10, activation="softmax")
])
 
# Training the model
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
earlystopping = EarlyStopping(monitor="val_loss", patience=4, restore_best_weights=True)
model.fit(XTrain, YTrain, validation_data=(XTest, YTest), epochs=100, batch_size=32, callbacks=[earlystopping])
 
# Evaluation of the model
print(model.evaluate(XTest, YTest, verbose=0))
 
# Using Pickle for serialization and deserialization
p_model = pickle.dumps(model)
new_model = pickle.loads(p_model)
 
# Evaluation
print(new_model.evaluate(XTest, YTest, verbose=0))

The output from the aforementioned code is as follows. The following two lines precisely tie together the test scores from the actual and recreated models:

Output:

Epoch 1/100 1875/1875 [==============================] – 48s 25ms/step – loss: 0.1495 – accuracy: 0.9561 – val_loss: 0.0650 – val_accuracy: 0.9796 Epoch 2/100 1875/1875 [==============================] – 49s 26ms/step – loss: 0.0640 – accuracy: 0.9805 – val_loss: 0.0502 – val_accuracy: 0.9835 Epoch 3/100 1875/1875 [==============================] – 47s 25ms/step – loss: 0.0504 – accuracy: 0.9841 – val_loss: 0.0472 – val_accuracy: 0.9860 Epoch 4/100 1875/1875 [==============================] – 47s 25ms/step – loss: 0.0406 – accuracy: 0.9870 – val_loss: 0.0443 – val_accuracy: 0.9861 Epoch 5/100 1875/1875 [==============================] – 46s 24ms/step – loss: 0.0352 – accuracy: 0.9882 – val_loss: 0.0491 – val_accuracy: 0.9835 Epoch 6/100 1875/1875 [==============================] – 47s 25ms/step – loss: 0.0303 – accuracy: 0.9900 – val_loss: 0.0379 – val_accuracy: 0.9868 Epoch 7/100 1875/1875 [==============================] – 46s 24ms/step – loss: 0.0289 – accuracy: 0.9907 – val_loss: 0.0471 – val_accuracy: 0.9860 Epoch 8/100 1875/1875 [==============================] – 46s 25ms/step – loss: 0.0252 – accuracy: 0.9919 – val_loss: 0.0390 – val_accuracy: 0.9872 Epoch 9/100 1875/1875 [==============================] – 46s 24ms/step – loss: 0.0208 – accuracy: 0.9934 – val_loss: 0.0435 – val_accuracy: 0.9869 Epoch 10/100 1875/1875 [==============================] – 48s 26ms/step – loss: 0.0206 – accuracy: 0.9930 – val_loss: 0.0376 – val_accuracy: 0.9890 Epoch 11/100 1875/1875 [==============================] – 47s 25ms/step – loss: 0.0205 – accuracy: 0.9935 – val_loss: 0.0448 – val_accuracy: 0.9865 Epoch 12/100 1875/1875 [==============================] – 47s 25ms/step – loss: 0.0182 – accuracy: 0.9939 – val_loss: 0.0402 – val_accuracy: 0.9884 Epoch 13/100 1875/1875 [==============================] – 47s 25ms/step – loss: 0.0157 – accuracy: 0.9948 – val_loss: 0.0468 – val_accuracy: 0.9871 Epoch 14/100 1875/1875 [==============================] – 47s 25ms/step – loss: 0.0134 – accuracy: 0.9958 – val_loss: 0.0416 – val_accuracy: 0.9877 [0.03760313242673874, 0.9890000224113464]

Conclusion

Pickle is a strong library, but there are certain restrictions on what should be pickled. Live connections, like database servers and active file handles, for instance, cannot be pickled. Recreating these items creates a challenge because it requires the right credentials and is outside the range of what Pickle is meant for because it cannot establish the interaction with the database or directory for you.

You learned what serialization is in this blog and how to employ Python libraries to serialize objects like dictionaries and Tensorflow Keras models. Additionally, you now know the benefits and drawbacks of the Python library(pickle).