Numpy Statistical Functions with Examples

Stats under our Hats!

A Beginner-Friendly Tutorial on Numpy Statistical Functions

Welcome to Techvidvan’s beginner-friendly tutorial on Numpy statistical functions! Numpy is a powerful library in Python that is widely used for numerical computations and data analysis. In this tutorial, we’ll dive into some essential statistical functions offered by Numpy, along with examples to help you understand how to use them effectively.

Importing Numpy

To use Numpy in your Python code, you need to import it. Conventionally, Numpy is imported as np.

import numpy as np

Calculating Mean and Median

Mean

The mean is the average value of a set of numbers. Numpy’s np.mean() function calculates the arithmetic mean.

data = np.array([12, 18, 22, 15, 20, 25, 30, 28])
mean_value = np.mean(data)
print("Mean:", mean_value)

Output:

Mean: 21.25

Median

The median is the middle value of a dataset when it’s arranged in ascending order. Numpy’s np.median() function calculates the median.

data = np.array([12, 18, 22, 15, 20, 25, 30, 28])
median_value = np.median(data)
print("Median:", median_value)

Output:

Median: 21.0

Calculating Standard Deviation and Variance

Standard Deviation

The standard deviation is used to measure the dispersion of data points around the mean. Numpy’s np.std() function calculates the standard deviation of the data provided.

data = np.array([12, 18, 22, 15, 20, 25, 30, 28])
std_deviation = np.std(data)
print("Standard Deviation:", std_deviation)

Output:

Standard Deviation: 5.681907957381501

Variance

Variance quantifies the spread between numbers in a dataset. Numpy’s np.var() function calculates the variance.

data = np.array([12, 18, 22, 15, 20, 25, 30, 28])
variance = np.var(data)
print("Variance:", variance)

Output:

Variance: 32.25

Finding Maximum and Minimum Values

Maximum

To find the maximum value in a dataset, Numpy provides the np.max() function.

data = np.array([12, 18, 22, 15, 20, 25, 30, 28])
max_value = np.max(data)
print("Maximum Value:", max_value)

Output:

Maximum Value: 30

Minimum

Similarly, to find the minimum value, you can use the np.min() function.

data = np.array([12, 18, 22, 15, 20, 25, 30, 28])
min_value = np.min(data)
print("Minimum Value:", min_value)

Output:

Minimum Value: 12

Calculating Correlation Coefficient

The correlation coefficient determines the strength and direction of a linear relationship between two variables. Numpy’s np.corrcoef() function computes the correlation matrix of the data given.

data_x = np.array([15, 18, 20, 22, 25, 30, 28, 35])
data_y = np.array([50, 55, 60, 65, 70, 75, 80, 85])
correlation_matrix = np.corrcoef(data_x, data_y)
correlation_coefficient = correlation_matrix[0, 1]
print("Correlation Coefficient:", correlation_coefficient)

Output:

Correlation Coefficient: 0.9938079899999066

Additional Statistical Functions

Percentiles

Percentiles help you understand the distribution of data. Numpy’s np.percentile() function calculates the value below which a given percentage of observations fall.

data = np.array([12, 18, 22, 15, 20, 25, 30, 28])
percentile_75 = np.percentile(data, 75)
print("75th Percentile:", percentile_75)

Output:

75th Percentile: 27.5

Histogram

Creating histograms is essential for visualizing data distribution. Numpy’s np.histogram() function computes the frequency of values within specified bins.

data = np.array([12, 18, 22, 15, 20, 25, 30, 28])
hist, bins = np.histogram(data, bins=3)
print("Histogram:", hist)
print("Bin Edges:", bins)

Output:

Histogram: [3 3 2]
Bin Edges: [12. 16. 20. 24. 28. 32. 36.]

Covariance

Covariance measures the relationship between two sets of data. Numpy’s np.cov() function calculates the covariance matrix.

data_x = np.array([15, 18, 20, 22, 25, 30, 28, 35])
data_y = np.array([50, 55, 60, 65, 70, 75, 80, 85])
covariance_matrix = np.cov(data_x, data_y)
covariance = covariance_matrix[0, 1]
print("Covariance:", covariance)

Output:

Covariance: 62.57142857142857

Conclusion

Congratulations! You’ve learned some fundamental Numpy statistical functions through this Techvidvan tutorial. These functions are incredibly useful for analyzing and understanding your data. Feel free to experiment with different datasets and explore more advanced statistical functions that Numpy offers. Happy coding!