Numpy Statistical Functions with Examples
Stats under our Hats!
A Beginner-Friendly Tutorial on Numpy Statistical Functions
Welcome to Techvidvan’s beginner-friendly tutorial on Numpy statistical functions! Numpy is a powerful library in Python that is widely used for numerical computations and data analysis. In this tutorial, we’ll dive into some essential statistical functions offered by Numpy, along with examples to help you understand how to use them effectively.
Importing Numpy
To use Numpy in your Python code, you need to import it. Conventionally, Numpy is imported as np.
import numpy as np
Calculating Mean and Median
Mean
The mean is the average value of a set of numbers. Numpy’s np.mean() function calculates the arithmetic mean.
data = np.array([12, 18, 22, 15, 20, 25, 30, 28]) mean_value = np.mean(data) print("Mean:", mean_value)
Output:
Mean: 21.25
Median
The median is the middle value of a dataset when it’s arranged in ascending order. Numpy’s np.median() function calculates the median.
data = np.array([12, 18, 22, 15, 20, 25, 30, 28]) median_value = np.median(data) print("Median:", median_value)
Output:
Median: 21.0
Calculating Standard Deviation and Variance
Standard Deviation
The standard deviation is used to measure the dispersion of data points around the mean. Numpy’s np.std() function calculates the standard deviation of the data provided.
data = np.array([12, 18, 22, 15, 20, 25, 30, 28]) std_deviation = np.std(data) print("Standard Deviation:", std_deviation)
Output:
Standard Deviation: 5.681907957381501
Variance
Variance quantifies the spread between numbers in a dataset. Numpy’s np.var() function calculates the variance.
data = np.array([12, 18, 22, 15, 20, 25, 30, 28]) variance = np.var(data) print("Variance:", variance)
Output:
Variance: 32.25
Finding Maximum and Minimum Values
Maximum
To find the maximum value in a dataset, Numpy provides the np.max() function.
data = np.array([12, 18, 22, 15, 20, 25, 30, 28]) max_value = np.max(data) print("Maximum Value:", max_value)
Output:
Maximum Value: 30
Minimum
Similarly, to find the minimum value, you can use the np.min() function.
data = np.array([12, 18, 22, 15, 20, 25, 30, 28]) min_value = np.min(data) print("Minimum Value:", min_value)
Output:
Minimum Value: 12
Calculating Correlation Coefficient
The correlation coefficient determines the strength and direction of a linear relationship between two variables. Numpy’s np.corrcoef() function computes the correlation matrix of the data given.
data_x = np.array([15, 18, 20, 22, 25, 30, 28, 35]) data_y = np.array([50, 55, 60, 65, 70, 75, 80, 85]) correlation_matrix = np.corrcoef(data_x, data_y) correlation_coefficient = correlation_matrix[0, 1] print("Correlation Coefficient:", correlation_coefficient)
Output:
Correlation Coefficient: 0.9938079899999066
Additional Statistical Functions
Percentiles
Percentiles help you understand the distribution of data. Numpy’s np.percentile() function calculates the value below which a given percentage of observations fall.
data = np.array([12, 18, 22, 15, 20, 25, 30, 28]) percentile_75 = np.percentile(data, 75) print("75th Percentile:", percentile_75)
Output:
75th Percentile: 27.5
Histogram
Creating histograms is essential for visualizing data distribution. Numpy’s np.histogram() function computes the frequency of values within specified bins.
data = np.array([12, 18, 22, 15, 20, 25, 30, 28]) hist, bins = np.histogram(data, bins=3) print("Histogram:", hist) print("Bin Edges:", bins)
Output:
Histogram: [3 3 2]
Bin Edges: [12. 16. 20. 24. 28. 32. 36.]
Covariance
Covariance measures the relationship between two sets of data. Numpy’s np.cov() function calculates the covariance matrix.
data_x = np.array([15, 18, 20, 22, 25, 30, 28, 35]) data_y = np.array([50, 55, 60, 65, 70, 75, 80, 85]) covariance_matrix = np.cov(data_x, data_y) covariance = covariance_matrix[0, 1] print("Covariance:", covariance)
Output:
Covariance: 62.57142857142857
Conclusion
Congratulations! You’ve learned some fundamental Numpy statistical functions through this Techvidvan tutorial. These functions are incredibly useful for analyzing and understanding your data. Feel free to experiment with different datasets and explore more advanced statistical functions that Numpy offers. Happy coding!