One of the most common distributions is Normal (or Gaussian) PDF. A lot of natural phenomena which observed randomly follows this distribution when number of observations is large.
I am going to use a Python programming language to build a normal distribution.
For my model I decided to obtain CPU utilisation data. I am running a Windows on my machine. "psutil" library can be used to measure CPU utilisation at a given time. As shown in the code below I perform 50 measurements in total, with 5s time delta between the measurements.
import matplotlib.pyplot as plt
import psutil
import time
raw_data = []
for i in range(50):
util = psutil.cpu_percent()
time.sleep(5)
raw_data.append(util)
plt.ylabel('CPU Utilisation, %')
plt.xlabel('Measurement sequence')
plt.plot(raw_data)
import psutil
import time
raw_data = []
for i in range(50):
util = psutil.cpu_percent()
time.sleep(5)
raw_data.append(util)
plt.ylabel('CPU Utilisation, %')
plt.xlabel('Measurement sequence')
plt.plot(raw_data)
Now I am going to calculate and plot a Normal Distribution using built in python functions. First, i need to calculate mean and standard deviation values:
import math
from scipy.stats import norm
mean, std = norm.fit(raw_data)
print "mean", mean
print "std", std
Result:
mean 10.69 std 3.83474901395
Second, i need to define the range of my random variables. Each random variable represents a certain CPU utilisation value. I assume that all random variables are within four standard deviations range from the mean.
range_value = int(mean + 4*std)
X = [i for i in range(range_value)]
And finally pass the mean, std and x to the pdf function and display the chart.
p = norm.pdf(X, mean, std)
plt.ylabel('PDF(X)')
plt.xlabel('X')
plt.plot(X,p)
The mean, or the expected value of the variable (10.69), is the centroid of the pdf.
Python offers very efficient and easy to use implementation of normal distribution. However I decided to implement it myself in order to understand how it works in detail.
First, I calculated mean and standard deviation.
Mean can be calculated as a sum of all CPU utilisation values divided by the number of items.
sum = raw_data.sum()
number_of_el = raw_data.shape[0]
mean = sum/number_of_el
print "mean_value", mean
Result:
mean_value 10.69
A standard deviation quantifies variation of data around the mean.
It can be calculated as a square root of a sum of of squared differences between mean and a cpu utilisation value divided by the number of items.std = math.sqrt(np.asarray([(x-mean)**2 for x in raw_data]).sum()/number_of_el)
print "std", std
Result:
std 3.83474901395
Normal PDF is defined by the following formula:
And python implementation:
range_value = int(mean + 4*std)
for x in range(range_value):
f_x = (1/(std*math.sqrt(2*math.pi)))*math.exp((-(x-mean)**2)/(2*std**2))
p.append(f_x)
plt.ylabel('PDF(X)')
plt.xlabel('X')
plt.plot(p)
The obtained pdf function looks exactly the same as in the previous example with built-in
Python implementation:






No comments:
Post a Comment