Sunday, July 30, 2017

Why Python Numpy is fast


The dot product of two vectors a and be is the sum of the products of its coordinates. It can also be written as:






where b in power T is the transpose of the vector b.

I am going to use python Numpy library to create two vectors a and b.

a = np.random.randint(10, size=10000000)
b = np.random.randint(10, size=10000000)

Each vector has a size of 10,000,000 elements and each element is an integer between zero and 10.

I am going to perform a dot product operation on these two vectors. I'll repeat each operation 10 times and then calculate the average execution time. First, i am going to use a for loop to iterate through the arrays.

import numpy as np
import time

def Dot_no_numpy(a,b):
'''
performs dot product with for loop
accepts two vectors on the input and returns the dot product
'''
sum = 0;
for i in range(len(a)):
sum = sum + a[i]*b[i]
return sum


time_measurements = np.array([]) for i in range (10): time_begin = time.time() Dot_no_numpy(a,b) time_end = time.time() delta_time = time_end - time_begin time_measurements = np.append(time_measurements, delta_time) mean = np.mean(time_measurements) print "average time", mean

Result:
average time 5.15


Now I am going to use Numpy dot function to perform a dot product and check if the result is different this time.

def Dot_with_numpy(a,b):
'''
performs dot product with numpy dot function
accepts two vectors on the input and returs the dot product
'''

return np.dot(a,b)

time_measurements = np.array([]) for i in range (10): time_begin = time.time() Dot_with_numpy(a,b) time_end = time.time() delta_time = time_end - time_begin time_measurements = np.append(time_measurements, delta_time) mean = np.mean(time_measurements) print "average time", mean

Result:
average time 0.00909998416901

By comparing average times we can see that if Numpy is used to perform the dot product, the time it takes is almost 560 times less compared to the case without Numpy. This is a huge difference.

Why the solution utilising for loop is so slow? Original python interpreter (CPython implemented in C language) was written with a GIL (Global Interpreter Lock) which prevents executing multiple threads in parallel. In this situation a systems performs as a SISD (Single Instruction Single Data) or a uniprocessor machine. It executes a single instruction at a time.

Why Numpy is different? Numpy utilises BLAS (Basic Linear Algebra Subprograms) libraries for its linear algebra operations. BLAS in turn makes use of SIMD (Single Instruction Multiple Data) instructions which allows to fully utilise multicore machine. SIMD instructions can greatly increase performance if the same operation is performed on multiple data objects. Intel has introduced a SIMD instruction set support in his Pentium III processors (in year 1999).



No comments:

Post a Comment