Use of Scientific Libraries and GPU Acceleration
Write programs implementing matrix multiplication C = AB, where A is m × n and B is n × k. Your program should take m, n, k as command line arguments (i.e. ./executable ) and the multiplication is to be done in a few different ways. Create a separate function that does each of the following operations and execute each function in one main program: 1. Create a CPU version of the naïve matrix multiplication similar to the one I presented in class. 2. Compute the inner products of rows of A with columns of B using the level-1 BLAS function ddot( ), which calculates the dot product of two arrays. READ THE NOTES ON CANVAS THAT INTRODUCES LEVEL-1 BLAS OPERATIONS. The first d in ddot( ) stands for double, which means that this operation is to be performed on arrays of doubles. 3. The second method also uses a level-1 BLAS function for the matrix multiplication. In this case, you will use daxpy( ) to form each column of C as a linear combination of columns of A. Once again, the d in daxpy( ) stands for double, so use double arrays. 4. Implement the same matrix multiplication problem using the dgemm routine, which is the most common function for matrix multiplication. Intel provides the following page to explain the usage. https://software.intel.com/en-us/mkl-tutorial-cmultiplying-matrices-using-dgemm In this step, you should create a random number function to initialize your matrices with random integer numbers ranging from 1 to 10. 5. Create a kernel that does the naive matrix multiplication for square matrices. Calculate your grid and block sizes and execute as follows: dim3 block(16, 16); dim3 grid( (n+15)/16, (n+15)/16 ); my_kernel<<