Use of Scientific Libraries and GPU Acceleration
Use of Scientific Libraries and GPU Acceleration
项目类别:计算机

Use of Scientific Libraries and GPU Acceleration


Write programs implementing matrix multiplication C = AB, where A is m × n and B is n × k. Your program should take m, n, k as command line arguments (i.e. ./executable ) and the multiplication is to be done in a few different ways. Create a separate function that does each of the following operations and execute each function in one main program: 1. Create a CPU version of the naïve matrix multiplication similar to the one I presented in class. 2. Compute the inner products of rows of A with columns of B using the level-1 BLAS function ddot( ), which calculates the dot product of two arrays. READ THE NOTES ON CANVAS THAT INTRODUCES LEVEL-1 BLAS OPERATIONS. The first d in ddot( ) stands for double, which means that this operation is to be performed on arrays of doubles. 3. The second method also uses a level-1 BLAS function for the matrix multiplication. In this case, you will use daxpy( ) to form each column of C as a linear combination of columns of A. Once again, the d in daxpy( ) stands for double, so use double arrays. 4. Implement the same matrix multiplication problem using the dgemm routine, which is the most common function for matrix multiplication. Intel provides the following page to explain the usage. https://software.intel.com/en-us/mkl-tutorial-c￾multiplying-matrices-using-dgemm In this step, you should create a random number function to initialize your matrices with random integer numbers ranging from 1 to 10. 5. Create a kernel that does the naive matrix multiplication for square matrices. Calculate your grid and block sizes and execute as follows: dim3 block(16, 16); dim3 grid( (n+15)/16, (n+15)/16 ); my_kernel<<>>(arguments); 6. CUDA also provides a GPU version of BLAS, which is cuBlas. Repeat Task #2, 3, and 4 using cuBlas. • Demonstrate matrix multiplication for a small problem (e.g. 5x5). Print all elements of the matrices. This step is for verification. • Time your code for square matrix sizes of N 100, 500, 1000, 2000, and 5000 for both CPU and GPU using ddot, daxpy and dgemm routines and present your results in a table format and also a plot. See the example above. Comment on your findings. Make sure to compile with optimization level –O3 during your testing HELPFUL TIPS: • BLAS functions should look like ddot_( ) rather than ddot( ). Also, since you are making a .cu file and compiling with nvcc, your function prototype for BLAS functions should look like: extern “C” double ddot_(…arguments); • It is sometimes useful to pass in the transpose of a matrix, rather than the original matrix. Remember that C stores arrays using row-major ordering. • Don’t forget to clean up when your code is done by using free( ) and cudaFree( ). • Make sure to include all of the required headers and link the appropriate libraries in your Makefile.

留学ICU™️ 留学生辅助指导品牌
在线客服 7*24 全天为您提供咨询服务
咨询电话(全球): +86 17530857517
客服QQ:2405269519
微信咨询:zz-x2580
关于我们
微信订阅号
© 2012-2021 ABC网站 站点地图:Google Sitemap | 服务条款 | 隐私政策
提示:ABC网站所开展服务及提供的文稿基于客户所提供资料,客户可用于研究目的等方面,本机构不鼓励、不提倡任何学术欺诈行为。