Homework 2
Due Friday, 2nd October at 11 pm EDT
Goal
The goal of this homework is to practice the CUDA optimization techniques you learned in the lectures and hands-on session.Correlation in 1D
In this homework, you are going to optimize a 1D correlation for the purposes of time series matching. Suppose we have a sampling of a periodic function, F, and we wish to match it to a library of peridioc functions Gi (for this problem, we shall assume that all contain the same number of samples, ns). For suitably normalised functions, we can do this by forming the correlationHomework Task
Download the skeleton code into your working directory $WORK. Unzip and untar it, and go into the src/ directory.tar -xvzf hw2.tgz
cd correlates/src/
You should be able to compile the code with make, and then run itcd correlates/src/
make
../bin/Correlate
The program will prompt you for a random seed (to generate the input arrays), and the number of Gi to create, the number of times to repeat the test and whether to run on the CPU or GPU. The program will then create random F and Gi functions. One of the Gi functions will be F itself. The program then attempts to use a correlation routine to discover which Gi function matches. A CPU version is supplied in the routine RunCPUmatch. Your task is to duplicate the functionality on the GPU, and minimize the run time.To complete this homework, open the file gpucorrelate.cu, which contains a skeleton routine, RunGPUmatch called by the main program. This is the routine you have to provide. It takes pointers to the F and G arrays as inputs (both of these are resident in pinned memory), as well as the number of G functions provided. There are three output arguments - the index of the matching G function (i.e. the value of i above), the offset at which it matches (i.e. the value of j above) and the value of the correlation function for this combination.
Note that this project does not link against CUTIL. However, the file cudacheck.hpp provides CUDA_SAFE_CALL and CUDA_CHECK_KERNEL macros, while a timer class is available in chronometer.hpp.
Questions
- Is your code memory or compute bound?
- How does this change as the length of the F and G arrays increases?
- Suppose you had a much longer F function, and wished to find out if any of the G functions appeared within it. How would you modify your code to achieve this?
Extra Credit
Reimplement the correlation comparison to make use of Fourier transforms. Compare and discuss the speed differences you obtain.Grading
This homework will be graded primarily on the performance you obtain on the code. The breakdown is as follows:- Functionality (30 pts)
- Code compiles and runs on resonance - 10 pts
- Code has no memory leaks - 10 pts
- Code works for any number of Gi up to 131072 (edit: 28/09)- 10 pts
- Performance on 131072 comparison functions (60 pts)
- GPU runs faster than the CPU (10 seconds) - 20pts
- GPU runs in less than 285 ms - 20 pts
- GPU runs in less than 265 ms - 20 pts
- Answering questions 1-3 above (10 pts)
Extra credit is worth a possible 5 pts.
Submission
Create a gzipped tarball containing your copy of gpucorrelate.cu and a text file answering the questions above. If you chose to work on the extra credit section, put it in a separate .cu file in the same submission.To submit your homework, create a folder named lastname_firstinitial_hw1 and place your files in this folder. Compress the folder (please use .zip compression) and submit on the course iSite page in the HW 2 dropbox. Please contact the TFs if you miss the deadline on Fri, Oct 2, 11 pm EDT.