Homework 2

Due Friday, 2nd October at 11 pm EDT

Goal

The goal of this homework is to practice the CUDA optimization techniques you learned in the lectures and hands-on session.

Correlation in 1D

In this homework, you are going to optimize a 1D correlation for the purposes of time series matching. Suppose we have a sampling of a periodic function, F, and we wish to match it to a library of peridioc functions Gi (for this problem, we shall assume that all contain the same number of samples, ns). For suitably normalised functions, we can do this by forming the correlation
C_i(j) = \sum_{k}^{n_s}F(k)G_i(j+k)
We then search for the maximum value of C_i(j), to determine the best matching function Gi, and the corresponding phase, j.

Homework Task

Download the skeleton code into your working directory $WORK. Unzip and untar it, and go into the src/ directory.
tar -xvzf hw2.tgz
cd correlates/src/
You should be able to compile the code with make, and then run it
make
../bin/Correlate
The program will prompt you for a random seed (to generate the input arrays), and the number of Gi to create, the number of times to repeat the test and whether to run on the CPU or GPU. The program will then create random F and Gi functions. One of the Gi functions will be F itself. The program then attempts to use a correlation routine to discover which Gi function matches. A CPU version is supplied in the routine RunCPUmatch. Your task is to duplicate the functionality on the GPU, and minimize the run time.

To complete this homework, open the file gpucorrelate.cu, which contains a skeleton routine, RunGPUmatch called by the main program. This is the routine you have to provide. It takes pointers to the F and G arrays as inputs (both of these are resident in pinned memory), as well as the number of G functions provided. There are three output arguments - the index of the matching G function (i.e. the value of i above), the offset at which it matches (i.e. the value of j above) and the value of the correlation function for this combination.

Note that this project does not link against CUTIL. However, the file cudacheck.hpp provides CUDA_SAFE_CALL and CUDA_CHECK_KERNEL macros, while a timer class is available in chronometer.hpp.

Questions

  1. Is your code memory or compute bound?
  2. How does this change as the length of the F and G arrays increases?
  3. Suppose you had a much longer F function, and wished to find out if any of the G functions appeared within it. How would you modify your code to achieve this?

Extra Credit

Reimplement the correlation comparison to make use of Fourier transforms. Compare and discuss the speed differences you obtain.

Grading

This homework will be graded primarily on the performance you obtain on the code. The breakdown is as follows:
  • Functionality (30 pts)
    • Code compiles and runs on resonance - 10 pts
    • Code has no memory leaks - 10 pts
    • Code works for any number of Gi  up to 131072 (edit: 28/09)- 10 pts
  • Performance on 131072 comparison functions (60 pts)
    • GPU runs faster than the CPU (10 seconds) - 20pts
    • GPU runs in less than 285 ms - 20 pts
    • GPU runs in less than 265 ms - 20 pts
  • Answering questions 1-3 above (10 pts)
Note that the code must work with any number of Gi, or no performance measurements will be taken. Performance measurements will be averaged over 10 runs, and performed on resonance.

Extra credit is worth a possible 5 pts.

Submission

Create a gzipped tarball containing your copy of gpucorrelate.cu and a text file answering the questions above. If you chose to work on the extra credit section, put it in a separate .cu file in the same submission. 

To submit your homework, create a folder named lastname_firstinitial_hw1 and place your files in this folder. Compress the folder (please use .zip compression) and submit on the course iSite page in the HW 2 dropbox. Please contact the TFs if you miss the deadline on Fri, Oct 2, 11 pm EDT.