I have a bunch of .cu files that use dynamic parallelism (a.cu, b.cu, c.cu.., e.cu, f.cu), and a main.c file that uses MPI to call functions from a.cu on multiple nodes. I'm trying to write a make file to compile the executable, but I keep facing the following errors:
cudafiles.o: In function `__cudaRegisterLinkedBinary_66_tmpxft_00001a84_00000000_17_cuda_device_runtime_compute_61_cpp1_ii_8b1a5d37':
link.stub:(.text+0x1fb): undefined reference to `__fatbinwrap_66_tmpxft_00001a84_00000000_17_cuda_device_runtime_compute_61_cpp1_ii_8b1a5d37'
Here is my makefile:
INCFILES=-I/usr/local/cuda-8.0/include -I/opt/mpi/mvapich2-gnu/2.2/include -I./
LIBFILES=-L/usr/local/cuda-8.0/lib64 -L/opt/mpi/mvapich2-gnu/2.2/lib
LIBS=-lcudart -lcudadevrt -lcublas_device -lmpi
ARCH=-gencode arch=compute_60,code=sm_60
NVCC=nvcc -ccbin g++
default: all
all: clean final.o
io.o: io.cpp
g++ -c -std=c++11 io.cpp
final.o: io.o a.cu b.cu c.cu d.cu e.cu f.cu main.cpp
$(NVCC) -std=c++11 $(INCFILES) $(LIBFILES) $(LIBS) -g -G -Xptxas -v -dc $(ARCH) a.cu b.cu c.cu d.cu e.cu f.cu
$(NVCC) -std=c++11 $(ARCH) $(INCFILES) $(LIBFILES) $(LIBS) -rdc=true -dlink a.o b.o c.o d.o e.o f.o io.o -o cudafiles.o
mpicxx -O3 $(INCFILES) $(LIBFILES) -c main.cpp -o main.o
mpicxx $(INCFILES) $(LIBFILES) $(LIBS) cudafiles.o a.o b.o c.o d.o e.o f.o io.o main.o -o exec
clean:
rm -rf *.o exec
The original problem reported was an undefined reference to
main. This was arising from this line in theMakefile:As constructed, this actually instructs
nvccto perform full/final linking. However the intent of this line was to perform the device-link step only, required when compiling with-rdc=trueor-dc, and when not performing the final link withnvcc. In this case, the final link was being performed bympicc/mpicxx. To perform the device-link step only, we need to specify-dlink. Without that switch,nvccexpects to do final linking, but fails because none of the supplied objects contain amainfunction. The correct solution, since we have no intent to do final link at this point, is to use the-dlinkswitch.I also suggested converting everything to C++ style linking, since
nvcclinks that way. It might be possible to sort out a C-style link with a C++-style link, but this just seems troublesome to me. Therefore I suggested converting the only.cfile (main.c) to a.cppfile, and convert frommpicctompicxxThe next problem that arose was undefined references to e.g.
cudaSetDevice()andcudaFree(). These are part of the CUDA runtime API library ("libcudart"). When performing final link withnvcc, these are linked automatically. But since final link is being performed bympicxx(basically a wrapper ong++), it's necessary to call out the link against that library specifically with-lcudart.Finally, the remaining problem was a link-order problem. In a nutshell, link dependencies need to be satisfied from left to right in the linker command line. Different compilers are more or less picky about this. The final reordering changes were to specify the libraries to link against in the correct order, and also to specify these libraries at the end of the link command line, so that any dependencies on these libraries, to their left in the link command line, are satisfied.