cublas direct fortran c-binding using cublas.lib

115 views Asked by At

I'm attempting to set up an interface to use cublas.lib in fortran without any separate c-code. I have seen a few examples of this and tried to duplicate those but I having trouble.

Both of these examples work for me (cudart and cusolver)

Find available graphics card memory using Fortran

https://forums.developer.nvidia.com/t/using-cusolverdn-in-fortran-code/39732/5

I have an additional include directory of C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\lib\x64 and additional dependencies of cublas.lib cusolver.lib cudart.lib. Everything compiles fine (As I was able to run the examples above.

When I run the code below cublasCreate returns 7 (CUBLAS_STATUS_INVALID_VALUE)

!==================================================================
!Interface to cusolverDn and CUDA C functions
!==================================================================
! C binding
!     https://gcc.gnu.org/onlinedocs/gfortran/ISO_005fC_005fBINDING.html
!
! Similar CUDA examples
!     https://stackoverflow.com/questions/27507169/find-available-graphics-card-memory-using-fortran%5B/url%5D
!     https://forums.developer.nvidia.com/t/using-cusolverdn-in-fortran-code/39732/5
!     https://stackoverflow.com/questions/22390812/returning-a-pointer-to-a-device-allocated-matrix-from-c-to-fortran
!     https://stackoverflow.com/questions/35150748/mixed-language-cuda-programming

module cudaThings

interface

    ! cudaMalloc
    integer (c_int) function cudaMalloc ( buffer, size ) bind (C, name="cudaMalloc" ) 
        use iso_c_binding
        implicit none
        type (c_ptr)  :: buffer
        integer (c_size_t), value :: size
    end function cudaMalloc
    
    ! cudaMemcpy 
    ! A_mem_stat = cudaMemcpy(gpuPtr,cpuPtr,sizeof(ptr),cudaMemcpyHostToDevice)
    !     note: cudaMemcpyHostToDevice = 1
    !     note: cudaMemcpyDeviceToHost = 2
    integer (c_int) function cudaMemcpy ( dst, src, count, kind ) bind (C, name="cudaMemcpy" )
        use iso_c_binding
        type (C_PTR), value :: dst, src
        integer (c_size_t), value :: count, kind
    end function cudaMemcpy
    
    ! cudaFree
    integer (c_int) function cudaFree(buffer)  bind(C, name="cudaFree")
        use iso_c_binding
        implicit none
        type (C_PTR), value :: buffer
    end function cudaFree
    
    ! get memory info
    integer (c_int) function cudaMemGetInfo(fre, tot) bind(C, name="cudaMemGetInfo")
        use iso_c_binding
        implicit none
        type(c_ptr),value :: fre
        type(c_ptr),value :: tot
    end function cudaMemGetInfo


     integer(c_int) function cusolverDnCreate(cusolver_Hndl) bind(C,name="cusolverDnCreate")
       use iso_c_binding
       implicit none
     
       type(c_ptr)::cusolver_Hndl
     end function
     
     integer(c_int) function cusolverDnDestroy(cusolver_Hndl) bind(C,name="cusolverDnDestroy")
       use iso_c_binding
       implicit none
     
       type(c_ptr),value::cusolver_Hndl
     end function
     
     integer(c_int) function cublasCreate(cublas_Hndl) bind(C,name="cublasCreate_v2")
        use iso_c_binding
        implicit none
        
        type(c_ptr),value::cublas_Hndl
     end function
     
    integer(c_int) function cublasDestroy(cublas_Hndl) bind(C,name="cublasDestroy_v2")
      use iso_c_binding
      implicit none
     
      type(c_ptr),value::cublas_Hndl
    end function
end interface  
  
end module 

program cudaTest
  use iso_c_binding
  use cudaThings
  implicit none
  
  ! GPU stuff
  type(c_ptr) :: cublas_Hndl
  integer*4    :: cublas_stat
  
  ! get handle
  cublas_stat = cublasCreate(cublas_Hndl) 
  write(*,*) cublas_stat
  if (cublas_stat .ne. 0 ) then
     write (*, '(A, I2)') " cublasCreate error: ", cublas_stat
     stop
  end if
end program

I'm on windows 10, intel fortran, cuda 12.2, with a 930M graphics card.

1

There are 1 answers

0
talonmies On

To understand what is going on, it is worth analyzing how the underlying C code works before writing an interface to it.

In C, the correct canonical call looks like this:

cublasHandle_t handle;
cublasStatus_t status = cublasCreate(&handle);

which is passing idiomatically the cublasHandle_t (itself a pointer to an opaque structure) by reference (even though C doesn't have explicit pass by reference semantics).

If you did this:

cublasHandle_t *handle;
cublasStatus_t status = cublasCreate(handle);

you are passing an uninitialized pointer to the routine, which should result in a failure. I haven't done much work with F2003 stype C interop, but to my eyes this:

  type(c_ptr) :: cublas_Hndl
  integer*4    :: cublas_stat
  
  ! get handle
  cublas_stat = cublasCreate(cublas_Hndl) 

is the same as the theoretically non-working C version, whereas this:

  type(c_ptr) :: cublas_Hndl
  integer*4    :: cublas_stat
  
  ! get handle
  cublas_stat = cublasCreate(c_loc(cublas_Hndl))

would be like the first working C version and more likely to work correctly.