Skip to content

Using the rules

cuda_archs

cuda_archs(name)

A build setting for specifying cuda archs to compile for.

To retain the flexiblity of NVCC, the extended notation is adopted.

When passing cuda_archs from commandline, its spec grammar is as follows:

ARCH_SPECS   ::= ARCH_SPEC [ ';' ARCH_SPECS ]
ARCH_SPEC    ::= [ VIRTUAL_ARCH ':' ] GPU_ARCHS
GPU_ARCHS    ::= GPU_ARCH [ ',' GPU_ARCHS ]
GPU_ARCH     ::= 'sm_' ARCH_NUMBER
               | 'lto_' ARCH_NUMBER
               | VIRTUAL_ARCH
VIRTUAL_ARCH ::= 'compute_' ARCH_NUMBER
               | 'lto_' ARCH_NUMBER
ARCH_NUMBER  ::= (a string in predefined cuda_archs list)

E.g.:

  • compute_80:sm_80,sm_86: Use compute_80 PTX, generate cubin with sm_80 and sm_86, no PTX embedded
  • compute_80:compute_80,sm_80,sm_86: Use compute_80 PTX, generate cubin with sm_80 and sm_86, PTX embedded
  • compute_80:compute_80: Embed compute_80 PTX, fully relay on ptxas
  • sm_80,sm_86: Same as compute_80:sm_80,sm_86, the arch with minimum integer value will be automatically populated.
  • sm_80;sm_86: Two specs used.
  • compute_80: Same as compute_80:compute_80

Best Practices:

  • Library supports a full range of archs from xx to yy, you should embed the yy PTX
  • Library supports a sparse range of archs from xx to yy, you should embed the xx PTX

ATTRIBUTES

Name Description Type Mandatory Default
name A unique name for this target. Name required

cuda_library

This rule compiles and creates static library for CUDA kernel code. The resulting targets can then be consumed by C/C++ Rules.

ATTRIBUTES

Name Description Type Mandatory Default
name A unique name for this target. Name required
deps - List of labels optional []
srcs - List of labels optional []
hdrs - List of labels optional []
alwayslink - Boolean optional False
copts Add these options to the CUDA device compilation command. List of strings optional []
defines List of defines to add to the compile line. List of strings optional []
host_copts Add these options to the CUDA host compilation command. List of strings optional []
host_defines List of defines to add to the compile line. List of strings optional []
host_linkopts Add these flags to the host library link command. List of strings optional []
host_local_defines List of defines to add to the compile line, but only apply to this rule. List of strings optional []
includes List of include dirs to be added to the compile line. List of strings optional []
linkopts Add these flags to the CUDA device link command. List of strings optional []
local_defines List of defines to add to the compile line, but only apply to this rule. List of strings optional []
ptxasopts Add these flags to the ptxas command. List of strings optional []
rdc Whether to perform device linking for relocateable device code. Transitive deps that contain device code must all either be cuda_objects or cuda_library(rdc = True). Boolean optional False

cuda_objects

This rule produces incomplete object files that can only be consumed by cuda_library. It is created for relocatable device code and device link time optimization source files.

ATTRIBUTES

Name Description Type Mandatory Default
name A unique name for this target. Name required
deps - List of labels optional []
srcs - List of labels optional []
hdrs - List of labels optional []
copts Add these options to the CUDA device compilation command. List of strings optional []
defines List of defines to add to the compile line. List of strings optional []
host_copts Add these options to the CUDA host compilation command. List of strings optional []
host_defines List of defines to add to the compile line. List of strings optional []
host_local_defines List of defines to add to the compile line, but only apply to this rule. List of strings optional []
includes List of include dirs to be added to the compile line. List of strings optional []
local_defines List of defines to add to the compile line, but only apply to this rule. List of strings optional []
ptxasopts Add these flags to the ptxas command. List of strings optional []

cuda_test

cuda_test(name, attrs)

Wrapper to ensure the test is compiled with the CUDA compiler.

PARAMETERS

Name Description Default Value
name

-

none
attrs

-

none

register_detected_cuda_toolchains

register_detected_cuda_toolchains()

Helper to register the automatically detected CUDA toolchain(s).

User can setup their own toolchain if needed and ignore the detected ones by not calling this macro.

rules_cuda_dependencies

rules_cuda_dependencies(toolkit_path)

Populate the dependencies for rules_cuda. This will setup workspace dependencies (other bazel rules) and local toolchains.

PARAMETERS

Name Description Default Value
toolkit_path Optionally specify the path to CUDA toolkit. If not specified, it will be detected automatically. None