Using the rules
cuda_archs
cuda_archs(name)
A build setting for specifying cuda archs to compile for.
To retain the flexiblity of NVCC, the extended notation is adopted.
When passing cuda_archs from commandline, its spec grammar is as follows:
ARCH_SPECS ::= ARCH_SPEC [ ';' ARCH_SPECS ]
ARCH_SPEC ::= [ VIRTUAL_ARCH ':' ] GPU_ARCHS
GPU_ARCHS ::= GPU_ARCH [ ',' GPU_ARCHS ]
GPU_ARCH ::= 'sm_' ARCH_NUMBER
| 'lto_' ARCH_NUMBER
| VIRTUAL_ARCH
VIRTUAL_ARCH ::= 'compute_' ARCH_NUMBER
| 'lto_' ARCH_NUMBER
ARCH_NUMBER ::= (a string in predefined cuda_archs list)
E.g.:
compute_80:sm_80,sm_86
: Usecompute_80
PTX, generate cubin withsm_80
andsm_86
, no PTX embeddedcompute_80:compute_80,sm_80,sm_86
: Usecompute_80
PTX, generate cubin withsm_80
andsm_86
, PTX embeddedcompute_80:compute_80
: Embedcompute_80
PTX, fully relay onptxas
sm_80,sm_86
: Same ascompute_80:sm_80,sm_86
, the arch with minimum integer value will be automatically populated.sm_80;sm_86
: Two specs used.compute_80
: Same ascompute_80:compute_80
Best Practices:
- Library supports a full range of archs from xx to yy, you should embed the yy PTX
- Library supports a sparse range of archs from xx to yy, you should embed the xx PTX
ATTRIBUTES
Name | Description | Type | Mandatory | Default |
---|---|---|---|---|
name | A unique name for this target. | Name | required |
cuda_library
cuda_library(name, deps, srcs, hdrs, alwayslink, copts, defines, host_copts, host_defines, host_linkopts, host_local_defines, includes, linkopts, local_defines, ptxasopts, rdc)
This rule compiles and creates static library for CUDA kernel code. The resulting targets can then be consumed by C/C++ Rules.
ATTRIBUTES
Name | Description | Type | Mandatory | Default |
---|---|---|---|---|
name | A unique name for this target. | Name | required | |
deps | - | List of labels | optional | [] |
srcs | - | List of labels | optional | [] |
hdrs | - | List of labels | optional | [] |
alwayslink | - | Boolean | optional | False |
copts | Add these options to the CUDA device compilation command. | List of strings | optional | [] |
defines | List of defines to add to the compile line. | List of strings | optional | [] |
host_copts | Add these options to the CUDA host compilation command. | List of strings | optional | [] |
host_defines | List of defines to add to the compile line. | List of strings | optional | [] |
host_linkopts | Add these flags to the host library link command. | List of strings | optional | [] |
host_local_defines | List of defines to add to the compile line, but only apply to this rule. | List of strings | optional | [] |
includes | List of include dirs to be added to the compile line. | List of strings | optional | [] |
linkopts | Add these flags to the CUDA device link command. | List of strings | optional | [] |
local_defines | List of defines to add to the compile line, but only apply to this rule. | List of strings | optional | [] |
ptxasopts | Add these flags to the ptxas command. | List of strings | optional | [] |
rdc | Whether to perform device linking for relocateable device code. Transitive deps that contain device code must all either be cuda_objects or cuda_library(rdc = True). | Boolean | optional | False |
cuda_objects
cuda_objects(name, deps, srcs, hdrs, copts, defines, host_copts, host_defines, host_local_defines, includes, local_defines, ptxasopts)
This rule produces incomplete object files that can only be consumed by cuda_library
. It is created for relocatable device
code and device link time optimization source files.
ATTRIBUTES
Name | Description | Type | Mandatory | Default |
---|---|---|---|---|
name | A unique name for this target. | Name | required | |
deps | - | List of labels | optional | [] |
srcs | - | List of labels | optional | [] |
hdrs | - | List of labels | optional | [] |
copts | Add these options to the CUDA device compilation command. | List of strings | optional | [] |
defines | List of defines to add to the compile line. | List of strings | optional | [] |
host_copts | Add these options to the CUDA host compilation command. | List of strings | optional | [] |
host_defines | List of defines to add to the compile line. | List of strings | optional | [] |
host_local_defines | List of defines to add to the compile line, but only apply to this rule. | List of strings | optional | [] |
includes | List of include dirs to be added to the compile line. | List of strings | optional | [] |
local_defines | List of defines to add to the compile line, but only apply to this rule. | List of strings | optional | [] |
ptxasopts | Add these flags to the ptxas command. | List of strings | optional | [] |
cuda_test
Wrapper to ensure the test is compiled with the CUDA compiler.
PARAMETERS
Name | Description | Default Value |
---|---|---|
name | - |
none |
attrs | - |
none |
register_detected_cuda_toolchains
register_detected_cuda_toolchains()
Helper to register the automatically detected CUDA toolchain(s).
User can setup their own toolchain if needed and ignore the detected ones by not calling this macro.
rules_cuda_dependencies
rules_cuda_dependencies(toolkit_path)
Populate the dependencies for rules_cuda. This will setup workspace dependencies (other bazel rules) and local toolchains.
PARAMETERS
Name | Description | Default Value |
---|---|---|
toolkit_path | Optionally specify the path to CUDA toolkit. If not specified, it will be detected automatically. | None |