Using the rules
cuda_archs
cuda_archs(name)
A build setting for specifying cuda archs to compile for.
To retain the flexiblity of NVCC, the extended notation is adopted.
When passing cuda_archs from commandline, its spec grammar is as follows:
ARCH_SPECS ::= ARCH_SPEC [ ';' ARCH_SPECS ]
ARCH_SPEC ::= [ VIRTUAL_ARCH ':' ] GPU_ARCHS
GPU_ARCHS ::= GPU_ARCH [ ',' GPU_ARCHS ]
GPU_ARCH ::= 'sm_' ARCH_NUMBER
| 'lto_' ARCH_NUMBER
| VIRTUAL_ARCH
VIRTUAL_ARCH ::= 'compute_' ARCH_NUMBER
| 'lto_' ARCH_NUMBER
ARCH_NUMBER ::= (a string in predefined cuda_archs list)
E.g.:
compute_80:sm_80,sm_86: Usecompute_80PTX, generate cubin withsm_80andsm_86, no PTX embeddedcompute_80:compute_80,sm_80,sm_86: Usecompute_80PTX, generate cubin withsm_80andsm_86, PTX embeddedcompute_80:compute_80: Embedcompute_80PTX, fully relay onptxassm_80,sm_86: Same ascompute_80:sm_80,sm_86, the arch with minimum integer value will be automatically populated.sm_80;sm_86: Two specs used.compute_80: Same ascompute_80:compute_80
Best Practices:
- Library supports a full range of archs from xx to yy, you should embed the yy PTX
- Library supports a sparse range of archs from xx to yy, you should embed the xx PTX
ATTRIBUTES
| Name | Description | Type | Mandatory | Default |
|---|---|---|---|---|
| name | A unique name for this target. | Name | required |
cuda_library
cuda_library(name, deps, srcs, hdrs, alwayslink, copts, defines, host_copts, host_defines, host_linkopts, host_local_defines, includes, linkopts, local_defines, ptxasopts, rdc)
This rule compiles and creates static library for CUDA kernel code. The resulting targets can then be consumed by C/C++ Rules.
ATTRIBUTES
| Name | Description | Type | Mandatory | Default |
|---|---|---|---|---|
| name | A unique name for this target. | Name | required | |
| deps | - | List of labels | optional | [] |
| srcs | - | List of labels | optional | [] |
| hdrs | - | List of labels | optional | [] |
| alwayslink | - | Boolean | optional | False |
| copts | Add these options to the CUDA device compilation command. | List of strings | optional | [] |
| defines | List of defines to add to the compile line. | List of strings | optional | [] |
| host_copts | Add these options to the CUDA host compilation command. | List of strings | optional | [] |
| host_defines | List of defines to add to the compile line. | List of strings | optional | [] |
| host_linkopts | Add these flags to the host library link command. | List of strings | optional | [] |
| host_local_defines | List of defines to add to the compile line, but only apply to this rule. | List of strings | optional | [] |
| includes | List of include dirs to be added to the compile line. | List of strings | optional | [] |
| linkopts | Add these flags to the CUDA device link command. | List of strings | optional | [] |
| local_defines | List of defines to add to the compile line, but only apply to this rule. | List of strings | optional | [] |
| ptxasopts | Add these flags to the ptxas command. | List of strings | optional | [] |
| rdc | Whether to perform device linking for relocateable device code. Transitive deps that contain device code must all either be cuda_objects or cuda_library(rdc = True). | Boolean | optional | False |
cuda_objects
cuda_objects(name, deps, srcs, hdrs, copts, defines, host_copts, host_defines, host_local_defines, includes, local_defines, ptxasopts)
This rule produces incomplete object files that can only be consumed by cuda_library. It is created for relocatable device
code and device link time optimization source files.
ATTRIBUTES
| Name | Description | Type | Mandatory | Default |
|---|---|---|---|---|
| name | A unique name for this target. | Name | required | |
| deps | - | List of labels | optional | [] |
| srcs | - | List of labels | optional | [] |
| hdrs | - | List of labels | optional | [] |
| copts | Add these options to the CUDA device compilation command. | List of strings | optional | [] |
| defines | List of defines to add to the compile line. | List of strings | optional | [] |
| host_copts | Add these options to the CUDA host compilation command. | List of strings | optional | [] |
| host_defines | List of defines to add to the compile line. | List of strings | optional | [] |
| host_local_defines | List of defines to add to the compile line, but only apply to this rule. | List of strings | optional | [] |
| includes | List of include dirs to be added to the compile line. | List of strings | optional | [] |
| local_defines | List of defines to add to the compile line, but only apply to this rule. | List of strings | optional | [] |
| ptxasopts | Add these flags to the ptxas command. | List of strings | optional | [] |
cuda_test
Wrapper to ensure the test is compiled with the CUDA compiler.
PARAMETERS
| Name | Description | Default Value |
|---|---|---|
| name | - |
none |
| attrs | - |
none |
register_detected_cuda_toolchains
register_detected_cuda_toolchains()
Helper to register the automatically detected CUDA toolchain(s).
User can setup their own toolchain if needed and ignore the detected ones by not calling this macro.
rules_cuda_dependencies
rules_cuda_dependencies(toolkit_path)
Populate the dependencies for rules_cuda. This will setup workspace dependencies (other bazel rules) and local toolchains.
PARAMETERS
| Name | Description | Default Value |
|---|---|---|
| toolkit_path | Optionally specify the path to CUDA toolkit. If not specified, it will be detected automatically. | None |