Using the rules

cuda_archs

cuda_archs(name)

A build setting for specifying cuda archs to compile for.

To retain the flexiblity of NVCC, the extended notation is adopted.

When passing cuda_archs from commandline, its spec grammar is as follows:

ARCH_SPECS   ::= ARCH_SPEC [ ';' ARCH_SPECS ]
ARCH_SPEC    ::= [ VIRTUAL_ARCH ':' ] GPU_ARCHS
GPU_ARCHS    ::= GPU_ARCH [ ',' GPU_ARCHS ]
GPU_ARCH     ::= 'sm_' ARCH_NUMBER
               | 'lto_' ARCH_NUMBER
               | VIRTUAL_ARCH
VIRTUAL_ARCH ::= 'compute_' ARCH_NUMBER
               | 'lto_' ARCH_NUMBER
ARCH_NUMBER  ::= (a string in predefined cuda_archs list)

E.g.:

compute_80:sm_80,sm_86: Use compute_80 PTX, generate cubin with sm_80 and sm_86, no PTX embedded
compute_80:compute_80,sm_80,sm_86: Use compute_80 PTX, generate cubin with sm_80 and sm_86, PTX embedded
compute_80:compute_80: Embed compute_80 PTX, fully relay on ptxas
sm_80,sm_86: Same as compute_80:sm_80,sm_86, the arch with minimum integer value will be automatically populated.
sm_80;sm_86: Two specs used.
compute_80: Same as compute_80:compute_80

Best Practices:

Library supports a full range of archs from xx to yy, you should embed the yy PTX
Library supports a sparse range of archs from xx to yy, you should embed the xx PTX

ATTRIBUTES

Name	Description	Type	Mandatory	Default
name	A unique name for this target.	Name	required

cuda_library

cuda_library(name, deps, srcs, hdrs, alwayslink, copts, defines, host_copts, host_defines,
             host_linkopts, host_local_defines, includes, linkopts, local_defines, ptxasopts, rdc)

This rule compiles and creates static library for CUDA kernel code. The resulting targets can then be consumed by C/C++ Rules.

ATTRIBUTES

Name	Description	Type	Mandatory	Default
name	A unique name for this target.	Name	required
deps	-	List of labels	optional	`[]`
srcs	-	List of labels	optional	`[]`
hdrs	-	List of labels	optional	`[]`
alwayslink	-	Boolean	optional	`False`
copts	Add these options to the CUDA device compilation command.	List of strings	optional	`[]`
defines	List of defines to add to the compile line.	List of strings	optional	`[]`
host_copts	Add these options to the CUDA host compilation command.	List of strings	optional	`[]`
host_defines	List of defines to add to the compile line.	List of strings	optional	`[]`
host_linkopts	Add these flags to the host library link command.	List of strings	optional	`[]`
host_local_defines	List of defines to add to the compile line, but only apply to this rule.	List of strings	optional	`[]`
includes	List of include dirs to be added to the compile line.	List of strings	optional	`[]`
linkopts	Add these flags to the CUDA device link command.	List of strings	optional	`[]`
local_defines	List of defines to add to the compile line, but only apply to this rule.	List of strings	optional	`[]`
ptxasopts	Add these flags to the ptxas command.	List of strings	optional	`[]`
rdc	Whether to perform device linking for relocateable device code. Transitive deps that contain device code must all either be cuda_objects or cuda_library(rdc = True).	Boolean	optional	`False`

cuda_objects

cuda_objects(name, deps, srcs, hdrs, copts, defines, host_copts, host_defines, host_local_defines,
             includes, local_defines, ptxasopts)

This rule produces incomplete object files that can only be consumed by cuda_library. It is created for relocatable device code and device link time optimization source files.

ATTRIBUTES

Name	Description	Type	Mandatory	Default
name	A unique name for this target.	Name	required
deps	-	List of labels	optional	`[]`
srcs	-	List of labels	optional	`[]`
hdrs	-	List of labels	optional	`[]`
copts	Add these options to the CUDA device compilation command.	List of strings	optional	`[]`
defines	List of defines to add to the compile line.	List of strings	optional	`[]`
host_copts	Add these options to the CUDA host compilation command.	List of strings	optional	`[]`
host_defines	List of defines to add to the compile line.	List of strings	optional	`[]`
host_local_defines	List of defines to add to the compile line, but only apply to this rule.	List of strings	optional	`[]`
includes	List of include dirs to be added to the compile line.	List of strings	optional	`[]`
local_defines	List of defines to add to the compile line, but only apply to this rule.	List of strings	optional	`[]`
ptxasopts	Add these flags to the ptxas command.	List of strings	optional	`[]`

cuda_test

cuda_test(name, attrs)

Wrapper to ensure the test is compiled with the CUDA compiler.

PARAMETERS

Name	Description	Default Value
name	-	none
attrs	-	none

register_detected_cuda_toolchains

register_detected_cuda_toolchains()

Helper to register the automatically detected CUDA toolchain(s).

User can setup their own toolchain if needed and ignore the detected ones by not calling this macro.

rules_cuda_dependencies

rules_cuda_dependencies(toolkit_path)

Populate the dependencies for rules_cuda. This will setup workspace dependencies (other bazel rules) and local toolchains.

PARAMETERS

Name	Description	Default Value
toolkit_path	Optionally specify the path to CUDA toolkit. If not specified, it will be detected automatically.	`None`