Brian2CUDA specific preferences

For information on the Brian2 preference system, read Brian2 preference documentation. The following Brian2CUDA preferences are used in the same way.

List of preferences

Brian2CUDA preferences

devices.cuda_standalone.SM_multiplier = 1: The number of blocks per SM. By default, this value is set to 1.

devices.cuda_standalone.bundle_threads_warp_multiple = False: Whether to round the number of threads used per synapse bundle during effect application (see devices.cuda_standalone.threads_per_synapse_bundle) to a multiple of the warp size. Round to next multiple if preference is 'up', round to previous multiple if 'low' and don’t round at all if False (default). If rounding down results in 0 threads, 1 thread is used instead.

devices.cuda_standalone.calc_occupancy = True: Wether or not to use cuda occupancy api to choose num_threads and num_blocks.

devices.cuda_standalone.default_functions_integral_convertion = float64: The floating point precision to which integral types will be converted when passed as arguments to default functions that have no integral type overload in device code (sin, cos, tan, sinh, cosh, tanh, exp, log, log10, sqrt, ceil, floor, arcsin, arccos, arctan).” NOTE: Convertion from 32bit and 64bit integral types to single precision (32bit) floating-point types is not type safe. And convertion from 64bit integral types to double precision (64bit) floating-point types neither. In those cases the closest higher or lower (implementation defined) representable value will be selected.

devices.cuda_standalone.extra_threshold_kernel = True: Wether or not to use a extra threshold kernel for resetting.

devices.cuda_standalone.launch_bounds = False: Wether or not to use __launch_bounds__ to optimise register usage in kernels.

devices.cuda_standalone.no_post_references = False: Set this preference if you don’t need access to j in any synaptic code string and no Synapses object applies effects to postsynaptic variables. This preference is for memory optimization until unnecassary device memory allocations in synapse creation are fixed, it is only relevant if your network uses close to all memory.

devices.cuda_standalone.no_pre_references = False: Set this preference if you don’t need access to i in any synaptic code string and no Synapses object applies effects to presynaptic variables. This preference is for memory optimization until unnecassary device memory allocations in synapse creation are fixed, it is only relevant if your network uses close to all memory.

devices.cuda_standalone.parallel_blocks = 1: The total number of parallel blocks to use. If None, the number of parallel blocks equals the number streaming multiprocessors on the GPU.

devices.cuda_standalone.profile_statemonitor_copy_to_host = None: Profile the final device to host copy of StateMonitor data. This preference is used for benchmarking and assumes that there is only one active StateMonitor in the network. The parameter of this preference is the recorded variable for which the device to host copy is recorded (e.g. ‘v’).

devices.cuda_standalone.push_synapse_bundles = True: If True, synaptic events are propagated by pushing bundles of synapse IDs with same delays into the corresponding delay queue. If False, each synapse of a spiking neuron is pushed in the corresponding queue individually. For very small bundle sizes (number of synapses with same delay, connected to a single neuron), pushing single Synapses can be faster. This option only has effect for Synapses objects with heterogenous delays.

devices.cuda_standalone.random_number_generator_ordering = False: The ordering parameter (str) used to choose how the results of cuRAND random number generation are ordered in global memory. See cuRAND documentation for more details on generator types and orderings.

devices.cuda_standalone.random_number_generator_type = 'CURAND_RNG_PSEUDO_DEFAULT': Generator type (str) that cuRAND uses for random number generation. Setting the generator type automatically resets the generator ordering (prefs.devices.cuda_standalone.random_number_generator_ordering) to its default value. See cuRAND documentation for more details on generator types and orderings.

devices.cuda_standalone.syn_launch_bounds = False: Wether or not to use __launch_bounds__ in synapses and synapses_push to optimise register usage in kernels.

devices.cuda_standalone.threads_per_synapse_bundle = '{max}': The number of threads used per synapses bundle during effect application. This has to be a string, which can be passed to Python’s eval function. The string can can use {mean}, {std}, {max} and {min} expressions, which refer to the statistics across all bundles, and the function ‘ceil’. The result of this expression will be converted to the next lower int (e.g. 1.9 will be cast to 1.0). Examples: '{mean} + 2 * {std}' will use the mean bunde size + 2 times the standard deviation over bundle sizes and round it to the next lower integer. If you want to round up instead, use 'ceil({mean} + 2 * {std})'.

devices.cuda_standalone.use_atomics = True: Weather to try to use atomic operations for synaptic effect application. Since this avoids race conditions, effect application can be parallelised.

Preferences for the CUDA backend in Brian2CUDA

devices.cuda_standalone.cuda_backend.compute_capability = None: Manually set the compute capability for which CUDA code will be compiled. Has to be a float (e.g. 6.1) or None. If None, compute capability is chosen depending on GPU in use.

devices.cuda_standalone.cuda_backend.cuda_path = None: The path to the CUDA installation. If set, this preferences takes precedence over environment variable CUDA_PATH.

devices.cuda_standalone.cuda_backend.cuda_runtime_version = None: The CUDA runtime version.

devices.cuda_standalone.cuda_backend.detect_cuda = True: Whether to try to detect CUDA installation paths and version. Disable this if you want to generae CUDA standalone code on a system without CUDA installed.

devices.cuda_standalone.cuda_backend.detect_gpus = True: Whether to detect names and compute capabilities of all available GPUs. This needs access to nvidia-smi and deviceQuery binaries.

devices.cuda_standalone.cuda_backend.device_query_path = None: Path to CUDA’s deviceQuery binary. Used to detect a GPUs compute capability

devices.cuda_standalone.cuda_backend.extra_compile_args_nvcc = ['-w', '-use_fast_math']: Extra compile arguments (a list of strings) to pass to the nvcc compiler.

devices.cuda_standalone.cuda_backend.gpu_heap_size = 128: Size of the heap (in MB) used by malloc() and free() device system calls, which are used in the cudaVector implementation. cudaVectors are used to dynamically allocate device memory for SpikeMonitors and the synapse queues in the CudaSpikeQueue implementation for networks with heterogeneously distributed delays.

devices.cuda_standalone.cuda_backend.gpu_id = None

The ID of the GPU that should be used for code execution. Default value is None, in which case the GPU with the highest compute capability and lowest ID is used.

If environment variable CUDA_VISIBLE_DEVICES is set, this preference will be interpreted as ID from the visible devices (e.g. with CUDA_VISIBLE_DEVICES=2 and gpu_id=0 preference, the GPU 2 will be used).