Brian2CUDA specific preferences
For information on the Brian2 preference system, read Brian2 preference documentation. The following Brian2CUDA preferences are used in the same way.
List of preferences
Brian2CUDA preferences
devices.cuda_standalone.SM_multiplier=1The number of blocks per SM. By default, this value is set to 1.
devices.cuda_standalone.bundle_threads_warp_multiple=FalseWhether to round the number of threads used per synapse bundle during effect application (see devices.cuda_standalone.threads_per_synapse_bundle) to a multiple of the warp size. Round to next multiple if preference is
'up', round to previous multiple if'low'and don’t round at all ifFalse(default). If rounding down results in0threads,1thread is used instead.
devices.cuda_standalone.calc_occupancy=TrueWether or not to use cuda occupancy api to choose num_threads and num_blocks.
devices.cuda_standalone.default_functions_integral_convertion=float64The floating point precision to which integral types will be converted when passed as arguments to default functions that have no integral type overload in device code (sin, cos, tan, sinh, cosh, tanh, exp, log, log10, sqrt, ceil, floor, arcsin, arccos, arctan).” NOTE: Convertion from 32bit and 64bit integral types to single precision (32bit) floating-point types is not type safe. And convertion from 64bit integral types to double precision (64bit) floating-point types neither. In those cases the closest higher or lower (implementation defined) representable value will be selected.
devices.cuda_standalone.extra_threshold_kernel=TrueWether or not to use a extra threshold kernel for resetting.
devices.cuda_standalone.launch_bounds=FalseWether or not to use
__launch_bounds__to optimise register usage in kernels.
devices.cuda_standalone.no_post_references=FalseSet this preference if you don’t need access to
jin any synaptic code string and no Synapses object applies effects to postsynaptic variables. This preference is for memory optimization until unnecassary device memory allocations in synapse creation are fixed, it is only relevant if your network uses close to all memory.
devices.cuda_standalone.no_pre_references=FalseSet this preference if you don’t need access to
iin any synaptic code string and no Synapses object applies effects to presynaptic variables. This preference is for memory optimization until unnecassary device memory allocations in synapse creation are fixed, it is only relevant if your network uses close to all memory.
devices.cuda_standalone.parallel_blocks=1The total number of parallel blocks to use. If
None, the number of parallel blocks equals the number streaming multiprocessors on the GPU.
devices.cuda_standalone.profile_statemonitor_copy_to_host=NoneProfile the final device to host copy of StateMonitor data. This preference is used for benchmarking and assumes that there is only one active StateMonitor in the network. The parameter of this preference is the recorded variable for which the device to host copy is recorded (e.g. ‘v’).
devices.cuda_standalone.push_synapse_bundles=TrueIf True, synaptic events are propagated by pushing bundles of synapse IDs with same delays into the corresponding delay queue. If False, each synapse of a spiking neuron is pushed in the corresponding queue individually. For very small bundle sizes (number of synapses with same delay, connected to a single neuron), pushing single Synapses can be faster. This option only has effect for
Synapsesobjects with heterogenous delays.
devices.cuda_standalone.random_number_generator_ordering=FalseThe ordering parameter (str) used to choose how the results of cuRAND random number generation are ordered in global memory. See cuRAND documentation for more details on generator types and orderings.
devices.cuda_standalone.random_number_generator_type='CURAND_RNG_PSEUDO_DEFAULT'Generator type (str) that cuRAND uses for random number generation. Setting the generator type automatically resets the generator ordering (prefs.devices.cuda_standalone.random_number_generator_ordering) to its default value. See cuRAND documentation for more details on generator types and orderings.
devices.cuda_standalone.syn_launch_bounds=FalseWether or not to use
__launch_bounds__in synapses and synapses_push to optimise register usage in kernels.
devices.cuda_standalone.threads_per_synapse_bundle='{max}'The number of threads used per synapses bundle during effect application. This has to be a string, which can be passed to Python’s
evalfunction. The string can can use{mean},{std},{max}and{min}expressions, which refer to the statistics across all bundles, and the function ‘ceil’. The result of this expression will be converted to the next lowerint(e.g.1.9will be cast to1.0). Examples:'{mean} + 2 * {std}'will use the mean bunde size + 2 times the standard deviation over bundle sizes and round it to the next lower integer. If you want to round up instead, use'ceil({mean} + 2 * {std})'.
devices.cuda_standalone.use_atomics=TrueWeather to try to use atomic operations for synaptic effect application. Since this avoids race conditions, effect application can be parallelised.
Preferences for the CUDA backend in Brian2CUDA
devices.cuda_standalone.cuda_backend.compute_capability=NoneManually set the compute capability for which CUDA code will be compiled. Has to be a float (e.g.
6.1) or None. If None, compute capability is chosen depending on GPU in use.
devices.cuda_standalone.cuda_backend.cuda_path=NoneThe path to the CUDA installation. If set, this preferences takes precedence over environment variable
CUDA_PATH.
devices.cuda_standalone.cuda_backend.cuda_runtime_version=NoneThe CUDA runtime version.
devices.cuda_standalone.cuda_backend.detect_cuda=TrueWhether to try to detect CUDA installation paths and version. Disable this if you want to generae CUDA standalone code on a system without CUDA installed.
devices.cuda_standalone.cuda_backend.detect_gpus=TrueWhether to detect names and compute capabilities of all available GPUs. This needs access to
nvidia-smianddeviceQuerybinaries.
devices.cuda_standalone.cuda_backend.device_query_path=NonePath to CUDA’s deviceQuery binary. Used to detect a GPUs compute capability
devices.cuda_standalone.cuda_backend.extra_compile_args_nvcc=['-w', '-use_fast_math']Extra compile arguments (a list of strings) to pass to the nvcc compiler.
devices.cuda_standalone.cuda_backend.gpu_heap_size=128Size of the heap (in MB) used by malloc() and free() device system calls, which are used in the
cudaVectorimplementation.cudaVectorsare used to dynamically allocate device memory forSpikeMonitorsand the synapse queues in theCudaSpikeQueueimplementation for networks with heterogeneously distributed delays.
devices.cuda_standalone.cuda_backend.gpu_id=NoneThe ID of the GPU that should be used for code execution. Default value is
None, in which case the GPU with the highest compute capability and lowest ID is used.If environment variable
CUDA_VISIBLE_DEVICESis set, this preference will be interpreted as ID from the visible devices (e.g. withCUDA_VISIBLE_DEVICES=2andgpu_id=0preference, the GPU 2 will be used).