API Reference | schedgpu

schedGPU API Reference

The schedGPU framework API is composed of three functions

cudaError_t schedGPUInit(int timeout_in_seconds = 60, int priority = 0);

The schedGPUInit() function to initialize the scheduler must be the first function used in the CUDA program. The parameter timeout_in_seconds indicates the maximum time the scheduler should block the application when requesting GPU memory until control is given back to the CUDA program. The parameter priority indicates the priority of the application when requesting GPU memory. A higher priority value means a low priority.

Error handling is implemented using the following standard CUDA error codes:
- cudaSuccess: indicates that the scheduler was successfully initialised
- cudaErrorUnknown: notifies that the scheduler initialisation failed

cudaError_t preCudaMalloc(int &device, size_t bytes, bool block = true, int *validDevices = NULL, int validDevicesLen = 0);

The preCudaMalloc() function does not allocate GPU memory, instead ensures that the requested memory in bytes can be safely allocated on the specified device. If device is less than 0, then memory is allocated on any available device with sufficient memory. If there are a number of devices that can furnish this request, then a device with the largest amount of free memory is selected to balance the GPU utilization. The device that is selected by the scheduler will be eventually returned in device.

If bytes is set to 0, then total GPU memory for the selected device will be allocated. This allows for utilizing the GPU in an exclusive manner since no other application can share the memory.

If block is true (default value), then the requesting application is blocked by the framework until there is enough memory available in the device. The application waits for the maximum time specified during initialization. On the other hand, if block is set to false, then the application is not blocked, thereby immediately returning control to the application with cudaErrorNotReady code.

The validDevices array provides the user the option to select from a set of devices. By default all devices available to the CUDA application are employed.

It is not efficient to use the pre-memory allocation and CUDA memory allocation functions progressively in the following manner:
...
preCudaMalloc(0, bytes1);
cudaMalloc((void**)&devPtr1, bytes1);
preCudaMalloc(0, bytes2);
cudaMalloc((void**)&devPtr2, bytes2);
preCudaMalloc(0, bytes3);
cudaMalloc((void**)&devPtr2, bytes3);
...

When two applications simultaneously request memory progressively, as shown in the above code for one application, then this could potentially lead to a competition between the applications to obtain memory and may eventually not satisfy the requirements of both applications. This can be fully avoided by pre-allocating memory in the following way:
...
preCudaMalloc(0, bytes1+bytes2+bytes3);
cudaMalloc((void**)&devPtr1, bytes1);
cudaMalloc((void**)&devPtr2, bytes2);
cudaMalloc((void**)&devPtr2, bytes3);
...

Error handling includes the following error codes:
- cudaSuccess: memory can safely be allocated
- cudaErrorMemoryAllocation: requested memory is larger than total physical memory of the GPU
- cudaErrorNotReady: indicates that there is not sufficient memory available
- cudaErrorUnknown: indicates an internal error

cudaError_t postCudaFree(int device, size_t bytes);

The postCudaFree() function does not free GPU memory, instead notifies the framework that memory has been freed on the device. If device is less than 0, then memory pre-allocated by the framework to the application on any device is freed. If a device is specified, then memory in bytes is freed on the device. If bytes is set to 0, then all memory available on the selected device will be freed.

Error handling is implemented using the following standard CUDA error codes:
- cudaSuccess: the framework has been successfully notified
- cudaErrorUnknown: indicates an internal error

schedGPU

schedGPU API Reference

The schedGPU framework API is composed of three functions

cudaError_t schedGPUInit(int timeout_in_seconds = 60, int priority = 0);

cudaError_t preCudaMalloc(int &device, size_t bytes, bool block = true, int *validDevices = NULL, int validDevicesLen = 0);

​

cudaError_t postCudaFree(int device, size_t bytes);

​