CP2K Offload Interface¶
include/libxstream_cp2k.h implements CP2K's offload runtime interface — the hardware-abstraction layer that CP2K uses for GPU-accelerated operations beyond DBCSR (grid integration, PW operations, etc.). The original interface provides static inline wrappers for CUDA, HIP, and OpenCL; LIBXSTREAM replaces the OpenCL path with a dedicated translation unit (src/libxstream_cp2k.c) that routes directly through LIBXSTREAM's internal API.
Relationship to the DBCSR Interface¶
CP2K's code has two accelerator interfaces:
| Interface | Header | Purpose |
|---|---|---|
| DBCSR ACC | libxstream_dbcsr.h |
Sparse matrix operations (DBCSR library) |
| Offload Runtime | libxstream_cp2k.h |
General offload (memory, streams, events, synchronization) |
The DBCSR adapter (src/libxstream_dbcsr.c) uses opaque void* handles and translates to LIBXSTREAM's typed API. The offload runtime adapter does the same but uses CP2K's offloadStream_t/offloadEvent_t typedefs (also void*). Both share the underlying LIBXSTREAM implementation.
API¶
The header is self-contained (C99, no LIBXSTREAM headers required) and provides opaque handle types, an error-checking macro, and functions covering five domains.
Types and Constants¶
typedef void* offloadStream_t;
typedef void* offloadEvent_t;
typedef int offloadError_t;
#define offloadSuccess EXIT_SUCCESS
Error Handling¶
const char* offloadGetErrorName(offloadError_t error);
offloadError_t offloadGetLastError(void);
offloadGetErrorName maps error codes to OpenCL error strings via libxstream_opencl_strerror. offloadGetLastError consumes and clears the last recorded error.
The OFFLOAD_CHECK macro aborts on failure after printing the error name and source location:
OFFLOAD_CHECK(offloadMalloc(&ptr, nbytes));
Streams¶
void offloadStreamCreate(offloadStream_t* stream);
void offloadStreamDestroy(offloadStream_t stream);
void offloadStreamSynchronize(offloadStream_t stream);
void offloadStreamWaitEvent(offloadStream_t stream, offloadEvent_t event);
offloadStreamCreate creates a stream with default priority (LIBXSTREAM_STREAM_DEFAULT).
Events¶
void offloadEventCreate(offloadEvent_t* event);
void offloadEventDestroy(offloadEvent_t event);
void offloadEventRecord(offloadEvent_t event, offloadStream_t stream);
void offloadEventSynchronize(offloadEvent_t event);
bool offloadEventQuery(offloadEvent_t event);
Memory¶
void offloadMalloc(void** ptr, size_t size);
void offloadFree(void* ptr);
void offloadMallocHost(void** ptr, size_t size);
void offloadFreeHost(void* ptr);
Transfers¶
void offloadMemcpyAsyncHtoD(void* ptr_dev, const void* ptr_hst, size_t size, offloadStream_t stream);
void offloadMemcpyAsyncDtoH(void* ptr_hst, const void* ptr_dev, size_t size, offloadStream_t stream);
void offloadMemcpyAsyncDtoD(void* dst, const void* src, size_t size, offloadStream_t stream);
void offloadMemcpyHtoD(void* ptr_dev, const void* ptr_hst, size_t size);
void offloadMemcpyDtoH(void* ptr_hst, const void* ptr_dev, size_t size);
void offloadMemsetAsync(void* ptr, int val, size_t size, offloadStream_t stream);
void offloadMemset(void* ptr, int val, size_t size);
The synchronous variants (offloadMemcpyHtoD, offloadMemcpyDtoH, offloadMemset) pass a NULL stream. offloadMemsetAsync supports arbitrary fill values via libxstream_opencl_memset.
Device¶
void offloadDeviceSynchronize(void);
Stubs¶
void offloadMemcpyToSymbol(const void* symbol, const void* src, size_t count);
void offloadEnsureMallocHeapSize(size_t required_size);
These are CUDA-specific operations (constant-memory writes and device-heap sizing) that have no direct OpenCL equivalent. They are currently stubs guarded by assertions. CP2K's OpenCL path disables the GPU grid subsystem (__NO_OFFLOAD_GRID) that would call them.
See Also¶
- LIBXSTREAM API (
include/libxstream.h) — the underlying OpenCL backend API - DBCSR ACC Interface — the DBCSR adapter layer
- CP2K offload_runtime.h — upstream interface definition