OpenCL Backend
include/libxstream_opencl.h is the internal OpenCL layer that powers every public libxstream_* function. It owns the OpenCL platform/device/context lifecycle, memory management, kernel compilation, and error handling. Sample code and other LIBXSTREAM extensions (e.g., LIBSMM, Ozaki) include this header to access the OpenCL runtime directly.
Compile-Time Configuration
The header is guarded by __OPENCL (set automatically when __OFFLOAD_OPENCL is defined). Key compile-time knobs:
| Macro |
Default |
Description |
LIBXSTREAM_MAXALIGN |
2 MB |
Maximum alignment for device allocations |
LIBXSTREAM_BUFFERSIZE |
8 KB |
Internal scratch-buffer size |
LIBXSTREAM_MAXSTRLEN |
48 |
Maximum string length for names |
LIBXSTREAM_MAXNDEVS |
64 |
Maximum number of OpenCL devices |
LIBXSTREAM_MAXNITEMS |
1024 |
Per-thread maximum item count |
LIBXSTREAM_USM |
SVM coarse-grain |
Runtime Unified Shared Memory level (unset = OpenCL 2.0 SVM coarse-grain with non-USM fallback, 0 = off, 1 = Intel USM, 2 = OpenCL 2.0 SVM coarse-grain, 3 = OpenCL 2.0 SVM reported caps) |
Data Types
libxstream_opencl_config_t
The central singleton (libxstream_opencl_config) populated by libxstream_init. It holds:
- Device table — ordered array of discovered
cl_device_id entries.
- Active device (
libxstream_opencl_device_t) — context, default stream, error slot, OpenCL standard level, workgroup limits, memory caps, vendor flags, and optional USM function pointers.
- Resource pools — lock objects, streams, events, memory-pointer registrations, and a host-memory pool (
libxs_malloc_pool_t).
- Runtime switches — verbosity, async mode, debug/dump level, profiling, execution hints, and workaround level.
- Histograms — optional transfer-time histograms for H2D, D2H, and D2D copies.
libxstream_opencl_stream_t / libxstream_event_t
Thin wrappers around cl_command_queue and cl_event respectively. Streams additionally carry a thread-ID and optional priority.
libxstream_opencl_info_memptr_t
Associates a cl_mem buffer object with its host-side pointer, used to translate between SVM/USM pointers and buffer-based memory.
libxstream_opencl_atomic_fp_t
Enumerates floating-point atomics support: none, 32-bit, or 64-bit.
Error Handling Macros
| Macro |
Description |
CL_CHECK(RESULT, CALL) |
Execute an OpenCL call; on failure record the error code and human-readable name |
CL_ERROR_REPORT(NAME) |
Print the last error to stderr (if verbosity is enabled) |
CL_RETURN(RESULT, NAME) |
Return from function, reporting the error if non-zero |
Key Functions
Device and Context
| Function |
Description |
libxstream_opencl_set_active_device |
Internal device activation (lock-aware) |
libxstream_opencl_create_context |
Create an OpenCL context for a given device |
libxstream_opencl_device_name |
Return device name, platform name, and UID |
libxstream_opencl_device_level |
Query OpenCL version and device type |
libxstream_opencl_device_vendor |
Confirm a device's vendor string |
libxstream_opencl_device_ext |
Check for required OpenCL extensions |
libxstream_opencl_device_uid |
Capture or compute a unique device identifier |
libxstream_opencl_info_devmem |
Query free/total/local device memory |
Memory
| Function |
Description |
libxstream_opencl_info_devptr |
Look up a device-pointer registration (read-only) |
libxstream_opencl_info_devptr_modify |
Look up a device-pointer registration (writable) |
libxstream_opencl_info_hostptr |
Look up a host-pointer registration |
libxstream_opencl_memset |
Fill device memory with an arbitrary byte pattern |
libxstream_opencl_use_cmem |
Whether OpenCL constant-memory hints apply |
libxstream_opencl_set_kernel_ptr |
Set a pointer kernel argument (USM-aware) |
Kernel Build
| Function |
Description |
libxstream_opencl_program |
Compile an OpenCL program from source, file, or binary |
libxstream_opencl_kernel_query |
Extract a named kernel from a compiled program |
libxstream_opencl_kernel |
Convenience: build + extract + release in one call |
libxstream_opencl_kernel_flags |
Assemble combined build flags from params, options, and extras |
libxstream_opencl_defines |
Merge user defines with internal definitions |
libxstream_opencl_flags_atomics |
Generate compiler flags for FP-atomic extensions |
Streams, Events, and Timing
| Function |
Description |
libxstream_opencl_stream |
Find an existing stream for a thread-ID |
libxstream_opencl_stream_default |
Return the device's default (internal) stream |
libxstream_opencl_device_synchronize |
Per-thread device synchronization |
libxstream_opencl_duration |
Measure elapsed seconds from a cl_event |
Error Utilities
| Function |
Description |
libxstream_opencl_strerror |
Map a cl_int error code to a string |
libxstream_opencl_error_consume |
Clear and return the last recorded error |
See Also
- LIBXSTREAM API (
include/libxstream.h) — public API built on top of this layer
- DBCSR ACC Interface — the DBCSR compatibility shim
Copyright (c) Intel Corporation, LIBXSTREAM Contributors.