You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
_Kernel Float_ is a header-only library for CUDA that simplifies working with vector types and reduced precision floating-point arithmetic in GPU code.
12
+
_Kernel Float_ is a header-only library for CUDA/HIP that simplifies working with vector types and reduced precision floating-point arithmetic in GPU code.
13
13
14
14
15
15
## Summary
16
16
17
-
CUDA natively offers several reduced precision floating-point types (`__half`, `__nv_bfloat16`, `__nv_fp8_e4m3`, `__nv_fp8_e5m2`)
and vector types (e.g., `__half2`, `__nv_fp8x4_e4m3`, `float3`).
19
19
However, working with these types is cumbersome:
20
20
mathematical operations require intrinsics (e.g., `__hadd2` performs addition for `__half2`),
@@ -24,9 +24,9 @@ and some functionality is missing (e.g., one cannot convert a `__half` to `__nv_
24
24
_Kernel Float_ resolves this by offering a single data type `kernel_float::vec<T, N>` that stores `N` elements of type `T`.
25
25
Internally, the data is stored as a fixed-sized array of elements.
26
26
Operator overloading (like `+`, `*`, `&&`) has been implemented such that the most optimal intrinsic for the available types is selected automatically.
27
-
Many mathetical functions (like `log`, `exp`, `sin`) and common operations (such as `sum`, `range`, `for_each`) are also available.
27
+
Many mathematical functions (like `log`, `exp`, `sin`) and common operations (such as `sum`, `range`, `for_each`) are also available.
28
28
29
-
By using this library, developers can avoid the complexity of working with reduced precision floating-point types in CUDA and focus on their applications.
29
+
Using Kernel Float, developers avoid the complexity of reduced precision floating-point types in CUDA and can focus on their applications.
30
30
31
31
32
32
## Features
@@ -40,6 +40,7 @@ In a nutshell, _Kernel Float_ offers the following features:
40
40
* Easy integration as a single header file.
41
41
* Written for C++17.
42
42
* Compatible with NVCC (NVIDIA Compiler) and NVRTC (NVIDIA Runtime Compilation).
43
+
* Compatible with HIPCC (AMD HIP Compiler)
43
44
44
45
45
46
## Example
@@ -49,7 +50,7 @@ Check out the [examples](https://github.com/KernelTuner/kernel_float/tree/master
49
50
50
51
Below shows a simple example of a CUDA kernel that adds a `constant` to the `input` array and writes the results to the `output` array.
51
52
Each thread processes two elements.
52
-
Notice how easy it would be change the precision (for example, `double` to `half`) or the vector size (for example, 4 instead of 2 items per thread).
53
+
Notice how easy it would be to change the precision (for example, `double` to `half`) or the vector size (for example, 4 instead of 2 items per thread).
0 commit comments