Basic GEMM
Test Environment
- GPU: NVIDIA Tesla A100
- CUDA Version: 12.6
Results
| [M, N, K] | [kTM, kTN, kTK] | WarpLayout | kRK | CUTLASS(ms) | TileFusion(ms) |
|---|---|---|---|---|---|
| [1024, 1024, 512] | [64, 128, 128] | [2, 2] | 16 | 0.017591 | 0.016548 |
| [1024, 1024, 1024] | [64, 128, 128] | [2, 2] | 16 | 0.029245 | 0.027156 |
| [2048, 2048, 1024] | [64, 128, 128] | [2, 2] | 16 | 0.065372 | 0.070431 |
| [2048, 2048, 2048] | [64, 128, 128] | [2, 2] | 16 | 0.101253 | 0.128143 |
| [4096, 4096, 4096] | [64, 128, 128] | [2, 2] | 16 | 0.818606 | 0.969605 |
| [8192, 8192, 1024] | [64, 128, 128] | [2, 2] | 16 | 0.871526 | 0.971059 |
| [8192, 8192, 2048] | [64, 128, 128] | [2, 2] | 16 | 1.937879 | 1.931223 |
| [8192, 8192, 4096] | [64, 128, 128] | [2, 2] | 16 | 3.924275 | 3.956757 |
| [8192, 8192, 8192] | [64, 128, 128] | [2, 2] | 16 | 7.740396 | 8.080589 |