Test Environment
- GPU: NVIDIA Tesla A100
- CUDA Version: 12.6
Results
[M, N, K] |
[kTM, kTN, kTK] |
WarpLayout |
kRK |
CUTLASS(ms) |
TileFusion(ms) |
[1024, 1024, 512] |
[64, 128, 128] |
[2, 2] |
16 |
0.017591 |
0.016548 |
[1024, 1024, 1024] |
[64, 128, 128] |
[2, 2] |
16 |
0.029245 |
0.027156 |
[2048, 2048, 1024] |
[64, 128, 128] |
[2, 2] |
16 |
0.065372 |
0.070431 |
[2048, 2048, 2048] |
[64, 128, 128] |
[2, 2] |
16 |
0.101253 |
0.128143 |
[4096, 4096, 4096] |
[64, 128 128] |
[2, 2] |
16 |
0.818606 |
0.969605 |
[8192, 8192, 1024] |
[64, 128 ,128] |
[2, 2] |
16 |
0.871526 |
0.971059 |
[8192, 8192, 2048] |
[64, 128, 128] |
[2, 2] |
16 |
1.937879 |
1.931223 |
[8192, 8192, 4096] |
[64, 128, 128] |
[2, 2] |
16 |
3.924275 |
3.956757 |
[8192, 8192, 8192] |
[64, 128, 128] |
[2, 2] |
16 |
7.740396 |
8.080589 |