Skip to content

Releases: xlite-dev/LeetCUDA

v3.0.5

09 Apr 15:15
ba6fac2
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v3.0.4...v3.0.5

v3.0.4

15 Mar 03:14
ca63606
Compare
Choose a tag to compare

What's Changed

Full Changelog: DefTruth/CUDA-Learn-Notes@v3.0.3...v3.0.4

v3.0.3

04 Mar 04:14
077096a
Compare
Choose a tag to compare

What's Changed

Full Changelog: DefTruth/CUDA-Learn-Notes@v3.0.2...v3.0.3

v3.0.2

24 Feb 01:30
a9e2d17
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: DefTruth/CUDA-Learn-Notes@v3.0.1...v3.0.2

v3.0.1

06 Feb 12:08
ee9f706
Compare
Choose a tag to compare

v3.0.0

22 Jan 10:08
7f35ae1
Compare
Choose a tag to compare

What's Changed

Full Changelog: DefTruth/CUDA-Learn-Notes@v2.6.15...v3.0.0

🔥FFPA L1 release

08 Jan 03:38
62cb712
Compare
Choose a tag to compare

What's Changed

Full Changelog: DefTruth/CUDA-Learn-Notes@v2.6.14...v2.6.15

QKV Fine-grained Tiling

03 Jan 08:51
82f1d04
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: DefTruth/CUDA-Learn-Notes@v2.6.13...v2.6.14

FA2 QKV SMEM Swizzle✔️

28 Dec 05:38
b7966f0
Compare
Choose a tag to compare

What's Changed

Full Changelog: DefTruth/CUDA-Learn-Notes@v2.6.12...v2.6.13

🎉FA2/HGEMM SMEM Swizzle

25 Dec 05:52
bdd361a
Compare
Choose a tag to compare

What's Changed

flash_attn_mma_stages_split_q_tiling_qk_swizzle_kernel

void flash_attn_mma_stages_split_q_tiling_qk_swizzle_kernel<512, 16, 8, 16, 8, 1, 8, 1, 1, 16, 1, 64, 2, 0, 0, 8>(__half *, __half *, __half *, __half *, int, int) (8, 48, 1)x(256, 1, 1), Context 1, Stream 7, Device 0, CC 8.9
    Section: Command line profiler metrics
    ------------------------------------------------------------------ ----------- ------------
    Metric Name                                                        Metric Unit Metric Value
    ------------------------------------------------------------------ ----------- ------------
    sm__sass_l1tex_data_bank_conflicts_pipe_lsu_mem_shared_op_ldsm.avg                        0
    sm__sass_l1tex_data_bank_conflicts_pipe_lsu_mem_shared_op_ldsm.max                        0
    sm__sass_l1tex_data_bank_conflicts_pipe_lsu_mem_shared_op_ldsm.min                        0
    sm__sass_l1tex_data_bank_conflicts_pipe_lsu_mem_shared_op_ldsm.sum                        0
    ------------------------------------------------------------------ ----------- ------------

Full Changelog: DefTruth/CUDA-Learn-Notes@v2.6.11...v2.6.12