Skip to content

Conversation

Lura518
Copy link

@Lura518 Lura518 commented Jul 16, 2025

This PR aims to extend the snitch_cluster by making all FPU's available on a separate offload port.

All 8 FPU's from the compute core merged together allows to run a 512-Bit wide vectorised (8x) double operations. The offload request can be optionally cut with a new parameter introduced in the configuration.

hw: add offload port to all relevant files

hw: add proper handshaking to merge the control-flow of the 8x FPU's

tracer: extend to log offload operations

Raphael and others added 30 commits July 9, 2025 14:08
* Extend snitch by adding CSR register for the user field

* All collective operation will be forwarded to the (default-) SoC Port for further processing.
  Even if the crossbar is defined as mcast it is currently not being used as such.
  The problem are reduction operation as the crossbar does not support such operation and therefor
  all collectiv ops are forwarded to the SoC port (to be processed in the router there)

* Bump bender dependencies

(Cherry-Picked from 6402d71 of Lorenzos Fork)
* New configuration for the narrow reduction

* Rename existing configuration options

* Integrate multicast/reduction into the global barrier functions

(Cherry-Picked from c9ed65c of Lorenzos Fork)
* Extend the axi xbar to offload collectiv operation to the SoC

* Add new iDMA Opscode

(Cherry-Picked from 0226903 of Lorenzos Fork)
* Extend the dma driver to support collectiv operation by setting the user field in the axi transmission

(Cherry-Picked from 3af3efa of Lorenzos Fork)
* Bump axi bnder version due to renaming on axi side

* Rename rest of collectiv to collective

* multicast rule / port fix
Also maps CSR_USER_LOW and CSR_USER_HIGH respectively to addresses
0x7C4 and 0x7C5 (instead of the opposite), to reflect LOW and HIGH
names also in the CSR address mapping.
colluca and others added 8 commits July 13, 2025 16:34
Doesn't satisfy multicast rule conversion constraints.
* Access the combined FPU from outside of the cluster

* Extension of the tracer

(Cherry-Picked from 5d029e6)
Squashed commit of the following commits:

* commit 8fd7a66 - Fix tracing

* commit e27b57e - sw: Use DataGen class in FlashAttention-2 and FusedConcatLinear data generators

* commit cfee4e1 -  sw: Add MHA kernel

* commit e040704 -  sw: Enable GEMM parallelized over K on subset of clusters

* commit 43b8dd8 - target: Separate HAL source and build dirs

* commit 16b74ea - target: Add missing RDL files to clean targets

* commit 4d2b312 - flashattention_2: Fix to work on multi-cluster systems

* commit 2a9536f - docs: Add system integration page

* commit 5769bbd - sw: Make `snitch_cluster_cfg.h.tpl` depend only on config

* commit c937cfc - docs: Add system integration guide

* commit 5fcf257 - runtime: Fix global reduction with DMA

* commit 1650654 - target: Streamline `SNRT_APPS` integration in derived systems

* commit bc60d21 - target: Pick up CLI gentrace flags and set `--permissive` when debugging

* commit 155f764 - target: Update trace visualization command after `SN_CFG` name change

* commit 3720c55 - sw: Add multicast 2D tile transfer functions

* commit d17d87c - sw: Enable overriding scripts directory

* commit 6b75d99 - runtime: Fix CLS pointer initialization

* commit 9528b4a - runtime: Fix `snrt_wake_up` with fence

* commit 8d73450 - runtime: Add `snrt_fence` routine

* commit 7f430f2 - Expose multiple wide TCDM ports (pulp-platform#258)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants