-
Notifications
You must be signed in to change notification settings - Fork 85
hw: add offloadport for the FPU in snitch #261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
Lura518
wants to merge
38
commits into
pulp-platform:develop
Choose a base branch
from
Lura518:feature/dca
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Extend snitch by adding CSR register for the user field * All collective operation will be forwarded to the (default-) SoC Port for further processing. Even if the crossbar is defined as mcast it is currently not being used as such. The problem are reduction operation as the crossbar does not support such operation and therefor all collectiv ops are forwarded to the SoC port (to be processed in the router there) * Bump bender dependencies (Cherry-Picked from 6402d71 of Lorenzos Fork)
* New configuration for the narrow reduction * Rename existing configuration options * Integrate multicast/reduction into the global barrier functions (Cherry-Picked from c9ed65c of Lorenzos Fork)
* Extend the axi xbar to offload collectiv operation to the SoC * Add new iDMA Opscode (Cherry-Picked from 0226903 of Lorenzos Fork)
* Extend the dma driver to support collectiv operation by setting the user field in the axi transmission (Cherry-Picked from 3af3efa of Lorenzos Fork)
Also maps CSR_USER_LOW and CSR_USER_HIGH respectively to addresses 0x7C4 and 0x7C5 (instead of the opposite), to reflect LOW and HIGH names also in the CSR address mapping.
Doesn't satisfy multicast rule conversion constraints.
* Access the combined FPU from outside of the cluster * Extension of the tracer (Cherry-Picked from 5d029e6)
Squashed commit of the following commits: * commit 8fd7a66 - Fix tracing * commit e27b57e - sw: Use DataGen class in FlashAttention-2 and FusedConcatLinear data generators * commit cfee4e1 - sw: Add MHA kernel * commit e040704 - sw: Enable GEMM parallelized over K on subset of clusters * commit 43b8dd8 - target: Separate HAL source and build dirs * commit 16b74ea - target: Add missing RDL files to clean targets * commit 4d2b312 - flashattention_2: Fix to work on multi-cluster systems * commit 2a9536f - docs: Add system integration page * commit 5769bbd - sw: Make `snitch_cluster_cfg.h.tpl` depend only on config * commit c937cfc - docs: Add system integration guide * commit 5fcf257 - runtime: Fix global reduction with DMA * commit 1650654 - target: Streamline `SNRT_APPS` integration in derived systems * commit bc60d21 - target: Pick up CLI gentrace flags and set `--permissive` when debugging * commit 155f764 - target: Update trace visualization command after `SN_CFG` name change * commit 3720c55 - sw: Add multicast 2D tile transfer functions * commit d17d87c - sw: Enable overriding scripts directory * commit 6b75d99 - runtime: Fix CLS pointer initialization * commit 9528b4a - runtime: Fix `snrt_wake_up` with fence * commit 8d73450 - runtime: Add `snrt_fence` routine * commit 7f430f2 - Expose multiple wide TCDM ports (pulp-platform#258)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR aims to extend the snitch_cluster by making all FPU's available on a separate offload port.
All 8 FPU's from the compute core merged together allows to run a 512-Bit wide vectorised (8x) double operations. The offload request can be optionally cut with a new parameter introduced in the configuration.
hw: add offload port to all relevant files
hw: add proper handshaking to merge the control-flow of the 8x FPU's
tracer: extend to log offload operations