Skip to content

duc - Pipeline PhasedAccu for n>8 and allow non-log(2) values of n when n>8 #170

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

JILAelec
Copy link

This patch was added with the Artiq EFC LTC2K variant in order to get timing closure for n>8. We also needed non-log(2) phases so that requirement is also lifted with the patch.

@sbourdeauducq sbourdeauducq requested a review from Spaqin July 16, 2025 13:48
@Spaqin
Copy link
Contributor

Spaqin commented Jul 17, 2025

Looks good to me, nice to see tests extended as well.

@@ -205,11 +205,13 @@ def __init__(self, width, constants):

###

self.latency = 2 if n > 8 else 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latency should be 0 for comb.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what happened here is that I wrote the latency of the PhasedAccu in here rather than just the MCM. I think that results in the following:

PhasedAccu latency 0 for n<=8, 2 for n>8
MCM latency 0 for n<=8, 1 for n>8

I think it makes sense to give both classes a latency attribute so this is clear to users even if there's currently no code checking for latency attributes in these classes.

@@ -223,11 +225,25 @@ def __init__(self, width, constants):
if n > 5:
ctx += o[5].eq(i + (i << 2))
if n > 6:
ctx += o[6].eq(o[3] << 1)
ctx += o[6].eq((i << 2) + (i << 1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this actually make a difference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes if the context is sync.

if n > 14:
ctx += o[14].eq((i << 3) + (i << 2) + (i << 1))
if n > 15:
ctx += o[15].eq(i + (i << 3) + (i << 2) + (i << 1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the entire list be generated programmatically?

Also I'd have thought Vivado would do these optimizations already if you simply write *3, *4, *5, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skimmed through docs and haven't found a definite answer. Even if so, I wouldn't want to risk Vivado messing it up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's easy to check, and issues would fail timing closure, and we rely on vivado for more complicated things than that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point; replaced it with a simple loop with *x syntax, TNS fails severely:

------------------------------------------------------------------------------------------------
| Design Timing Summary
| ---------------------
------------------------------------------------------------------------------------------------

    WNS(ns)      TNS(ns)  TNS Failing Endpoints  TNS Total Endpoints      WHS(ns)      THS(ns)  THS Failing Endpoints  THS Total Endpoints     WPWS(ns)     TPWS(ns)  TPWS Failing Endpoints  TPWS Total Endpoints  
    -------      -------  ---------------------  -------------------      -------      -------  ---------------------  -------------------     --------     --------  ----------------------  --------------------  
     -0.651     -325.990                   1640                72371        0.042        0.000                      0                72364       -0.259       -4.802                      34                 33075  


Timing constraints are not met.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What design is that? Take a look at the netlist as well to confirm what Vivado did.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this could use a code comment explaining the situation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that vivado isn't smart enough to use shift and add operations when optimizing multiplication. Instead, it uses a full multiplier and fails timing as a consequence.

One option I considered was to add a "use_dsp" synthesis attribute to use DSP slices which can meet timing even for larger n, especially on newer FPGAs. At the end, though, I went back to shift and add to stick with the principle that is already in use.

A way to generate the list programmatically is to use Canonical Signed Digit (CSD) representations to find the optimal decomposition. If this is of interest I'll be happy to create a small n multiplier based on CSD for review. Here's a link with some more info:
https://www.allaboutcircuits.com/technical-articles/an-introduction-to-canonical-signed-digit-representation/

Copy link
Contributor

@Spaqin Spaqin Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the netlist and it does use a DSP48E1 slice if you just use "*". It does meet timing, with a caveat.

Regardless of method used for MCM, EFC for LTC synthesis fails to meet timing at few points with 125MHz clock (OK on 100MHz). I was looking at the 125MHz output with the initial report - in this case, using the DSP made the timing issues worse. Regardless, this should be fine and the timing failure is in another spot in ARTIQ side that I will fix next.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds wasteful to allocate DSP slices for those multiplications anyway.

@sbourdeauducq
Copy link
Member

Are these modifications compatible with https://github.com/quartiq/phaser ?

@Spaqin
Copy link
Contributor

Spaqin commented Jul 22, 2025

Are these modifications compatible with https://github.com/quartiq/phaser ?

Phaser gateware is fixed on early 2021 misoc/migen

Copy link
Member

@jordens jordens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as on #172

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants