-
Notifications
You must be signed in to change notification settings - Fork 94
Gpu type 3 #517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
DiamonDinoia
merged 82 commits into
flatironinstitute:master
from
DiamonDinoia:gpu-type-3
Sep 12, 2024
Merged
Gpu type 3 #517
Changes from all commits
Commits
Show all changes
82 commits
Select commit
Hold shift + click to select a range
45333fa
basic benchmarks
DiamonDinoia b95a082
added plotting script
DiamonDinoia ae55ca5
optimised plotting
DiamonDinoia 16e27f0
fixed plotting and metrics
DiamonDinoia 49d1f21
fixed the plot script
DiamonDinoia 2fdae68
bin_size_x is as function of the shared memory available
DiamonDinoia c0d9923
bin_size_x is as function of the shared memory available
DiamonDinoia 907797c
minor optimizations in 1D
DiamonDinoia 60f4780
otpimized nupts driven
DiamonDinoia 35dcc66
Optimized 1D and 2D
DiamonDinoia e1ad9bb
Merge branch 'master' into gpu-optimizations
DiamonDinoia 366295d
3D integer operations
DiamonDinoia 24bf6be
3D SM and GM optimized
DiamonDinoia 960117a
bump cuda version
DiamonDinoia 4295a86
Merge remote-tracking branch 'flatiron/master' into gpu-optimizations
DiamonDinoia c1b14c6
changed matlab to generate necessary cuda upsampfact files
DiamonDinoia f300d2d
added new coeffs
DiamonDinoia e86c762
Merge remote-tracking branch 'refs/remotes/origin/gpu-optimizations' …
DiamonDinoia db0457a
restoring .m from master
DiamonDinoia d0ce11e
updated hook
DiamonDinoia 513ce4b
updated matlab upsampfact
DiamonDinoia 798717d
updated coefficients
DiamonDinoia 282baf5
new coeffs
DiamonDinoia 12822a2
updated cufinufft to new coeff
DiamonDinoia badf22f
Merge remote-tracking branch 'flatiron/master' into gpu-optimizations
DiamonDinoia bf6328b
Merge remote-tracking branch 'flatiron/master' into gpu-optimizations
DiamonDinoia ae783da
picked good defaults for method
DiamonDinoia d29fcf5
update configuration
DiamonDinoia 73f937b
upated build system
DiamonDinoia 0724866
fixing jenkins
DiamonDinoia 8cd50fc
using cuda 11.2
DiamonDinoia 49a9d7e
using sm90 atomics
DiamonDinoia 041a536
updated script
DiamonDinoia 54683c3
fixed bin sizes
DiamonDinoia 4f19103
Merge branch 'master' into gpu-optimizations
DiamonDinoia dc3a628
using floor in fold_rescale updated changelog
DiamonDinoia b3237f7
fixed a mistake
DiamonDinoia db80aad
added comments for review
DiamonDinoia c225fb5
fixing review comments
DiamonDinoia 394550f
Merge remote-tracking branch 'flatiron/master' into gpu-optimizations
DiamonDinoia 5606aa0
merged master
DiamonDinoia 74ccd71
fixed cmake
DiamonDinoia ee28d05
Gcc-9 fixes; Ker size fixed too
DiamonDinoia 466ddff
windows compatibility tweak; unit testing the 1.25 upsampfact
DiamonDinoia 3f60ca4
Merge remote-tracking branch 'flatiron/master' into gpu-optimizations
DiamonDinoia fb48ff8
added forgotten c++17 flag
DiamonDinoia 8c42061
Merge remote-tracking branch 'flatiron/master' into gpu-type-3
DiamonDinoia b64f68e
Preliminary type 3 commit. Incomplete setpts but greatly simplifies t…
DiamonDinoia 7c810a5
Merge remote-tracking branch 'flatiron/master' into gpu-type-3
DiamonDinoia 9d44993
testing
DiamonDinoia 074dda5
Adding prephase and deconv with tests
DiamonDinoia 332b5b7
first 3D working version
DiamonDinoia 53a7c63
First working version, Horner breaks
DiamonDinoia 9f517e3
Type 3 working
DiamonDinoia 096cf1e
added 1D&2d type 3
DiamonDinoia 3cfe406
fixed tests for type3
DiamonDinoia 1842f68
fixed possible memory leaks
DiamonDinoia c13a6a9
minor changes, mainly for debug
DiamonDinoia f0a0fa4
small fixes
DiamonDinoia 066906e
adding debug prints
DiamonDinoia 6da956b
testing inner plan2 & using cudamemcpyasync
DiamonDinoia 6098edc
testing the intter type 2 completely
DiamonDinoia e89a4f9
fixed type 3 without horner
DiamonDinoia fe1da53
type3 many support
DiamonDinoia d415f0d
type3 many tests for one target
DiamonDinoia 289fb4f
updated docstring
DiamonDinoia bca0a73
removed small transf tests
DiamonDinoia d29cbba
XMerge remote-tracking branch 'flatiron/master' into gpu-type-3
DiamonDinoia 71ad464
added extended lambda flag to tests
DiamonDinoia a494518
CleanUP
DiamonDinoia 5788320
Updated changelog
DiamonDinoia 4c7388e
fixed printf warning
DiamonDinoia 46eb1d4
restored fftw behaviour
DiamonDinoia 0ada7a0
Added devnotes on the issue
DiamonDinoia 671e4ac
removed sprurious changes
DiamonDinoia 7a7cff5
Minor cleanup
DiamonDinoia 9b0da66
fixed math test
DiamonDinoia d3d4d34
Addressed review comments
DiamonDinoia 52cd6cc
Merge remote-tracking branch 'flatiron/master' into gpu-type-3
DiamonDinoia 1355818
splitting onedim_f_series in two functions
DiamonDinoia bc64a92
GPU flipwind type 1-2; fseries and nuft renaming to match CPU code
DiamonDinoia 96980d3
fixed complex math test
DiamonDinoia File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,173 @@ | ||
| #ifndef FINUFFT_INCLUDE_CUFINUFFT_CONTRIB_HELPER_MATH_H | ||
| #define FINUFFT_INCLUDE_CUFINUFFT_CONTRIB_HELPER_MATH_H | ||
|
|
||
| #include <cuComplex.h> | ||
|
|
||
| // This header provides some helper functions for cuComplex types. | ||
| // It mainly wraps existing CUDA implementations to provide operator overloads | ||
| // e.g. cuAdd, cuSub, cuMul, cuDiv, cuCreal, cuCimag, cuCabs, cuCarg, cuConj are all | ||
| // provided by CUDA | ||
|
|
||
| // Addition for cuDoubleComplex (double) with cuDoubleComplex (double) | ||
| __host__ __device__ __forceinline__ cuDoubleComplex operator+( | ||
| const cuDoubleComplex &a, const cuDoubleComplex &b) noexcept { | ||
| return cuCadd(a, b); | ||
| } | ||
|
|
||
| // Subtraction for cuDoubleComplex (double) with cuDoubleComplex (double) | ||
| __host__ __device__ __forceinline__ cuDoubleComplex operator-( | ||
| const cuDoubleComplex &a, const cuDoubleComplex &b) noexcept { | ||
| return cuCsub(a, b); | ||
| } | ||
|
|
||
| // Multiplication for cuDoubleComplex (double) with cuDoubleComplex (double) | ||
| __host__ __device__ __forceinline__ cuDoubleComplex operator*( | ||
| const cuDoubleComplex &a, const cuDoubleComplex &b) noexcept { | ||
| return cuCmul(a, b); | ||
| } | ||
|
|
||
| // Division for cuDoubleComplex (double) with cuDoubleComplex (double) | ||
| __host__ __device__ __forceinline__ cuDoubleComplex operator/( | ||
| const cuDoubleComplex &a, const cuDoubleComplex &b) noexcept { | ||
| return cuCdiv(a, b); | ||
| } | ||
|
|
||
| // Equality for cuDoubleComplex (double) with cuDoubleComplex (double) | ||
| __host__ __device__ __forceinline__ bool operator==(const cuDoubleComplex &a, | ||
| const cuDoubleComplex &b) noexcept { | ||
| return cuCreal(a) == cuCreal(b) && cuCimag(a) == cuCimag(b); | ||
| } | ||
|
|
||
| // Inequality for cuDoubleComplex (double) with cuDoubleComplex (double) | ||
| __host__ __device__ __forceinline__ bool operator!=(const cuDoubleComplex &a, | ||
| const cuDoubleComplex &b) noexcept { | ||
| return !(a == b); | ||
| } | ||
|
|
||
| // Addition for cuDoubleComplex (double) with double | ||
| __host__ __device__ __forceinline__ cuDoubleComplex operator+(const cuDoubleComplex &a, | ||
| double b) noexcept { | ||
| return make_cuDoubleComplex(cuCreal(a) + b, cuCimag(a)); | ||
| } | ||
|
|
||
| __host__ __device__ __forceinline__ cuDoubleComplex operator+( | ||
| double a, const cuDoubleComplex &b) noexcept { | ||
| return make_cuDoubleComplex(a + cuCreal(b), cuCimag(b)); | ||
| } | ||
|
|
||
| // Subtraction for cuDoubleComplex (double) with double | ||
| __host__ __device__ __forceinline__ cuDoubleComplex operator-(const cuDoubleComplex &a, | ||
| double b) noexcept { | ||
| return make_cuDoubleComplex(cuCreal(a) - b, cuCimag(a)); | ||
| } | ||
|
|
||
| __host__ __device__ __forceinline__ cuDoubleComplex operator-( | ||
| double a, const cuDoubleComplex &b) noexcept { | ||
| return make_cuDoubleComplex(a - cuCreal(b), -cuCimag(b)); | ||
| } | ||
|
|
||
| // Multiplication for cuDoubleComplex (double) with double | ||
| __host__ __device__ __forceinline__ cuDoubleComplex operator*(const cuDoubleComplex &a, | ||
| double b) noexcept { | ||
| return make_cuDoubleComplex(cuCreal(a) * b, cuCimag(a) * b); | ||
| } | ||
|
|
||
| __host__ __device__ __forceinline__ cuDoubleComplex operator*( | ||
| double a, const cuDoubleComplex &b) noexcept { | ||
| return make_cuDoubleComplex(a * cuCreal(b), a * cuCimag(b)); | ||
| } | ||
|
|
||
| // Division for cuDoubleComplex (double) with double | ||
| __host__ __device__ __forceinline__ cuDoubleComplex operator/(const cuDoubleComplex &a, | ||
| double b) noexcept { | ||
| return make_cuDoubleComplex(cuCreal(a) / b, cuCimag(a) / b); | ||
| } | ||
|
|
||
| __host__ __device__ __forceinline__ cuDoubleComplex operator/( | ||
| double a, const cuDoubleComplex &b) noexcept { | ||
| double denom = cuCreal(b) * cuCreal(b) + cuCimag(b) * cuCimag(b); | ||
| return make_cuDoubleComplex((a * cuCreal(b)) / denom, (-a * cuCimag(b)) / denom); | ||
| } | ||
|
|
||
| // Addition for cuFloatComplex (float) with cuFloatComplex (float) | ||
| __host__ __device__ __forceinline__ cuFloatComplex operator+( | ||
| const cuFloatComplex &a, const cuFloatComplex &b) noexcept { | ||
| return cuCaddf(a, b); | ||
| } | ||
|
|
||
| // Subtraction for cuFloatComplex (float) with cuFloatComplex (float) | ||
| __host__ __device__ __forceinline__ cuFloatComplex operator-( | ||
| const cuFloatComplex &a, const cuFloatComplex &b) noexcept { | ||
| return cuCsubf(a, b); | ||
| } | ||
|
|
||
| // Multiplication for cuFloatComplex (float) with cuFloatComplex (float) | ||
| __host__ __device__ __forceinline__ cuFloatComplex operator*( | ||
| const cuFloatComplex &a, const cuFloatComplex &b) noexcept { | ||
| return cuCmulf(a, b); | ||
| } | ||
|
|
||
| // Division for cuFloatComplex (float) with cuFloatComplex (float) | ||
| __host__ __device__ __forceinline__ cuFloatComplex operator/( | ||
| const cuFloatComplex &a, const cuFloatComplex &b) noexcept { | ||
| return cuCdivf(a, b); | ||
| } | ||
|
|
||
| // Equality for cuFloatComplex (float) with cuFloatComplex (float) | ||
| __host__ __device__ __forceinline__ bool operator==(const cuFloatComplex &a, | ||
| const cuFloatComplex &b) noexcept { | ||
| return cuCrealf(a) == cuCrealf(b) && cuCimagf(a) == cuCimagf(b); | ||
| } | ||
|
|
||
| // Inequality for cuFloatComplex (float) with cuFloatComplex (float) | ||
| __host__ __device__ __forceinline__ bool operator!=(const cuFloatComplex &a, | ||
| const cuFloatComplex &b) noexcept { | ||
| return !(a == b); | ||
| } | ||
|
|
||
| // Addition for cuFloatComplex (float) with float | ||
| __host__ __device__ __forceinline__ cuFloatComplex operator+(const cuFloatComplex &a, | ||
| float b) noexcept { | ||
| return make_cuFloatComplex(cuCrealf(a) + b, cuCimagf(a)); | ||
| } | ||
|
|
||
| __host__ __device__ __forceinline__ cuFloatComplex operator+( | ||
| float a, const cuFloatComplex &b) noexcept { | ||
| return make_cuFloatComplex(a + cuCrealf(b), cuCimagf(b)); | ||
| } | ||
|
|
||
| // Subtraction for cuFloatComplex (float) with float | ||
| __host__ __device__ __forceinline__ cuFloatComplex operator-(const cuFloatComplex &a, | ||
| float b) noexcept { | ||
| return make_cuFloatComplex(cuCrealf(a) - b, cuCimagf(a)); | ||
| } | ||
|
|
||
| __host__ __device__ __forceinline__ cuFloatComplex operator-( | ||
| float a, const cuFloatComplex &b) noexcept { | ||
| return make_cuFloatComplex(a - cuCrealf(b), -cuCimagf(b)); | ||
| } | ||
|
|
||
| // Multiplication for cuFloatComplex (float) with float | ||
| __host__ __device__ __forceinline__ cuFloatComplex operator*(const cuFloatComplex &a, | ||
| float b) noexcept { | ||
| return make_cuFloatComplex(cuCrealf(a) * b, cuCimagf(a) * b); | ||
| } | ||
|
|
||
| __host__ __device__ __forceinline__ cuFloatComplex operator*( | ||
| float a, const cuFloatComplex &b) noexcept { | ||
| return make_cuFloatComplex(a * cuCrealf(b), a * cuCimagf(b)); | ||
| } | ||
|
|
||
| // Division for cuFloatComplex (float) with float | ||
| __host__ __device__ __forceinline__ cuFloatComplex operator/(const cuFloatComplex &a, | ||
| float b) noexcept { | ||
| return make_cuFloatComplex(cuCrealf(a) / b, cuCimagf(a) / b); | ||
| } | ||
|
|
||
| __host__ __device__ __forceinline__ cuFloatComplex operator/( | ||
| float a, const cuFloatComplex &b) noexcept { | ||
| float denom = cuCrealf(b) * cuCrealf(b) + cuCimagf(b) * cuCimagf(b); | ||
| return make_cuFloatComplex((a * cuCrealf(b)) / denom, (-a * cuCimagf(b)) / denom); | ||
| } | ||
|
|
||
| #endif // FINUFFT_INCLUDE_CUFINUFFT_CONTRIB_HELPER_MATH_H | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.