Skip to content

Commit a679d6d

Browse files
committed
Merge remote-tracking branch 'flatiron/master' into HEAD
2 parents d68a661 + defdd48 commit a679d6d

File tree

12 files changed

+113
-107
lines changed

12 files changed

+113
-107
lines changed

CHANGELOG

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,11 @@
11
List of features / changes made / release notes, in reverse chronological order.
22
If not stated, FINUFFT is assumed (cuFINUFFT <=1.3 is listed separately).
33

4+
Master (9/10/24)
5+
6+
* reduced roundoff error in a[n] phase calc in CPU onedim_fseries_kernel().
7+
#534 (Barnett).
8+
49
V 2.3.0 (9/5/24)
510

611
* Switched C++ standards from C++14 to C++17, allowing various templating

docs/opts.rst

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -20,17 +20,17 @@ to the simple, vectorized, or guru makeplan routines.
2020
Recall how to do this from C++:
2121

2222
.. code-block:: C++
23-
23+
2424
// (... set up M,x,c,tol,N, and allocate F here...)
25-
finufft_opts* opts;
26-
finufft_default_opts(opts);
27-
opts->debug = 1;
28-
int ier = finufft1d1(M,x,c,+1,tol,N,F,opts);
25+
finufft_opts opts;
26+
finufft_default_opts(&opts);
27+
opts.debug = 1;
28+
int ier = finufft1d1(M,x,c,+1,tol,N,F,&opts);
2929

3030
This setting produces more timing output to ``stdout``.
3131

3232
.. warning::
33-
33+
3434
In C/C++ and Fortran, don't forget to call the command which sets default options
3535
(``finufft_default_opts`` or ``finufftf_default_opts``)
3636
before you start changing them and passing them to FINUFFT.
@@ -51,9 +51,9 @@ Here are their default settings (from ``src/finufft.cpp:finufft_default_opts``):
5151
.. literalinclude:: ../src/finufft.cpp
5252
:start-after: @defopts_start
5353
:end-before: @defopts_end
54-
54+
5555
As for quick advice, the main options you'll want to play with are:
56-
56+
5757
- ``modeord`` to flip ("fftshift") the Fourier mode ordering
5858
- ``debug`` to look at timing output (to determine if your problem is spread/interpolation dominated, vs FFT dominated)
5959
- ``nthreads`` to run with a different number of threads than the current maximum available through OpenMP (a large number can sometimes be detrimental, and very small problems can sometimes run faster on 1 thread)
@@ -92,15 +92,15 @@ Data handling options
9292
.. note:: The index *sets* are the same in the two ``modeord`` choices; their ordering differs only by a cyclic shift. The FFT ordering cyclically shifts the CMCL indices $\mbox{floor}(N/2)$ to the left (often called an "fftshift").
9393

9494
**chkbnds**: [DEPRECATED] has no effect.
95-
95+
9696

9797
Diagnostic options
9898
~~~~~~~~~~~~~~~~~~~~~~~
9999

100100
**debug**: Controls the amount of overall debug/timing output to stdout.
101101

102102
* ``debug=0`` : silent
103-
103+
104104
* ``debug=1`` : print some information
105105

106106
* ``debug=2`` : prints more information
@@ -113,11 +113,11 @@ Diagnostic options
113113

114114
* ``spread_debug=2`` : prints lots. This can print thousands of lines since it includes one line per *subproblem*.
115115

116-
116+
117117
**showwarn**: Whether to print warnings (these go to stderr).
118-
118+
119119
* ``showwarn=0`` : suppresses such warnings
120-
120+
121121
* ``showwarn=1`` : prints warnings
122122

123123

@@ -173,16 +173,16 @@ for only two settings, as follows. Otherwise, setting it to zero chooses a good
173173
**spread_thread**: in the case of multiple transforms per call (``ntr>1``, or the "many" interfaces), controls how multithreading is used to spread/interpolate each batch of data.
174174

175175
* ``spread_thread=0`` : makes an automatic choice between the below. Recommended.
176-
176+
177177
* ``spread_thread=1`` : acts on each vector in the batch in sequence, using multithreaded spread/interpolate on that vector. It can be slightly better than ``2`` for large problems.
178-
178+
179179
* ``spread_thread=2`` : acts on all vectors in a batch (of size chosen typically to be the number of threads) simultaneously, assigning each a thread which performs a single-threaded spread/interpolate. It is much better than ``1`` for all but large problems. (Historical note: this was used by Melody Shih for the original "2dmany" interface in 2018.)
180180

181181
.. note::
182-
182+
183183
Historical note: A former option ``3`` has been removed. This was like ``2`` except allowing nested OMP parallelism, so multi-threaded spread-interpolate was used for each of the vectors in a batch in parallel. This was used by Andrea Malleo in 2019. We have not yet found a case where this beats both ``1`` and ``2``, hence removed it due to complications with changing the OMP nesting state in both old and new OMP versions.
184184

185-
185+
186186
**maxbatchsize**: in the case of multiple transforms per call (``ntr>1``, or the "many" interfaces), set the largest batch size of data vectors.
187187
Here ``0`` makes an automatic choice. If you are unhappy with this, then for small problems it should equal the number of threads, while for large problems it appears that ``1`` often better (since otherwise too much simultaneous RAM movement occurs). Some further work is needed to optimize this parameter.
188188

include/cufinufft/common.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -41,8 +41,8 @@ template<typename T>
4141
void cufinufft_setup_binsize(int type, int ns, int dim, cufinufft_opts *opts);
4242

4343
template<typename T, typename V>
44-
auto cufinufft_set_shared_memory(V *kernel, const int dim,
45-
const cufinufft_plan_t<T> &d_plan) {
44+
int cufinufft_set_shared_memory(V *kernel, const int dim,
45+
const cufinufft_plan_t<T> &d_plan) {
4646
/**
4747
* WARNING: this function does not handle cuda errors. The caller should check them.
4848
*/

include/finufft.h

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -35,10 +35,6 @@
3535
#include <stdint.h>
3636
#define FINUFFT_BIGINT int64_t
3737

38-
#ifndef __cplusplus
39-
#include <stdbool.h> // for bool type in C (needed for item in plan struct)
40-
#endif
41-
4238
// this macro name has to be safe since exposed to user
4339
#define FINUFFT_SINGLE
4440
#include <finufft_eitherprec.h>

include/finufft/defs.h

Lines changed: 46 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -24,18 +24,16 @@
2424

2525
// All indexing in library that potentially can exceed 2^31 uses 64-bit signed.
2626
// This includes all calling arguments (eg M,N) that could be huge someday.
27-
#define BIGINT int64_t
28-
#define UBIGINT uint64_t
27+
using BIGINT = int64_t;
28+
using UBIGINT = uint64_t;
2929
// Precision-independent real and complex types, for private lib/test compile
3030
#ifdef SINGLE
31-
#define FLT float
31+
using FLT = float;
3232
#else
33-
#define FLT double
33+
using FLT = double;
3434
#endif
35-
// next line possibly obsolete...
36-
#define _USE_MATH_DEFINES
3735
#include <complex> // we define C++ complex type only
38-
#define CPX std::complex<FLT>
36+
using CPX = std::complex<FLT>;
3937

4038
// inline macro, to force inlining of small functions
4139
// this avoids the use of macros to implement functions
@@ -65,44 +63,49 @@
6563
// ------------- Library-wide algorithm parameter settings ----------------
6664

6765
// Library version (is a string)
68-
#define FINUFFT_VER "2.3.0"
66+
#define FINUFFT_VER "2.3.0"
6967

7068
// Smallest possible kernel spread width per dimension, in fine grid points
7169
// (used only in spreadinterp.cpp)
72-
#define MIN_NSPREAD 2
70+
inline constexpr int MIN_NSPREAD = 2;
7371

7472
// Largest possible kernel spread width per dimension, in fine grid points
7573
// (used only in spreadinterp.cpp)
76-
#define MAX_NSPREAD 16
74+
inline constexpr int MAX_NSPREAD = 16;
7775

7876
// Fraction growth cut-off in utils:arraywidcen, sets when translate in type-3
79-
#define ARRAYWIDCEN_GROWFRAC 0.1
77+
inline constexpr double ARRAYWIDCEN_GROWFRAC = 0.1;
8078

8179
// Max number of positive quadr nodes for kernel FT (used only in common.cpp)
82-
#define MAX_NQUAD 100
80+
inline constexpr int MAX_NQUAD = 100;
8381

8482
// Internal (nf1 etc) array allocation size that immediately raises error.
8583
// (Note: next235 takes 1s for 1e11, so it is also to prevent hang here.)
8684
// Increase this if you need >10TB (!) RAM...
87-
#define MAX_NF (BIGINT)1e12
85+
inline constexpr BIGINT MAX_NF = BIGINT(1e12);
8886

8987
// Maximum allowed number M of NU points; useful to catch incorrectly cast int32
9088
// values for M = nj (also nk in type 3)...
91-
#define MAX_NU_PTS (BIGINT)1e14
89+
inline constexpr BIGINT MAX_NU_PTS = BIGINT(1e14);
9290

9391
// -------------- Math consts (not in math.h) and useful math macros ----------
9492
#include <math.h>
9593

9694
// either-precision unit imaginary number...
9795
#define IMA (CPX(0.0, 1.0))
98-
// using namespace std::complex_literals; // needs C++14, provides 1i, 1if
96+
97+
// MR: In the longer term I suggest to move
98+
// away from M_PI, which was never part of the standard.
99+
// Perhaps a constexpr pi in the namespace finufft, or a constexpr finufft_pi
100+
// if no namespaces are used?
101+
// In C++20 these constants will be part of the language, and the problem will go away.
99102
#ifndef M_PI // Windows apparently doesn't have this const
100103
#define M_PI 3.14159265358979329
101104
#endif
102105
#define M_1_2PI 0.159154943091895336
103106
#define M_2PI 6.28318530717958648
104107
// to avoid mixed precision operators in eg i*pi, an either-prec PI...
105-
#define PI (FLT) M_PI
108+
#define PI FLT(M_PI)
106109

107110
// machine epsilon for decisions of achievable tolerance...
108111
#ifdef SINGLE
@@ -115,36 +118,45 @@
115118
// These macros should probably be replaced by modern C++ std lib or random123.
116119
// (RAND_MAX is in stdlib.h)
117120
#include <stdlib.h>
118-
// #define rand01() (((FLT)(rand()%RAND_MAX))/RAND_MAX)
119-
#define rand01() ((FLT)rand() / (FLT)RAND_MAX)
121+
static inline FLT rand01() { return FLT(rand()) / FLT(RAND_MAX); }
120122
// unif[-1,1]:
121-
#define randm11() (2 * rand01() - (FLT)1.0)
123+
static inline FLT randm11() { return 2 * rand01() - FLT(1); }
122124
// complex unif[-1,1] for Re and Im:
123-
#define crandm11() (randm11() + IMA * randm11())
125+
static inline CPX crandm11() { return randm11() + IMA * randm11(); }
124126

125127
// Thread-safe seed-carrying versions of above (x is ptr to seed)...
128+
// MR: we have to leave those as macros for now, as "rand_r" is deprecated
129+
// and apparently no longer available on Windows.
130+
#if 1
126131
#define rand01r(x) ((FLT)rand_r(x) / (FLT)RAND_MAX)
127132
// unif[-1,1]:
128133
#define randm11r(x) (2 * rand01r(x) - (FLT)1.0)
129134
// complex unif[-1,1] for Re and Im:
130135
#define crandm11r(x) (randm11r(x) + IMA * randm11r(x))
136+
#else
137+
static inline FLT rand01r(unsigned int *x) { return FLT(rand_r(x)) / FLT(RAND_MAX); }
138+
// unif[-1,1]:
139+
static inline FLT randm11r(unsigned int *x) { return 2 * rand01r(x) - FLT(1); }
140+
// complex unif[-1,1] for Re and Im:
141+
static inline CPX crandm11r(unsigned int *x) { return randm11r(x) + IMA * randm11r(x); }
142+
#endif
131143

132144
// ----- OpenMP macros which also work when omp not present -----
133145
// Allows compile-time switch off of openmp, so compilation without any openmp
134146
// is done (Note: _OPENMP is automatically set by -fopenmp compile flag)
135147
#ifdef _OPENMP
136148
#include <omp.h>
137149
// point to actual omp utils
138-
#define MY_OMP_GET_NUM_THREADS() omp_get_num_threads()
139-
#define MY_OMP_GET_MAX_THREADS() omp_get_max_threads()
140-
#define MY_OMP_GET_THREAD_NUM() omp_get_thread_num()
141-
#define MY_OMP_SET_NUM_THREADS(x) omp_set_num_threads(x)
150+
static inline int MY_OMP_GET_NUM_THREADS() { return omp_get_num_threads(); }
151+
static inline int MY_OMP_GET_MAX_THREADS() { return omp_get_max_threads(); }
152+
static inline int MY_OMP_GET_THREAD_NUM() { return omp_get_thread_num(); }
153+
static inline void MY_OMP_SET_NUM_THREADS(int x) { omp_set_num_threads(x); }
142154
#else
143155
// non-omp safe dummy versions of omp utils...
144-
#define MY_OMP_GET_NUM_THREADS() 1
145-
#define MY_OMP_GET_MAX_THREADS() 1
146-
#define MY_OMP_GET_THREAD_NUM() 0
147-
#define MY_OMP_SET_NUM_THREADS(x)
156+
static inline int MY_OMP_GET_NUM_THREADS() { return 1; }
157+
static inline int MY_OMP_GET_MAX_THREADS() { return 1; }
158+
static inline int MY_OMP_GET_THREAD_NUM() { return 0; }
159+
static inline void MY_OMP_SET_NUM_THREADS(int) {}
148160
#endif
149161

150162
// Prec-switching name macros (respond to SINGLE), used in lib & test sources
@@ -194,12 +206,11 @@
194206
#include <finufft/fft.h> // (must come after complex.h)
195207

196208
// group together a bunch of type 3 rescaling/centering/phasing parameters:
197-
#define TYPE3PARAMS FINUFFTIFY(_type3Params)
198-
typedef struct {
199-
FLT X1, C1, D1, h1, gam1; // x dim: X=halfwid C=center D=freqcen h,gam=rescale
200-
FLT X2, C2, D2, h2, gam2; // y
201-
FLT X3, C3, D3, h3, gam3; // z
202-
} TYPE3PARAMS;
209+
template<typename T> struct type3params {
210+
T X1, C1, D1, h1, gam1; // x dim: X=halfwid C=center D=freqcen h,gam=rescale
211+
T X2, C2, D2, h2, gam2; // y
212+
T X3, C3, D3, h3, gam3; // z
213+
};
203214

204215
typedef struct FINUFFT_PLAN_S { // the main plan object, fully C++
205216

@@ -243,7 +254,7 @@ typedef struct FINUFFT_PLAN_S { // the main plan object, fully C++
243254
CPX *deconv; // reciprocal of kernel FT, phase, all output NU pts
244255
CPX *CpBatch; // working array of prephased strengths
245256
FLT *Sp, *Tp, *Up; // internal primed targs (s'_k, etc), allocated
246-
TYPE3PARAMS t3P; // groups together type 3 shift, scale, phase, parameters
257+
type3params<FLT> t3P; // groups together type 3 shift, scale, phase, parameters
247258
FINUFFT_PLAN innerT2plan; // ptr used for type 2 in step 2 of type 3
248259

249260
// other internal structs; each is C-compatible of course
@@ -255,6 +266,4 @@ typedef struct FINUFFT_PLAN_S { // the main plan object, fully C++
255266

256267
} FINUFFT_PLAN_S;
257268

258-
#undef TYPE3PARAMS
259-
260269
#endif // DEFS_H

include/finufft/fft.h

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,10 @@
33

44
#ifdef FINUFFT_USE_DUCC0
55
#include "ducc0/fft/fftnd_impl.h"
6-
#define FFTW_FORGET_WISDOM() // temporary hack since some tests call this unconditionally
7-
#define FFTW_CLEANUP() // temporary hack since some tests call this unconditionally
8-
#define FFTW_CLEANUP_THREADS() // temporary hack since some tests call this
9-
// unconditionally
6+
// temporary hacks to allow compilation of tests that assume FFTW is used
7+
static inline void FFTW_FORGET_WISDOM() {}
8+
static inline void FFTW_CLEANUP() {}
9+
static inline void FFTW_CLEANUP_THREADS() {}
1010
#else
1111
#include "fftw_defs.h"
1212
#endif

include/finufft/fftw_defs.h

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,11 +22,7 @@
2222
// now use this tool (note we replaced typedefs v<=2.0.4, in favor of macros):
2323
#define FFTW_CPX FFTWIFY(complex)
2424
#define FFTW_PLAN FFTWIFY(plan)
25-
#define FFTW_ALLOC_RE FFTWIFY(alloc_real)
2625
#define FFTW_ALLOC_CPX FFTWIFY(alloc_complex)
27-
#define FFTW_PLAN_1D FFTWIFY(plan_dft_1d)
28-
#define FFTW_PLAN_2D FFTWIFY(plan_dft_2d)
29-
#define FFTW_PLAN_3D FFTWIFY(plan_dft_3d)
3026
#define FFTW_PLAN_MANY_DFT FFTWIFY(plan_many_dft)
3127
#define FFTW_EX FFTWIFY(execute)
3228
#define FFTW_DE FFTWIFY(destroy_plan)

include/finufft/spreadinterp.h

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,10 +20,12 @@
2020
NOTE: non-zero values are for experts only, since
2121
NUMERICAL OUTPUT MAY BE INCORRECT UNLESS finufft_spread_opts.flags=0 !
2222
*/
23-
#define TF_OMIT_WRITE_TO_GRID 1 // don't add subgrids to out grid (dir=1)
24-
#define TF_OMIT_EVALUATE_KERNEL 2 // don't evaluate the kernel at all
25-
#define TF_OMIT_EVALUATE_EXPONENTIAL 4 // omit exp() in kernel (kereval=0 only)
26-
#define TF_OMIT_SPREADING 8 // don't interp/spread (dir=1: to subgrids)
23+
enum {
24+
TF_OMIT_WRITE_TO_GRID = 1, // don't add subgrids to out grid (dir=1)
25+
TF_OMIT_EVALUATE_KERNEL = 2, // don't evaluate the kernel at all
26+
TF_OMIT_EVALUATE_EXPONENTIAL = 4, // omit exp() in kernel (kereval=0 only)
27+
TF_OMIT_SPREADING = 8 // don't interp/spread (dir=1: to subgrids)
28+
};
2729

2830
namespace finufft {
2931
namespace spreadinterp {

include/finufft_eitherprec.h

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,6 @@
2424

2525
// decide which kind of complex numbers FINUFFT_CPX is (four options)
2626
#ifdef __cplusplus
27-
#define _USE_MATH_DEFINES
2827
#include <complex> // C++ type
2928
#define FINUFFT_COMPLEXIFY(X) std::complex<X>
3029
#else
@@ -183,4 +182,3 @@ FINUFFT_EXPORT int FINUFFT_CDECL FINUFFTIFY(3d3many)(
183182
#undef FINUFFT_CPX
184183
#undef FINUFFT_PLAN
185184
#undef FINUFFT_PLAN_S
186-
#undef FINUFFT_TYPE3PARAMS

0 commit comments

Comments
 (0)