You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/opts.rst
+17-17Lines changed: 17 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,17 +20,17 @@ to the simple, vectorized, or guru makeplan routines.
20
20
Recall how to do this from C++:
21
21
22
22
.. code-block:: C++
23
-
23
+
24
24
// (... set up M,x,c,tol,N, and allocate F here...)
25
-
finufft_opts* opts;
26
-
finufft_default_opts(opts);
27
-
opts->debug = 1;
28
-
int ier = finufft1d1(M,x,c,+1,tol,N,F,opts);
25
+
finufft_opts opts;
26
+
finufft_default_opts(&opts);
27
+
opts.debug = 1;
28
+
int ier = finufft1d1(M,x,c,+1,tol,N,F,&opts);
29
29
30
30
This setting produces more timing output to ``stdout``.
31
31
32
32
.. warning::
33
-
33
+
34
34
In C/C++ and Fortran, don't forget to call the command which sets default options
35
35
(``finufft_default_opts`` or ``finufftf_default_opts``)
36
36
before you start changing them and passing them to FINUFFT.
@@ -51,9 +51,9 @@ Here are their default settings (from ``src/finufft.cpp:finufft_default_opts``):
51
51
.. literalinclude:: ../src/finufft.cpp
52
52
:start-after: @defopts_start
53
53
:end-before: @defopts_end
54
-
54
+
55
55
As for quick advice, the main options you'll want to play with are:
56
-
56
+
57
57
- ``modeord`` to flip ("fftshift") the Fourier mode ordering
58
58
- ``debug`` to look at timing output (to determine if your problem is spread/interpolation dominated, vs FFT dominated)
59
59
- ``nthreads`` to run with a different number of threads than the current maximum available through OpenMP (a large number can sometimes be detrimental, and very small problems can sometimes run faster on 1 thread)
@@ -92,15 +92,15 @@ Data handling options
92
92
.. note:: The index *sets* are the same in the two ``modeord`` choices; their ordering differs only by a cyclic shift. The FFT ordering cyclically shifts the CMCL indices $\mbox{floor}(N/2)$ to the left (often called an "fftshift").
93
93
94
94
**chkbnds**: [DEPRECATED] has no effect.
95
-
95
+
96
96
97
97
Diagnostic options
98
98
~~~~~~~~~~~~~~~~~~~~~~~
99
99
100
100
**debug**: Controls the amount of overall debug/timing output to stdout.
101
101
102
102
* ``debug=0`` : silent
103
-
103
+
104
104
* ``debug=1`` : print some information
105
105
106
106
* ``debug=2`` : prints more information
@@ -113,11 +113,11 @@ Diagnostic options
113
113
114
114
* ``spread_debug=2`` : prints lots. This can print thousands of lines since it includes one line per *subproblem*.
115
115
116
-
116
+
117
117
**showwarn**: Whether to print warnings (these go to stderr).
118
-
118
+
119
119
* ``showwarn=0`` : suppresses such warnings
120
-
120
+
121
121
* ``showwarn=1`` : prints warnings
122
122
123
123
@@ -173,16 +173,16 @@ for only two settings, as follows. Otherwise, setting it to zero chooses a good
173
173
**spread_thread**: in the case of multiple transforms per call (``ntr>1``, or the "many" interfaces), controls how multithreading is used to spread/interpolate each batch of data.
174
174
175
175
* ``spread_thread=0`` : makes an automatic choice between the below. Recommended.
176
-
176
+
177
177
* ``spread_thread=1`` : acts on each vector in the batch in sequence, using multithreaded spread/interpolate on that vector. It can be slightly better than ``2`` for large problems.
178
-
178
+
179
179
* ``spread_thread=2`` : acts on all vectors in a batch (of size chosen typically to be the number of threads) simultaneously, assigning each a thread which performs a single-threaded spread/interpolate. It is much better than ``1`` for all but large problems. (Historical note: this was used by Melody Shih for the original "2dmany" interface in 2018.)
180
180
181
181
.. note::
182
-
182
+
183
183
Historical note: A former option ``3`` has been removed. This was like ``2`` except allowing nested OMP parallelism, so multi-threaded spread-interpolate was used for each of the vectors in a batch in parallel. This was used by Andrea Malleo in 2019. We have not yet found a case where this beats both ``1`` and ``2``, hence removed it due to complications with changing the OMP nesting state in both old and new OMP versions.
184
184
185
-
185
+
186
186
**maxbatchsize**: in the case of multiple transforms per call (``ntr>1``, or the "many" interfaces), set the largest batch size of data vectors.
187
187
Here ``0`` makes an automatic choice. If you are unhappy with this, then for small problems it should equal the number of threads, while for large problems it appears that ``1`` often better (since otherwise too much simultaneous RAM movement occurs). Some further work is needed to optimize this parameter.
0 commit comments