I'm opening a new issue here since the other was closed and I am unable to reopen it.
As stated in the manual, given the nature of FTTW_ESTIMATE you would expect a deterministic solution (i.e. bit reproducibility). However, it seems that this is not always the case, as observed for some cases comparing a plan_many with howmany=1 vs plan_many with howmany>1. Is this expected, that the same plan is not determined with variable howmany, and so bit reproducibility is not supported for these cases?
I submit the following inplace R2C/C2R example using fftw_plan_many_dft_r2c/c2r, with plan flags FFTW_ESTIMATE used (although I have experimented with FFTW_NO_SIMD and FFTW_UNALIGNED, and see similar failures). There are many particular combinations of howmany and N which produce a "failure."
I transform a "reference" 1D array of length N, is compared to the results of Ny transforms on a "test" set of identical 1D arrays (i.e., each of the Ny "rows" is initially identical to the reference row, laid out in memory as a 2D array). For the reference array I plan using howmany=1, and for the Ny test arrays I plan using howmany=Ny. It is observed that for some values of N, Ny the transformed test arrays are not identical to the transformed reference array. Looking further, I see that the plans returned during these "failures" are not equal.
I won't paste the full output, which shows the results of the checks, but they differ in the LSB. Instead I paste the plans generated, allowing for comparison.
Here is the result for Ny=2, N=2522 (although, as you say, results may vary compiler to compiler), where the mismatches occur after the backward (c2r) transform only. (I have seen cases where r2c failed as well):
Reference Plan:
(rdft2-ct-dif/2
(hc2c-direct-2/4/0 "hc2cbdftv_2_avx"
(rdft2-hc2r-direct-2 "r2cb_2")
(rdft2-nop))
(dft-ct-dif/13
(dftw-generic-dif-13-97
(dft-direct-13-x97 "n1bv_13_avx"))
(dft-buffered-97-x13/13-5
(dft-vrank>=1-x13/1
(dft-rader-97/is=2/os=2
(dft-ct-dit/16
(dftw-direct-16/16 "t3bv_16_avx")
(dft-direct-6-x16 "n1_6"))
(dft-ct-dit/8
(dftw-direct-8/12 "t3fv_8_avx")
(dft-directbuf/14-12-x8 "n2fv_12_avx"))
(dft-ct-dit/8
(dftw-direct-8/12 "t3fv_8_avx")
(dft-directbuf/14-12-x8 "n2fv_12_avx"))))
(dft-r2hc-1
(rdft-rank0-tiledbuf/2-x13-x97))
(dft-nop))))
Test Plan:
(rdft2-ct-dif/2
(hc2c-direct-2/4/0-x2 "hc2cbdftv_2_avx"
(rdft2-hc2r-direct-2 "r2cb_2")
(rdft2-nop))
(dft-buffered-1261-x2/2-1
(dft-vrank>=1-x2/1
(dft-ct-dif/13
(dftw-generic-dif-13-97
(dft-direct-13-x97 "n1bv_13_avx"))
(dft-vrank>=1-x13/1
(dft-rader-97/is=2/os=26
(dft-ct-dit/8
(dftw-direct-8/28 "t1buv_8_avx")
(dft-direct-12-x8 "n1_12"))
(dft-ct-dit/8
(dftw-direct-8/12 "t3fv_8_avx")
(dft-directbuf/14-12-x8 "n2fv_12_avx"))
(dft-ct-dit/8
(dftw-direct-8/12 "t3fv_8_avx")
(dft-directbuf/14-12-x8 "n2fv_12_avx"))))))
(dft-r2hc-1
(rdft-rank0-iter-ci/2522-x2))
(dft-nop)))
Notice that the plans are similar, but not identical -- likely accounting for the slight difference in the transformed values (which are of the order 1e-10, but again, only for some of the elements). Is there something about the plan_many that itself doesn't guarantee bit reproducibility, seemingly variable with howmany?
A test code follows (sorry if it's a bit long, I have a lot of checks in there):
include
include
include "fftw3.h"
int main(void) {
double *ref, *test;
void *in, *out;
fftw_plan fpref, fptest, bpref, bptest;
int i, j, Npass, Nfail, Ndims, N, Nx, Ny, stride, rdist, cdist, inembed, onembed;
Ndims = 1;
stride = 1;
inembed = onembed = 0;
Ny = 2;
N = 2522;
Nx = (N/2+1)*2;
rdist = Nx;
cdist = N/2+1;
// Allocate reference and test arrays
ref = fftw_malloc(Nxsizeof(double));
test = fftw_malloc(NxNy*sizeof(double));
// Plan reference (Nx)
in = out = ref;
fpref = fftw_plan_many_dft_r2c(Ndims, &N, 1, in, &inembed, stride, rdist, out, &onembed, stride, cdist, FFTW_ESTIMATE);
bpref = fftw_plan_many_dft_c2r(Ndims, &N, 1, out, &onembed, stride, cdist, in, &inembed, stride, rdist, FFTW_ESTIMATE);
// Plan test (Nx*Ny)
in = out = test;
fptest = fftw_plan_many_dft_r2c(Ndims, &N, Ny, in, &inembed, stride, rdist, out, &onembed, stride, cdist, FFTW_ESTIMATE);
bptest = fftw_plan_many_dft_c2r(Ndims, &N, Ny, out, &onembed, stride, cdist, in, &inembed, stride, rdist, FFTW_ESTIMATE);
// printf("Filling ref array\n");
for (j=0; j<N; ++j){ ref[j] = (double) (j+1); }
// printf("Filling test array\n");
for (i=0; i<Ny; ++i){ for (j=0; j<N; ++j){ test[i,j] = ref[j]; } }
// printf("Executing forward plans\n");
fftw_execute(fpref);
fftw_execute(fptest);
// printf("Evaluating arrays after forward transform\n");
Nfail = 0;
Npass = 0;
for (i=0; i printf("FWD: N = %d, Nx = %d, Ny = %d, Npass = %lld, Nfail = %lld\n", N, Nx, Ny, Npass, Nfail);
if ( Nfail > 0 ){
printf("Reference Plan:\n");
fftw_print_plan(fpref);
printf("\nTest Plan:\n");
fftw_print_plan(fptest);
printf("\n\n");
}
// printf("Executing backward plans\n");
fftw_execute(bpref);
fftw_execute(bptest);
// printf("Evaluating arrays after backward transform\n");
Nfail = 0;
Npass = 0;
for (i=0; i printf("BWD: N = %d, Nx = %d, Ny = %d, Npass = %lld, Nfail = %lld\n", N, Nx, Ny, Npass, Nfail);
if ( Nfail > 0 ){
printf("Reference Plan:\n");
fftw_print_plan(bpref);
printf("\nTest Plan:\n");
fftw_print_plan(bptest);
printf("\n\n");
}
// printf("Freeing resources\n");
fftw_destroy_plan(fpref);
fftw_destroy_plan(fptest);
fftw_destroy_plan(bpref);
fftw_destroy_plan(bptest);
fftw_free(ref);
fftw_free(test);
return 0;
}