mrirecon / bart Goto Github PK

BART: Toolbox for Computational Magnetic Resonance Imaging

Home Page: https://mrirecon.github.io/bart/

License: BSD 3-Clause "New" or "Revised" License

Shell 0.45% MATLAB 0.20% Makefile 5.06% C 73.48% Cuda 2.93% C++ 16.35% Python 1.05% CMake 0.02% HTML 0.11% JavaScript 0.35%

bart-toolbox compressed-sensing computational-imaging deep-learning iterative-methods mri

bart's People

Contributors

Stargazers

Watchers

Forkers

apilastri asslaend xqwang1 drinksbydrew thisiscam pvss morpheus-med nckz mrthat grlee77 welcheb mbdriscoll hcmh cbasasif norok2 qinweizhang timlod jhu-cardiac-mri markusjsommer hjmjohnson mjacob75 sequintoa jychengmri stfnr aparek qqrider xpjiang kuanggongzju csyben takishima biotic-imaging sdimoudi mjanderson09 jasonbme jaeseung16 myousefi2016 johannestoger davidco5 bijoumd78 nanshawn jhlegarreta mutual-ai tuko vcg-uvic poonono sidward jsulam briando2005 unakarmi davied9 drewik667 maxdiefenbach wgrissom soumickovgu lyz0305 fighternan riep jschoormans yuxinhu gongjizhang zhitao-li 1035326373 peterzs tetianadadakova geralinev jing-ge521 zhengguotan cfg126 backswimming betsyzz chicitipo marjolein90 melrobin pimborman daydreamer2023 giladmri prajwalbv99 llcc343 cindyfeng2019 sequynth malits zongjg njstewart-eju hmella spam-depository praveenivp martink84 jinwei1209 gaoyang2016 xinpeiwangmri jzhjedi mbic-mr-methods adamwu1979 amritkumar9595 kevin2599 mon-ius jiantaiz balbasty lpzhang develop-xu

bart's Issues

Support MKL for LAPACKE

Has this been investigated/performed?

We have a compilation working using the templates for the other math libraries in using MKL that ships with Intel 18.0, but as we are not a user of 'BART' we'd need some help making sure basic tests work.

-Kevin (from MARCC - Maryland Advanced Research Computing Center - Johns Hopkins)

Document tools that require different inputs/outputs

Some tools modify the input in-place (e.g. transpose). Sometimes there are special cases where this does not matter (e.g. transposing across singleton dimensions). Other tools use temporary memory. We need clearer documentation on these different cases.

weighted pics does not work over additional dimensions

There is a bug with pics that returns an incorrect result when supplying sampling pattern over time

Different behaviour between RTNLINV and NLINV with oversampled grid/image

I have been using the NLINV algorithm for the reconstruction of some 2D radial data (thank you for that, it has been working great!). Because I'm sometimes (for various reasons) measure with a very tight field of view) it has been pointed out (here #220) that I can also reconstruct the image with the oversampled grid/FOV by simply scaling my trajectory. This also has been working great with the NLINV algorithm.

When using the RTNLINV version (since I also have acquired frames/repetitions with the data) with the exact same inputs and parameters as the NLINV it in principle worked and did also return images with the same size as NLINV and including the oversampled grid/FOV as the NLINV version. However, the data is masked to the original/reduced FOV without oversampling:

Surprisingly, the coil sensitivities returned by RTNLINV do only have half the size of those returned by NLINV. Judging from the debug output below I would guess that the psf dims would require twice the size to be coherent with NLINV. Is this difference between NLINV and RTNLINV intentional, a mistake, or just something that has not been added yet?

>> [recoNLINV, coilsensNLINV] = bart('nlinv -d4 -i14 -t', trajFile, data.vol);
nufft kernel dims: [  1  96 120   1   1   1   1   1   1   1  20   1   1   1   1   1   1 ]
nufft psf dims:    [350 350   1   1   1   1   1   1   1   1  20   1   1   1   1   1   1 ]
nufft traj dims:   [  3  96 120   1   1   1   1   1   1   1   1   1   1   1   1   1   1 ]
Allocating 230400 (vs. 2680400) + 2450000
ksp : [  1  96 120   1   1   1   1   1   1   1  20   1   1   1   1   1   1 ]
cim : [350 350   1   1   1   1   1   1   1   1  20   1   1   1   1   1   1 ]
traj: [  3  96 120   1   1   1   1   1   1   1   1   1   1   1   1   1   1 ]
ksp : [  1  96 120   8   1   1   1   1   1   1  20   1   1   1   1   1 ]
cim : [350 350   1   8   1   1   1   1   1   1  20   1   1   1   1   1 ]
traj: [  3  96 120   1   1   1   1   1   1   1   1   1   1   1   1   1 ]
Scaling: 8.542377

>> [recoNLINVRT, coilsensNLINVRT] = bart('rtnlinv -d4 -i14 -t', trajFile, data.vol);
[  0   0   0 ]
nufft kernel dims: [  1  96 120   1   1   1   1   1   1   1   1   1   1   1   1   1   1 ]
nufft psf dims:    [175 175   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 ]
nufft traj dims:   [  3  96 120   1   1   1   1   1   1   1   1   1   1   1   1   1   1 ]
Allocating 11520 (vs. 42145) + 30625
ksp : [  1  96 120   1   1   1   1   1   1   1   1   1   1   1   1   1   1 ]
cim : [175 175   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 ]
traj: [  3  96 120   1   1   1   1   1   1   1   1   1   1   1   1   1   1 ]
ksp : [  1  96 120   8   1   1   1   1   1   1   1   1   1   1   1   1 ]
cim : [175 175   1   8   1   1   1   1   1   1   1   1   1   1   1   1 ]
traj: [  3  96 120   1   1   1   1   1   1   1   1   1   1   1   1   1 ]

>> size(recoNLINV)
ans = 175   175     1     1     1     1     1     1     1     1    20

>> size(recoNLINVRT)
ans = 175   175     1     1     1     1     1     1     1     1    20

>> size(coilsensNLINV)
ans = 350   350     1     8     1     1     1     1     1     1    20

>> size(coilsensNLINVRT)
ans = 175   175     1     8     1     1     1     1     1     1    20

Include density compensation and coil sensitivity maps as parameters for nufft?

Hi,

Is there a way to density compensation and coil sensitivity maps along with the k-space trajectory as extra parameters for the bart nufft?

Ecalib test fails under Mac + Clang

Steps to replicate:

FFTW_BASE=/opt/local CC=clang OMP=0 PARALLEL=1 make
make test

Produces error:

/Users/jtamir/tmp/bart/phantom -s8 -k /Users/jtamir/tmp/bart/tests/out//shepplogan_coil.ra
set -e ; mkdir /Users/jtamir/tmp/bart/tests/tmp/$$/ ; cd /Users/jtamir/tmp/bart/tests/tmp/$$/                   ;\
    /Users/jtamir/tmp/bart/ecalib -m1 /Users/jtamir/tmp/bart/tests/out//shepplogan_coil.ra coils.ra         ;\
    /Users/jtamir/tmp/bart/pocsense -i1 /Users/jtamir/tmp/bart/tests/out//shepplogan_coil.ra coils.ra proj.ra   ;\
    /Users/jtamir/tmp/bart/nrmse -t 0.05 proj.ra /Users/jtamir/tmp/bart/tests/out//shepplogan_coil.ra       ;\
    rm *.ra ; cd .. ; rmdir /Users/jtamir/tmp/bart/tests/tmp/$$/
Calibration region...  (size: 5x5x1, pos: 62x61x0)
Energy: 0.599 0.298 0.088 0.011 0.003 0.001 0.000 0.000
Build calibration matrix and SVD...
Assertion failed: (dim[i] >= dimk[i]), function casorati_dims, file /Users/jtamir/tmp/bart/src/num/casorati.c, line 40.
/bin/sh: line 1: 72553 Abort trap: 6           /Users/jtamir/tmp/bart/ecalib -m1 /Users/jtamir/tmp/bart/tests/out//shepplogan_coil.ra coils.ra
make[1]: *** [tests/test-ecalib] Error 134

If I manually change the kernel size, I don't get a runtime error but the output of ecalib will be all-zeros.

Makefile gcc6 for MAC

the Makefile need to be set to gcc6 to match the instruction in the "README" file.
ifeq ($(BUILDTYPE), MacOSX)
CC ?= gcc-mp-6

Improve Cmake build system to simplify

See #56 for more details:

Is there a workaround for the LAPACKE issue, so that I can try this on my machine?

I will merge this now, but in the long run we should simplify this. It seems that some things which are simple to do with make and shell scripts are rather complicated with cmake.

For example reading the list of targets from a file seems very complicated. Would a different/simpler format for this file (the list of targets) help?
In bart.c, we could include a config.h with the list of main-functions defined as a macro. Would autogenerating the config.h be easier than editing bart.c during the build? (for our build system, config.h could just be empty)
I also wonder if we could simply omit some build options for cmake. For example, compiling commands into individual tools is something most people do not need. Just not supporting this option would remove some of the complexity.
Special stuff such as Mat2cfl might disappear anyway at some point.
Finally, would it be possible to add a section to .travis.yml where the cmake build is tested?

Martin

Creating a BART GitHub Wiki

Would there be interests for creating GitHub Wiki pages for BART ?

I know that BART comes bundled with some documentation, but making that available directly from the GitHub repository might be more convenient for users looking to troubleshoot common problems.

As an example, I have written a small page on building pyBART after a question I got from @jtamir (pyBART wiki). If there is interest, I could expand that to include a detailed description of the CMake build system (I personally don't use the Makefile...). This could be a good addition to the existing website but more focused towards building BART.

undefined reference to problem when gcc nufft.c file

I have been trying to execute the nufft.c file located in the bart/src directory. However, when I compile the nufft.c file by running gcc nufft.c it gives me a lot of undefined reference to problems. How do I solve this issue.

Compile fail on CentOS

I am trying to compile BART 0.5.00 on a CentOS cluster. I cloned the master branch from git, and encountered the following error:

gcc -Wall -Wextra -MMD -MF /var/scratch/tbbakker/bart/src/num/.lapack.d -iquote /var/scratch/tbbakker/bart/src/ -I/usr//include/ -I/usr//include -DFFTWTHREADS -DMAIN_LIST="avg, bench, bitmask, cabs, caldir, calmat, carg, casorati, cc, ccapply, cdf97, circshift, conj, conv, copy, cpyphs, creal, crop, delta, ecalib, ecaltwo, estdelay, estdims, estshift, estvar, extract, fakeksp, fft, fftmod, fftrot, fftshift, filter, flatten, flip, fmac, homodyne, index, invert, itsense, join, looklocker, lrmatrix, mandelbrot, mip, moba, nlinv, noise, normalize, nrmse, nufft, ones, pattern, phantom, pics, pocsense, poisson, poly, repmat, reshape, resize, rmfreq, rof, rss, sake, saxpy, scale, sdot, show, slice, spow, sqpics, squeeze, ssa, std, svd, tgv, threshold, toimg, traj, transpose, twixread, var, vec, version, walsh, wave, wavelet, wavepsf, whiten, window, wshfl, zeros, zexp, ()" -include src/main.h -O3 -ffast-math -Wmissing-prototypes -std=gnu11 -fopenmp -c -o /var/scratch/tbbakker/bart/src/num/lapack.o /var/scratch/tbbakker/bart/src/num/lapack.c /var/scratch/tbbakker/bart/src/num/lapack.c:14:21: fatal error: lapacke.h: No such file or directory #include <lapacke.h> ^ compilation terminated. make[1]: *** [/var/scratch/tbbakker/bart/src/num/lapack.o] Error 1

I don't have root access on this system, so could not follow the instructions detailed in the README for installation on CentOS:
sudo yum install devtoolset-8 atlas-devel fftw3-devel libpng-devel lapack-devel

Instead, given the error, I tried building lapack (version 3.9.0) from source. Unfortunately, the error remains.

I tried inspecting the Makefile for a hint, and found the following lines:

ifeq ($(BUILDTYPE), Linux) ifneq (,$(findstring Red Hat,$(shell gcc --version))) CPPFLAGS+=-I/usr/include/lapacke/ LDFLAGS+=-L/usr/lib64/atlas -laterals endif endif

I suspect this tells the compiler where to look for the missing lapacke.h? As a side-note, this line was missing from the Makefile in bart-0.5.00.tar I obtained from the BART webpage in another attempt to solve this issue (is this intentional?).

Regardless, /usr/include/lapacke/ indeed exists on this system, and contains (among other files) a file lapacke.h. I'm rather unfamiliar with these systems, and am unsure how to proceed.

CentOS version: CentOS Linux release 7.4.1708 (Core)
BART version: 0.5.00

Error: "shared clf" using twixread

Command:
bart twixread -A meas_MID00165_FID19032_mfc_3dflash_T1w_GRAPPA4_11mm.dat t1w_data

Output:
VD Header. MeasID: 143 FileID: 19010 Scans: 2
VD Header. MeasID: 143 FileID: 19010 Scans: 2
ERROR: shared cfl t1w_data.cfl

The error is returned immediately with the second line of output. The input file is a 3D gradient-echo image with GRAPPA 2x2 acceleration, 40x40 oversampling, 32 channels and 9 echoes. According to mapVBVD, the full shape of the data is: [416 32 182 159 1 1 1 9 1 1 1 1 1 1 1 1].

Possible reason:
I suspect that twixreader cannot handle the two scans in the TWIX file, one of which is a reference scan.

Any help would be greatly appreciated.

Cuda out of memory error for ecalib and pics

I'm new to this MRI business, so I may be doing something totally crazy. Here is my code (part of it, anyway):

kspace = cfl.readcfl('data/{}'.format(filename))
sensitivities = bart(1, 'ecalib -g', kspace)

I get this error:

CUDA Pointwise Eigendecomposition...
ERROR: Error: ERROR: cuda error: 230 out of memory 
Aborted (core dumped)
Traceback (most recent call last):
  File "sandbox.py", line 32, in <module>
    sensitivities = bart(1, 'ecalib -g', kspace)
  File "/home/chris/bart/python/bart.py", line 61, in bart
    raise Exception("Command exited with an error.")

The file is a 1 GB .cfl file (and the kspace array is just over 1 GB in memory). Is that too big? Here are my GPU specs:

It worked great without using the GPU. pics also works fine without the GPU, but dies with the same error.

The shape of kspace is (320, 288, 236, 6). I'm not sure what the last dimension is; can I somehow slim down the 4th dimension from 6 to 1 to help things fit? Maybe I'm going down the wrong road.

edit: I got my .cfl from here (the first entry): http://mridata.org/undersampled/knees
2nd edit: I tried using kspace[:,:,:100,:], but I get an error as well:

ERROR: invalid argument
Aborted (core dumped)
Traceback (most recent call last):
  File "sandbox.py", line 32, in <module>
    sensitivities = bart(1, 'ecalib -g', kspace)
  File "/home/chris/bart/python/bart.py", line 61, in bart
    raise Exception("Command exited with an error.")
Exception: Command exited with an error.

Matlab memory / write-read speed

Hey,
First off, thanks for BART - I am getting into using the toolbox and so far really like the results!

I am interfacing with Matlab and noticed that readcfl always returns doubles and was wondering if that is intentional (as BART seems to be outputting only single precision values). We are working with very large datasets and therefore memory usage must be kept down.

Right now readcfl.m reads float32s and converts them to double. Then it instantiates a double (non-complex) array which will be immediately reallocated as a complex matrix to fit the complex doubles.
writecfl.m also instantiates an extra double array for the values (where in my case then all the single data first needs to be promoted to double).

I modified both methods and saw speed increase significantly. Memory usage halves due to keeping the values as singles.
Should I make a pull request about this?
Best, Tim

Residual convergence graph - intuition

Hi, I'm adding this as an issue here because I'm not sure where to discuss these things. Please let me know if there is a better place.

In any case, I've been working on a golden angle radial real-time flow sequence with a temporal TV constraint. I started thinking about convergence, so I made a few plots of the convergence measures that the ADMM optimization outputs (rnorm and snorm). One example can be found in the link below.

https://imgur.com/1sQ8XPl

The left plot shows the convergence measures as a function of the total number of CG iterations performed (i.e. what you specify with the -i option), up to 1000 iterations. The lambda value is set to something that gives the best results I could get so far. The rho factor is dynamic, but I set it close to what it usually converges to for my problem, and it settles within the fest few iterations.

I can see some different "phases" here:

A rapid decay phase (0-100 or so)
A slower decay phase (100-400)
A "chaotic" phase (400-850)
A sharp drop (~850)
Very slow decay (850-1000)

So my question is this - is there some kind of intuition or knowledge to be gained from this? Can we explain the different phases? Can we use that knowledge to improve the optimization? Is it ADMM-specific, MRI-specific, or is it due to the combination of the two?

It might also be completely meaningless, but I thought I should at least show it.

Cheers,

Johannes

Activated CUDA?

hello,

I build BART successfully with CUDA:

CUDA?= 1
CUDA_BASE ?= /usr/local/cuda/

Then I run bart ecalib. It works, however, I cann't see the process via nvidia-smi.
So I am wandering if bart use CUDA or not.

thx.

fmemopen missing on Mac

fmemopen does not exist on BSD/OS X. Possible workaround is to include the following:
https://github.com/NimbusKit/memorymapping

thread safety problem

if fft_apply is called multi times in different thread, a segmentation falut will occur

fft_apply exit critical
fft_apply exit critical
fft_apply exit critical
fft_apply enter critical
fft_apply enter critical
fftw: alloc.c:187: assertion failed: (stat->cnt == 0 && stat->siz == 0) || (stat->cnt > 0 && stat->siz > 0)
Aborted (core dumped)

but according to FFTW, the execute interface should be thread safe

have any ideas about it? Thanks.

On compiling on Fedora 27: Makefile modifications.

I needed to modify the Makefile in a way that makes me not sure on how to integrate the changes into the Makefile.local system.

All the step I did (as much as I can recollect):

dnf install lapack-devel
dnf install openblas-devel
Set BLAS_BASE=/usr/local/openblas.
Set BLAS_H=-I$(BLAS_BASE)
Set BLAS_L=-lopenblas64 -llpacake

The BLAS_BASE environment variable is basically redundant in this case. Also, to the best of my knowledge, yum or dnf does not offer an explicit liblapacke_dev package. Rather, it is integrated into openblas-dev. The other packages were available by default.

Would it make sense to have direct BLAS_H and BLAS_L environement variables that can be force-set in Makefile.local? If so, I'll submit a pull request.

NLINV with oversampled grid

For some applications, I'm acquiring data with 2D and/or 3D radial sampling using a very narrow field of view (due to various reasons). The NUFFT implemented within BART has an option to reconstruct the images by keeping the 2x oversampled grid,

-1		use/return oversampled grid

The NLINV algorithm currently is missing this (even though surprisingly the coil maps returned do include the 2-times oversampled grid).

I have tried to implement a parameter into the NLINV code to allow for the images to be returned using the 2x oversampled grid, which is so far quite straight forward:
MartinK84@0aa6a4d

When looking at the results this appears to work quite well for the NLINV reconstructed images:

However, the sensitivity maps appear to be influenced by this in the oversampled areas of the FoV:

The differences in the sensitivity maps are only outside of the object and can easily be removed by using a magnitude threshold. I would like to nevertheless ask if there is a better way to implement this and if this behavior is something that would be expected when reconstructing NLINV with an oversampled grid? Finally, I would very much like to see this option to be included for the NLINV algorithm in some way or another, if my changes proposed above are sufficient feel free to incorporate them, I can also create a pull request if desired.

Conflict with ox-bart

Looks like the noreturn also breaks ox-bart...

Please see:
fbc2e46#comments

Compilation error on RHEL7

I believe I have all the prerequisite libraries and gcc 4.8.5. Many parts do compile, but I reach this error on __auto_type and I'm unsure what is needed. Please advise. Thanks much, Brian Hanna CMRR

In file included from /opt/local/mrirecon/bart/src/num/ops.h:11:0,
from /opt/local/mrirecon/bart/src/linops/linop.c:18:
/opt/local/mrirecon/bart/src/linops/linop.c: In function ‘plus_apply’:
/opt/local/mrirecon/bart/src/misc/types.h:44:14: error: unknown type name ‘__auto_type’
#define auto __auto_type
^

gcc -v

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC)

pics total variation algorithm switch

It should be possible to select different iterative algorithms for total variation (currently: chambolle pock or ADMM) instead of always setting it to ADMM. We should move the hard-code ADMM in optreg.c to an assertion check in pics.c

MATLAB Wrapper and tools that output the result with bart_printf

I ran into an issue with the MATLAB wrapper when using tools that output the result with bart_printf, in particular estdelay. Ideally, I would like to get the output as a variable in MATLAB. Using out = bart('estdelay',traj,data) results in an error because there is no output from estdelay. I'm aware that I could capture the command window output by doing something like out = evalc("bart('estdelay',traj,data)"); or but it's not very tidy.

I made a very hacky attempt to have bart.m attempt to drop theout_str and try again if it detects a specific error message. Whereas it doesn't seem to break any code for me, it's might result in unforeseen consequences down the line. Another approach would be to have bart.m recognize these tools based on thecmd-string and handle them accordingly, i.e. calling bart without an out_str, but this doesn't seem like a very clean solution either.

operator_p_(co)domain do not work correctly

operator_p_domain() and operator_p_codomain() contain the following code:
assert(1u == op->op.io_flags);
(see https://github.com/mrirecon/bart/blob/master/src/num/ops.c#L304)

However, in operator_p_create2(), we initialize this as:
o->op.io_flags = MD_BIT(1);
which is 2 (https://github.com/mrirecon/bart/blob/master/src/num/ops.c#L383). Therefore, operator_p_domain() can never work on any operator_p_s created by that function (which are all or almost all of them).

This is obviously not working as intended. Should the assert check for 2 instead? In general, I don't really see what that check is there for...

Question: pre-kspace time series data?

I'm looking for data straight off the receive coil (or whatever the most raw, original data is), before it is processed/transformed to kspace. Do you know of any datasets like that? Also, does your tool handle any of that type of processing?

Thanks!

MATLAB Wrapper Unit Test

During the recent BART Workshop one of the polls did show that apparently a majority of users are using BART in combination with the MATLAB wrapper. However, as far as I can see it there is no Unit Test available for the MATLAB wrapper to tests its functionality.

MATLAB does include its own unit testing Fraemwork (see https://www.mathworks.com/help/matlab/matlab-unit-test-framework.html) and I wonder if there would be enough interest for a small Unit Test of the BART MATLAB wrapper to be developed.

The idea would be to have a matlabTest.m/bartTest.m function in bart/tests (or bart/matlab?) that runs a basic unit test on the MATLAB wrapper. Useful things to test could be:

Toolbox path set and finding of the bart binary
Execution of the bart binary
BART help from within MATLAB (see #219)
Runing bart with:
- single and multiple input variables
- no, single and multiple output variables

MATLAB Unit Tests are not trivial, but I have used the MATLAB Unit Testing Framework before and would be willing to contribute.

double free or corruption -- noncart example

environment :
centos : 6.8 x64
gcc : 6.3.1 20170216 from devtoolset-6
bart : v0.4.03 built with openblas && gfortran

When I run example noncart.m in espirit-matlab-examples, I met a 'double free or corruption' error, the log:

lowres_img = bart('nufft -i -d30:30:1 -t', traj_rad2, ksp_sim);
*** glibc detected *** /home/solutions/SWR5.0/R5.0_new/system/CMS160/SW/SW_ENG/DPC/BART/bart: double free or corruption (out): 0x0000000001e42b80 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x75e5e)[0x7f5a7d7f1e5e]
/lib64/libc.so.6(+0x78cf0)[0x7f5a7d7f4cf0]
/home/solutions/SWR5.0/R5.0_new/system/CMS160/SW/SW_ENG/DPC/BART/bart(xfree+0x18)[0x491406]
/home/solutions/SWR5.0/R5.0_new/system/CMS160/SW/SW_ENG/DPC/BART/bart(md_free+0x18)[0x484223]
/home/solutions/SWR5.0/R5.0_new/system/CMS160/SW/SW_ENG/DPC/BART/bart[0x44cff1]
/home/solutions/SWR5.0/R5.0_new/system/CMS160/SW/SW_ENG/DPC/BART/bart[0x43d976]
/home/solutions/SWR5.0/R5.0_new/system/CMS160/SW/SW_ENG/DPC/BART/bart(operator_generic_apply_unchecked+0x49)[0x4861a2]
/home/solutions/SWR5.0/R5.0_new/system/CMS160/SW/SW_ENG/DPC/BART/bart(operator_apply_unchecked+0x39)[0x4861f8]
/home/solutions/SWR5.0/R5.0_new/system/CMS160/SW/SW_ENG/DPC/BART/bart(linop_adjoint_unchecked+0x55)[0x43e298]
/home/solutions/SWR5.0/R5.0_new/system/CMS160/SW/SW_ENG/DPC/BART/bart(compute_psf+0x1b1)[0x44c07f]
/home/solutions/SWR5.0/R5.0_new/system/CMS160/SW/SW_ENG/DPC/BART/bart[0x44c3e1]
/home/solutions/SWR5.0/R5.0_new/system/CMS160/SW/SW_ENG/DPC/BART/bart(nufft_create+0x9f4)[0x44b3c1]
/home/solutions/SWR5.0/R5.0_new/system/CMS160/SW/SW_ENG/DPC/BART/bart(main_nufft+0x56f)[0x41fe91]
/home/solutions/SWR5.0/R5.0_new/system/CMS160/SW/SW_ENG/DPC/BART/bart(main_bart+0x2a4)[0x40db15]
/home/solutions/SWR5.0/R5.0_new/system/CMS160/SW/SW_ENG/DPC/BART/bart(main_bart+0x251)[0x40dac2]
/home/solutions/SWR5.0/R5.0_new/system/CMS160/SW/SW_ENG/DPC/BART/bart(main+0x20)[0x40d7d6]
/lib64/libc.so.6(__libc_start_main+0x100)[0x7f5a7d79ad20]
/home/solutions/SWR5.0/R5.0_new/system/CMS160/SW/SW_ENG/DPC/BART/bart[0x40d6cd]
======= Memory map: ========
00400000-004be000 r-xp 00000000 00:27 16777                              /home/solutions/SWR5.0/R5.0_new/system/CMS160/SW/SW_ENG/DPC/BART/bart
006bd000-006be000 r-xp 000bd000 00:27 16777                              /home/solutions/SWR5.0/R5.0_new/system/CMS160/SW/SW_ENG/DPC/BART/bart
006be000-006bf000 rwxp 000be000 00:27 16777                              /home/solutions/SWR5.0/R5.0_new/system/CMS160/SW/SW_ENG/DPC/BART/bart
01dad000-01e7f000 rwxp 00000000 00:00 0                                  [heap]
3034c00000-3034cef000 r-xp 00000000 fd:00 2007117                        /usr/lib64/libfftw3f.so.3.2.3
3034cef000-3034eef000 ---p 000ef000 fd:00 2007117                        /usr/lib64/libfftw3f.so.3.2.3
3034eef000-3034ef6000 rwxp 000ef000 fd:00 2007117                        /usr/lib64/libfftw3f.so.3.2.3
3035000000-3035006000 r-xp 00000000 fd:00 2007103                        /usr/lib64/libfftw3f_threads.so.3.2.3
3035006000-3035205000 ---p 00006000 fd:00 2007103                        /usr/lib64/libfftw3f_threads.so.3.2.3
3035205000-3035206000 rwxp 00005000 fd:00 2007103                        /usr/lib64/libfftw3f_threads.so.3.2.3
33d9800000-33d9815000 r-xp 00000000 fd:00 659611                         /lib64/libz.so.1.2.3
33d9815000-33d9a14000 ---p 00015000 fd:00 659611                         /lib64/libz.so.1.2.3
33d9a14000-33d9a15000 r-xp 00014000 fd:00 659611                         /lib64/libz.so.1.2.3
33d9a15000-33d9a16000 rwxp 00015000 fd:00 659611                         /lib64/libz.so.1.2.3
33dc400000-33dc425000 r-xp 00000000 fd:00 2004150                        /usr/lib64/libpng12.so.0.49.0
33dc425000-33dc625000 ---p 00025000 fd:00 2004150                        /usr/lib64/libpng12.so.0.49.0
33dc625000-33dc626000 rwxp 00025000 fd:00 2004150                        /usr/lib64/libpng12.so.0.49.0
3cc6a00000-3cc6a16000 r-xp 00000000 fd:00 680540                         /lib64/libgcc_s-4.4.7-20120601.so.1
3cc6a16000-3cc6c15000 ---p 00016000 fd:00 680540                         /lib64/libgcc_s-4.4.7-20120601.so.1
3cc6c15000-3cc6c16000 rwxp 00015000 fd:00 680540                         /lib64/libgcc_s-4.4.7-20120601.so.1
3cc6e00000-3cc6e15000 r-xp 00000000 fd:00 1978363                        /usr/lib64/libgomp.so.1.0.0
3cc6e15000-3cc7014000 ---p 00015000 fd:00 1978363                        /usr/lib64/libgomp.so.1.0.0
3cc7014000-3cc7015000 rwxp 00014000 fd:00 1978363                        /usr/lib64/libgomp.so.1.0.0
7f5a74000000-7f5a74021000 rwxp 00000000 00:00 0 
7f5a74021000-7f5a78000000 ---p 00000000 00:00 0 
7f5a7b4d7000-7f5a7b4d8000 ---p 00000000 00:00 0 
7f5a7b4d8000-7f5a7bed8000 rwxp 00000000 00:00 0 
7f5a7bed8000-7f5a7bed9000 ---p 00000000 00:00 0 
7f5a7bed9000-7f5a7c8d9000 rwxp 00000000 00:00 0 
7f5a7c8d9000-7f5a7cad9000 rwxp 00000000 fd:00 1834589                    /tmp/tpd084480d_170a_48dd_90e7_079622dc00f0in2.cfl
7f5a7cad9000-7f5a7cf2b000 r-xp 00000000 fd:00 1977989                    /usr/lib64/atlas/libatlas.so.3.0
7f5a7cf2b000-7f5a7d12b000 ---p 00452000 fd:00 1977989                    /usr/lib64/atlas/libatlas.so.3.0
7f5a7d12b000-7f5a7d135000 rwxp 00452000 fd:00 1977989                    /usr/lib64/atlas/libatlas.so.3.0
7f5a7d135000-7f5a7d13c000 r-xp 00000000 fd:00 654123                     /lib64/librt-2.12.so
7f5a7d13c000-7f5a7d33b000 ---p 00007000 fd:00 654123                     /lib64/librt-2.12.so
7f5a7d33b000-7f5a7d33c000 r-xp 00006000 fd:00 654123                     /lib64/librt-2.12.so
7f5a7d33c000-7f5a7d33d000 rwxp 00007000 fd:00 654123                     /lib64/librt-2.12.so
7f5a7d33d000-7f5a7d35d000 r-xp 00000000 fd:00 1977991                    /usr/lib64/atlas/libcblas.so.3.0
7f5a7d35d000-7f5a7d55c000 ---p 00020000 fd:00 1977991                    /usr/lib64/atlas/libcblas.so.3.0
7f5a7d55c000-7f5a7d55d000 rwxp 0001f000 fd:00 1977991                    /usr/lib64/atlas/libcblas.so.3.0
7f5a7d55d000-7f5a7d57c000 r-xp 00000000 fd:00 1977995                    /usr/lib64/atlas/libf77blas.so.3.0
7f5a7d57c000-7f5a7d77b000 ---p 0001f000 fd:00 1977995                    /usr/lib64/atlas/libf77blas.so.3.0
7f5a7d77b000-7f5a7d77c000 rwxp 0001e000 fd:00 1977995                    /usr/lib64/atlas/libf77blas.so.3.0
7f5a7d77c000-7f5a7d907000 r-xp 00000000 fd:00 654095                     /lib64/libc-2.12.so
7f5a7d907000-7f5a7db06000 ---p 0018b000 fd:00 654095                     /lib64/libc-2.12.so
7f5a7db06000-7f5a7db0a000 r-xp 0018a000 fd:00 654095                     /lib64/libc-2.12.so
7f5a7db0a000-7f5a7db0c000 rwxp 0018e000 fd:00 654095                     /lib64/libc-2.12.so
7f5a7db0c000-7f5a7db10000 rwxp 00000000 00:00 0 
7f5a7db10000-7f5a7db27000 r-xp 00000000 fd:00 654119                     /lib64/libpthread-2.12.so
7f5a7db27000-7f5a7dd27000 ---p 00017000 fd:00 654119                     /lib64/libpthread-2.12.so
7f5a7dd27000-7f5a7dd28000 r-xp 00017000 fd:00 654119                     /lib64/libpthread-2.12.so
7f5a7dd28000-7f5a7dd29000 rwxp 00018000 fd:00 654119                     /lib64/libpthread-2.12.so
7f5a7dd29000-7f5a7dd2d000 rwxp 00000000 00:00 0 
7f5a7dd2d000-7f5a7ddb0000 r-xp 00000000 fd:00 654103                     /lib64/libm-2.12.so
7f5a7ddb0000-7f5a7dfaf000 ---p 00083000 fd:00 654103                     /lib64/libm-2.12.so
7f5a7dfaf000-7f5a7dfb0000 r-xp 00082000 fd:00 654103                     /lib64/libm-2.12.so
7f5a7dfb0000-7f5a7dfb1000 rwxp 00083000 fd:00 654103                     /lib64/libm-2.12.so
7f5a7dfb1000-7f5a7e0a1000 r-xp 00000000 fd:00 1982889                    /usr/lib64/libgfortran.so.3.0.0
7f5a7e0a1000-7f5a7e2a0000 ---p 000f0000 fd:00 1982889                    /usr/lib64/libgfortran.so.3.0.0
7f5a7e2a0000-7f5a7e2a2000 rwxp 000ef000 fd:00 1982889                    /usr/lib64/libgfortran.so.3.0.0
7f5a7e2a2000-7f5a7e2a3000 rwxp 00000000 00:00 0 
7f5a7e2a3000-7f5a7ee37000 r-xp 00000000 fd:00 2007163                    /usr/local/lib/libopenblas_nehalem-r0.3.3.dev.so
7f5a7ee37000-7f5a7f036000 ---p 00b94000 fd:00 2007163                    /usr/local/lib/libopenblas_nehalem-r0.3.3.dev.so
7f5a7f036000-7f5a7f03a000 r-xp 00b93000 fd:00 2007163                    /usr/local/lib/libopenblas_nehalem-r0.3.3.dev.so
7f5a7f03a000-7f5a7f043000 rwxp 00b97000 fd:00 2007163                    /usr/local/lib/libopenblas_nehalem-r0.3.3.dev.so
7f5a7f043000-7f5a7f08e000 rwxp 00000000 00:00 0 
7f5a7f08e000-7f5a7f59f000 r-xp 00000000 fd:00 1977997                    /usr/lib64/atlas/liblapack.so.3.0
7f5a7f59f000-7f5a7f79e000 ---p 00511000 fd:00 1977997                    /usr/lib64/atlas/liblapack.so.3.0
7f5a7f79e000-7f5a7f7a1000 rwxp 00510000 fd:00 1977997                    /usr/lib64/atlas/liblapack.so.3.0
7f5a7f7a1000-7f5a7f8af000 rwxp 00000000 00:00 0 
7f5a7f8af000-7f5a7f8cf000 r-xp 00000000 fd:00 654479                     /lib64/ld-2.12.so
7f5a7f8eb000-7f5a7f9ee000 rwxp 00000000 00:00 0 
7f5a7f9ee000-7f5a7faae000 rwxp 00000000 fd:00 1834587                    /tmp/tpd084480d_170a_48dd_90e7_079622dc00f0in1.cfl
7f5a7faae000-7f5a7fab6000 rwxp 00000000 00:00 0 
7f5a7fabe000-7f5a7facd000 rwxs 00000000 fd:00 1834591                    /tmp/tpd084480d_170a_48dd_90e7_079622dc00f0out1.cfl
7f5a7facd000-7f5a7facf000 rwxp 00000000 00:00 0 
7f5a7facf000-7f5a7fad0000 r-xp 00020000 fd:00 654479                     /lib64/ld-2.12.so
7f5a7fad0000-7f5a7fad1000 rwxp 00021000 fd:00 654479                     /lib64/ld-2.12.so
7f5a7fad1000-7f5a7fad2000 rwxp 00000000 00:00 0 
7ffcea5c0000-7ffcea5d5000 rwxp 00000000 00:00 0                          [stack]
7ffcea5d5000-7ffcea5d6000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
/home/solutions/SWR5.0/R5.0_new/system/CMS160/SW/SW_ENG/DPC/BART/bart nufft -i -d30:30:1 -t  /tmp/tpd084480d_170a_48dd_90e7_079622dc00f0in1 /tmp/tpd084480d_170a_48dd_90e7_079622dc00f0in2  /tmp/tpd084480d_170a_48dd_90e7_079622dc00f0out1: Aborted
Error using bart (line 90)
command exited with an error

I try to run

lowres_img = bart('nufft -i -d2:2:1 -t', traj_rad2, ksp_sim);

got the same problem

I trace all the way to src/noncart/grid.c
turns out the last few lines of grid_point function,

	for (int w = sti[2]; w <= eni[2]; w++) {

		float frac = fabs(((float)w - pos[2]));
		float dw = 1. * intlookup(kb_size, kb_table, frac / width);
		int indw = ((dims[2] + w) % dims[2]) * dims[1];

	for (int v = sti[1]; v <= eni[1]; v++) {

		float frac = fabs(((float)v - pos[1]));
		float dv = dw * intlookup(kb_size, kb_table, frac / width);
		int indv = (indw + ((dims[1] + v) % dims[1])) * dims[0];

	for (int u = sti[0]; u <= eni[0]; u++) {

		float frac = fabs(((float)u - pos[0]));
		float du = dv * intlookup(kb_size, kb_table, frac / width);
		int indu = (indv + ((dims[0] + u) % dims[0]));

	for (unsigned int c = 0; c < ch; c++) {
		// we are allowed to update real and imaginary part independently which works atomically
		#pragma omp atomic
		__real(dst[indu + c * dims[0] * dims[1] * dims[2]]) += __real(val[c]) * du;
		#pragma omp atomic
		__imag(dst[indu + c * dims[0] * dims[1] * dims[2]]) += __imag(val[c]) * du;
	}}}}

the index computation

indu + c * dims[0] * dims[1] * dims[2]

will have negative values, and I print them out (only paste some of them here) :

08:41:10 >>> grid_point : max = 0, w = 0, v = -240, u = 138, c = 0
08:41:10 >>> grid_point : min = -11364, w = 0, v = -335, u = 156, c = 0
08:41:10 >>> grid_point : max = 0, w = 0, v = -240, u = 138, c = 0
08:41:10 >>> grid_point : min = -11604, w = 0, v = -337, u = 156, c = 0

for lowres_img = bart('nufft -i -d30:30:1 -t', traj_rad2, ksp_sim);

and

08:43:47 >>> grid_point : min = -63, w = 0, v = -95, u = -159, c = 0
08:43:47 >>> grid_point : max = 0, w = 140499651464528, v = 4623989346310029312, u = 4682471242102004196, c = 4512192
08:43:47 >>> grid_point : min = -54, w = 0, v = -94, u = -158, c = 0
08:43:47 >>> grid_point : max = 7, w = 0, v = -104, u = 207, c = 0
08:43:47 >>> grid_point : min = -54, w = 0, v = -103, u = 202, c = 0
08:43:47 >>> grid_point : max = 7, w = 0, v = -600, u = 7, c = 0

for lowres_img = bart('nufft -i -d2:2:1 -t', traj_rad2, ksp_sim);

and

08:41:59 >>> grid_point : max = 638627, w = 0, v = 518, u = 451, c = 0
08:41:59 >>> grid_point : min = 0, w = 43941584, v = 4736723, u = 140384598179344, c = 140384598179072
08:41:59 >>> grid_point : max = 639861, w = 0, v = 519, u = 453, c = 0
08:41:59 >>> grid_point : min = 0, w = 43941584, v = 4736723, u = 140384598179344, c = 140384598179072
08:41:59 >>> grid_point : max = 626681, w = 0, v = 508, u = 825, c = 0
08:41:59 >>> grid_point : min = 0, w = 47805136, v = 4736723, u = 140384587689488, c = 140384587689216
08:41:59 >>> grid_point : max = 625451, w = 0, v = 507, u = 827, c = 0
08:41:59 >>> grid_point : min = 0, w = 47805136, v = 4736723, u = 140384587689488, c = 140384587689216

for lowres_img = bart('nufft -i -t', traj_rad2, ksp_sim);
things worked out for nufft without '-d' specified

so I assume some where the dimension calculation is not correct, please help me out, thanks a lot.

Bug : deallocate_mem_cfl_name will not delete memory correctly

mmiocc.cc :

_Bool deallocate_mem_cfl_name(const char* name)
{
     return mem_handler.remove(name);
}

we got two implementations of remove:

	  template <typename T>
	  bool remove(T* ptr)
	       {
		    return remove_(PtrDataEqual(ptr));
	       }
	  bool remove(const std::string& name)
	       {
		    return remove_(NameEqual(name));
	       }

but char* will be treated as T* instead of std::string&
either char* should be eleminated from T* by something like enable_if2
or deallocate_mem_cfl_name should pass a std::string

PS: I'm working on CentOS6, using gcc 4.9.2

gpu_wrapper does not honor selected GPU

Hi,

when, for example, using pics -G1 -g ..., the following wrapping code will still try to initilize CUDA on other GPUs, not honoring my selected GPU:

bart/src/num/ops.c

Lines 1140 to 1143 in dca1cd4

 int nr_cuda_devices = MIN(cuda_devices(), MAX_CUDA_DEVICES); 

 int gpun = omp_get_thread_num() % nr_cuda_devices; 

 cuda_init(gpun);

This is unfortunate because it makes pics unusable on the GPU when CUDA is configured to only allow a single process per GPU.

Segmentation fault on WSL (Debian and Ubuntu)

Hi,
Thanks for implementing bart, it's an excellent tools.
Tried to compile (with default options, except turning DEBUG=1 in the Makefile after encounter problem) and run bart on both Debian (4.4.0) and Ubuntu (4.4.0) on WSL.
Both version compile successfully on WSL, Running benchmark and other commands on version 0.4.00 seems fine, but version 0.4.03 cause segmentation fault.

Bart 0.4.03
simon@DESKTOP-6P96G2J:~$ ./bart-0.4.03/bart bench
Segmentation fault (core dumped)

Bart 0.4.00
simon@DESKTOP-6P96G2J:~$ ./bart-0.4.00/bart bench
add (md_zaxpy) | 0.0070 0.0060 0.0059 0.0084 0.0061 | Avg: 0.0067 Max: 0.0084 Min: 0.0059
add (md_zaxpy), contiguous | 0.0027 0.0028 0.0028 0.0028 0.0028 | Avg: 0.0028 Max: 0.0028 Min: 0.0027
add (for loop) | 0.0249 0.0247 0.0249 0.0248 0.0248 | Avg: 0.0248 Max: 0.0249 Min: 0.0247
sum (md_zaxpy) | 0.0983 0.1067 0.1105 0.1083 0.1087 | Avg: 0.1065 Max: 0.1105 Min: 0.0983
sum (md_zaxpy), contiguous | 0.0381 0.0393 0.0394 0.0402 0.0402 | Avg: 0.0394 Max: 0.0402 Min: 0.0381
sum (for loop) | 0.0076 0.0079 0.0083 0.0086 0.0077 | Avg: 0.0080 Max: 0.0086 Min: 0.0076
complex transpose | 0.0151 0.0090 0.0090 0.0089 0.0090 | Avg: 0.0102 Max: 0.0151 Min: 0.0089
complex resize | 0.0012 0.0012 0.0028 0.0016 0.0012 | Avg: 0.0016 Max: 0.0028 Min: 0.0012
complex matrix multiply | 0.0256 0.0246 0.0249 0.0247 0.0246 | Avg: 0.0249 Max: 0.0256 Min: 0.0246
batch matrix multiply 1 | 0.0045 0.0061 0.0043 0.0044 0.0043 | Avg: 0.0047 Max: 0.0061 Min: 0.0043
batch matrix multiply 2 | 0.0319 0.0249 0.0246 0.0246 0.0249 | Avg: 0.0262 Max: 0.0319 Min: 0.0246
tall matrix multiply 1 | 0.0659 0.0654 0.0653 0.0653 0.0653 | Avg: 0.0654 Max: 0.0659 Min: 0.0653
tall matrix multiply 2 | 0.0138 0.0135 0.0150 0.0167 0.0151 | Avg: 0.0148 Max: 0.0167 Min: 0.0135
complex dot product | 0.0103 0.0103 0.0103 0.0103 0.0104 | Avg: 0.0103 Max: 0.0104 Min: 0.0103
complex dot product | 0.0103 0.0103 0.0103 0.0102 0.0103 | Avg: 0.0103 Max: 0.0103 Min: 0.0102
real complex dot product | 0.0012 0.0011 0.0011 0.0010 0.0011 | Avg: 0.0011 Max: 0.0012 Min: 0.0010
l2 norm | 0.0008 0.0009 0.0009 0.0008 0.0009 | Avg: 0.0008 Max: 0.0009 Min: 0.0008
l1 norm | 0.0026 0.0098 0.0022 0.0027 0.0022 | Avg: 0.0039 Max: 0.0098 Min: 0.0022
copy 1 | 0.0319 0.0312 0.0316 0.0327 0.0314 | Avg: 0.0318 Max: 0.0327 Min: 0.0312
copy 2 | 0.0328 0.0331 0.0328 0.0325 0.0332 | Avg: 0.0329 Max: 0.0332 Min: 0.0325
wavelet soft thresh | 0.0206 0.3222 0.0209 0.0208 0.0217 | Avg: 0.0812 Max: 0.3222 Min: 0.0206
wavelet soft thresh | 0.0267 0.0215 0.0281 0.0217 0.1693 | Avg: 0.0534 Max: 0.1693 Min: 0.0215

simon@DESKTOP-6P96G2J:~$ ./bart-0.4.03/bart zeros 2 256 256 m
Segmentation fault (core dumped)

Compile in debug mode and debug using gdb with bt
simon@DESKTOP-6P96G2J:~/bart-0.4.03$ gdb --arg ./bart bench
GNU gdb (Debian 7.12-6) 7.12.0.20161007-git
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./bart...done.
(gdb) run
Starting program: /home/simon/bart-0.4.03/bart bench

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff9f90700 (LWP 2861)]
[New Thread 0x7ffff7780700 (LWP 2862)]
[New Thread 0x7ffff4f70700 (LWP 2863)]
[New Thread 0x7ffff2760700 (LWP 2864)]
[New Thread 0x7fffeff50700 (LWP 2865)]
[New Thread 0x7fffed740700 (LWP 2866)]
[New Thread 0x7fffeaf30700 (LWP 2867)]
[New Thread 0x7fffe8720700 (LWP 2868)]
[New Thread 0x7fffe5f10700 (LWP 2869)]
[New Thread 0x7fffe3700700 (LWP 2870)]
[New Thread 0x7fffe0ef0700 (LWP 2871)]
[New Thread 0x7fffdcdd0700 (LWP 2872)]
[New Thread 0x7fffdc5c0700 (LWP 2873)]
[New Thread 0x7fffdbdb0700 (LWP 2874)]
[New Thread 0x7fffdb5a0700 (LWP 2875)]
[New Thread 0x7fffdad90700 (LWP 2876)]
[New Thread 0x7fffda580700 (LWP 2877)]
[New Thread 0x7fffd9d70700 (LWP 2878)]
[New Thread 0x7fffd9560700 (LWP 2879)]
[New Thread 0x7fffd8d50700 (LWP 2880)]
[New Thread 0x7fffd8540700 (LWP 2881)]
[New Thread 0x7fffd7d30700 (LWP 2882)]

Thread 19 "bart" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd9d70700 (LWP 2878)]
0x00007ffffffed848 in ?? ()
(gdb)
(gdb) bt
#0 0x00007ffffffed848 in ?? ()
#1 0x00000000080783dd in md_nary (fun=0x7ffffffed848, ptr=0x7fffd9d6fb80, str=0x7ffffffed630, dim=0x7ffffffed530,
D=, C=3) at /home/simon/bart-0.4.03/src/num/multind.c:60
#2 md_nary (fun=0x7ffffffed848, ptr=0x7fffd9d6fba0, str=0x7ffffffed630, dim=0x7ffffffed530, D=1, C=3)
at /home/simon/bart-0.4.03/src/num/multind.c:71
#3 md_nary (C=3, D=, dim=0x7ffffffed530, str=0x7ffffffed630, ptr=0x7fffd9d6fd90,
fun=0x7ffffffed848) at /home/simon/bart-0.4.03/src/num/multind.c:71
#4 0x000000000807868a in md_nary (fun=, ptr=0x7fffd9d6fdb0, str=,
dim=, D=, C=) at /home/simon/bart-0.4.03/src/num/multind.c:71
#5 md_parallel_nary._omp_fn.0 () at /home/simon/bart-0.4.03/src/num/multind.c:142
#6 0x00007ffffdda66b6 in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#7 0x00007ffffdb77494 in start_thread (arg=0x7fffd9d70700) at pthread_create.c:333
#8 0x00007ffffd8b8acf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
(gdb)

Also try to reproduce the results with command zeros, and gdb bt return
Thread 1 "bart" received signal SIGSEGV, Segmentation fault.
0x00007ffffffedc68 in ?? ()
(gdb) bt
#0 0x00007ffffffedc68 in ?? ()
#1 0x000000000807838c in md_nary (C=, D=, dim=0x7ffffffedbd0, str=,
ptr=0x7ffffffedb80, fun=0x7ffffffedc68) at /home/simon/bart-0.4.03/src/num/multind.c:60
#2 0x0000000008078a73 in md_parallel_nary (C=C@entry=1, D=D@entry=0, dim=dim@entry=0x7ffffffedbd0,
flags=flags@entry=0, str=str@entry=0x7ffffffedb70, ptr=ptr@entry=0x7ffffffedb80, fun=0x7ffffffedc68)
at /home/simon/bart-0.4.03/src/num/multind.c:87
#3 0x0000000008088d17 in optimized_nop (N=N@entry=1, io=io@entry=1, D=D@entry=2, dim=dim@entry=0x7ffffffedd20,
nstr=nstr@entry=0x7ffffffedd48, nptr=nptr@entry=0x7ffffffedd50, sizes=0x7ffffffedd58)
at /home/simon/bart-0.4.03/src/num/optimize.c:618
#4 0x000000000807988b in md_clear2 (D=2, dim=, str=str@entry=0x7ffffffeddd0,
ptr=ptr@entry=0x7fffff710000) at /home/simon/bart-0.4.03/src/num/multind.c:569
#5 0x0000000008079952 in md_clear (D=D@entry=2, dim=dim@entry=0x7ffffffede10, ptr=ptr@entry=0x7fffff710000,
size=size@entry=8) at /home/simon/bart-0.4.03/src/num/multind.c:610
#6 0x00000000080114ce in main_zeros (argc=, argv=) at src/zeros.c:48
#7 0x000000000800fd74 in main_bart (argc=, argv=)
at /home/simon/bart-0.4.03/src/bart.c:104
#8 0x00007ffffd7f02e1 in __libc_start_main (main=0x800fa90

, argc=6, argv=0x7ffffffedfd8,
init=, fini=, rtld_fini=, stack_end=0x7ffffffedfc8)
at ../csu/libc-start.c:291
#9 0x000000000800faea in _start ()

Am I missing anything?

Thanks in advance.
Simon

test-ecalib sometimes fails on mac

Unclear what is going on, but the output from ecalib/pocsense has some incorrect boundary issues that cause it to fail the NRMSE test.

FFTW Wisdom

Hi Team,

I have a proposal in adding fftw wisdom feature to BART. Per combination with suggestion #151, it can further reduce the pipeline processing time for l1-wavelet gpu recon, with the wisdom feature of fftw library: http://www.fftw.org/fftw3_doc/Words-of-Wisdom_002dSaving-Plans.html#Words-of-Wisdom_002dSaving-Plans. It's serves as a cache to store previously optimized plan

In order to do this, several minor modification is implemented and trial run on bart-0.4.03.
A) In file ./src/num/fft.h, the following line is added after #define __FFT_H:

#define FFTW_DEFAULT_WISDOM_CACHE_FILE "./fftw_wisdom_cache"

Explanation: it's a default location of the FFTW wisdom file

B) In the file ./src/num/init.c,
At the end of the function void num_init(void), the following code is added:

// Import previously measured fftw plans from environmental path, if any, or default cache path
char* env_wisdom_file = getenv("FFTW_WISDOM_FILE");
char *wisdom_file = (NULL != env_wisdom_file)? env_wisdom_file: FFTW_DEFAULT_WISDOM_CACHE_FILE;
char* env_disable_wisdom_cache = getenv("FFTW_DISABLE_WISDOM_CACHE");
int fftw_disable_wisdom_cache = (NULL != env_disable_wisdom_cache)? atoi(env_disable_wisdom_cache): 0;
if (0 == fftw_disable_wisdom_cache)
{
	if (0 == fftwf_import_wisdom_from_filename(wisdom_file))
	{
		debug_printf(DP_DEBUG1, "fftw wisdcom file import error\n");
	}
}

Explanation: the fftw wisdom file will be determined by the environmental variable FFTW_WISDOM_FILE. It this environmental variable is not available, it will use the file specified in (A). For use cases unfavorable to use a wisdom file. The user will have an option to turn it off explicitly by setting the FFTW_DISABLE_WISDOM_CACHE environmental variable to 1. Afterwards, the fftwf_import_wisdom_from_filename will import the wisdom file. Note: If the file is not available on file system or any error encountered during the wisdom file reading or import process, the fftwf_import_wisdom_from_filename will return a value of 0 (zero).

(C) In the file ./src/num/fft.c, after "plan->fftw = fft_fftwf_plan(D, dimensions, flags, strides, dst, strides, src, backwards, true);", the following code is inserted:

// Export previously measured fftw plans from environmental path, if any, or default cache path
char* env_wisdom_file = getenv("FFTW_WISDOM_FILE");
char *wisdom_file = (NULL != env_wisdom_file)? env_wisdom_file: FFTW_DEFAULT_WISDOM_CACHE_FILE;
char* env_disable_wisdom_cache = getenv("FFTW_DISABLE_WISDOM_CACHE");
int fftw_disable_wisdom_cache = (NULL != env_disable_wisdom_cache)? atoi(env_disable_wisdom_cache): 0;
if (0 == fftw_disable_wisdom_cache)
{
	if (0 == fftwf_export_wisdom_to_filename(wisdom_file))
	{
		debug_printf(DP_DEBUG1, "fftw wisdom file export error\n");
	}
}

Explanation: similar to import wisdom with the same options, the wisdom file is write back to the same wisdom file on file system.

Analysis:
Hardware platform: Intel 8700k cpu with 64GB ram, NVidia 1080 Ti
Bart command: bart pics -l1 -g -S -r 0.1 ./ksp_cc ./ecalib_map ./image
Benchmark tool: NVidia nvvp
Data: 256 x 256 x 90 x 7 (compressed from 16 ch)
Acceleration: 3.74x with poisson sampling

Without fftw wisdom file or 1st time running bart pics without fftw wisdom file

Whole process takes ~8.5s. The CPU thread takes around ~5.5s consumed by fftw create plan

Enabling fftw wisdom and run bart for the 2nd time

Without re-create known fftw plan during previous processes, CPU time reduced from ~5.5s to ~3.1s (43% increase in CPU cycles in this particular case).

enable fftw wisdom and executing bart directly on command prompt, without nvvp overhead the whole process actually takes only ~4.69s.

Hope this suggestion useful for the BART community.

Cheers,
Simon

Some non-executables have execute permissions

src/bpsense.c
src/misc/debug.c
src/misc/debug.h
... etc.

Probably easier to run chmod -x *.c than to merge a pull request.

pyBART transposes data

@Takishima

The reading/writing of pyBART seems to transpose data. Here is an example where the phantom ends up being transposed:

import pyBART as bart
im0 = bart.phantom('-x 128 im0.mem')
im1 = bart.load_cfl('im0.mem')
im2 = im1.copy()
bart.register_memory('im2.mem', im2)
print(bart.nrmse('im0.mem im2.mem')) # not zero
im3 = bart.load_cfl('im2.mem')
plt.figure()
plt.imshow(abs(im1.squeeze()))
plt.figure()
plt.imshow(abs(im3.squeeze()))

Can't compile with CUDA alone on Mac OS X

Mac OS X Yosemite uses Clang, which doesn't support OpenMP.

You can install separately GCC, which supports OpenMP, but GCC is not compatible with nvcc on Yosemite.

¯\_(ツ)_/¯

So CUDA compilation works on OS X, but some of the GPU source assumes the presence of OpenMP, so the make dies on src/num/gpuops.c.

Would it be possible to unravel the CUDA and OpenMP dependencies so that CUDA does not require OpenMP?

Great package, by the way. I hope to contribute some of my code to it at some point. Just have to plug it into the mmio concept.

Can bart support for non-parallel MRI reconstruction?

There are many examples for parallel imaging and compressed sensing by using bart, can bart support for non-parallel MRI data, just single coil MRI data? These data don't have coil sensitivity map, thus it would cause errors.

Thanks!

estvar always returns 0.00000

As input I have data of shape (num_slices, height, width, num_coils) made by writecfl (coming from the fastmri dataset). However, when I run bart estvar -k1 -r <number_of_central_lines> I always get Estimated noise variance: 0.00000. Is there somewhere an test image which returns something positive?

Segfault in cdf97

Hey,

starting from commit 3c1c2eb and including current master, the following code segfaults for me:

bart phantom -x 256 p
bart cdf97 3 p pw

Strangely enough, it does not segfault for 126, 127, or 128, although the result is the original phantom instead of its wavelet decomposition:

x=128:

Previous commits do now show this problem.

bart threshold with wavelet segfaults

Steps to reproduce:

git clone [email protected]:mrirecon/bart
cd bart/
PARALLEL=1 make
./bart phantom o
./bart threshold -W 1 o o2
# Segmentation fault (core dumped)

md_copy2 improvement

Hi Team,

Using BART to recon 3D MRI data with dimension 256 x 256 x 90, 16 channel (compressed to 7 virtual channel) data. Using nvvp to analyze the performance of the "bart pics" command to recon, and on my system (Debian on Intel 8700k cpu with 64GB RAM & GTX 1080 Ti), more than 90% of the time is used in memory copy and the kernel occupancy is less than 10%.

E.g. the following l1-wavelet 3D recon: "bart pics -l1 -i 30 -S -r 0.1 -g -G 0 ./ksp_cc ./map_cc ./ img_cc" takes ~26s (5.8s before GPU recon (upto sense_init), 20s for GPU recon). Traced the source code and figured out the performance is bounded by md_copy2 function, which perform a cudaMemcpy2D only when (ND-Skip==1).

After studied the FIXME comment about memory alignment on md_copy2, we began re-write the code for the use_gpu==1 case. When ND-Skip==0, a single cudaMemcpy will copy the the data 1 time. For the case ND-Skip >= 1, without using the NESTED macro and the corresponding nd_ function and use an explicit nested loop (with some modifications on the original code) to call the cudaMemcpy2D function, we are able to successfully recon & cut the recon time from 26s down to 9.5s (ie 5.8s before GPU recon and 3.7s for recon), memcpy reduced to less than 25% and kernel occupancy went up to 75%.

A closer look of 1 iteration look like this (the kernel look more busy ;)):

The modified version seems fine for the workshop example and some other demo code as well.

Let me touchup the code, and see if we can combined with the NESTED macro and share with you later. After this, I am thinking to make some improvement on

implement a cu kernel to accelerate multi-dimensional strided device to device memory copy, ie cudaMemcpyNDPitched
H->D memcpy in small matrix with very long pitch, by extracting the small matrix in host memory to another host memory, then transfer the tightly packed matrix to device. Then using cudaMemcpyNDPitched to perform the D->D pitched ND memory copy. Similar for D->H as well.
make the sense_init function a little bit faster (although it's quite fast already)

Cheers,
Simon

Cygwin compile fails

Hi, just tried to compile BART v0.4.01 on Cygwin to integrate with Matlab on a colleagues computer (Windows 7). I installed cygwin 64-bit and the packages that are mentioned in the README file.

First, I got an error that lapacke.h is missing, but I got past that error by downloading the LAPACKE (C integration) header files from the LAPACK website: http://www.netlib.org/lapack/

After this the file execinfo.h is missing. However, this is not provided at all in Cygwin due to this function being basically incompatible with Windows.

Is there a better way of compiling on Windows?

Error log from compilation below:

gcc -Wall -Wextra -MMD -MF /cygdrive/c/Users/johannes/code/csf-resp-recon/bart/src/misc/.debug.d -I/cygdrive/c/Users/johannes/code/csf-resp-recon/bart/src/ -I/usr//include/ -I/usr//include -DFFTWTHREADS -DMAIN_LIST="avg, bench, bitmask, bpsense, cabs, caldir, calmat, carg, casorati, cc, ccapply, cdf97, circshift, conj, conv, copy, cpyphs, creal, crop, delta, ecalib, ecaltwo, estdelay, estdims, estshift, estvar, extract, fakeksp, fft, fftmod, fftshift, filter, flatten, flip, fmac, homodyne, invert, itsense, join, lrmatrix, mandelbrot, mip, nlinv, noise, normalize, nrmse, nufft, ones, pattern, phantom, pics, pocsense, poisson, repmat, reshape, resize, rof, rss, sake, saxpy, scale, sdot, show, slice, spow, sqpics, squeeze, svd, threshold, toimg, traj, transpose, twixread, version, walsh, wave, wavelet, zeros, zexpj, ()" -include src/main.h -O3 -ffast-math -Wmissing-prototypes -std=gnu11 -I/cygdrive/c/Users/johannes/code/csf-resp-recon/bart/src/ -I../lapacke-include/ -fopenmp -c -o /cygdrive/c/Users/johannes/code/csf-resp-recon/bart/src/misc/debug.o /cygdrive/c/Users/johannes/code/csf-resp-recon/bart/src/misc/debug.c
/cygdrive/c/Users/johannes/code/csf-resp-recon/bart/src/misc/debug.c:22:22: fatal error: execinfo.h: No such file or directory
#include <execinfo.h>
^
compilation terminated.
make[1]: *** [Makefile:373: /cygdrive/c/Users/johannes/code/csf-resp-recon/bart/src/misc/debug.o] Error 1
make[1]: Leaving directory '/cygdrive/c/Users/johannes/code/csf-resp-recon/bart'
make: *** [Makefile:194: default] Error 2

primal dual crashes for non-Cartesian

(using bpsense)

Rename CC

Is it possible to rename the cc binary? If I install it in my PATH, it over shadows the soft link to my C compiler (which CMake looks for).

error: unknown type name 'uint'

I pulled the latest version of bart from git today and recompiled on both my CentOS Linux and macOS systems. It compiled fine on the Linux system, but failed on my mac due to the following error:

/Users/jwoods/Documents/MATLAB/bart/src/num/ops.c:728:7: error: unknown type name 'uint'
for (uint i = 0; i < D + 1; i++)
^~~~

I fixed this by simply changing "uint" to "unsigned int".

I'm using the gcc-mp-6 compiler and haven't had this issue before. Just thought I'd let you know!

OpenMP causes Clang to Crash

I filed a clang/LLVM bug report for this issue here.

Please share any helpful information you may have with the clang developers.

linop API consistency

The linop API should be more consistent - some linop constructors start with linop_XXX_create and some don't (e.g. src/linops/rvc.h). A more consistent and searchable alternative is linop_create_XXX

Joint wavelet regularization not supported in pics

Currently not implemented. Requires using prox_unithresh_create along with a wavelet linop

fPIC flag missing in CMakeLists.txt

The standard Makefile includes the -fPIC compiler flag, so that bart can be used as an external library (e.g. in gadgetron). However, this flag is missing in CMakeLists.txt. Possible solution: Append the following lines somewhere after set(_compile_flags_c/cxx):

list(APPEND _compile_flags_c "-fPIC")
list(APPEND _compile_flags_cxx "-fPIC")

Missing 'end' in /bart/matlab/bart.m @ branch v0.4.04

Hey,

just downloaded the last update (v0.4.04) and there is a missing 'end' in /bart/matlab/bart.m at line 33.

Not existing in master branch, but I did use the link of the gh-page to download and than got to this version.

Best regards
Max

	int nr_cuda_devices = MIN(cuda_devices(), MAX_CUDA_DEVICES);
	int gpun = omp_get_thread_num() % nr_cuda_devices;

	cuda_init(gpun);

mrirecon / bart Goto Github PK

bart's People

Contributors

Stargazers

Watchers

Forkers

bart's Issues

gcc -v

Recommend Projects

Recommend Topics

Recommend Org

Jobs