alexeedm / pytorch-fortran Goto Github PK

View Code? Open in Web Editor NEW

82.0 82.0 11.0 80 KB

Pytorch bindings for Fortran

License: MIT License

Python 26.52% Shell 16.59% Pawn 0.41% CMake 17.41% C++ 30.02% C 6.02% Gnuplot 3.02%

deep-learning fortran pytorch

pytorch-fortran's People

Contributors

Stargazers

Watchers

Forkers

artaxerces caibirdhsa dongrenze98 shubhangi17002 vmagno sungdukyu smanjul sebk26 jonasjucker mjaehn pankajkarman

pytorch-fortran's Issues

Feature Request: Include PyTorch-Geometric in pytorch-fortran

Feature Request

Hi @alexeedm,

for my studies I included the pytorch package "PyTorch-Geometric" [1] in the torchfortran bindings to be able to open and do inferencing on models including torch-geometric modules.
This works well and was not too complicated. However, I see that this would create more dependencies for this repository, which is probably not desired.

My question now would be, if there is an elegant way that my solution could be included somehow so that people with the same intentions don't have to go the whole way or at least see this sample implementation?

[1] https://github.com/pyg-team/pytorch_geometric

How I included the package

build cpp backend of TorchSparse [2] and TorchScatter [3] (subpackages of PyG) in ./pyg/
include both in CMAKE_PREFIX_PATH

[2] https://github.com/rusty1s/pytorch_sparse
[3] https://github.com/rusty1s/pytorch_scatter

In `make_gnu.sh`

PYG="./pyg"
CMAKE_PREFIX_PATH=$CMAKE_PREFIX_PATH:"$PYG/pytorch_scatter/install/share/cmake:$PYG/pytorch_sparse/install/share/cmake

In `src/proxy_lib/torch_proxy.cpp`

#include <torchscatter/scatter.h>
#include <torchsparse/sparse.h>

`src/proxy_lib/CMakeLists.txt`

find_package(TorchScatter REQUIRED)
find_package(TorchSparse REQUIRED)
find_package(Python3 COMPONENTS Development)

target_link_libraries(pytorch_proxy PRIVATE TorchScatter::TorchScatter)
target_link_libraries(pytorch_proxy PRIVATE TorchSparse::TorchSparse)

failed to build gnu target

step:

cd container
python container.py gnu --pytorch-tag 1.13.1-cuda11.6-cudnn8-devel
docker build -t pytorch-fortran:v1.13.1-devel .
docker run --name pytorch-fortran -d pytorch-fortran:v1.13.1-devel sleep 1000000
docker exec -it pytorch-fortran bash
apt update -y && apt install git -y
git clone https://github.com/alexeedm/pytorch-fortran.git
cd pytorch-fortran
./make_gnu.sh

result

Did I do something wrong? I can't successfully run make_gnu.sh script as prompted in README.md

Citing pytorch-fortran

Hi Dmitry,

I am a Ph.D. student at UC Santa Cruz and Los Alamos National Laboratory. I specialize in ML-based turbulence modeling within stellar explosions. This repo has been incredibly helpful for the last chapter of my thesis, which involved the integration of PyTorch models in a legacy Fortran code for 1D supernovae (pikarpov-lanl/COLLAPSO1D). We are writing a paper for Astrophysical Journal on this subject, and I would like to give you proper credit for the pytorch-fortran repo. Do you have any preferences on how to cite your work?

In addition, I wrote an interface to integrate your ML wrapper into any legacy F90 code, which is pretty generalizable. As such, I think it would be highly beneficial for the astrophysical community if this pipeline would be published separately, e.g., in the Journal of Open Source Software. Please let me know your thoughts and whether you would want to collaborate. Feel free to send me an email ([email protected]).

torch_tensor_to_array: pointer needs to be unassociated!

Thank you @alexeedm for this nice repository.

I've build you're code with GCC 11.3.0 without container using the make_gnu.sh script on an AMD EPYC CPU 7352.

I did not used conda but pip but this should'nt be an issue here, i guess.

Environment

GCC 11.3.0
CUDA 11.8.0
NCCL 2.15.5
Python 3.10.4
PyTorch 2.0

Build script modifications to run with pip

-PYPATH=$(find /opt/conda/lib/ -maxdepth 1 -name 'python?.*' -type d)
+VIRTUAL_ENV="path/to/your/virtualenv/myenv"
+PYPATH=${VIRTUAL_ENV}/lib/python3.10

Known Issue with gnu compiler

As mentioned by @ch21d012 in #5 adding the proposed attribute target to value fixes the error.
(I also tried using pointer attribute instead, but this fails to compile)
So I edited this in the template file ./src/f90_bindings/torch_ftn.f90.templ

-        {dtype.fortran_id} ({dtype.fortran_prec}), intent(in)    :: value
+        {dtype.fortran_id} ({dtype.fortran_prec}), intent(in), target    :: value

Error

Apparently, I get a similar error as @ch21d012 mentioned in #6.

> ./install/bin/resnet_forward  ../examples/resnet_forward/traced_model.pt                                                                                                 
terminate called after throwing an instance of 'std::runtime_error'                                                                         
  what():  torch_tensor_to_array: pointer needs to be unassociated!                                                                         
                                                                                                                                            
Program received signal SIGABRT: Process abort signal.                                                                                      
                                                                                                                                            
Backtrace for this error:                                                                                                                   
#0  0x2af1b94ff3ff in ???                                                                                                                   
#1  0x2af1b94ff387 in ???                                                                                                                   
#2  0x2af1b9500a77 in ???                                                                                                                   
#3  0x2af1ff0e3879 in _ZN9__gnu_cxx27__verbose_terminate_handlerEv                                                                          
        at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95                                                                              
#4  0x2af1ff0ef2e9 in _ZN10__cxxabiv111__terminateEPFvvE                                                                                    
        at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48                                                                            
#5  0x2af1ff0ef354 in _ZSt9terminatev                                                                                                       
        at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:58                                                                            #6  0x2af1ff0ef5a8 in __cxa_throw                                                                                                           
        at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:95                                                                                
#7  0x2af1b8d382c0 in torch_throw_cpp 
        at pytorch-fortran/src/proxy_lib/torch_proxy.cpp:169                          
#8  0x2af1b8d20b5d in __torch_ftn_MOD_torch_tensor_to_2_fp32                                                                                
        at pytorch-fortran/gnu/build_fortproxy/torch_ftn.f90:729                      
#9  0x4014b0 in resnet_forward                                                                                                              
        at pytorch-fortran/examples/resnet_forward/resnet_forward.f90:53              
#10  0x40115c in main                                                                                                                       
        at pytorch-fortran/examples/resnet_forward/resnet_forward.f90:23              
Aborted

This happens for the resnet_forward, polynomial and python_training example every time an array pointer is passed to out_tensor%to_array(output).

My (probably dirty) workaround

Calling a nullify(output) somewhere before call out_tensor%to_array(output) solves the problem and the tensor is able to be converted to an array and printed.

I justed wanted to leave this here, in case someone else is struggling.
Let me know, if there are other fixes that I missed and with which I can also compile the pytroch-fortran bindings with the GNU GCC compiler.

A way to build the program without using the docker container?

Hi,
thank you very much for this nice code!
I am very interested in using it independent from docker. Are there any build instructions on it?
I tried to do it on my own and the build itself seemed to be successful. Also I could execute the "resnet" and "polynomial" example. However there is a "Sigabrt" error running the "python_training"

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
#0 0x7fb132490700 in ???
#1 0x7fb13248f8a5 in ???
#2 0x7fb1322c008f in ???
#3 0x7fb1322c000b in ???
#4 0x7fb13229f858 in ???
#5 0x7fb123960359 in _ZN10__cxxabiv111__terminateEPFvvE
at /opt/conda/conda-bld/gcc-compiler_1654084175708/work/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:48
#6 0x7fb1239603c4 in _ZSt9terminatev
at /opt/conda/conda-bld/gcc-compiler_1654084175708/work/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:58
#7 0x7fb123960657 in __cxa_throw
at /opt/conda/conda-bld/gcc-compiler_1654084175708/work/gcc/libstdc++-v3/libsupc++/eh_throw.cc:95
#8 0x7fb13225713a in _ZN8pybind117module_6importEPKc
at /home/pia/anaconda3/lib/python3.9/site-packages/torch/include/pybind11/pybind11.h:1197
#9 0x7fb13225713a in PyModule
at /home/workspace/pytorch-fortran/src/proxy_lib/torch_proxy.cpp:152
...

It seems to point to a problem with the "torch_pymodule" and the libraries. I would be very happy if you have some helpful remarks, especially which prerequisites are required for building it. Maybe I have done some stupid mistake or missed to install something.

Best regards

implicit none missing in example

pytorch-fortran/examples/resnet_forward/resnet_forward.f90

Line 22 in cd4334a

program resnet_forward

The example program uses implicit typing. It might lead to some surprises if someone tries to extend the example without noticing.

add topic

I suggest adding the topic pytorch in the About section at https://github.com/alexeedm/pytorch-fortran.

Issue generating singularity container

Hi,

I am attempting to utilize your example code in an HPC environment. As such, I would prefer to generate a singularity container during the build phase as I don't believe I have the permissions (or access) for Docker. In container.py the parser arguments appear to allow selection of docker or singularity containers (line 126). However, using the flag --format singularity yields an error. If you have a solution to this issue I'd much appreciate it. Thank you!

how to use the wrapper

hi @alexeedm this is Varun, I am working on using pytorch model in fortran and I want to use your wrapper but I have some doubts regarding how to use. So, I have a trained model pyrorch file ex. model.pth and i want to get output for a given input in fortran array. Can you tell me how to do this ? Also, I am using an In-house code for CFD in fortran language, I want to know can we integrate this wrapper in my code. Thanks in advance.

Some questions about the future plans of pytorch-fortran

Hi @alexeedm, I am LuChen, a postgraduate majored in Software Engineering in Tongji University, China. And my current research interests are around Climate AI. Since I can't find your contact information, I create an issue here.

As you may think, we also encountered the lack of AI ecology issues during our research. Therefore, I have developed a tool Fortran-Torch-Adapter by myself from scratch in the past few months and used it in my research. (🤣Yes, exactly based on the same idea with pytorch-fortran, calling a TorchScript model directly from Fortran through interoperability between C++ and Fortran.) And I was also working on a paper to introduce this new tool as I found your repo yesterday. It seems that Nvidia was also working on this even earlier. What a coincidence! 😂😂

Since so, I want to know what are the future plans for pytorch-fortran. For the project, Fortran-Torch-Adapter was still in its infancy and I would love to see a more powerful and well-organized tool like pytorch-fortran to take it over and maybe I could also make some small contributions to this wonderful project. For the paper, I don't know if Nvidia has any plan to apply a pattern or maybe a paper for this? Since I was preparing a paper for this currently, if you are interested, you are very welcome to join this by co-authoring or anything else.

It's all open by now. Just want to hear your thoughts.