scientific-computing / fkb Goto Github PK

A two-way deep learning bridge between Keras and Fortran

License: MIT License

CMake 3.48% Python 23.60% Jupyter Notebook 30.61% Shell 0.65% Fortran 41.65%

fkb's Introduction

The Fortran-Keras Bridge (FKB)

This two-way bridge connects environments where deep learning resources are plentiful, with those where they are scarce. You can find the paper here.

@article{ott2020fortran,
  title={A Fortran-Keras Deep Learning Bridge for Scientific Computing},
  author={Ott, Jordan and Pritchard, Mike and Best, Natalie and Linstead, Erik and Curcic, Milan and Baldi, Pierre},
  journal={arXiv preprint arXiv:2004.10652},
  year={2020}
}

This library allows users to convert models built and trained in Keras to ones usable in Fortran. In order to make this possible FKB implements a neural network library in Fortran. The foundations of which are derived from Milan Curcic's original work.

Additions

An extendable layer type
- The original library was only capable of a dense layer
  - Forward and backward operations occurred outside the layer (in the network module)
- Ability to implement arbitrary layers
  - Simply extend the layer_type and specify these functions:
    - forward
    - backward
Training
- Backprop takes place inside the extended layer_type
- Ability to training arbitrary cost functions
Implemented layers
- Dense
- Dropout
- Batch Normalization
Ensembles
- Read in a directory of network configs
- Create a network for each config
- Run in parallel using $OMP PARALLEL directives
- Average results of all predictions in ensemble
A two-way bridge between Keras and Fortran
- Convert model trained in Keras (h5 file) to Fortran
  - Any of the above layers are allowed
  - Sequential or Functional API
- Convert Fortran models back to Keras
- Check out this for supported model types

Getting started

Check out an example in the getting started notebook

Get the code:

git clone https://github.com/scientific-computing/FKB

Dependencies:

Fortran 2018-compatible compiler
OpenCoarrays (optional, for parallel execution, gfortran only)
BLAS, MKL (optional)

Build

Tests and examples will be built in the bin/ directory
To use a different compiler modify FC=mpif90 cmake .. -DSERIAL=1

sh build_steps.sh

Examples

Loading a model trained in Keras

python convert_weights.py --weights_file path/to/keras_model.h5 --output_file path/to/model_config.txt

This would create the model_config.txt file with the following:

9                         --> How many total layers (includes input and activations)
input	5                 --> 5 inputs
dense	3                 --> Hidden layer 1 has 3 nodes
leakyrelu	0.3       --> Hidden layer 1 activation LeakyReLU with alpha = 0.3
dense	4                 --> Hidden layer 2 has 4 nodes
leakyrelu	0.3       --> Hidden layer 2 activation LeakyReLU with alpha = 0.3
dense	3                 --> Hidden layer 3 has 3 nodes
leakyrelu	0.3       --> Hidden layer 3 activation LeakyReLU with alpha = 0.3
dense	2                 --> 2 outputs in the last layer
linear	0                 --> Linear activation with no alpha
0.5                       --> Learning rate
<BIASES>
.
.
.
<DENSE LAYER WEIGHTS>
.
.
.
<BATCH NORMALIZATION PARAMETERS>

Creating a network

Architecture descriptions are specified in a config text file:

9                         --> How many total layers (includes input and activations)
input	5                 --> 5 inputs
dense	3                 --> Hidden layer 1 has 3 nodes
leakyrelu	0.3       --> Hidden layer 1 activation LeakyReLU with alpha = 0.3
dense	4                 --> Hidden layer 2 has 4 nodes
leakyrelu	0.3       --> Hidden layer 2 activation LeakyReLU with alpha = 0.3
dense	3                 --> Hidden layer 3 has 3 nodes
leakyrelu	0.3       --> Hidden layer 3 activation LeakyReLU with alpha = 0.3
dense	2                 --> 2 outputs in the last layer
linear	0                 --> Linear activation with no alpha
0.5                       --> Learning rate

Then the network configuration can be loaded into FORTRAN:

use mod_network, only: network_type
type(network_type) :: net

call net % load('model_config.txt')

Ensembles

mod_ensemble allows ensembles of neural networks to be run in parallel. The ensemble_type will read all networks provided in the user specified directory. Calling average passes the input through all networks in the ensemble and averages their output. noise_perturbation is used to perturb the input to each model with Gaussian noise.

Put the names of the model files in ensemble_members.txt:

simple_model.txt
simple_model_with_weights.txt

Then to run an ensemble:

ensemble = ensemble_type('$HOME/Desktop/neural-fortran/ExampleModels/', noise_perturbation)

result1 = ensemble % average(input)

You can run the test_ensembles.F90 file:

./test_ensembles $HOME/Desktop/neural-fortran/ExampleModels/

Saving and loading from file

To save a network to a file, do:

call net % save('model_config.txt')

Loading from file works the same way:

call net % load('model_config.txt')

fkb's People

Contributors

Stargazers

Watchers

fkb's Issues

add topics

I suggest adding the topics deep-learning, keras in the About section.

Recognize InputLayer

Hi congratulation for the great work!

A minor thing I realized during initial testing.
If the input layer of the model is given by

model.add(keras.models.InputLayer(input_shape=5))

instead of

model.add(Dense(8, input_dim=5, activation='relu'))

then this results in the error:

Traceback (most recent call last):
  File "convert_weights.py", line 353, in <module>
    h5_to_txt(
  File "convert_weights.py", line 183, in h5_to_txt
    info = model_config['config']['layers'][0]['config']['batch_input_shape'][1]

dense layer with no biases can't be converted

Hi,
Thanks for your work with this repo. I have a tf.keras model saved as an .h5 file that I tried converting to text with convert_weights.py. However, it looks like the main Dense layers of my model don't have biases saved, and this makes the python script throw a "'bias:0' object doesn't exist" error.
I made a temporary workaround by just skipping adding the biases if they can't be found, but this leads to an end-of-file error when loading the model into Fortran. I'm guessing that the Fortran code is expecting to read the biases even if they aren't there. Any suggestions for solving this issue?

At line 208 of file /Users/rurata/work/FKB/src/lib/mod_network.F90
Fortran runtime error: End of file

corrupted size vs. prev_size

Hello,
I'm trying to load a simple network and do a prediction but I get the following message:

corrupted size vs. prev_size

More specifically, I use the following network mytest.txt
to be renamed mytest.py and run with python3 mytest.py with the data dataset_STS_kd_kr_N2.txt Then, I adapted a test test_simpleNN.txt to be renamed test_simpleNN.F90 and compiled.

Do you have some hints about its origin?
Thank you,
Lorenzo

Keras saved mode is not .h5 file

Hi I am planning embedding an ML model trained by keras into a climate model which is written by fortran 90. I checked your package and found that it claimed that it is able to translate saved model to fortran script. But I found that my model is not saved by .h5 format, but a folder. I used

def get_model():
    # Create a simple model.
    inputs = keras.Input(shape=(32,))
    outputs = keras.layers.Dense(1)(inputs)
    model = keras.Model(inputs, outputs)
    model.compile(optimizer="adam", loss="mean_squared_error")
    return model


model = get_model()

# Train the model.
test_input = np.random.random((128, 32))
test_target = np.random.random((128, 1))
model.fit(test_input, test_target)

# Calling `save('my_model')` creates a SavedModel folder `my_model`.
model.save("my_model")

# It can be used to reconstruct the model identically.
reconstructed_model = keras.models.load_model("my_model")

so there is a folder named "my_model", and inside there are assets saved_model.pb variables. So can I still use your package? thanks!

FKB with threading turned on

I recently noticed that the calls to the FKB library are not thread-safe. After I link the FKB library to my code and run my code in a threaded mode (OpenMP), the code produces unexpected results unless I call FKB in an omp critical section. Are there any special flags I need to turn on while compiling the FKB library so that it works for threaded runs?

Unsupported layer found! Skipping...

Hi thanks for developing this fantastic interface between Fortran and Keras. I have been running into this error however when running the example files:

../FKB/KerasWeightsProcessing/convert_weights.py:208: UserWarning: Unsupported layer found! Skipping...

I was hoping that you might have some insight into what the underlying cause of this issue might be and what I could do to address it.
Thank you.

"Unable to open object (object 'dense_1' doesn't exist)"

I created a convolutional neural network in keras and when I run

h5_to_txt(weights_file_name='model2.h5', output_file_name='model2.txt')

I get the following error:

KeyError Traceback (most recent call last)
in
----> 1 h5_to_txt(weights_file_name='model2.h5', output_file_name='model2.txt')

/media/nas/x21957/.../KerasWeightsProcessing/convert_weights.py in h5_to_txt(weights_file_name, output_file_name)
211 # get weights and biases out of dictionary
212 layer_weights = np.array(
--> 213 model_weights[name][name]['kernel:0']
214 )
215

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

/media/nas/x21957/data/reinhardt/py3_venv/lib/python3.7/site-packages/h5py/_hl/group.py in getitem(self, name)
262 raise ValueError("Invalid HDF5 object reference")
263 else:
--> 264 oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
265
266 otype = h5i.get_type(oid)

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/h5o.pyx in h5py.h5o.open()

KeyError: "Unable to open object (object 'dense_1' doesn't exist)"

Can someone help?

Performance enhancements (batched predictions using GEMM)

Hi,

Depending on the application - model and problem sizes - it's possible to make the inference very much faster by doing it in batches (packing vector-sized inputs into a 2D array) and replacing the matrix-vector multiplications by matrix-matrix which are delegated to a BLAS library. I have a Fortran application based on FKB, or actually its earlier incarnation neural-Fortran, where I did such that (I referenced neural-Fortran in my paper). It works well, and the nice thing is it's trivial to run the code on GPU's too:

#ifdef USE_CUDA
#define sgemm cublassgemm
#endif

plus some OpenACC directives above the bias addition and activation loops. You can find my code here. I think a similar batched output procedure for 2D arrays would be a valuable contribution to the main repo. I am happy to work on a pull request if you agree. If so let me know if you'd like to keep the GPU stuff: I'd have to add a few things to make it more general, like copying the input array to device, and creating the intermediate arrays for hidden layers (in my code I can get away with just two intermediate arrays where I do pointer swapping, because my models had the same number of neurons in all hidden layers).

There are a few other points too:

should DGEMM be called if input data is in double precision?
if the pointer-based activation functions of the current code are used, last I checked those can't be elemental functions, which is what I used

Possible typo (filename) in the header comment of test_training.F90

src/tests/test_training.F90 has this header information:

! TO RUN
! ./test_training $NF_PATH/ExampleModels/simple_model_with_weights.txt

! load the simple_model
! train on simple example
!   run for 10 epochs
!   print predictions, targets, and loss
!   loss should decrease to verify backprop works

However, for some reason, using ExampleModels/simple_model_with_weights.txt results in End of file error.

If I use ExampleModels/simple_model.txt instead, the test finishes successfully (with decreasing loss values).

Typo "usable" not "usuable" in README

Wrong derivative of leaky-relu activation

According to mod_activation.F90 file, It seems like derivative function of leaky_relu is same as RELU.

  pure function leaky_relu_prime(x, alpha) result(res)
    ! First derivative of the REctified Linear Unit (RELU) activation function.
    real(rk), intent(in) :: x(:)
    real(rk), intent(in) :: alpha
    real(rk) :: res(size(x))
    where (0.3 * x > 0)
      res = 1
    elsewhere
      res = 0
    end where
  end function leaky_relu_prime

Redundant evaluation of db in dense_backward() in mod_dense_layer.F90 ?

In the dense_backward() routine in src/lib/mod_dense_layer.F90 below,

  subroutine dense_backward(self, g, lr)

    class(Dense), intent(in out) :: self
    real(rk), intent(in) :: g(:), lr
    real(rk), allocatable :: t(:), dw(:,:), db(:)

    db = self % activation_prime(self % z, self % alpha) * g   !<--- [1]

    dw = matmul(&
      reshape(self % i, (/size(self % i), 1/)), &
      reshape(db, (/1, size(db)/))&
    )

    db = self % activation_prime(self % z, self % alpha) * g   !<--- [2]

    self % gradient = matmul(self % w, db)

    ! weight updates
    self % w = self % w - lr * dw
    self % b = self % b - lr * db

  end subroutine dense_backward

the second evaluation of db maked by [2] seems redundant with that
marked by [1] because self % z, self % alpha, and g have not been modified between [1] and [2]. So I guess it may be OK to eliminate [2] (which leads to less computation also).

Should linear_prime return 1 instead of x?

In src/lib/mod_activation.F90, should linear_prime return 1 as the result rather than x?

  pure function linear_prime(x, alpha) result(res)
    !! Linear activation function.
    real(rk), intent(in) :: x(:)
    real(rk), intent(in) :: alpha
    real(rk) :: res(size(x))
    res = x ! <-- should this be 1?
  end function linear_prime

It seems to me like it should.

Reported by septcolor here.

Undefined TBP (train()) used in src/tests/example_simple.F90 and example_mnist.F90

src/tests/example_simple.F90 and example_mnist.F90 have this line:

call net % train(input, output, eta=xxx)   !! where xxx is some number

However, % train() is not defined for the network_type type, so the compilation fails.

The structure constructor for the network_type also fails because
the signature is different from that defined in mod_network.F90.

net = network_type([3, 5, 2])

So I am wondering these example files were possibly prepared for some former version of FKB, or possibly assuming neural-fortran...?

CC: @milancurcic (for possible relation with neural-fortran)

how can we put the value from one subroutine to another subroutine, containing a function?

Hi, seniors, my question is that I have some variable outputs and say z is variable and have output of z in txt form

and from the brent's method, I want to calculate lambda(another variable) and the function containing lambda is depending on the z. so how can I put z value in the function dependent on z so that I can get the value of lambda correctly?

 `FUNCTION fzlamda(lamda)  
  REAL fzlamda,z,eL,lamda
  COMMON /flamda/ eL,z,x(3,1)  
!  eL=4685.0

  
  lamda=9*3.1415926535/180.0
  fzlamda=0.5*sin(lamda)*sqrt(1.+3.*sin(lamda)**2)+(0.5/sqrt(3.))
 +    *log(abs(sqrt(3.)*sin(lamda)+sqrt(1.+3.*sin(lamda)**2)))
 +    -abs(z)/4685.0                                              ! taking abs(z) means: lamda increases from equator as abs(z) increases 
  return
  END`

this is the function and
here I am writing the z as

   `if(rt.eq.0)then
      write(1,1000)t/(2*pi),(x(i,1),i=1,3),lamdaroot,(x(i,2),i=1,2),
 &(x(i,3),i=1,2),alpha*180.d0/pi,kew,deta
     
  else
      write(1,1000)t/(2*pi),(x(i,1),i=1,3),lamdaroot,(x(i,2),i=1,2),
 &(x(i,3),i=1,2),alpha*180.d0/pi,kew,deta
  endif`

where z is z=x(3,1) and getting txt result as

for instance

the second last column values are z and the last column is of lambda but lambda value is dependant on z as in function but still, lambda is constant throughout the calculations

Incompatibilities with Tensorflow 2

I've recently started using this for a project, and I noticed that I was having issues because of my Tensorflow version.

In the PR #13 you can find the changes I did to update the code to the new Tensorflow version. Unfortunately, I couldn't test the changes thoroughly, especially in the examples.

multi-input multi-output regression

Hello,

thanks for this nice way to interface fortran and keras!
I have a question about running a multi-input (or even single-input) and multi-output model. In this case, should I use an ensemble?

For example:
input = temperature
output = [concentration1, ..., concentration100]

What to do in this case? Also, I do need any average of the results when using ensembles. And also "Put the names of the model files" what does it mean? It assumes that my original network has been already splitted in many single outputs?

Thank you,
Kind regards,
Lorenzo Campoli

No "tanh" activation in "convert_weights.py"

Thanks for sharing this package! I run your example in my local machine by changing your network to my own network. And I got following warning:

../convert_weights.py:234: UserWarning: Unsupported activation, tanh, found! Replacing with Linear.
  warnings.warn(warning_str)

So I checked the convert_weights.py and found the supported activation methods are ['relu', 'linear', 'leakyrelu', 'sigmoid']. But I found that in the fortran code FKB/src/lib/mod_activation.F90, there are indeed functions for tanh. So do I just need to change ACTIVATIONS = ['relu', 'linear', 'leakyrelu', 'sigmoid'] in convert_weights.py to run my model? Thanks a lot!