GithubHelp home page GithubHelp logo

mxfold / mxfold2 Goto Github PK

View Code? Open in Web Editor NEW
106.0 6.0 31.0 14.31 MB

MXfold2: RNA secondary structure prediction using deep learning with thermodynamic integration

License: MIT License

CMake 0.16% Python 72.65% C++ 22.50% Dockerfile 0.62% Shell 4.08%
rna-secondary-structure-prediction deep-learning

mxfold2's Introduction

MXfold2

RNA secondary structure prediction using deep learning with thermodynamic integration

Installation

System requirements

  • python (>=3.7)
  • pytorch (>=1.4)
  • C++17 compatible compiler (tested on Apple clang version 12.0.0 and GCC version 7.4.0) (optional)

Install from wheel

We provide the wheel python packages for several platforms at the release. You can download an appropriate package and install it as follows:

% pip3 install mxfold2-0.1.2-cp310-cp310-manylinux_2_17_x86_64.whl

Install from sdist

You can build and install from the source distribution downloaded from the release as follows:

% pip3 install mxfold2-0.1.2.tar.gz

To build MXfold2 from the source distribution, you need a C++17 compatible compiler.

Prediction

You can predict RNA secondary structures of given FASTA-formatted RNA sequences like:

% mxfold2 predict test.fa
>DS4440
GGAUGGAUGUCUGAGCGGUUGAAAGAGUCGGUCUUGAAAACCGAAGUAUUGAUAGGAAUACCGGGGGUUCGAAUCCCUCUCCAUCCG
(((((((........(((((..((((.....))))...)))))...................(((((.......)))))))))))). (24.8)

By default, MXfold2 employs the parameters trained from TrainSetA and TrainSetB (see our paper).

We provide other pre-trained models used in our paper. You can download models-0.1.0.tar.gz and extract the pre-trained models from it as follows:

% tar -zxvf models-0.1.0.tar.gz

Then, you can predict RNA secondary structures of given FASTA-formatted RNA sequences like:

% mxfold2 predict @./models/TrainSetA.conf test.fa
>DS4440
GGAUGGAUGUCUGAGCGGUUGAAAGAGUCGGUCUUGAAAACCGAAGUAUUGAUAGGAAUACCGGGGGUUCGAAUCCCUCUCCAUCCG
(((((((.((....))...........(((((.......))))).(((((......))))).(((((.......)))))))))))). (24.3)

Here, ./models/TrainSetA.conf specifies a lot of parameters including hyper-parameters of DNN models.

Training

MXfold2 can train its parameters from BPSEQ-formatted RNA sequences. You can also download the datasets used in our paper at the release.

% mxfold2 train --model MixC --param model.pth --save-config model.conf data/TrainSetA.lst

You can specify a lot of model's hyper-parameters. See mxfold2 train --help. In this example, the model's hyper-parameters and the trained parameters are saved in model.conf and model.pth, respectively.

Web server

A web server is working at http://www.dna.bio.keio.ac.jp/mxfold2/.

References

mxfold2's People

Contributors

satoken avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

mxfold2's Issues

energy

Hi,

What does the number inside the bracket show?
I mean output from 'mxfold2 predict'.
Is it the energy of the secondary structure?
Or how can I extract energy from it?

Best,
Peyman

ImportError: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /home/ekofman/new_anaconda3/envs/mxfold2_python38/lib/python3.8/site-packages/mxfold2/interface.cpython-38-x86_64-linux-gnu.so)

Looks like it installed successfully, but now when I try to run it I get the following error:

ImportError: /lib64/libstdc++.so.6: version GLIBCXX_3.4.20' not found (required by /home/ekofman/new_anaconda3/envs/mxfold2_python38/lib/python3.8/site-packages/mxfold2/interface.cpython-38-x86_64-linux-gnu.so)`

Anybody seen this before/ know how to resolve it? Thank you!

Batch size

I can't find how to change the batch size. With the param in the default, the training is so slow, so how can I change the batch size to speed the train. Thanks.

mxfold2-0.1.1.tar.gz vs mxfold2-0.1.2.tar.gz

About mxfold2-0.1.1.tar.gz and mxfold2-0.1.2.tar.gz. The former install successfully but not the latter on Windows. What could be the reason? An unsuccessful installation of the latter comes with the following message:
Building wheel for mxfold2 (pyproject.toml) did not run successfully.
exit code: 1
[228 lines of output]
INFO: could not find files for the given pattern(s).

Difficulty replicating webserver results using command line

What method is used for predicting structures on the webserver? As it is not detailed anywhere, I have tried several different models to predict RNA structures using the command line tool, but can not find a way to replicate the results I get from the webserver.

I am using different variations of the command mxfold2 predict @./different_models.conf sequence.fasta

Is there a .conf file I can point to which will replicate webserver results?

Thanks

Sequences > 1000 nt

Hi,

I was wondering if there is any way to predict structures of sequences > 1000 nt? I have only used the web server, if using the command line tools is this possible?

Many thanks,

Nick

mxfold2-0.1.1-cp38-cp38-linux_x86_64.whl is not a supported wheel on this platform

Hi there,

Any ideas what might be happening to yield this error? When I type:

(mxfold2) [ekofman@tscc-1-36]$ uname -m

I see that my distribution is:

x86_64

I have pip3 installed, but when I type:

(mxfold2) [ekofman@tscc-1-36]$ pip3 install mxfold2-0.1.1-cp38-cp38-linux_x86_64.whl
I see:

ERROR: mxfold2-0.1.1-cp38-cp38-linux_x86_64.whl is not a supported wheel on this platform.

Score output of Neural Network

Hi Dr Kengo Sat,

Please let me know if raising issues here about MXFold2 bothers you. I do have a lot of problem when trying to understand MXFold2 and apply it myself. I can email you if you think that's more appropriate.

This issue is about zuker.py(https://github.com/keio-bioinformatics/mxfold2/blob/master/mxfold2/fold/zuker.py) and Fig. 2 The network structure of our algorithm from you paper published on Nature Communications.

Assuming we using --model MixC and --pair-join cat:

  1. From Fig .2 there are four types of score outputs, Helix Stacking, Unpaired Region, Helix Opening and Helix Closing, while zuker.py line 103-113, there are 10 scores (all of shape L *L). I can see score_helix_stacking should be Helix Stacking (first dimension of score_paired), how about score_basepair(all zeros), four types of score_mismatch (second dimension of score_paired) and four types of score_base (third dimension of score_paired)?
  2. There are also 7 other scores from line 39-45, can you explain a bit about them as well?I can't find any matches from Fig. 2.

Thanks a lot in advance.

'mxfold2' terminated by signal SIGSEGV (Address boundary error)

I can't get mxfold2 to run on my MacBook at all. Here's what happens:

BenjaminLee@mbp ~/r/viroid-search (master) [SIGSEGV]> clang --version                                                          (viroid-search)
Apple clang version 14.0.0 (clang-1400.0.29.202)
Target: x86_64-apple-darwin22.1.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
BenjaminLee@mbp ~/r/viroid-search (master)> python --version                                                                   (viroid-search)
Python 3.8.13
BenjaminLee@mbp ~/r/viroid-search (master)> pip install torch                                                                  (viroid-search)
Requirement already satisfied: torch in /usr/local/Caskroom/miniconda/base/envs/viroid-search/lib/python3.8/site-packages (1.13.0)
Requirement already satisfied: typing-extensions in /usr/local/Caskroom/miniconda/base/envs/viroid-search/lib/python3.8/site-packages (from torch) (4.2.0)
WARNING: Error parsing requirements for chardet: [Errno 2] No such file or directory: '/usr/local/Caskroom/miniconda/base/envs/viroid-search/lib/python3.8/site-packages/chardet-3.0.4.dist-info/METADATA'
BenjaminLee@mbp ~/r/viroid-search (master)> mxfold2 --help                                                                     (viroid-search)
fish: Job 1, 'mxfold2 --help' terminated by signal SIGSEGV (Address boundary error)
BenjaminLee@mbp ~/r/viroid-search (master) [SIGSEGV]> mxfold2 predict                                                          (viroid-search)
fish: Job 1, 'mxfold2 predict' terminated by signal SIGSEGV (Address boundary error)
BenjaminLee@mbp ~/r/viroid-search (master) [SIGSEGV]> mxfold2                                                                  (viroid-search)
fish: Job 1, 'mxfold2' terminated by signal SIGSEGV (Address boundary error)

I have tried installing via both the wheel and sdist but can't get it to work. Do you have any pointers?

Code of Loss (objective) function

Hi Dr Kengo Sato,

I have been reading your paper as well as trying to understand this github repo code. There has been several doubts I would like to ask you about. For this issue, I will ask about the corresponding code of loss function.

In the paper, the objective function is:

Screen Shot 2021-02-09 at 7 48 55 pm

I find the corresponding python loss function code from loss.py:

class StructuredLossWithTurner(nn.Module):
    def __init__(self, model, loss_pos_paired=0, loss_neg_paired=0, loss_pos_unpaired=0, loss_neg_unpaired=0, 
                l1_weight=0., l2_weight=0., sl_weight=1., verbose=False):
        super(StructuredLossWithTurner, self).__init__()
        self.model = model
        self.loss_pos_paired = loss_pos_paired
        self.loss_neg_paired = loss_neg_paired
        self.loss_pos_unpaired = loss_pos_unpaired
        self.loss_neg_unpaired = loss_neg_unpaired
        self.l1_weight = l1_weight
        self.l2_weight = l2_weight
        self.sl_weight = sl_weight
        self.verbose = verbose
        from .fold.rnafold import RNAFold
        from . import param_turner2004
        if getattr(self.model, "turner", None):
            self.turner = self.model.turner
        else:
            self.turner = RNAFold(param_turner2004).to(next(self.model.parameters()).device)

    def forward(self, seq, pairs, fname=None):
        pred, pred_s, _, param = self.model(seq, return_param=True, reference=pairs,
                                loss_pos_paired=self.loss_pos_paired, loss_neg_paired=self.loss_neg_paired, 
                                loss_pos_unpaired=self.loss_pos_unpaired, loss_neg_unpaired=self.loss_neg_unpaired)
        ref, ref_s, _ = self.model(seq, param=param, constraint=pairs, max_internal_length=None)
        with torch.no_grad():
            ref2, ref2_s, _ = self.turner(seq, constraint=pairs, max_internal_length=None)
        l = torch.tensor([len(s) for s in seq], device=pred.device)
        loss = (pred - ref) / l
        loss += self.sl_weight * (ref-ref2) * (ref-ref2) / l
        if self.verbose:
            print("Loss = {} = ({} - {})".format(loss.item(), pred.item(), ref.item()))
            print(seq)
            print(pred_s)
            print(ref_s)
        if loss.item()> 1e10 or torch.isnan(loss):
            print()
            print(fname)
            print(loss.item(), pred.item(), ref.item())
            print(seq)

        if self.l1_weight > 0.0:
            for p in self.model.parameters():
                loss += self.l1_weight * torch.sum(torch.abs(p))

        # if self.l2_weight > 0.0:
        #     l2_reg = 0.0
        #     for p in self.model.parameters():
        #         l2_reg += torch.sum((self.l2_weight * p) ** 2)
        #     loss += torch.sqrt(l2_reg)

        return loss

From the python code only, I am not able to match to the function from paper. I can see both L1 and L2 regularization terms, but I am not able to find the code for structured hinge loss function (margin term and max function), I suppose f(x, y) from paper means the score (first return item) from self.model. I assume that the max margin part is embedded in self.model which is in the C++ code part. From the paper, you are using MixedFold, can you please kindly point out which part and lines of C++ code are the max margin function?

Running Issue -- interface.cpp

Hi,

I got an error when I run ''mxfold2 predict test.fa''

Here is the error info

from .. import interface
ImportError: cannot import name 'interface' from 'mxfold2' (/mnt/home/wangru25/anaconda3/lib/python3.8/site-packages/mxfold2/init.py)

Would you please help me with this issue.

Thank you!

mxfold2 output

Hi, I am wondering if you could clarify what the output from mxfold2 represents? Specifically the numbers in the parenthesis at the bottom right? Are these a score or metric of some kind? Is a larger number better or worse for a given output?

Apologies if this is stated somewhere, but I couldn't find it!

Inconsistent prediction result with Web Service

Hello Developers,
Iโ€™m trying to use mxfold2 to predict for some RNA sequences. However, I encountered a inconsistent result between my local prediction and the web service at http://www.dna.bio.keio.ac.jp/mxfold2/predict as being shown below:

#my local running
$mxfold2 predict  demo5.fa 
>KC131142_UTR5
AGTTGTTAGTCTACGTGGACCGACAAAGACAGATTCTTCGAGGAAGCTAAGCTTAACGTAGTTCTAACAGTTTTTTAATTAGAGAGCAGATCTCTG
................................................................................................ (4.8)
>KR920365_UTR5
AGTTGTTAGTCTACGTGGACCGACAAAGACAGATTCTTTGAGGAAGCTAAGCTTAACGTAGTTCTAACAGTTTTTTAATTAGAGAGCAGATCTCTG
................................................................................................ (4.7)


$mxfold2 predict @./mxfold2/models/TrainSetAB.conf demo5.fa 
>KC131142_UTR5
AGTTGTTAGTCTACGTGGACCGACAAAGACAGATTCTTCGAGGAAGCTAAGCTTAACGTAGTTCTAACAGTTTTTTAATTAGAGAGCAGATCTCTG
................................................................................................ (4.8)
>KR920365_UTR5
AGTTGTTAGTCTACGTGGACCGACAAAGACAGATTCTTTGAGGAAGCTAAGCTTAACGTAGTTCTAACAGTTTTTTAATTAGAGAGCAGATCTCTG
................................................................................................ (4.7)

The input two sequences in `demo5.fa` has no secondary structure from my local prediction, neither `default` settings , nor the `mxfold2/models/TrainSetAB.conf`.
However, the same sequences were input to `mxfold2` Web Service, and will have two well-formed secondary structures:

Why the same input sequences have a different results? Did I use the wrong local running settings? Could you please help me ?

cannot import name 'interface' from 'mxfold2'

I cloned the repository into my university's cluster. To work around importing issues and get things working properly, I put this file in my home directory:

# Called `mf2`
import re
import sys
from mxfold2.mxfold2.__main__ import main
if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
    sys.exit(main())

and have this repository cloned in the same directory, and when I run

python mf2 predict some_sequence.fasta

I get the following errors:

Traceback (most recent call last):
  File "/home/ewhiting/mxfold2/mf2", line 5, in <module>
    from mxfold2.__main__ import main
  File "/home/ewhiting/mxfold2/mxfold2/__main__.py", line 5, in <module>
    from .predict import Predict
  File "/home/ewhiting/mxfold2/mxfold2/predict.py", line 14, in <module>
    from .fold.mix import MixedFold
  File "/home/ewhiting/mxfold2/mxfold2/fold/mix.py", line 2, in <module>
    from .. import interface
ImportError: cannot import name 'interface' from 'mxfold2' (/home/ewhiting/mxfold2/mxfold2/__init__.py)

Is there a better workaround for this?

mxfold2-0.1.2-cp310-cp310-manylinux_2_17_x86_64.whl is not a supported wheel on this platform.

Using the latest build (Apr version), trying to install it on WSL2 with python 3.11 gives the error in the header.

mxfold2-0.1.2-cp310-cp310-manylinux_2_17_x86_64.whl is not a supported wheel on this platform.

I have also tried installing from source, however, i get the following error:

ERROR: Cannot install mxfold2, mxfold2==0.1.2 and torchvision==0.15.2+cpu because these package versions have conflicting dependencies.

The conflict is caused by:
    mxfold2 0.1.2 depends on torch<2.0 and >=1.4
    torchvision 0.15.2+cpu depends on torch==2.0.1
    mxfold2 0.1.2 depends on torch<2.0 and >=1.4
    torchvision 0.15.2 depends on torch==2.0.1
    mxfold2 0.1.2 depends on torch<2.0 and >=1.4
    torchvision 0.15.1 depends on torch==2.0.0

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.