Comments (14)
@joshim5 Thanks for the reply, sorry I missed the explanation that there is no plan to release any pre-training code at this time.
I hugely appreciate the quality of the released code but I think not releasing the training code (which is obviously non-trivial for a model with this complexity) highly hinders the overall reproducibility of the paper, and is a very bad practice, especially in the compbio domain (for those who are wondering how it should be done, here is a good example: https://github.com/kundajelab/bpnet-manuscript by @Avsecz).
from esm.
Just saw the closed issue #11, which is very similar! If you're still not planning on providing any of your fairseq code, then I understand closing this out as duplicate. I'd really appreciate if you could provide the code, though!
from esm.
yes they're just regular model weights of the MSA transformer, all being trained.
from esm.
HI @Jacoberts, thanks for your interest! In our experiments, we didn't see much of an improvement from evolutionary fine-tuning. For example, see Figure 15 in the appendix of our recent paper at ICLR 2021. We don't have plans to release any pre-training code at this time, but I would encourage you to try using ESM even without evolutionary fine-tuning. You may be surprised by the results!
from esm.
Hi @joshim5, thanks for your reply! I'm finding that ESM is underperforming UniRep and eUniRep on the prediction task defined by Alley et al. Honestly your results make sense to me: I wouldn't expect evotuning to do much. But the Church lab had a phenomenal increase in recall in the generalization set by evotuning! I think I'll try to whip up a fairseq config for ESM and see if eESM does any better.
from esm.
Could y'all provide any of the code you used in the pre-training task? Eg, your implementations of noising / masking, your loss function, or your gradient descent function?
@joshim5 Do you mind commenting on this part of the issue too? Thanks :)
from esm.
@gokceneraslan these details are listed in our pre-print. See page 21 "Pre-training task." If you find anything missing, feel free to start a new discussion and we're happy to clarify any details.
from esm.
@Jacoberts In case you managed to whip up a fairseq config, I'd be very grateful if you could share it!
from esm.
@Jacoberts or anyone else who created code for pretraining with fairseq or any other framework and can share, It would be a big help
from esm.
Since ESM-1b is now available on Huggingface (https://huggingface.co/facebook/esm-1b), you should be able to use the HuggingFace tooling for evolutionary finetuning/pretraining.
from esm.
Along these lines, I have the following question (tangentially relevant to #143): it isn't 100% clear to me, having read the MSA Transformer paper, whether the initial token embedding weights were also randomly initialised and learnt as part of the overall MLM pre-training, or whether pre-computed embeddings (trained separately in some other way) were fed to the model at pre-training time. I imagine the former was the case, but would appreciate the clarification. Thanks!
from esm.
initial token embedding weights were also randomly initialised
Correct, This is how it was done, there is no change wrt fairseq TransformerSentenceEncoder self.embed_tokens
from esm.
Thanks @tomsercu, really appreciate the fast replies.
from esm.
@tomsercu just one last thing: You probably meant to also quote "and learnt as part of the overall MLM pre-training", right?
from esm.
Related Issues (20)
- how to input cropped protein for ESM-2 ? HOT 1
- AttributeError: module 'deepspeed' has no attribute 'comm' HOT 3
- Expecting multiple model of single sequence input
- requests.exceptions.SSLError :: Streamlit
- How to train LinearProjectionDistogramModel for my data๏ผ
- ESMFold for multimer fails when using HuggingFace installation HOT 1
- fold.py in naming pdb file
- Embedding
- Query on Alphabet Consistency Across Different Scales of ESM2 Models [8M, 35M, 150M, 650M, 3B, 150B]
- the specific code of the ESM2 model
- RuntimeError HOT 5
- Error when predicting contacts for heterodimer
- Failed to build openfold HOT 3
- Contact prediction for multimeric proteins HOT 1
- Why the inpu size of embedding is 33? HOT 1
- offset=24 in [Zero-shot variant prediction with protein language models]
- Different result with esmfold server HOT 1
- Fine tuning on sequencing classification data
- Missing CUDA availability check in ESM-1v example
- [BUG] Fasta header with special characters alter output path
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from esm.