tbepler / prose Goto Github PK
View Code? Open in Web Editor NEWMulti-task and masked language model-based protein sequence embedding models.
License: Other
Multi-task and masked language model-based protein sequence embedding models.
License: Other
Hello, sir!
I'm trying to use embeddings for distance calculation between protein sequences. In both papers (2019, 2021), you proposed the "soft sequence alignment" method.
I have several questions regarding the SSA:
SkipLSTM
encoder, there is a proj
attribute which I assume relates to the linear projection matrix. Can I apply this layer to the embeddings to get the same representation as the one used in the paper for distance calculation?a_ij
and b_ij
) parameters have the normalization summed over n
and m
sequence elements, respectively (sum_l^n (k_il)
, sum_l^m (k_lj)
). However, i
and j
are indices reserved for the elements of the first and the second sequences, resp. For instance, in the first sum (sum_l^n (k_il)
), we fix the i
and sum over each l
-th element of the second sequence, m
in total (and not n
). Summing up, could you please clarify how to correctly calculate the normalization constant in the "alpha" and "beta" parameters.I realize that's a lot of abstract questions, so any insight you could give is highly appreciated!
Ivan
Thanks for your great work! I am gonna use it to generate embeddings for each of individual downstream tasks. Just wondering is there any limit for the length of input sequence.
Thanks for your great work! Because of the length of time, I read it roughly.
As your paper write: Our encoder consists of 3 biLSTM layers with 512 hidden units each and a final output embedding dimension of 100.
As your test code about pretrain model:
python embed_sequences.py --pool avg -o data/demo.h5 data/demo.fa
the output dimension is 6165.
If I only want to get the embedding result(dimension of 100).How I get it?
And what is the 6165 means?
Thank you for your great work! I was just wondering how I would modify the code to control the output length, as running the SkipLSTM model produces a Tensor of dimensions [386, 6165] (no pooling) when run on my pre-aligned sequences. I would like to produce much shorter representations for each of the 386 sequence components. Thank you.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.