GithubHelp home page GithubHelp logo

Comments (3)

chaitjo avatar chaitjo commented on August 11, 2024

Hi @yuehua-Song666, many thanks for your interest! And apologies for this very delayed response.

I got an issue here, after getting processed.pt, I tried to run main.py.

Have you created the processed dataset yourself from the raw RNAsolo PDB files? Or have you downloaded it from our link: https://drive.google.com/file/d/1gcUUaRxbGZnGMkLdtVwAILWVerVCbu4Y/view?usp=sharing

It uses das_split.pt to split the data into train, val and test, right? But I got an "index out of range" error. I wonder if you have any clues why this happened?

I think the index error could be happening if you have created the processed dataset by yourself and there are fewer entries/samples in the new processed dataset than there were when I created the splits. Could you check whether this is the case?

Essentially, index out of range means that the list of indexes in the das_split contains one or more indexes that are far too large to be able to correctly index the processed data list. It is likely that the processed data list has length N, but the index value is something like N + x > N, so it leads to an index out of range error.

By the way, I saw under the data folder, you have three '_split.pt' files, can you please tell me the difference between them?

We have provided two splits used in our experiments in the data/ directory:

  • Single-state split from Das et al., 2010: data/das_split.pt (called the Das split for compatibility with older code)
    • This split is used to fairly evaluate gRNAde for single-state design on a set of RNA structures of interest from the PDB identified by the Das et al. paper, which mainly includes riboswitches, aptamers, and ribozymes.
    • We identify the structural clusters belonging to the RNAs identified in Das et al. and add all the RNAs in these clusters to the test set (100 samples).
    • The remaining clusters are randomly added to the training and validation splits.
  • Multi-state split of structurally flexible RNAs: data/structsim_split.pt
    • This split is used to test gRNAde's ability to design RNA with multiple distinct conformational states.
    • We order the structural clusters based on median intra-sequence RMSD among available structures within the cluster.
    • The top 100 samples from clusters with the highest median intra-sequence RMSD are added to the test set. The next 100 samples are added to the validation set and all remaining samples are used for training.

Let me know if this is helpful.

from geometric-rna-design.

chaitjo avatar chaitjo commented on August 11, 2024

Hi @yuehua-Song666, I recently updated the instructions for preparing the data and for reproducing our splits for benchmarking: #16

Somebody else told me that RNAsolo was no longer allowing downloading older versions based on date cutoffs, and I suspect the issues you were facing can be due to the same reason. If you try the new data instructions in the README, I think it should work.

Let me know how it goes!

from geometric-rna-design.

yuehua-Song666 avatar yuehua-Song666 commented on August 11, 2024

Hi authors,

Thank you so much for such an useful reply! I figured it out. =)

Thanks a lot,
Yuehua

from geometric-rna-design.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.