GithubHelp home page GithubHelp logo

sjgosai / boda2 Goto Github PK

View Code? Open in Web Editor NEW
13.0 13.0 1.0 341.54 MB

Computational Optimization of DNA Activity (CODA)

License: GNU Affero General Public License v3.0

Python 57.64% Jupyter Notebook 41.12% Dockerfile 1.01% Shell 0.24%

boda2's Introduction

sjgosai

For public data sharing

boda2's People

Contributors

asr2210 avatar frankliuyc avatar irodcast avatar sjgosai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

outpace-bio

boda2's Issues

Error in example implementation of sequence generation.

Hi there, I recently attempted your implementation of generation using the tutorial command, below is exactly what I used:

python src/generate.py \
  --params_module StraightThroughParameters \
    --n_channels 4 --length 200 --n_samples 10 \
  --energy_module OverMaxEnergy \
    --model_artifact tmp/test_new_config/model_artifacts__20231127_235004__284796.tar.gz \
    --bias_cell 0 --bending_factor 1.0 --a_min -2.0 --a_max 6.0 \
  --generator_module FastSeqProp \
    --energy_threshold -2.0 --max_attempts 20 --n_steps 200 \
    --n_proposals 20 \
  --proposal_path tmp/generated/

The model training worked fine and I was able to get pretty close to what the preprint's pearson correlations, however the generate command as given does not seem to work. Below is a snippet of the error I get:

File "/opt/py/conda/PyLib_Common/envs/boda2/lib/python3.10/site-packages/boda-0.2.0-py3.10.egg/boda/generator/FastSeqProp.py", line 230, in generate
    states    = torch.cat([states,     final_states[energy_filter].cpu()], dim=0)
RuntimeError: Tensors must have same number of dimensions: got 3 and 4

Would you know why this may be the case? I have tried several different parameter combinations and have never gotten FastSeqProp to work. I have gotten Simulated Annealing to work fine.

Also, there seems to be no add_generator_specific_args function in AdaLead.py even though there is one in the SimulatedAnnealing and FastSeqProp class. Therefore, AdaLead also does not work. Is there a reason why this is the case?

Error in training due to pytorch-lightning version

Hi, I recently tried running your construct_new_model.ipynb notebook and encountered an error with the pytorch-lightning version having some of your training modules deprecated after version 2.0. Could you all please provide the lightning.pytorch used in this paper in the requirements.txt file?

Acceptable data input formats

Hi all,

I have a datatable with just two columns: DNA sequence and activity level. Is there a way for me to input this data into the boda2 pipeline? The tutorial I see only shows how it can be done for datasets in the MPRA format.

Thanks,
Adam

input data length

According to the tutorial, the input data length should be 600. However, it was mentioned that both downstream and upstream sequences are padded with a fixed length of 200. Could you please explain the rationale behind this approach?

data preprocessing

hey @sjgosai.

Do we have an example python script of the datapreprocessing mentioned in paper?
basically going from the encode files for MPRA to supplementary table 2 (SupTable 2 - UKBB_GTEX_CODA_averaged_no_cutoffs.txt) in the paper

Model not found

Hi,

I'm trying to load the pre-trained model, but got this error: FileNotFoundError: [Errno 2] No such file or directory: './model_artifacts__20231222_004129__133866.tar.gz'

Thanks,
JC

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.