For public data sharing
sjgosai / boda2 Goto Github PK
View Code? Open in Web Editor NEWComputational Optimization of DNA Activity (CODA)
License: GNU Affero General Public License v3.0
Computational Optimization of DNA Activity (CODA)
License: GNU Affero General Public License v3.0
Hi there, I recently attempted your implementation of generation using the tutorial command, below is exactly what I used:
python src/generate.py \
--params_module StraightThroughParameters \
--n_channels 4 --length 200 --n_samples 10 \
--energy_module OverMaxEnergy \
--model_artifact tmp/test_new_config/model_artifacts__20231127_235004__284796.tar.gz \
--bias_cell 0 --bending_factor 1.0 --a_min -2.0 --a_max 6.0 \
--generator_module FastSeqProp \
--energy_threshold -2.0 --max_attempts 20 --n_steps 200 \
--n_proposals 20 \
--proposal_path tmp/generated/
The model training worked fine and I was able to get pretty close to what the preprint's pearson correlations, however the generate command as given does not seem to work. Below is a snippet of the error I get:
File "/opt/py/conda/PyLib_Common/envs/boda2/lib/python3.10/site-packages/boda-0.2.0-py3.10.egg/boda/generator/FastSeqProp.py", line 230, in generate
states = torch.cat([states, final_states[energy_filter].cpu()], dim=0)
RuntimeError: Tensors must have same number of dimensions: got 3 and 4
Would you know why this may be the case? I have tried several different parameter combinations and have never gotten FastSeqProp to work. I have gotten Simulated Annealing to work fine.
Also, there seems to be no add_generator_specific_args function in AdaLead.py even though there is one in the SimulatedAnnealing and FastSeqProp class. Therefore, AdaLead also does not work. Is there a reason why this is the case?
Hi, I recently tried running your construct_new_model.ipynb notebook and encountered an error with the pytorch-lightning version having some of your training modules deprecated after version 2.0. Could you all please provide the lightning.pytorch used in this paper in the requirements.txt file?
Hi all,
I have a datatable with just two columns: DNA sequence and activity level. Is there a way for me to input this data into the boda2 pipeline? The tutorial I see only shows how it can be done for datasets in the MPRA format.
Thanks,
Adam
According to the tutorial, the input data length should be 600. However, it was mentioned that both downstream and upstream sequences are padded with a fixed length of 200. Could you please explain the rationale behind this approach?
hey @sjgosai.
Do we have an example python script of the datapreprocessing mentioned in paper?
basically going from the encode files for MPRA to supplementary table 2 (SupTable 2 - UKBB_GTEX_CODA_averaged_no_cutoffs.txt) in the paper
We need to load weights into the branched linear layers when the number of branches change. Need to update /boda2/boda/graph/utils.py
.
Hi,
I'm trying to load the pre-trained model, but got this error: FileNotFoundError: [Errno 2] No such file or directory: './model_artifacts__20231222_004129__133866.tar.gz'
Thanks,
JC
The current MPRA_DataModule (pl.LightningDataModule)
does not run in a Google Colab environment due to memory requirements
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.