GithubHelp home page GithubHelp logo

calico / basenji Goto Github PK

View Code? Open in Web Editor NEW
374.0 374.0 120.0 156.03 MB

Sequential regulatory activity predictions with deep convolutional neural networks.

License: Apache License 2.0

Python 53.78% Jupyter Notebook 46.13% Shell 0.07% Makefile 0.02%

basenji's People

Contributors

chihuahua avatar davek44 avatar gfudenberg avatar jaspersnoek avatar mlbileschi avatar vagarwal87 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

basenji's Issues

Tutorial sad.ipynb not working

Hi Basenji Developers, thanks a lot for the great project. I ran into a list of issues trying to reproduce the sad.ipynb tutorial.

I was able to download the additional files in the first three blocks in the notebook. Then errors occur in the block when executing this command:

basenji_sad.py -f data/hg19.ml.fa -g data/human.hg19.genome -l 131072 -o output/rfx6_sad --rc -t data/heart_wigs_index.txt models/params_small.txt models/heart/model_best.tf data/rs339331.vcf

I was able to solve the first three errors but not the fourth. Any help is greatly appreciated.

  1. the zarr package is required but not listed in the setup.py. I installed it by pip install zarr.

  2. HG19 environmental variable is required even if -f data/hg19.ml.fa if specified. I solved it by setting an empty string like export HG19=''

  3. Error occurs when the script loaded the index file as dataFrame, i.e. df = pd.dataFrame("data/heart_wigs_index.txt") and tried to access the df.identifier and df.description. I think this is because data/heart_wigs_index.txt created in the tutorial does not have a first row of column names. I manually added identifier for the 2nd and description for the 4th column based on my guess.

  4. Error when building a TF tensor, says the dimension is wrong:

>>> ~/codes/basenji/bin/basenji_sad.py -f data/hg19.ml.fa -g data/human.hg19.genome -o output/rfx6_sad --rc -t data/test_index.txt models/params_small.txt models/heart/model_best.tf data/rs339331.vcf
{'batch_size': 4, 'batch_buffer': 4096, 'link': 'softplus', 'loss': 'poisson', 'optimizer': 'adam', 'adam_beta1': 0.97, 'adam_beta2': 0.98, 'learning_rate': 0.002, 'num_targets': 3, 'target_pool': 128, 'cnn_dropout': 0.1, 'cnn_filter_sizes': [20, 7, 7, 7, 3, 3, 3, 3, 3, 3, 3, 1], 'cnn_filters': [128, 128, 192, 256, 256, 32, 32, 32, 32, 32, 32, 384], 'cnn_pool': [2, 4, 4, 4, 1, 0, 0, 0, 0, 0, 0, 0], 'cnn_dilation': [1, 1, 1, 1, 1, 2, 4, 8, 16, 32, 64, 1], 'cnn_dense': [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0]}
Targets pooled by 128 to length 8
Traceback (most recent call last):
  File "/home/qyd/codes/basenji/bin/basenji_sad.py", line 474, in <module>
    main()
  File "/home/qyd/codes/basenji/bin/basenji_sad.py", line 162, in main
    embed_penultimate=options.penultimate, target_subset=target_subset)
  File "/home/qyd/codes/basenji/basenji/seqnn.py", line 51, in build_feed
    target_subset=target_subset)
  File "/home/qyd/codes/basenji/basenji/seqnn.py", line 75, in build_from_data_ops
    save_reprs=True)
  File "/home/qyd/codes/basenji/basenji/seqnn.py", line 238, in build_predict
    seq_length - batch_buffer_pool, :]
  File "/home/qyd/codes/anaconda3/envs/basenji/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 206, in __sub__
    return Dimension(self._value - other.value)
  File "/home/qyd/codes/anaconda3/envs/basenji/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 42, in __init__
    raise ValueError("Dimension %d must be >= 0" % self._value)
ValueError: Dimension -24 must be >= 0

I found that the dimension seq_length - batch_buffer_pool is -24 because seq_length is 8 and batch_buffer_pool is 32.

Hopefully the above information make sense, and the tutorials can be updated to work again. Thanks!

Access to Older Versions of the Model

Hi!

Thank you so much for making all of your code available on github, we really appreciate it. I noticed that there has been a few changes on this repository, particularly with the model architecture. I would greatly appreciate it if I can gain access to the older versions of the model. We are trying to use the older parameters file to restore the model, but it fails to do so as the newer model is in the .h5 format. Hence we would like access to the older human model files, human_model_model_best.tf.meta, human_model_model_best.tf.index, and human_model_model_best.tf.data-00000-of-00001

Thank you very much for all your help in this regard,
Faiz

Release the processed data

I was very impressed by your paper, software and the tutorial. I found the pre-processing step is very tedious. I was wondering if you could release the processed data. By the way, do you compare the performance difference if we use processed data from the ENCODE and Roadmap directly? Thank you for your help!

`sad.ipynb` get some abnormal output

sad.ipynb get some abnormal output, and no error in front.

This is mine:
image

This is your ipynb file output
image

BTW, I want to ask a question. In 'train_test.ipynb', is my training process normal? The above is my run, the following is the original tutorial run

image

Best wish!

basenji_train.py has bugs when running according to your tutorial

I installed tensorflow-gpu,version 1.4,
and downloaded the data according to your tutorial.
https://github.com/calico/basenji/blob/master/tutorials/train_test.ipynb
.Then I ran the code at cell 2,
image

basenji_train.py --augment_rc --ensemble_rc --augment_shifts "1,0,-1" --logdir models/heart --params models/params_small.txt --train_data data/heart_l131k/tfrecords/train*.tfr --test_data data/heart_l131k/tfrecords/valid*.tfr

Then a bug was given as below,"local variable 'metrics_queue' referenced before assignment".
image

Could you please help solve the problem?
Thanks a lot.

Basset-style peak prediction problem

Hello. I am following the instructions in manuscripts/basset to make prediction using my bed files. I have modified the targets and ran make_data.sh without problems. Next, I changed the "units": 164 to 2 in params_basset.json since I want to use two bed files. However, when I executed basenji_train.py -k -o train_basset --rc params_basset.json data_basset I got that --rc is unknown parameter. Removing that parameter resulted in the following exception:

Traceback (most recent call last):
File "/home/user/basenji-master/bin/basenji_train.py", line 165, in
main()
File "/home/user/basenji-master/bin/basenji_train.py", line 154, in main
seqnn_trainer.fit_keras(seqnn_model)
File "/home/user/basenji-master/basenji/trainer.py", line 132, in fit_keras
validation_steps=self.eval_epoch_batches[0])
File "/home/user/anaconda3/envs/basenji/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1100, in fit
tmp_logs = self.train_function(iterator)
File "/home/user/anaconda3/envs/basenji/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 828, in call
result = self._call(*args, **kwds)
File "/home/user/anaconda3/envs/basenji/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 888, in _call
return self._stateless_fn(*args, **kwds)
File "/home/user/anaconda3/envs/basenji/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2943, in call
filtered_flat_args, captured_inputs=graph_function.captured_inputs) # pylint: disable=protected-access
File "/home/user/anaconda3/envs/basenji/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1919, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/home/user/anaconda3/envs/basenji/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 560, in call
ctx=ctx)
File "/home/user/anaconda3/envs/basenji/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [200,448,2] vs. [200,64,2]
[[node LogicalAnd_2 (defined at /basenji-master/basenji/metrics.py:65) ]] [Op:__inference_train_function_5687]

Function call stack:
train_function

Setting the batch size to 1 makes the code run, but I dont know if the results are correct. Please advise what is wrong!

Can't import util and slurm in basenji_data.py

Hi David,

Thank you so much for sharing the Basenji code. I was trying to run the preprocess.py from the tutorial, but I ran into the error that:

ModuleNotFoundError: No module named 'util'
ModuleNotFoundError: No module named 'slurm'

Do you know where I can find these two packages?

Thanks!

AttributeError in running genes.ipynb

When i try to run the command in google colab
! basenji_hdf5_genes.py -g data/human.hg19.genome -l 131072 -c 0.333 -p 3 -t data/heart_wigs.txt -w 128 data/hg19.ml.fa data/gencode_chr9.gtf data/gencode_chr9.h5

I get the following error:

Traceback (most recent call last):
  File "basenji_hdf5_genes.py", line 465, in <module>
    main()
  File "basenji_hdf5_genes.py", line 102, in main
    check_wigs(options.target_wigs_file)
  File "basenji_hdf5_genes.py", line 357, in check_wigs
    for wig_file in target_wigs_df.file:
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py", line 3614, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'file'

Use of slurm code

Hello David,

I'd like to launch basenji_train.py on a cluster with slurm, but I'm not sure about how to deal with it. Please, do you have a kind of example like your tutorials to help ?
Thank you !

Using the model trained by `train_test.ipynb`, there will be problems with subsequent gene expression predictions, how can I solve it?

Dear developer

I don't know if there is a problem with my operation, I hope I can get your help. I am learning to use basenji because I think basenji is very good, thank you very much!

If I use the model trained by train_test.ipynb for the subsequent gene.ipynb, sat.ipynb, an abnormal prediction will occur.

I took the latest basenji and my environment installation is done according to the tutorial you said.

The train.tfr file output by my preprocess.ipynb and my trained model, as long as it is my own training model, no matter how good AUPRC, will get the abnormal output.

Specifically, as I retrain the model, the training process and results are as follows:

image

image

image

I put my trained model into gene.ipynb and the result is still an abnormal result.
image

image

This is the result of the model model_best.tf I trained, applied to sad.ipynb.
image

Later I downloaded and used the model you gave and put it into gene.ipynb. I successfully reproduced the original result in gene.ipynb.

image

image

So this shows that the train.tfr that my preprocess.ipynb got is problematic, which caused my training process to be less normal. Although I can achieve stable convergence in the end, I can get a higher AUPRC in the test phase, but my initial training process shows that the loss is very low. as follows:

image

This is also the place I am troubled. When I execute preprocess.ipynb, I sometimes can't get the train.tfr and valid.tfr files. Last time I passed the chmod 777 * command, I modified /basen/basenji and The file permissions in /basenji/bin are output to the tfr file.

I don’t know if my environment is wrong or something else.

Some problem about `sat_mut.ipynb`:NameError: name 'bed_file' is not defined

Hello! dear developer, I don't know if there is a problem with my operation, I hope I can get your help. I am learning to use basenji because I think basenji is very good, thank you very much!

The following is to use the 'models/heart/model_best.tf.meta' model file you provided.

  • This is the basenji I pulled on April 5th.

The first step of sat_mut has no errors.
image

The second step has an error, KeyError: 'index'
image

  • This is the basenji I pulled today.

The first step of sat_mut has an error, NameError: name 'bed_file' is not defined, but I obviously have the data/gata4.bed file.
image

Clarification on output

The output array for basenji is: (960, 4229)
Could you explain in more detail where the 960 comes from? I understand that it is the number of bins on the sequence, but why is it not 1024bp bins? 1024 comes from 2^17/128 = 1024.

Additionally, in the picture below there is a labeling ("pred/n/m") that is ordered by digit rather than by value. Which ordering should we consult when looking at Supplementary Table 1? I presume the 10th output feature corresponds to the 10th value in SuppTable1, not the third (as it would under digit order).

screen shot 2018-06-01 at 4 26 01 pm

Thanks

Format of sample_wigs_file in basenji_hdf5_single.py?

Hello Dave,

I was wondering what the input format is for sample_wig_files in the inputs for basenji_hdf5_single.py?

When I looked at the tutorial, it suggested a format with two columns (from data/heart_wigs.txt):

aorta        data/CNhs11760.bw
artery        data/CNhs12843.bw
pulmonic_valve        data/CNhs12856.bw

Unfortunately, when I run basenji_hdf5_single.py it gives me an index error at line 186:

  for line in open(sample_wigs_file, encoding='UTF-8'):
    a = line.rstrip().split('\t')
    target_wigs[a[0]] = a[1]
    target_strands.append(a[2])

It looks like it was trying to access a[2], which would be the third column of data/heart_wigs.txt, which does not exist in heart_wigs.txt

What is the expected input format for sample_wig_files?

Best,

Jake

Barcoded Bam Files as inputs to bam_Cov.py

Hi,

I had an issue of inputing barcoded bam files to the bam_Cov.py script where the bigWig files were essentially empty except for a header. I've tracked the issue at least part of the way down to being an issue of tags. For example, line 1205: if (not align.has_tag('NH') or align.get_tag('NH')==1) and not align.has_tag('XA'):

If I understand correctly this is essentially asking if the read is uniquely aligned, but my BAM file doesn't have NH or XA tags. I can use AS (Alignement score of primary) == XS (Alignment score to suboptimal alignement) to determine if a read is essentially uniquely mapping. But I was wondering if you could point out other points in the script I should be worried about or if you had an alternative strategy besides re-alignment to deal with this issue.

Thanks!

tutorials for basenji_data.py

I used exactly the same data as you provided on the website, and ran basenji_data.py -d .1 -g data/unmap_macro.bed -l 131072 --local -o data/heart_l131k -p 8 -t .1 -v .1 -w 128 data/hg19.ml.fa data/heart_wigs.txt and got sequence.bed and other files. but in seqs_cov folder, I had no *.h5 file, so I didn't get the *.tfr file. I am wondering why I didn't get the h5 and tfr files. Is the unmap_macro.bed in the github is the one should be used in the command line?

I attached my log:
stride_train 1 converted to 131072.000000
stride_test 1 converted to 131072.000000
Contigs divided into
Train: 4701 contigs, 2169074921 nt (0.8005)
Valid: 572 contigs, 270358978 nt (0.0998)
Test: 584 contigs, 270330829 nt (0.0998)
/work/06731/yliu5/lonestar/softwares/basenji/bin/basenji_data.py:255: FutureWarning: read_table is deprecated, use read_csv instead, passing sep='\t'.
targets_df = pd.read_table(targets_file, index_col=0)
basenji_data_read.py -w 128 -u sum -c 384.000000 -s 1.000000 data/CNhs11760.bw data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov/0.h5 &> data/heart_l131k/seqs_cov/0.err
basenji_data_read.py -w 128 -u sum -c 384.000000 -s 1.000000 data/CNhs12843.bw data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov/1.h5 &> data/heart_l131k/seqs_cov/1.err
basenji_data_read.py -w 128 -u sum -c 384.000000 -s 1.000000 data/CNhs12856.bw data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov/2.h5 &> data/heart_l131k/seqs_cov/2.err
basenji_data_write.py -s 0 -e 256 data/hg19.ml.fa data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov data/heart_l131k/tfrecords/train-0.tfr &> data/heart_l131k/tfrecords/train-0.err
basenji_data_write.py -s 256 -e 512 data/hg19.ml.fa data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov data/heart_l131k/tfrecords/train-1.tfr &> data/heart_l131k/tfrecords/train-1.err
basenji_data_write.py -s 512 -e 768 data/hg19.ml.fa data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov data/heart_l131k/tfrecords/train-2.tfr &> data/heart_l131k/tfrecords/train-2.err
basenji_data_write.py -s 768 -e 1024 data/hg19.ml.fa data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov data/heart_l131k/tfrecords/train-3.tfr &> data/heart_l131k/tfrecords/train-3.err
basenji_data_write.py -s 1024 -e 1280 data/hg19.ml.fa data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov data/heart_l131k/tfrecords/train-4.tfr &> data/heart_l131k/tfrecords/train-4.err
basenji_data_write.py -s 1280 -e 1499 data/hg19.ml.fa data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov data/heart_l131k/tfrecords/train-5.tfr &> data/heart_l131k/tfrecords/train-5.err
basenji_data_write.py -s 1499 -e 1679 data/hg19.ml.fa data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov data/heart_l131k/tfrecords/valid-0.tfr &> data/heart_l131k/tfrecords/valid-0.err
basenji_data_write.py -s 1679 -e 1858 data/hg19.ml.fa data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov data/heart_l131k/tfrecords/test-0.tfr &> data/heart_l131k/tfrecords/test-0.err

warning during installing basenji

Hi, thanks for your brilliant work!
I have an issue about warning when instill basenji,
I open conda environment for install basenji and choose python version 3.6,

  1. I git clone https://github.com/calico/basenji in a new work path
  2. then python setup.py develop, then I met the error: The 'pyBigWig' distribution was not found and is required by basenji after a long lines report.
  3. then conda install pybigwig -c bioconda
  4. again, python setup.py develop, this time I got a long report too, I found in this report,
...
Processing dependencies for basenji==0.0.1
Searching for networkx
Reading https://pypi.org/simple/networkx/
Downloading https://files.pythonhosted.org/packages/f3/f4/7e20ef40b118478191cec0b58c3192f822cace858c19505c7670961b76b2/networkx-2.2.zip#sha256=45e56f7ab6fe81652fb4bc9f44faddb0e9025f469f602df14e3b2551c2ea5c8b
Best match: networkx 2.2
Processing networkx-2.2.zip
Writing /tmp/easy_install-jsx89xlw/networkx-2.2/setup.cfg
Running networkx-2.2/setup.py -q bdist_egg --dist-dir /tmp/easy_install-jsx89xlw/networkx-2.2/egg-dist-tmp-nbez5udu
warning: no files found matching '*.html' under directory 'doc'
warning: no files found matching '*.css' under directory 'doc'
warning: no previously-included files matching '*~' found anywhere in distribution
warning: no previously-included files matching '*.pyc' found anywhere in distribution
warning: no previously-included files matching '.svn' found anywhere in distribution
no previously-included directories found matching 'doc/build'
no previously-included directories found matching 'doc/auto_examples'
no previously-included directories found matching 'doc/modules'
no previously-included directories found matching 'doc/reference/generated'
no previously-included directories found matching 'doc/reference/algorithms/generated'
no previously-included directories found matching 'doc/reference/classes/generated'
no previously-included directories found matching 'doc/reference/readwrite/generated'
creating /project2/yangili1/mengguo/Mn/envs/EnvPy37/lib/python3.6/site-packages/networkx-2.2-py3.6.egg
Extracting networkx-2.2-py3.6.egg to /project2/yangili1/mengguo/Mn/envs/EnvPy37/lib/python3.6/site-packages
Adding networkx 2.2 to easy-install.pth file

Installed /project2/yangili1/mengguo/Mn/envs/EnvPy37/lib/python3.6/site-packages/networkx-2.2-py3.6.egg
Searching for joblib
Reading https://pypi.org/simple/joblib/
Downloading https://files.pythonhosted.org/packages/0d/1b/995167f6c66848d4eb7eabc386aebe07a1571b397629b2eac3b7bebdc343/joblib-0.13.0-py2.py3-none-any.whl#sha256=9002b53b88ae0adb3872164e0846a489b7e112c50087c5e3e1bcee35f18424c4
Best match: joblib 0.13.0
Processing joblib-0.13.0-py2.py3-none-any.whl
Installing joblib-0.13.0-py2.py3-none-any.whl to /project2/yangili1/mengguo/Mn/envs/EnvPy37/lib/python3.6/site-packages
Adding joblib 0.13.0 to easy-install.pth file

Installed /project2/yangili1/mengguo/Mn/envs/EnvPy37/lib/python3.6/site-packages/joblib-0.13.0-py3.6.egg
Searching for h5py
...
Finished processing dependencies for basenji==0.0.1

At last, no error, but these warning still make me worried for unknown effect to run or the results correct. directory 'doc' really don't have these mentioned files, so I don't understand why and it looks that no stop it's finish. Very grateful if you could give some views. Thanks!

Best,
Grace

basenji_data sample_pct

Hi Dave,

Thank you for providing such a great tool!
In the tutorial basenji_data, I have a question regarding the -d option when it is set below 1 as in the tutorial (-d .1).
It seems that the subsequent sequences.bed file mixed train/valid/test regions in random orders using random.sample function:
mseqs = random.sample(mseqs, int(options.sample_pct*len(contigs)))

Therefore, could this lead to an issue when the TF records are written i.e when the loop goes through each label, it will write the 256 next sequences (seqs_per_tfr option) whatever their labels?

Thank you again for your help on this.

Best regards,

Camille

Running tutorials error

Hi, I was following the tutorials for trying to run the sad.ipynb notebook in the tutorial folder. Installations all worked fine.
However I get an error when running the following command:

! basenji_sad.py --cpu -f data/hg19.ml.fa -g data/human.hg19.genome --h5 -o output/rfx6_sad --rc --shift "1,0,-1" -t data/heart_wigs.txt models/params_small.txt models/heart/model_best.tf data/rs339331.vcf

The first error I get is zsh:1: command not found: basenji_sad.py, which I can solve by changing the directory to ../bin/basenji_sad.py.

Subsequently, no such option errors arise. After running ! ../bin/basenji_sad.py --help, and removing the flags that aren't shown, the code runs.

Finally, the command looks like
! ../bin/basenji_sad.py -f data/hg19.ml.fa -o output/rfx6_sad --rc --shift "1,0,-1" -t data/heart_wigs.txt models/params_small.json models/heart/model_best.tf data/rs339331.vcf

I get an assertion error:
Traceback (most recent call last): File "../bin/basenji_sad.py", line 426, in <module> main() File "../bin/basenji_sad.py", line 170, in main seqnn_model.restore(model_file) File "/Users/andyjiang/basenji/basenji/basenji/seqnn.py", line 350, in restore self.models[head_i].load_weights(model_file) File "/Users/andyjiang/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 2216, in load_weights status.assert_nontrivial_match() File "/Users/andyjiang/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/training/tracking/util.py", line 1023, in assert_nontrivial_match return self.assert_consumed() File "/Users/andyjiang/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/training/tracking/util.py", line 998, in assert_consumed raise AssertionError( AssertionError: Some objects had attributes which were not restored:

We are focusing on predicting the effect of genetic variants, so we started with this script using the model in the tutorial script. Could this error be caused by differences in the version used to generate that model? We would prefer to use a pre-trained model to predict genetic effects if available.

Would appreciate any help.

Andy

basenji_data.py stochastic data

Despite my efforts, basenji_data.py does not produce a deterministic random partition and sort of training sequences. Resolve that.

GC bias correction

In the paper you mention how GC% bias correction was performed:

We normalized for GC% bias using a procedure that incorporates several established ideas, aiming to model the trend across the GC% spectrum without precluding a GC% enrichment for active regions( Benjamini and Speed 2012; Teng and Irizarry 2016). We assigned each position an estimated relevant GC% value using a Gaussian filter (to assign greater weight to nearby nucleotides more likely to have been part of a fragment relevant to that genomic position). Then we fit a third-degree polynomial regression to the log2coverage estimates. Finally, we reconfigured the coverage estimates to highlight the residual coverage unexplained by the GC% model. The parameters of the bias models exhibit a wide range, both within and across assays, suggesting the absence of a common sequencing bias. A Python script implementing these procedures to transform a BAM file of alignments to a BigWig file of inferred coverage values is available in the Basenji tool suite.

I couldn't locate the script you are talking about, where is it? 😄

TypeError: __init__() takes 1 positional argument but 2 were given

Hi,
I'm trying to explore Akita but when loading the model I get the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-70b692f42586> in <module>
     10 target_length = params_model['target_length']
     11 target_crop = params_model['target_crop']
---> 12 seqnn_model = seqnn.SeqNN(params_model)

TypeError: __init__() takes 1 positional argument but 2 were given

As I understand the error is produced from basenji but I couldn't find out what's wrong with it. What can be the reason?

Running error

I was following the tutorials for installation and trying to run the notebook in the tutorial folder. The installation was through conda and import basenji worked without an error.

When I run the following command in preprocess.ipynb

! /home/shams/miniconda3/envs/basenji/bin/python /home/shams/basenji/bin/basenji_data.py -d .1 -g data/unmap_macro.bed -l 131072 --local -o data/heart_l131k -p 8 -t .1 -v .1 -w 128 data/hg19.ml.fa data/heart_wigs.txt

and get the following error

stride_train 1 converted to 131072.000000
stride_test 1 converted to 131072.000000
Contigs divided into
 Train:  4701 contigs, 2169074921 nt (0.8005)
 Valid:   572 contigs,  270358978 nt (0.0998)
 Test:    584 contigs,  270330829 nt (0.0998)
basenji_data_read.py --crop 0 -w 128 -u sum -c 384.000000 -s 1.000000 data/CNhs11760.bw data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov/0.h5
basenji_data_read.py --crop 0 -w 128 -u sum -c 384.000000 -s 1.000000 data/CNhs12843.bw data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov/1.h5
/bin/sh: 1: basenji_data_read.py: not found
basenji_data_read.py --crop 0 -w 128 -u sum -c 384.000000 -s 1.000000 data/CNhs12856.bw data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov/2.h5
/bin/sh: 1: basenji_data_read.py: not found
/bin/sh: 1: basenji_data_read.py: not found
basenji_data_write.py -s 0 -e 256 --umap_clip 1.000000 data/hg19.ml.fa data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov data/heart_l131k/tfrecords/train-0.tfr
basenji_data_write.py -s 256 -e 512 --umap_clip 1.000000 data/hg19.ml.fa data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov data/heart_l131k/tfrecords/train-1.tfr
/bin/sh: 1: basenji_data_write.py: not found
basenji_data_write.py -s 512 -e 768 --umap_clip 1.000000 data/hg19.ml.fa data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov data/heart_l131k/tfrecords/train-2.tfr
/bin/sh: 1: basenji_data_write.py: not found
basenji_data_write.py -s 768 -e 1024 --umap_clip 1.000000 data/hg19.ml.fa data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov data/heart_l131k/tfrecords/train-3.tfr
/bin/sh: 1: basenji_data_write.py: not found
basenji_data_write.py -s 1024 -e 1280 --umap_clip 1.000000 data/hg19.ml.fa data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov data/heart_l131k/tfrecords/train-4.tfr
/bin/sh: 1: basenji_data_write.py: not found
basenji_data_write.py -s 1280 -e 1499 --umap_clip 1.000000 data/hg19.ml.fa data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov data/heart_l131k/tfrecords/train-5.tfr
/bin/sh: 1: basenji_data_write.py: not found
basenji_data_write.py -s 1499 -e 1679 --umap_clip 1.000000 data/hg19.ml.fa data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov data/heart_l131k/tfrecords/valid-0.tfr
/bin/sh: 1: basenji_data_write.py: not found
basenji_data_write.py -s 1679 -e 1858 --umap_clip 1.000000 data/hg19.ml.fa data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov data/heart_l131k/tfrecords/test-0.tfr
/bin/sh: 1: basenji_data_write.py: not found
/bin/sh: 1: basenji_data_write.py: not found

I know it is a python newbie error. Could not find a self learning fix by googling

pep-8 compliant formatting

this is a small issue but some people do prefer some readability things like four spaces, and other things that black helps with, so I ran black on the project to help with this. I do understand why use 2 spaces, so no hard feelings if the pull request isn't taken in. Have run some of the tests in tests/ folder and it seems to have the same output as before, although not sure if the test scripts require args.

Trying to restore full model breaks basenji_sat

I successfully ran the example command in the sat_mut.ipynb:
#! ../bin/basenji_sat.py -g -f 20 -l 200 -o output/gata4_sat --rc -t 0,1,2 models/params_small.txt models/heart/model_best.tf data/gata4.fa

So then tried to do this for the full model that is specified in basenji/manuscript/get_model.sh

Once downloading the 3 model files to models/full, I attempted to run the same code but with new model:
! ../bin/basenji_sat.py -g -f 20 -l 200 -o output/practice --rc -t 0,1,2 models/params_small.txt ./models/full/model.tf data/gata4.fa

And got these errors (bolded what i think is the issue):

{'batch_size': 4, 'batch_buffer': 4096, 'link': 'softplus', 'loss': 'poisson', 'optimizer': 'adam', 'adam_beta1': 0.97, 'adam_beta2': 0.98, 'learning_rate': 0.002, 'num_targets': 3, 'target_pool': 128, 'cnn_dropout': 0.1, 'cnn_filter_sizes': [20, 7, 7, 7, 3, 3, 3, 3, 3, 3, 3, 1], 'cnn_filters': [128, 128, 192, 256, 256, 32, 32, 32, 32, 32, 32, 384], 'cnn_pool': [2, 4, 4, 4, 1, 0, 0, 0, 0, 0, 0, 0], 'cnn_dilation': [1, 1, 1, 1, 1, 2, 4, 8, 16, 32, 64, 1], 'cnn_dense': [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0]}
Targets pooled by 128 to length 1024
Convolution w/ 3 384x1 filters to final targets
Model building time 11.241484
2018-06-21 10:31:31.380303: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-06-21 10:31:34.284572: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key final/dense/bias not found in checkpoint
Traceback (most recent call last):
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
return fn(*args)
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: Key final/dense/bias not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "../bin/basenji_sat.py", line 706, in
main()
File "../bin/basenji_sat.py", line 203, in main
saver.restore(sess, model_file)
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1802, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key final/dense/bias not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op 'save/RestoreV2', defined at:
File "../bin/basenji_sat.py", line 706, in
main()
File "../bin/basenji_sat.py", line 199, in main
saver = tf.train.Saver()
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1338, in init
self.build()
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1347, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1384, in _build
build_save=build_save, build_restore=build_restore)
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 835, in _build_internal
restore_sequentially, reshape)
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 472, in _AddRestoreOps
restore_sequentially)
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 886, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1463, in restore_v2
shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
op_def=op_def)
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1718, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

NotFoundError (see above for traceback): Key final/dense/bias not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

What can I do to fix this?

Thanks

Update
I tried using all the different parameter files for the model with no luck either.

Issue in running bam_cov.py

Hello,
I was trying to run preprocess step in which when i am running bam_cov.py and it is giving a key error 'HG19'

The command iam giving is bam_cov.py FLAG-MSI2_2_iCLIP.bam FLAG-MSI2_2.bw

the error message i am getting is
Traceback (most recent call last):
File "/ysm-gpfs/home/kvp6/Kiran.vp/Tools/basenji-master/bin/bam_cov.py", line 1631, in
main()
File "/ysm-gpfs/home/kvp6/Kiran.vp/Tools/basenji-master/bin/bam_cov.py", line 95, in main
default='%s/assembly/hg19.fa' % os.environ['HG19'],
File "/ysm-gpfs/apps/software/Python/3.5.1-foss-2016a/lib/python3.5/os.py", line 683, in getitem
raise KeyError(key) from None
KeyError: 'HG19'

Tutorials not working, input/parameter files?

Hi there,

Thanks for releasing your interesting bioRxiv paper and code. I'd love to try out your model on the CAGE data you presented in the paper. The tutorial links seem broken, however, and I can't access them, and the one available in /bin/tutorial seems to be an older one for Basset. I also cannot find the "params_file" that you used for training. It would be greatly helpful if you could upload these and perhaps instructions for how to setup an example CAGE file for training? If you have the bandwidth, your pretrained files would also be very helpful for me to test.

Thanks again and best wishes,
Vikram

Basenji for keras

Hi David

I'm interested in getting a saved version of the 2018 Basenji model that can be loaded up in keras and used in a Python script, rather than in s command line tool. I tried going through your code to convert the model from tensorflow to keras but, due to my inexperience with tensorflow, had little luck. Would it be difficult for you to make such a model available that can be loaded into keras?

Thanks!

AssertionError when running basenji_test.py and basenji_sad.py

Hi David,

I'm running into the following error (pasted below) when running basenji_test.py or basenji_sad.py on the sample data provided in the tutorials.
This is the command I'm running for basenji_sad: python basenji_sad.py --cpu -f /global/scratch/poojakathail/data/hg19.ml.fa -o /global/scratch/poojakathail/output/rfx6_sad --rc --shift "1,0,-1" -t /global/scratch/poojakathail/data/heart_wigs.txt /global/scratch/poojakathail/models/params_small.json /global/scratch/poojakathail/models/heart/model_best.tf /global/scratch/poojakathail/data/rs339331.vcf

params_small.json is the file provided here: https://github.com/calico/basenji/blob/master/tutorials/models/params_small.json
model_best.tf is the pre-trained model included in the tutorials

Thanks for providing this package and for your help!

  File "basenji_sad.py", line 419, in <module>
    main()
  File "basenji_sad.py", line 176, in main
    seqnn_model.restore(model_file)
  File "/global/home/users/poojakathail/basenji/basenji/seqnn.py", line 316, in restore
    self.model.load_weights(model_file)
  File "/global/home/users/poojakathail/.conda/envs/basenji/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 182, in load_weights
    return super(Model, self).load_weights(filepath, by_name)
  File "/global/home/users/poojakathail/.conda/envs/basenji/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/network.py", line 1356, in load_weights
    status.assert_nontrivial_match()
  File "/global/home/users/poojakathail/.conda/envs/basenji/lib/python3.7/site-packages/tensorflow_core/python/training/tracking/util.py", line 966, in assert_nontrivial_match
    return self.assert_consumed()
  File "/global/home/users/poojakathail/.conda/envs/basenji/lib/python3.7/site-packages/tensorflow_core/python/training/tracking/util.py", line 943, in assert_consumed
    "".join(unused_attribute_strings)))
AssertionError: Some objects had attributes which were not restored:

Error training Basset-style model (Incompatible shapes: [200,448,164] vs. [200,64,164])

Hi, I've been attempting to train a Basset-style mode using the guidelines that you outlined here: https://github.com/calico/basenji/tree/master/manuscripts/basset

When I get to the actual training step, though, using the included params_basset.json file, I keep getting the following incompatible dimensions error message, in which it appears that the batch_size value of 64 is being multiplied by 7:

Total params: 5,967,782
Trainable params: 5,960,460
Non-trainable params: 7,322
__________________________________________________________________________________________________
None
model_strides [192]
target_lengths [7]
target_crops [0]
2021-02-07 16:50:08.427245: I tensorflow/core/profiler/lib/profiler_session.cc:164] Profiler session started.
Epoch 1/10000
Traceback (most recent call last):
  File "/home/bettimj/basenji/bin/basenji_train.py", line 165, in <module>
    main()
  File "/home/bettimj/basenji/bin/basenji_train.py", line 154, in main
    seqnn_trainer.fit_keras(seqnn_model)
  File "/gpfs23/scratch/bettimj/basenji/basenji/trainer.py", line 126, in fit_keras
    seqnn_model.model.fit(
  File "/home/bettimj/miniconda3/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 108, in _method_wrapper
    return method(self, *args, **kwargs)
  File "/home/bettimj/miniconda3/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1098, in fit
    tmp_logs = train_function(iterator)
  File "/home/bettimj/miniconda3/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
    result = self._call(*args, **kwds)
  File "/home/bettimj/miniconda3/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 840, in _call
    return self._stateless_fn(*args, **kwds)
  File "/home/bettimj/miniconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2829, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/home/bettimj/miniconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1843, in _filtered_call
    return self._call_flat(
  File "/home/bettimj/miniconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1923, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "/home/bettimj/miniconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 545, in call
    outputs = execute.execute(
  File "/home/bettimj/miniconda3/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError:  Incompatible shapes: [200,448,164] vs. [200,64,164]
	 [[node LogicalAnd_6 (defined at gpfs23/scratch/bettimj/basenji/basenji/metrics.py:65) ]] [Op:__inference_train_function_6085]

Function call stack:
train_function

The only way that I have been able to get the script to run so far is by decreasing the batch_size parameter in the JSON file down to 1. Do you know what might be causing this issue on my end?

Thank you so much for your help!

Tutorial for identifying distal regulatory elements

Hi Basenji Developers, thanks a lot your project.

I would like to use your software to identify distal regulatory elements and outline their driving motives in K562 and PrEC cell lines. I see that a part of this procedure is in silico saturation mutagenesis and you posted a tutorial on that, but I cannot find any for regulatory elements identification.

I would be grateful if you could provide a comprehencive tutorial for this task.

Thanks!

This call to matplotlib.use() has no effect because the backend has already been chosen

Dear developer, this tutorial is very good! Now I have some problems. In the tutorial tutorial/genes.ipynb, I ran the following line and I got an error. I checked the related issues and I still didn't solve them.

image

In addition, the final output result "R2, etc." of ''gene.ipynb' also has problems, as follows:

image

Before that, I installed stat according to the prompt. I don't know if it will affect. I want to know the version of your stat and matlibplot modules, can I?

image

ImportError when running basenji_test_genes.py (gene expression predictions)

Hello David,

I would like to use your gene expression prediction modules on my sequences that contain disease-associated SNPs. I have run the lines as explained in the tutorial, but in step 6
python3 basenji_test_genes.py -o ../tutorials/output/gencode_chr9_test --rc -s --table ../tutorials/models/params_small.txt ../tutorials/models/heart/model_best.tf ../tutorials/data/gencode_chr9.h5

I get the ImportError: cannot import name 'infer_replicates' from 'basenji_test_reps'.
I launched the script in conda environment created as described in your package installation section.

Thank your for your help.

Training basenji on google colaboratory

I'm trying to run basenji_train.py on google colab and I got this error:

/usr/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
return f(*args, **kwds)
{'batch_size': 4, 'batch_buffer': 4096, 'link': 'softplus', 'loss': 'poisson', 'optimizer': 'adam', 'adam_beta1': 0.97, 'adam_beta2': 0.98, 'learning_rate': 0.002, 'num_targets': 3, 'target_pool': 128, 'cnn_dropout': 0.1, 'cnn_filter_sizes': [20, 7, 7, 7, 3, 3, 3, 3, 3, 3, 3, 1], 'cnn_filters': [128, 128, 192, 256, 256, 32, 32, 32, 32, 32, 32, 384], 'cnn_pool': [2, 4, 4, 4, 1, 0, 0, 0, 0, 0, 0, 0], 'cnn_dilation': [1, 1, 1, 1, 1, 2, 4, 8, 16, 32, 64, 1], 'cnn_dense': [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0]}
Targets pooled by 128 to length 1024
Convolution w/ 3 384x1 filters to final targets
Targets pooled by 128 to length 1024
Convolution w/ 3 384x1 filters to final targets
Model building time 5.505798
Batcher initialized
2018-10-30 21:35:40.047109: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
Initializing...
Initialization time 1.197716
Traceback (most recent call last):
File "bin/basenji_train.py", line 214, in
tf.app.run(main)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "bin/basenji_train.py", line 42, in main
test_epoch_batches=FLAGS.test_epoch_batches)
File "bin/basenji_train.py", line 165, in run
no_steps=FLAGS.no_steps)
TypeError: train_epoch_h5() got an unexpected keyword argument 'fwdrc'

The preprocessing is working fine and I got no errors installing. Please anyone knows how to fix it?

Training crashes

Hello,

I am trying to train a model but I get a bunch of warnings from tensorflow and then it crashes, I don't know if this might be a tensorflow compatibility issue, I am using tensorflow 1.7
Any ideas? Thanks in advance!

/home/laura/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
WARNING:tensorflow:From /home/laura/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
WARNING:tensorflow:From /home/laura/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py:497: calling conv1d (from tensorflow.python.ops.nn_ops) with data_format=NHWC is deprecated and will be removed in a future version.
Instructions for updating:
NHWC for data_format is deprecated, use NWC instead
2018-04-20 19:06:34.278450: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
{'batch_size': 2, 'seq_length': [262144, 65536], 'seq_depth': 4, 'target_length': 2048, 'num_targets': 10, 'target_pool': 128, 'batch_buffer': 16384, 'batch_renorm': 1, 'link': 'exp_linear', 'loss': 'poisson', 'learning_rate': 0.005769, 'momentum': 0.99574, 'optimizer': 'momentum', 'cnn_dropout': 0.04375, 'cnn_l2': 1.44e-09, 'cnn_filter_sizes': [3, 1, 3, 3, 3, 3], 'cnn_filters': [6, 6, 6, 6, 6, 376], 'cnn_pool': [1, 2, 4, 4, 4, 1], 'dense': 1, 'dcnn_dropout': 0.072917, 'dcnn_l2': 1.93e-08, 'dcnn_filter_sizes': [3, 3, 3, 3, 3, 3, 3], 'dcnn_filters': [8, 8, 8, 8, 8, 8, 8], 'full_units': 32, 'full_dropout': 0.009375, 'full_l2': 2.58e-08, 'final_l1': 1.45e-08, 'load_hd5': 1}
Targets pooled by 128 to length 2048
Convolution w/ 10 376x1 filters to final targets
Model building time 59.275235
Batcher initialized
Initializing...
Initialization time 0.484647
Traceback (most recent call last):
File "bin/basenji_train.py", line 207, in
tf.app.run(main)
File "/home/laura/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "bin/basenji_train.py", line 40, in main
num_train_epochs=FLAGS.num_train_epochs)
File "bin/basenji_train.py", line 145, in run
for epoch in range(num_train_epochs):
TypeError: 'NoneType' object cannot be interpreted as an integer

Predicting activity of sequences using model trained in the manuscript

Hi,

I am trying to use a custom bed file as input to predict the activity score in each genomic loci using the model trained and tested in the manuscript.

I was able to download the trained model from https://github.com/calico/basenji/tree/master/manuscript but I am not clear how to now use this model to make predictions on new sequences.

Also, if I want to run basenji_motifs.py on the same set of sequences, can you explain what does the data file - HDF5 file with test_in/test_out keys mean here ? I am getting confused with the "test_in/test_out keys" bit.

Thanks a lot- this is my first time implementing any neural network models and I appreciate your help !

num_targets in params file vs --ti option in basenji_test.py

Hi,
I am trying to understand how basenji works. We are trying to train a model on a small dataset and produce bw files of prediction for all five tracks. I do not understand the role of num_targets and its relation with --ti. For example. We had a dataset that included 2 files to train a model and we wanted to make predictions on all five tracks, but it failed to do so. Could you possibly explain the roles of num_targets in the params file and the --ti option in basenji_test.py. Many Thanks!
Best,
Faiz

Meaning of the test data's result from kipoi

hi, I used the kipoi version of basenji and got the results like this.
image
I knew the meaning of the preds/m/n, but what did the specific number mean in the pic above? there are lots of data for this small test data, Is it the expression results of each region on a different sample, I need to add them up in order to obtain a final prediction level? or it is just a score? If I want to know the expression level of specific genes, do you have any suggestions for me with basenji?

Clarification on output

The output array for basenji is: (960, 4229)
Could you explain in more detail where the 960 comes from? I understand that it is the number of bins on the sequence, but why is it not 1024bp bins? 1024 comes from 2^17/128 = 1024.

Additionally, in the picture below there is a labeling ("pred/n/m") that is ordered by digit rather than by value. Which ordering should we consult when looking at Supplementary Table 1? I presume the 10th output feature corresponds to the 10th value in SuppTable1, not the third (as it would under digit order).

screen shot 2018-06-01 at 4 26 01 pm

Thanks

No *.tfr files found

I am trying to run the preprocessing tutorial notebook, and although there are no errors, no *.tfr files are created in the process. The created tfrecords folder only contains .err-files.

How do I replicate Bassets peak classification using Basenji?

Hi,

Firstly, thanks for making these tools open source. Much appreciated!

I have a few questions related to the same topic. I have some DHS data in various cell types that I would like to train a predictive model for. I initially intended to use Basset, but then saw this in the ReadMe section of Basenji - "Basenji makes predictions in bins across the sequences you provide. You could replicate Basset's peak classification by simply providing smaller sequences and binning the target for the entire sequence." 1) I am unsure what binning the target for the entire sequence means? 2) What would be the process for replicating Bassets peak classification on DHS data using Basenji? It seems that Basenji only scores in 128 bp windows. 3) Can I score full DHS sequences (150 bp)?

In addition to this, I see that Basset was trained on 600 bp sequences and Basenji is trained on much larger sequences. 4) If I want to train on new DHS data would I be able train using the Basenji architecture on smaller DHS sequences (around 150-600 bp) or do I have to use Basset? I would definitely prefer to use Basenji as it is based on tensorflow and lua (which Basset uses) isn't compatible with the Power9 architecture.

I appreciate any help.

Best,
Zain

New stable release?

Hi!
I'm working on a conda package for basenji in bioconda/bioconda-recipes#26534
As available releases of basenji seem to be old compared to latest developments in the master branch, I've written the recipe to use a recent commit from master. Do you think it's ok? Or should I stick to the released 0.4? Or do you plan to make a new release any time soon?

Second question: envrionnment.yml refers to cudnn and cudatoolkit. Are they only needed when using tensorflow-gpu, or is it always required by basenji?

Fail to intall basenji

Dear Authors:
I tried to install basenji as instruction from README.md,
And I input command as below:
python setup.py develop --install-dir=/software/python3lib
And the Error messenger is showed as below:
#======#
/tmp/easy_install-567tun7k/h5py-2.8.0/h5py/defs.c: In function ‘__pyx_f_4h5py_4defs_H5Sget_regular_hyperslab’:
/tmp/easy_install-567tun7k/h5py-2.8.0/h5py/defs.c:31571:15: warning: implicit declaration of function ‘H5Sget_regular_hyperslab’ [-Wimplicit-function-declaration]
__pyx_t_1 = H5Sget_regular_hyperslab(__pyx_v_spaceid, __pyx_v_start, __pyx_v_stride, __pyx_v_count, __pyx_v_block); if (unlikely(PyErr_Occur
^
/tmp/easy_install-567tun7k/h5py-2.8.0/h5py/defs.c: In function ‘__Pyx_modinit_function_export_code’:
/tmp/easy_install-567tun7k/h5py-2.8.0/h5py/defs.c:40634:67: error: ‘__pyx_f_4h5py_4defs_H5Pset_virtual_view’ undeclared (first use in this function)
if (__Pyx_ExportFunction("H5Pset_virtual_view", (void ()(void))__pyx_f_4h5py_4defs_H5Pset_virtual_view, "herr_t (hid_t, H5D_vds_view_t)") <
^
/tmp/easy_install-567tun7k/h5py-2.8.0/h5py/defs.c:40634:67: note: each undeclared identifier is reported only once for each function it appears in
/tmp/easy_install-567tun7k/h5py-2.8.0/h5py/defs.c:40635:67: error: ‘__pyx_f_4h5py_4defs_H5Pget_virtual_view’ undeclared (first use in this function)
if (__Pyx_ExportFunction("H5Pget_virtual_view", (void (
)(void))__pyx_f_4h5py_4defs_H5Pget_virtual_view, "herr_t (hid_t, H5D_vds_view_t *)")
^
In file included from /opt/apps/tools/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/numpy/core/include/numpy/ndarrayobject.h:27:0,
from /opt/apps/tools/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from /tmp/easy_install-567tun7k/h5py-2.8.0/h5py/api_compat.h:26,
from /tmp/easy_install-567tun7k/h5py-2.8.0/h5py/defs.c:611:
/tmp/easy_install-567tun7k/h5py-2.8.0/h5py/defs.c: At top level:
/opt/apps/tools/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/numpy/core/include/numpy/__multiarray_api.h:1463:1: warning: ‘_import_array’ defined but not used [-Wunused-function]
_import_array(void)
^
error: Setup script exited with error: command 'gcc' failed with exit status 1
#==========#

Could you help to give me some tips how to figure it out? Thank you so much.

BEst,
Qi

`crop_bp` argument in `basenji_data_read.py`

I am following tutorials in tutorials/preprocess.ipynb to reproduce the data used in the publication step-by-step.

However, I got an error message executing the command
!basenji_data.py -d .1 -g data/unmap_macro.bed -l 131072 --local -o data/heart_l131k -p 8 -t .1 -v .1 -w 128 data/hg19.ml.fa data/heart_wigs.txt

It was because basenji_data.py forces to set --crop parameter larger than 0.
In line 69 of basenji_data_read.py, assert(options.crop_bp > 0)

Could you help with it?

Additionally, I have another trivial question. The command has an argument -d with 0.1. Does it mean that output is subsample of data used in the publication?

Sincerely Yours

Old links to SeqNN in basenji_sat.py and basenji_sed.py

Both basenji_sat.py and basenji_sed.py called a attributes that don't exist in SeqNN class

{'batch_size': 4, 'cnn_dilation': [1, 1, 1, 1, 1, 2, 4, 8, 16, 32, 64, 1], 'link': 'softplus', 'batch_buffer': 4096, 'learning_rate': 0.002, 'cnn_filters': [128, 128, 192, 256, 256, 32, 32, 32, 32, 32, 32, 384], 'cnn_dropout': 0.1, 'adam_beta2': 0.98, 'cnn_dense': [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0], 'loss': 'poisson', 'cnn_filter_sizes': [20, 7, 7, 7, 3, 3, 3, 3, 3, 3, 3, 1], 'cnn_pool': [2, 4, 4, 4, 1, 0, 0, 0, 0, 0, 0, 0], 'num_targets': 3, 'adam_beta1': 0.97, 'optimizer': 'adam', 'target_pool': 128}
Targets pooled by 128 to length 1024
Convolution w/ 3 384x1 filters to final targets
Targets pooled by 128 to length 1024
Convolution w/ 3 384x1 filters to final targets
2018-09-24 19:52:17.490718: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Mutating sequence 1 / 1
Traceback (most recent call last):
File "/shared/workspace/software/basenji/bin/basenji_sat.py", line 660, in
main()
File "/shared/workspace/software/basenji/bin/basenji_sat.py", line 189, in main
sat_preds = model.predict(sess, batcher_sat,
AttributeError: 'SeqNN' object has no attribute 'predict'

The predict attribute seems to exist in SeqNNOrig.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.