calico / basenji Goto Github PK
View Code? Open in Web Editor NEWSequential regulatory activity predictions with deep convolutional neural networks.
License: Apache License 2.0
Sequential regulatory activity predictions with deep convolutional neural networks.
License: Apache License 2.0
The output array for basenji is: (960, 4229)
Could you explain in more detail where the 960 comes from? I understand that it is the number of bins on the sequence, but why is it not 1024bp bins? 1024 comes from 2^17/128 = 1024.
Additionally, in the picture below there is a labeling ("pred/n/m") that is ordered by digit rather than by value. Which ordering should we consult when looking at Supplementary Table 1? I presume the 10th output feature corresponds to the 10th value in SuppTable1, not the third (as it would under digit order).
Thanks
I am trying to run the preprocessing tutorial notebook, and although there are no errors, no *.tfr files are created in the process. The created tfrecords folder only contains .err-files.
Hi,
I am trying to use a custom bed file as input to predict the activity score in each genomic loci using the model trained and tested in the manuscript.
I was able to download the trained model from https://github.com/calico/basenji/tree/master/manuscript but I am not clear how to now use this model to make predictions on new sequences.
Also, if I want to run basenji_motifs.py on the same set of sequences, can you explain what does the data file - HDF5 file with test_in/test_out keys mean here ? I am getting confused with the "test_in/test_out keys" bit.
Thanks a lot- this is my first time implementing any neural network models and I appreciate your help !
Hi David
I'm interested in getting a saved version of the 2018 Basenji model that can be loaded up in keras and used in a Python script, rather than in s command line tool. I tried going through your code to convert the model from tensorflow to keras but, due to my inexperience with tensorflow, had little luck. Would it be difficult for you to make such a model available that can be loaded into keras?
Thanks!
Hi,
I'm trying to explore Akita but when loading the model I get the following error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-3-70b692f42586> in <module>
10 target_length = params_model['target_length']
11 target_crop = params_model['target_crop']
---> 12 seqnn_model = seqnn.SeqNN(params_model)
TypeError: __init__() takes 1 positional argument but 2 were given
As I understand the error is produced from basenji but I couldn't find out what's wrong with it. What can be the reason?
Hi, I've been attempting to train a Basset-style mode using the guidelines that you outlined here: https://github.com/calico/basenji/tree/master/manuscripts/basset
When I get to the actual training step, though, using the included params_basset.json
file, I keep getting the following incompatible dimensions error message, in which it appears that the batch_size
value of 64 is being multiplied by 7:
Total params: 5,967,782
Trainable params: 5,960,460
Non-trainable params: 7,322
__________________________________________________________________________________________________
None
model_strides [192]
target_lengths [7]
target_crops [0]
2021-02-07 16:50:08.427245: I tensorflow/core/profiler/lib/profiler_session.cc:164] Profiler session started.
Epoch 1/10000
Traceback (most recent call last):
File "/home/bettimj/basenji/bin/basenji_train.py", line 165, in <module>
main()
File "/home/bettimj/basenji/bin/basenji_train.py", line 154, in main
seqnn_trainer.fit_keras(seqnn_model)
File "/gpfs23/scratch/bettimj/basenji/basenji/trainer.py", line 126, in fit_keras
seqnn_model.model.fit(
File "/home/bettimj/miniconda3/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 108, in _method_wrapper
return method(self, *args, **kwargs)
File "/home/bettimj/miniconda3/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1098, in fit
tmp_logs = train_function(iterator)
File "/home/bettimj/miniconda3/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
result = self._call(*args, **kwds)
File "/home/bettimj/miniconda3/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 840, in _call
return self._stateless_fn(*args, **kwds)
File "/home/bettimj/miniconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2829, in __call__
return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
File "/home/bettimj/miniconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1843, in _filtered_call
return self._call_flat(
File "/home/bettimj/miniconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1923, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "/home/bettimj/miniconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 545, in call
outputs = execute.execute(
File "/home/bettimj/miniconda3/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [200,448,164] vs. [200,64,164]
[[node LogicalAnd_6 (defined at gpfs23/scratch/bettimj/basenji/basenji/metrics.py:65) ]] [Op:__inference_train_function_6085]
Function call stack:
train_function
The only way that I have been able to get the script to run so far is by decreasing the batch_size
parameter in the JSON file down to 1. Do you know what might be causing this issue on my end?
Thank you so much for your help!
Hi,
I had an issue of inputing barcoded bam files to the bam_Cov.py script where the bigWig files were essentially empty except for a header. I've tracked the issue at least part of the way down to being an issue of tags. For example, line 1205: if (not align.has_tag('NH') or align.get_tag('NH')==1) and not align.has_tag('XA'):
If I understand correctly this is essentially asking if the read is uniquely aligned, but my BAM file doesn't have NH or XA tags. I can use AS (Alignement score of primary) == XS (Alignment score to suboptimal alignement) to determine if a read is essentially uniquely mapping. But I was wondering if you could point out other points in the script I should be worried about or if you had an alternative strategy besides re-alignment to deal with this issue.
Thanks!
hi, I used the kipoi version of basenji and got the results like this.
I knew the meaning of the preds/m/n, but what did the specific number mean in the pic above? there are lots of data for this small test data, Is it the expression results of each region on a different sample, I need to add them up in order to obtain a final prediction level? or it is just a score? If I want to know the expression level of specific genes, do you have any suggestions for me with basenji?
Hi there,
Thanks for releasing your interesting bioRxiv paper and code. I'd love to try out your model on the CAGE data you presented in the paper. The tutorial links seem broken, however, and I can't access them, and the one available in /bin/tutorial seems to be an older one for Basset. I also cannot find the "params_file" that you used for training. It would be greatly helpful if you could upload these and perhaps instructions for how to setup an example CAGE file for training? If you have the bandwidth, your pretrained files would also be very helpful for me to test.
Thanks again and best wishes,
Vikram
I successfully ran the example command in the sat_mut.ipynb:
#! ../bin/basenji_sat.py -g -f 20 -l 200 -o output/gata4_sat --rc -t 0,1,2 models/params_small.txt models/heart/model_best.tf data/gata4.fa
So then tried to do this for the full model that is specified in basenji/manuscript/get_model.sh
Once downloading the 3 model files to models/full, I attempted to run the same code but with new model:
! ../bin/basenji_sat.py -g -f 20 -l 200 -o output/practice --rc -t 0,1,2 models/params_small.txt ./models/full/model.tf data/gata4.fa
And got these errors (bolded what i think is the issue):
{'batch_size': 4, 'batch_buffer': 4096, 'link': 'softplus', 'loss': 'poisson', 'optimizer': 'adam', 'adam_beta1': 0.97, 'adam_beta2': 0.98, 'learning_rate': 0.002, 'num_targets': 3, 'target_pool': 128, 'cnn_dropout': 0.1, 'cnn_filter_sizes': [20, 7, 7, 7, 3, 3, 3, 3, 3, 3, 3, 1], 'cnn_filters': [128, 128, 192, 256, 256, 32, 32, 32, 32, 32, 32, 384], 'cnn_pool': [2, 4, 4, 4, 1, 0, 0, 0, 0, 0, 0, 0], 'cnn_dilation': [1, 1, 1, 1, 1, 2, 4, 8, 16, 32, 64, 1], 'cnn_dense': [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0]}
Targets pooled by 128 to length 1024
Convolution w/ 3 384x1 filters to final targets
Model building time 11.241484
2018-06-21 10:31:31.380303: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-06-21 10:31:34.284572: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key final/dense/bias not found in checkpoint
Traceback (most recent call last):
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
return fn(*args)
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: Key final/dense/bias not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "../bin/basenji_sat.py", line 706, in
main()
File "../bin/basenji_sat.py", line 203, in main
saver.restore(sess, model_file)
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1802, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key final/dense/bias not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
Caused by op 'save/RestoreV2', defined at:
File "../bin/basenji_sat.py", line 706, in
main()
File "../bin/basenji_sat.py", line 199, in main
saver = tf.train.Saver()
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1338, in init
self.build()
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1347, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1384, in _build
build_save=build_save, build_restore=build_restore)
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 835, in _build_internal
restore_sequentially, reshape)
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 472, in _AddRestoreOps
restore_sequentially)
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 886, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1463, in restore_v2
shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
op_def=op_def)
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1718, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
NotFoundError (see above for traceback): Key final/dense/bias not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
What can I do to fix this?
Thanks
Update
I tried using all the different parameter files for the model with no luck either.
I was very impressed by your paper, software and the tutorial. I found the pre-processing step is very tedious. I was wondering if you could release the processed data. By the way, do you compare the performance difference if we use processed data from the ENCODE and Roadmap directly? Thank you for your help!
Dear developer
I don't know if there is a problem with my operation, I hope I can get your help. I am learning to use basenji because I think basenji is very good, thank you very much!
If I use the model trained by train_test.ipynb
for the subsequent gene.ipynb
, sat.ipynb
, an abnormal prediction will occur.
I took the latest basenji and my environment installation is done according to the tutorial you said.
The train.tfr
file output by my preprocess.ipynb
and my trained model, as long as it is my own training model, no matter how good AUPRC, will get the abnormal output.
Specifically, as I retrain the model, the training process and results are as follows:
I put my trained model into gene.ipynb
and the result is still an abnormal result.
This is the result of the model model_best.tf
I trained, applied to sad.ipynb
.
Later I downloaded and used the model you gave and put it into gene.ipynb
. I successfully reproduced the original result in gene.ipynb
.
So this shows that the train.tfr
that my preprocess.ipynb got is problematic, which caused my training process to be less normal. Although I can achieve stable convergence in the end, I can get a higher AUPRC in the test phase, but my initial training process shows that the loss is very low. as follows:
This is also the place I am troubled. When I execute preprocess.ipynb
, I sometimes can't get the train.tfr
and valid.tfr
files. Last time I passed the chmod 777 *
command, I modified /basen/basenji and The file permissions in /basenji/bin are output to the tfr file.
I don’t know if my environment is wrong or something else.
Hello! dear developer, I don't know if there is a problem with my operation, I hope I can get your help. I am learning to use basenji because I think basenji is very good, thank you very much!
The following is to use the 'models/heart/model_best.tf.meta' model file you provided.
The first step of sat_mut
has no errors.
The second step has an error, KeyError: 'index'
The first step of sat_mut
has an error, NameError: name 'bed_file' is not defined, but I obviously have the data/gata4.bed
file.
Dear Authors:
I tried to install basenji as instruction from README.md,
And I input command as below:
python setup.py develop --install-dir=/software/python3lib
And the Error messenger is showed as below:
#======#
/tmp/easy_install-567tun7k/h5py-2.8.0/h5py/defs.c: In function ‘__pyx_f_4h5py_4defs_H5Sget_regular_hyperslab’:
/tmp/easy_install-567tun7k/h5py-2.8.0/h5py/defs.c:31571:15: warning: implicit declaration of function ‘H5Sget_regular_hyperslab’ [-Wimplicit-function-declaration]
__pyx_t_1 = H5Sget_regular_hyperslab(__pyx_v_spaceid, __pyx_v_start, __pyx_v_stride, __pyx_v_count, __pyx_v_block); if (unlikely(PyErr_Occur
^
/tmp/easy_install-567tun7k/h5py-2.8.0/h5py/defs.c: In function ‘__Pyx_modinit_function_export_code’:
/tmp/easy_install-567tun7k/h5py-2.8.0/h5py/defs.c:40634:67: error: ‘__pyx_f_4h5py_4defs_H5Pset_virtual_view’ undeclared (first use in this function)
if (__Pyx_ExportFunction("H5Pset_virtual_view", (void ()(void))__pyx_f_4h5py_4defs_H5Pset_virtual_view, "herr_t (hid_t, H5D_vds_view_t)") <
^
/tmp/easy_install-567tun7k/h5py-2.8.0/h5py/defs.c:40634:67: note: each undeclared identifier is reported only once for each function it appears in
/tmp/easy_install-567tun7k/h5py-2.8.0/h5py/defs.c:40635:67: error: ‘__pyx_f_4h5py_4defs_H5Pget_virtual_view’ undeclared (first use in this function)
if (__Pyx_ExportFunction("H5Pget_virtual_view", (void ()(void))__pyx_f_4h5py_4defs_H5Pget_virtual_view, "herr_t (hid_t, H5D_vds_view_t *)")
^
In file included from /opt/apps/tools/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/numpy/core/include/numpy/ndarrayobject.h:27:0,
from /opt/apps/tools/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from /tmp/easy_install-567tun7k/h5py-2.8.0/h5py/api_compat.h:26,
from /tmp/easy_install-567tun7k/h5py-2.8.0/h5py/defs.c:611:
/tmp/easy_install-567tun7k/h5py-2.8.0/h5py/defs.c: At top level:
/opt/apps/tools/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/numpy/core/include/numpy/__multiarray_api.h:1463:1: warning: ‘_import_array’ defined but not used [-Wunused-function]
_import_array(void)
^
error: Setup script exited with error: command 'gcc' failed with exit status 1
#==========#
Could you help to give me some tips how to figure it out? Thank you so much.
BEst,
Qi
Hi!
I'm working on a conda package for basenji in bioconda/bioconda-recipes#26534
As available releases of basenji seem to be old compared to latest developments in the master branch, I've written the recipe to use a recent commit from master. Do you think it's ok? Or should I stick to the released 0.4? Or do you plan to make a new release any time soon?
Second question: envrionnment.yml refers to cudnn and cudatoolkit. Are they only needed when using tensorflow-gpu, or is it always required by basenji?
The output array for basenji is: (960, 4229)
Could you explain in more detail where the 960 comes from? I understand that it is the number of bins on the sequence, but why is it not 1024bp bins? 1024 comes from 2^17/128 = 1024.
Additionally, in the picture below there is a labeling ("pred/n/m") that is ordered by digit rather than by value. Which ordering should we consult when looking at Supplementary Table 1? I presume the 10th output feature corresponds to the 10th value in SuppTable1, not the third (as it would under digit order).
Thanks
Hello,
I was trying to run preprocess step in which when i am running bam_cov.py and it is giving a key error 'HG19'
The command iam giving is bam_cov.py FLAG-MSI2_2_iCLIP.bam FLAG-MSI2_2.bw
the error message i am getting is
Traceback (most recent call last):
File "/ysm-gpfs/home/kvp6/Kiran.vp/Tools/basenji-master/bin/bam_cov.py", line 1631, in
main()
File "/ysm-gpfs/home/kvp6/Kiran.vp/Tools/basenji-master/bin/bam_cov.py", line 95, in main
default='%s/assembly/hg19.fa' % os.environ['HG19'],
File "/ysm-gpfs/apps/software/Python/3.5.1-foss-2016a/lib/python3.5/os.py", line 683, in getitem
raise KeyError(key) from None
KeyError: 'HG19'
Hi!
Thank you so much for making all of your code available on github, we really appreciate it. I noticed that there has been a few changes on this repository, particularly with the model architecture. I would greatly appreciate it if I can gain access to the older versions of the model. We are trying to use the older parameters file to restore the model, but it fails to do so as the newer model is in the .h5 format. Hence we would like access to the older human model files, human_model_model_best.tf.meta, human_model_model_best.tf.index, and human_model_model_best.tf.data-00000-of-00001
Thank you very much for all your help in this regard,
Faiz
In the paper you mention how GC% bias correction was performed:
We normalized for GC% bias using a procedure that incorporates several established ideas, aiming to model the trend across the GC% spectrum without precluding a GC% enrichment for active regions( Benjamini and Speed 2012; Teng and Irizarry 2016). We assigned each position an estimated relevant GC% value using a Gaussian filter (to assign greater weight to nearby nucleotides more likely to have been part of a fragment relevant to that genomic position). Then we fit a third-degree polynomial regression to the log2coverage estimates. Finally, we reconfigured the coverage estimates to highlight the residual coverage unexplained by the GC% model. The parameters of the bias models exhibit a wide range, both within and across assays, suggesting the absence of a common sequencing bias. A Python script implementing these procedures to transform a BAM file of alignments to a BigWig file of inferred coverage values is available in the Basenji tool suite.
I couldn't locate the script you are talking about, where is it? 😄
Hello David,
I'd like to launch basenji_train.py on a cluster with slurm, but I'm not sure about how to deal with it. Please, do you have a kind of example like your tutorials to help ?
Thank you !
this is a small issue but some people do prefer some readability things like four spaces, and other things that black helps with, so I ran black on the project to help with this. I do understand why use 2 spaces, so no hard feelings if the pull request isn't taken in. Have run some of the tests in tests/ folder and it seems to have the same output as before, although not sure if the test scripts require args.
Hi Basenji Developers, thanks a lot your project.
I would like to use your software to identify distal regulatory elements and outline their driving motives in K562 and PrEC cell lines. I see that a part of this procedure is in silico saturation mutagenesis and you posted a tutorial on that, but I cannot find any for regulatory elements identification.
I would be grateful if you could provide a comprehencive tutorial for this task.
Thanks!
Dear David,
Could you guide me where can I find the "data/heart_l131k.bed" in Preprocess new datasets for training part in Turorial? Thank you so much.
I installed tensorflow-gpu,version 1.4,
and downloaded the data according to your tutorial.
https://github.com/calico/basenji/blob/master/tutorials/train_test.ipynb
.Then I ran the code at cell 2,
basenji_train.py --augment_rc --ensemble_rc --augment_shifts "1,0,-1" --logdir models/heart --params models/params_small.txt --train_data data/heart_l131k/tfrecords/train*.tfr --test_data data/heart_l131k/tfrecords/valid*.tfr
Then a bug was given as below,"local variable 'metrics_queue' referenced before assignment".
Could you please help solve the problem?
Thanks a lot.
tensorflow=1.14
should be
tensorflow==1.14
I used exactly the same data as you provided on the website, and ran basenji_data.py -d .1 -g data/unmap_macro.bed -l 131072 --local -o data/heart_l131k -p 8 -t .1 -v .1 -w 128 data/hg19.ml.fa data/heart_wigs.txt and got sequence.bed and other files. but in seqs_cov folder, I had no *.h5 file, so I didn't get the *.tfr file. I am wondering why I didn't get the h5 and tfr files. Is the unmap_macro.bed in the github is the one should be used in the command line?
I attached my log:
stride_train 1 converted to 131072.000000
stride_test 1 converted to 131072.000000
Contigs divided into
Train: 4701 contigs, 2169074921 nt (0.8005)
Valid: 572 contigs, 270358978 nt (0.0998)
Test: 584 contigs, 270330829 nt (0.0998)
/work/06731/yliu5/lonestar/softwares/basenji/bin/basenji_data.py:255: FutureWarning: read_table is deprecated, use read_csv instead, passing sep='\t'.
targets_df = pd.read_table(targets_file, index_col=0)
basenji_data_read.py -w 128 -u sum -c 384.000000 -s 1.000000 data/CNhs11760.bw data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov/0.h5 &> data/heart_l131k/seqs_cov/0.err
basenji_data_read.py -w 128 -u sum -c 384.000000 -s 1.000000 data/CNhs12843.bw data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov/1.h5 &> data/heart_l131k/seqs_cov/1.err
basenji_data_read.py -w 128 -u sum -c 384.000000 -s 1.000000 data/CNhs12856.bw data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov/2.h5 &> data/heart_l131k/seqs_cov/2.err
basenji_data_write.py -s 0 -e 256 data/hg19.ml.fa data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov data/heart_l131k/tfrecords/train-0.tfr &> data/heart_l131k/tfrecords/train-0.err
basenji_data_write.py -s 256 -e 512 data/hg19.ml.fa data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov data/heart_l131k/tfrecords/train-1.tfr &> data/heart_l131k/tfrecords/train-1.err
basenji_data_write.py -s 512 -e 768 data/hg19.ml.fa data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov data/heart_l131k/tfrecords/train-2.tfr &> data/heart_l131k/tfrecords/train-2.err
basenji_data_write.py -s 768 -e 1024 data/hg19.ml.fa data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov data/heart_l131k/tfrecords/train-3.tfr &> data/heart_l131k/tfrecords/train-3.err
basenji_data_write.py -s 1024 -e 1280 data/hg19.ml.fa data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov data/heart_l131k/tfrecords/train-4.tfr &> data/heart_l131k/tfrecords/train-4.err
basenji_data_write.py -s 1280 -e 1499 data/hg19.ml.fa data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov data/heart_l131k/tfrecords/train-5.tfr &> data/heart_l131k/tfrecords/train-5.err
basenji_data_write.py -s 1499 -e 1679 data/hg19.ml.fa data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov data/heart_l131k/tfrecords/valid-0.tfr &> data/heart_l131k/tfrecords/valid-0.err
basenji_data_write.py -s 1679 -e 1858 data/hg19.ml.fa data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov data/heart_l131k/tfrecords/test-0.tfr &> data/heart_l131k/tfrecords/test-0.err
Both basenji_sat.py and basenji_sed.py called a attributes that don't exist in SeqNN class
{'batch_size': 4, 'cnn_dilation': [1, 1, 1, 1, 1, 2, 4, 8, 16, 32, 64, 1], 'link': 'softplus', 'batch_buffer': 4096, 'learning_rate': 0.002, 'cnn_filters': [128, 128, 192, 256, 256, 32, 32, 32, 32, 32, 32, 384], 'cnn_dropout': 0.1, 'adam_beta2': 0.98, 'cnn_dense': [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0], 'loss': 'poisson', 'cnn_filter_sizes': [20, 7, 7, 7, 3, 3, 3, 3, 3, 3, 3, 1], 'cnn_pool': [2, 4, 4, 4, 1, 0, 0, 0, 0, 0, 0, 0], 'num_targets': 3, 'adam_beta1': 0.97, 'optimizer': 'adam', 'target_pool': 128}
Targets pooled by 128 to length 1024
Convolution w/ 3 384x1 filters to final targets
Targets pooled by 128 to length 1024
Convolution w/ 3 384x1 filters to final targets
2018-09-24 19:52:17.490718: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Mutating sequence 1 / 1
Traceback (most recent call last):
File "/shared/workspace/software/basenji/bin/basenji_sat.py", line 660, in
main()
File "/shared/workspace/software/basenji/bin/basenji_sat.py", line 189, in main
sat_preds = model.predict(sess, batcher_sat,
AttributeError: 'SeqNN' object has no attribute 'predict'
The predict attribute seems to exist in SeqNNOrig.
Dear developer, this tutorial is very good! Now I have some problems. In the tutorial tutorial/genes.ipynb
, I ran the following line and I got an error. I checked the related issues and I still didn't solve them.
In addition, the final output result "R2, etc." of ''gene.ipynb' also has problems, as follows:
Before that, I installed stat
according to the prompt. I don't know if it will affect. I want to know the version of your stat
and matlibplot
modules, can I?
Hello Dave,
I was wondering what the input format is for sample_wig_files
in the inputs for basenji_hdf5_single.py
?
When I looked at the tutorial, it suggested a format with two columns (from data/heart_wigs.txt
):
aorta data/CNhs11760.bw
artery data/CNhs12843.bw
pulmonic_valve data/CNhs12856.bw
Unfortunately, when I run basenji_hdf5_single.py
it gives me an index error at line 186:
for line in open(sample_wigs_file, encoding='UTF-8'):
a = line.rstrip().split('\t')
target_wigs[a[0]] = a[1]
target_strands.append(a[2])
It looks like it was trying to access a[2]
, which would be the third column of data/heart_wigs.txt
, which does not exist in heart_wigs.txt
What is the expected input format for sample_wig_files
?
Best,
Jake
Despite my efforts, basenji_data.py does not produce a deterministic random partition and sort of training sequences. Resolve that.
I am following tutorials in tutorials/preprocess.ipynb
to reproduce the data used in the publication step-by-step.
However, I got an error message executing the command
!basenji_data.py -d .1 -g data/unmap_macro.bed -l 131072 --local -o data/heart_l131k -p 8 -t .1 -v .1 -w 128 data/hg19.ml.fa data/heart_wigs.txt
It was because basenji_data.py
forces to set --crop parameter larger than 0.
In line 69 of basenji_data_read.py
, assert(options.crop_bp > 0)
Could you help with it?
Additionally, I have another trivial question. The command has an argument -d with 0.1. Does it mean that output is subsample of data used in the publication?
Sincerely Yours
Hi Dave,
Thank you for providing such a great tool!
In the tutorial basenji_data, I have a question regarding the -d
option when it is set below 1 as in the tutorial (-d .1
).
It seems that the subsequent sequences.bed file mixed train/valid/test regions in random orders using random.sample
function:
mseqs = random.sample(mseqs, int(options.sample_pct*len(contigs)))
Therefore, could this lead to an issue when the TF records are written i.e when the loop goes through each label, it will write the 256 next sequences (seqs_per_tfr
option) whatever their labels?
Thank you again for your help on this.
Best regards,
Camille
Hi Basenji Developers, thanks a lot for the great project. I ran into a list of issues trying to reproduce the sad.ipynb
tutorial.
I was able to download the additional files in the first three blocks in the notebook. Then errors occur in the block when executing this command:
basenji_sad.py -f data/hg19.ml.fa -g data/human.hg19.genome -l 131072 -o output/rfx6_sad --rc -t data/heart_wigs_index.txt models/params_small.txt models/heart/model_best.tf data/rs339331.vcf
I was able to solve the first three errors but not the fourth. Any help is greatly appreciated.
the zarr
package is required but not listed in the setup.py
. I installed it by pip install zarr
.
HG19
environmental variable is required even if -f data/hg19.ml.fa
if specified. I solved it by setting an empty string like export HG19=''
Error occurs when the script loaded the index file as dataFrame, i.e. df = pd.dataFrame("data/heart_wigs_index.txt")
and tried to access the df.identifier
and df.description
. I think this is because data/heart_wigs_index.txt
created in the tutorial does not have a first row of column names. I manually added identifier
for the 2nd and description
for the 4th column based on my guess.
Error when building a TF tensor, says the dimension is wrong:
>>> ~/codes/basenji/bin/basenji_sad.py -f data/hg19.ml.fa -g data/human.hg19.genome -o output/rfx6_sad --rc -t data/test_index.txt models/params_small.txt models/heart/model_best.tf data/rs339331.vcf
{'batch_size': 4, 'batch_buffer': 4096, 'link': 'softplus', 'loss': 'poisson', 'optimizer': 'adam', 'adam_beta1': 0.97, 'adam_beta2': 0.98, 'learning_rate': 0.002, 'num_targets': 3, 'target_pool': 128, 'cnn_dropout': 0.1, 'cnn_filter_sizes': [20, 7, 7, 7, 3, 3, 3, 3, 3, 3, 3, 1], 'cnn_filters': [128, 128, 192, 256, 256, 32, 32, 32, 32, 32, 32, 384], 'cnn_pool': [2, 4, 4, 4, 1, 0, 0, 0, 0, 0, 0, 0], 'cnn_dilation': [1, 1, 1, 1, 1, 2, 4, 8, 16, 32, 64, 1], 'cnn_dense': [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0]}
Targets pooled by 128 to length 8
Traceback (most recent call last):
File "/home/qyd/codes/basenji/bin/basenji_sad.py", line 474, in <module>
main()
File "/home/qyd/codes/basenji/bin/basenji_sad.py", line 162, in main
embed_penultimate=options.penultimate, target_subset=target_subset)
File "/home/qyd/codes/basenji/basenji/seqnn.py", line 51, in build_feed
target_subset=target_subset)
File "/home/qyd/codes/basenji/basenji/seqnn.py", line 75, in build_from_data_ops
save_reprs=True)
File "/home/qyd/codes/basenji/basenji/seqnn.py", line 238, in build_predict
seq_length - batch_buffer_pool, :]
File "/home/qyd/codes/anaconda3/envs/basenji/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 206, in __sub__
return Dimension(self._value - other.value)
File "/home/qyd/codes/anaconda3/envs/basenji/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 42, in __init__
raise ValueError("Dimension %d must be >= 0" % self._value)
ValueError: Dimension -24 must be >= 0
I found that the dimension seq_length - batch_buffer_pool
is -24
because seq_length
is 8 and batch_buffer_pool
is 32.
Hopefully the above information make sense, and the tutorials can be updated to work again. Thanks!
Hello. I am following the instructions in manuscripts/basset to make prediction using my bed files. I have modified the targets and ran make_data.sh without problems. Next, I changed the "units": 164 to 2 in params_basset.json since I want to use two bed files. However, when I executed basenji_train.py -k -o train_basset --rc params_basset.json data_basset I got that --rc is unknown parameter. Removing that parameter resulted in the following exception:
Traceback (most recent call last):
File "/home/user/basenji-master/bin/basenji_train.py", line 165, in
main()
File "/home/user/basenji-master/bin/basenji_train.py", line 154, in main
seqnn_trainer.fit_keras(seqnn_model)
File "/home/user/basenji-master/basenji/trainer.py", line 132, in fit_keras
validation_steps=self.eval_epoch_batches[0])
File "/home/user/anaconda3/envs/basenji/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1100, in fit
tmp_logs = self.train_function(iterator)
File "/home/user/anaconda3/envs/basenji/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 828, in call
result = self._call(*args, **kwds)
File "/home/user/anaconda3/envs/basenji/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 888, in _call
return self._stateless_fn(*args, **kwds)
File "/home/user/anaconda3/envs/basenji/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2943, in call
filtered_flat_args, captured_inputs=graph_function.captured_inputs) # pylint: disable=protected-access
File "/home/user/anaconda3/envs/basenji/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1919, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/home/user/anaconda3/envs/basenji/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 560, in call
ctx=ctx)
File "/home/user/anaconda3/envs/basenji/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [200,448,2] vs. [200,64,2]
[[node LogicalAnd_2 (defined at /basenji-master/basenji/metrics.py:65) ]] [Op:__inference_train_function_5687]
Function call stack:
train_function
Setting the batch size to 1 makes the code run, but I dont know if the results are correct. Please advise what is wrong!
Hi,
I am trying to understand how basenji works. We are trying to train a model on a small dataset and produce bw files of prediction for all five tracks. I do not understand the role of num_targets and its relation with --ti. For example. We had a dataset that included 2 files to train a model and we wanted to make predictions on all five tracks, but it failed to do so. Could you possibly explain the roles of num_targets in the params file and the --ti option in basenji_test.py. Many Thanks!
Best,
Faiz
Hello,
I am trying to train a model but I get a bunch of warnings from tensorflow and then it crashes, I don't know if this might be a tensorflow compatibility issue, I am using tensorflow 1.7
Any ideas? Thanks in advance!
/home/laura/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float
to np.floating
is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type
.
from ._conv import register_converters as _register_converters
WARNING:tensorflow:From /home/laura/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
WARNING:tensorflow:From /home/laura/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py:497: calling conv1d (from tensorflow.python.ops.nn_ops) with data_format=NHWC is deprecated and will be removed in a future version.
Instructions for updating:
NHWC
for data_format is deprecated, use NWC
instead
2018-04-20 19:06:34.278450: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
{'batch_size': 2, 'seq_length': [262144, 65536], 'seq_depth': 4, 'target_length': 2048, 'num_targets': 10, 'target_pool': 128, 'batch_buffer': 16384, 'batch_renorm': 1, 'link': 'exp_linear', 'loss': 'poisson', 'learning_rate': 0.005769, 'momentum': 0.99574, 'optimizer': 'momentum', 'cnn_dropout': 0.04375, 'cnn_l2': 1.44e-09, 'cnn_filter_sizes': [3, 1, 3, 3, 3, 3], 'cnn_filters': [6, 6, 6, 6, 6, 376], 'cnn_pool': [1, 2, 4, 4, 4, 1], 'dense': 1, 'dcnn_dropout': 0.072917, 'dcnn_l2': 1.93e-08, 'dcnn_filter_sizes': [3, 3, 3, 3, 3, 3, 3], 'dcnn_filters': [8, 8, 8, 8, 8, 8, 8], 'full_units': 32, 'full_dropout': 0.009375, 'full_l2': 2.58e-08, 'final_l1': 1.45e-08, 'load_hd5': 1}
Targets pooled by 128 to length 2048
Convolution w/ 10 376x1 filters to final targets
Model building time 59.275235
Batcher initialized
Initializing...
Initialization time 0.484647
Traceback (most recent call last):
File "bin/basenji_train.py", line 207, in
tf.app.run(main)
File "/home/laura/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "bin/basenji_train.py", line 40, in main
num_train_epochs=FLAGS.num_train_epochs)
File "bin/basenji_train.py", line 145, in run
for epoch in range(num_train_epochs):
TypeError: 'NoneType' object cannot be interpreted as an integer
Hi, I was following the tutorials for trying to run the sad.ipynb notebook in the tutorial folder. Installations all worked fine.
However I get an error when running the following command:
! basenji_sad.py --cpu -f data/hg19.ml.fa -g data/human.hg19.genome --h5 -o output/rfx6_sad --rc --shift "1,0,-1" -t data/heart_wigs.txt models/params_small.txt models/heart/model_best.tf data/rs339331.vcf
The first error I get is zsh:1: command not found: basenji_sad.py
, which I can solve by changing the directory to ../bin/basenji_sad.py
.
Subsequently, no such option
errors arise. After running ! ../bin/basenji_sad.py --help
, and removing the flags that aren't shown, the code runs.
Finally, the command looks like
! ../bin/basenji_sad.py -f data/hg19.ml.fa -o output/rfx6_sad --rc --shift "1,0,-1" -t data/heart_wigs.txt models/params_small.json models/heart/model_best.tf data/rs339331.vcf
I get an assertion error:
Traceback (most recent call last): File "../bin/basenji_sad.py", line 426, in <module> main() File "../bin/basenji_sad.py", line 170, in main seqnn_model.restore(model_file) File "/Users/andyjiang/basenji/basenji/basenji/seqnn.py", line 350, in restore self.models[head_i].load_weights(model_file) File "/Users/andyjiang/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 2216, in load_weights status.assert_nontrivial_match() File "/Users/andyjiang/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/training/tracking/util.py", line 1023, in assert_nontrivial_match return self.assert_consumed() File "/Users/andyjiang/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/training/tracking/util.py", line 998, in assert_consumed raise AssertionError( AssertionError: Some objects had attributes which were not restored:
We are focusing on predicting the effect of genetic variants, so we started with this script using the model in the tutorial script. Could this error be caused by differences in the version used to generate that model? We would prefer to use a pre-trained model to predict genetic effects if available.
Would appreciate any help.
Andy
Hi basenji/akita developers, I'd just like to ask if it's normal that akita_data.py is using a lot of RAM (while only using 1 process, although -p was set to 8). I've been running it for quite some time (~9H), but no output files other than the options.json can be seen. RAM utilization just keeps rising. I'm not sure if I'm doing it right. I'm following https://github.com/calico/basenji/blob/master/manuscripts/akita/tutorial.ipynb
Thanks for any help on this.
Hi David,
Thank you so much for sharing the Basenji code. I was trying to run the preprocess.py from the tutorial, but I ran into the error that:
ModuleNotFoundError: No module named 'util'
ModuleNotFoundError: No module named 'slurm'
Do you know where I can find these two packages?
Thanks!
Hi,
Firstly, thanks for making these tools open source. Much appreciated!
I have a few questions related to the same topic. I have some DHS data in various cell types that I would like to train a predictive model for. I initially intended to use Basset, but then saw this in the ReadMe section of Basenji - "Basenji makes predictions in bins across the sequences you provide. You could replicate Basset's peak classification by simply providing smaller sequences and binning the target for the entire sequence." 1) I am unsure what binning the target for the entire sequence means? 2) What would be the process for replicating Bassets peak classification on DHS data using Basenji? It seems that Basenji only scores in 128 bp windows. 3) Can I score full DHS sequences (150 bp)?
In addition to this, I see that Basset was trained on 600 bp sequences and Basenji is trained on much larger sequences. 4) If I want to train on new DHS data would I be able train using the Basenji architecture on smaller DHS sequences (around 150-600 bp) or do I have to use Basset? I would definitely prefer to use Basenji as it is based on tensorflow and lua (which Basset uses) isn't compatible with the Power9 architecture.
I appreciate any help.
Best,
Zain
When i try to run the command in google colab
! basenji_hdf5_genes.py -g data/human.hg19.genome -l 131072 -c 0.333 -p 3 -t data/heart_wigs.txt -w 128 data/hg19.ml.fa data/gencode_chr9.gtf data/gencode_chr9.h5
I get the following error:
Traceback (most recent call last):
File "basenji_hdf5_genes.py", line 465, in <module>
main()
File "basenji_hdf5_genes.py", line 102, in main
check_wigs(options.target_wigs_file)
File "basenji_hdf5_genes.py", line 357, in check_wigs
for wig_file in target_wigs_df.file:
File "/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py", line 3614, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'file'
Hi David,
I'm running into the following error (pasted below) when running basenji_test.py
or basenji_sad.py
on the sample data provided in the tutorials.
This is the command I'm running for basenji_sad: python basenji_sad.py --cpu -f /global/scratch/poojakathail/data/hg19.ml.fa -o /global/scratch/poojakathail/output/rfx6_sad --rc --shift "1,0,-1" -t /global/scratch/poojakathail/data/heart_wigs.txt /global/scratch/poojakathail/models/params_small.json /global/scratch/poojakathail/models/heart/model_best.tf /global/scratch/poojakathail/data/rs339331.vcf
params_small.json
is the file provided here: https://github.com/calico/basenji/blob/master/tutorials/models/params_small.json
model_best.tf
is the pre-trained model included in the tutorials
Thanks for providing this package and for your help!
File "basenji_sad.py", line 419, in <module>
main()
File "basenji_sad.py", line 176, in main
seqnn_model.restore(model_file)
File "/global/home/users/poojakathail/basenji/basenji/seqnn.py", line 316, in restore
self.model.load_weights(model_file)
File "/global/home/users/poojakathail/.conda/envs/basenji/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 182, in load_weights
return super(Model, self).load_weights(filepath, by_name)
File "/global/home/users/poojakathail/.conda/envs/basenji/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/network.py", line 1356, in load_weights
status.assert_nontrivial_match()
File "/global/home/users/poojakathail/.conda/envs/basenji/lib/python3.7/site-packages/tensorflow_core/python/training/tracking/util.py", line 966, in assert_nontrivial_match
return self.assert_consumed()
File "/global/home/users/poojakathail/.conda/envs/basenji/lib/python3.7/site-packages/tensorflow_core/python/training/tracking/util.py", line 943, in assert_consumed
"".join(unused_attribute_strings)))
AssertionError: Some objects had attributes which were not restored:
I was following the tutorials for installation and trying to run the notebook in the tutorial folder. The installation was through conda
and import basenji
worked without an error.
When I run the following command in preprocess.ipynb
! /home/shams/miniconda3/envs/basenji/bin/python /home/shams/basenji/bin/basenji_data.py -d .1 -g data/unmap_macro.bed -l 131072 --local -o data/heart_l131k -p 8 -t .1 -v .1 -w 128 data/hg19.ml.fa data/heart_wigs.txt
and get the following error
stride_train 1 converted to 131072.000000
stride_test 1 converted to 131072.000000
Contigs divided into
Train: 4701 contigs, 2169074921 nt (0.8005)
Valid: 572 contigs, 270358978 nt (0.0998)
Test: 584 contigs, 270330829 nt (0.0998)
basenji_data_read.py --crop 0 -w 128 -u sum -c 384.000000 -s 1.000000 data/CNhs11760.bw data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov/0.h5
basenji_data_read.py --crop 0 -w 128 -u sum -c 384.000000 -s 1.000000 data/CNhs12843.bw data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov/1.h5
/bin/sh: 1: basenji_data_read.py: not found
basenji_data_read.py --crop 0 -w 128 -u sum -c 384.000000 -s 1.000000 data/CNhs12856.bw data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov/2.h5
/bin/sh: 1: basenji_data_read.py: not found
/bin/sh: 1: basenji_data_read.py: not found
basenji_data_write.py -s 0 -e 256 --umap_clip 1.000000 data/hg19.ml.fa data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov data/heart_l131k/tfrecords/train-0.tfr
basenji_data_write.py -s 256 -e 512 --umap_clip 1.000000 data/hg19.ml.fa data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov data/heart_l131k/tfrecords/train-1.tfr
/bin/sh: 1: basenji_data_write.py: not found
basenji_data_write.py -s 512 -e 768 --umap_clip 1.000000 data/hg19.ml.fa data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov data/heart_l131k/tfrecords/train-2.tfr
/bin/sh: 1: basenji_data_write.py: not found
basenji_data_write.py -s 768 -e 1024 --umap_clip 1.000000 data/hg19.ml.fa data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov data/heart_l131k/tfrecords/train-3.tfr
/bin/sh: 1: basenji_data_write.py: not found
basenji_data_write.py -s 1024 -e 1280 --umap_clip 1.000000 data/hg19.ml.fa data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov data/heart_l131k/tfrecords/train-4.tfr
/bin/sh: 1: basenji_data_write.py: not found
basenji_data_write.py -s 1280 -e 1499 --umap_clip 1.000000 data/hg19.ml.fa data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov data/heart_l131k/tfrecords/train-5.tfr
/bin/sh: 1: basenji_data_write.py: not found
basenji_data_write.py -s 1499 -e 1679 --umap_clip 1.000000 data/hg19.ml.fa data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov data/heart_l131k/tfrecords/valid-0.tfr
/bin/sh: 1: basenji_data_write.py: not found
basenji_data_write.py -s 1679 -e 1858 --umap_clip 1.000000 data/hg19.ml.fa data/heart_l131k/sequences.bed data/heart_l131k/seqs_cov data/heart_l131k/tfrecords/test-0.tfr
/bin/sh: 1: basenji_data_write.py: not found
/bin/sh: 1: basenji_data_write.py: not found
I know it is a python newbie error. Could not find a self learning fix by googling
Hi, thanks for your brilliant work!
I have an issue about warning when instill basenji,
I open conda environment for install basenji and choose python version 3.6,
git clone https://github.com/calico/basenji
in a new work pathpython setup.py develop
, then I met the error: The 'pyBigWig' distribution was not found and is required by basenji after a long lines report.conda install pybigwig -c bioconda
python setup.py develop
, this time I got a long report too, I found in this report,...
Processing dependencies for basenji==0.0.1
Searching for networkx
Reading https://pypi.org/simple/networkx/
Downloading https://files.pythonhosted.org/packages/f3/f4/7e20ef40b118478191cec0b58c3192f822cace858c19505c7670961b76b2/networkx-2.2.zip#sha256=45e56f7ab6fe81652fb4bc9f44faddb0e9025f469f602df14e3b2551c2ea5c8b
Best match: networkx 2.2
Processing networkx-2.2.zip
Writing /tmp/easy_install-jsx89xlw/networkx-2.2/setup.cfg
Running networkx-2.2/setup.py -q bdist_egg --dist-dir /tmp/easy_install-jsx89xlw/networkx-2.2/egg-dist-tmp-nbez5udu
warning: no files found matching '*.html' under directory 'doc'
warning: no files found matching '*.css' under directory 'doc'
warning: no previously-included files matching '*~' found anywhere in distribution
warning: no previously-included files matching '*.pyc' found anywhere in distribution
warning: no previously-included files matching '.svn' found anywhere in distribution
no previously-included directories found matching 'doc/build'
no previously-included directories found matching 'doc/auto_examples'
no previously-included directories found matching 'doc/modules'
no previously-included directories found matching 'doc/reference/generated'
no previously-included directories found matching 'doc/reference/algorithms/generated'
no previously-included directories found matching 'doc/reference/classes/generated'
no previously-included directories found matching 'doc/reference/readwrite/generated'
creating /project2/yangili1/mengguo/Mn/envs/EnvPy37/lib/python3.6/site-packages/networkx-2.2-py3.6.egg
Extracting networkx-2.2-py3.6.egg to /project2/yangili1/mengguo/Mn/envs/EnvPy37/lib/python3.6/site-packages
Adding networkx 2.2 to easy-install.pth file
Installed /project2/yangili1/mengguo/Mn/envs/EnvPy37/lib/python3.6/site-packages/networkx-2.2-py3.6.egg
Searching for joblib
Reading https://pypi.org/simple/joblib/
Downloading https://files.pythonhosted.org/packages/0d/1b/995167f6c66848d4eb7eabc386aebe07a1571b397629b2eac3b7bebdc343/joblib-0.13.0-py2.py3-none-any.whl#sha256=9002b53b88ae0adb3872164e0846a489b7e112c50087c5e3e1bcee35f18424c4
Best match: joblib 0.13.0
Processing joblib-0.13.0-py2.py3-none-any.whl
Installing joblib-0.13.0-py2.py3-none-any.whl to /project2/yangili1/mengguo/Mn/envs/EnvPy37/lib/python3.6/site-packages
Adding joblib 0.13.0 to easy-install.pth file
Installed /project2/yangili1/mengguo/Mn/envs/EnvPy37/lib/python3.6/site-packages/joblib-0.13.0-py3.6.egg
Searching for h5py
...
Finished processing dependencies for basenji==0.0.1
At last, no error, but these warning still make me worried for unknown effect to run or the results correct. directory 'doc' really don't have these mentioned files, so I don't understand why and it looks that no stop it's finish. Very grateful if you could give some views. Thanks!
Best,
Grace
I'm trying to run basenji_train.py on google colab and I got this error:
/usr/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
return f(*args, **kwds)
{'batch_size': 4, 'batch_buffer': 4096, 'link': 'softplus', 'loss': 'poisson', 'optimizer': 'adam', 'adam_beta1': 0.97, 'adam_beta2': 0.98, 'learning_rate': 0.002, 'num_targets': 3, 'target_pool': 128, 'cnn_dropout': 0.1, 'cnn_filter_sizes': [20, 7, 7, 7, 3, 3, 3, 3, 3, 3, 3, 1], 'cnn_filters': [128, 128, 192, 256, 256, 32, 32, 32, 32, 32, 32, 384], 'cnn_pool': [2, 4, 4, 4, 1, 0, 0, 0, 0, 0, 0, 0], 'cnn_dilation': [1, 1, 1, 1, 1, 2, 4, 8, 16, 32, 64, 1], 'cnn_dense': [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0]}
Targets pooled by 128 to length 1024
Convolution w/ 3 384x1 filters to final targets
Targets pooled by 128 to length 1024
Convolution w/ 3 384x1 filters to final targets
Model building time 5.505798
Batcher initialized
2018-10-30 21:35:40.047109: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
Initializing...
Initialization time 1.197716
Traceback (most recent call last):
File "bin/basenji_train.py", line 214, in
tf.app.run(main)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "bin/basenji_train.py", line 42, in main
test_epoch_batches=FLAGS.test_epoch_batches)
File "bin/basenji_train.py", line 165, in run
no_steps=FLAGS.no_steps)
TypeError: train_epoch_h5() got an unexpected keyword argument 'fwdrc'
The preprocessing is working fine and I got no errors installing. Please anyone knows how to fix it?
Hello David,
I would like to use your gene expression prediction modules on my sequences that contain disease-associated SNPs. I have run the lines as explained in the tutorial, but in step 6
python3 basenji_test_genes.py -o ../tutorials/output/gencode_chr9_test --rc -s --table ../tutorials/models/params_small.txt ../tutorials/models/heart/model_best.tf ../tutorials/data/gencode_chr9.h5
I get the ImportError: cannot import name 'infer_replicates' from 'basenji_test_reps'.
I launched the script in conda environment created as described in your package installation section.
Thank your for your help.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.