jessieren / deepvirfinder Goto Github PK
View Code? Open in Web Editor NEWIdentifying viruses from metagenomic data by deep learning
License: Other
Identifying viruses from metagenomic data by deep learning
License: Other
Hi, you have made a great tool here.
I have used this tool on a bunch of single-isolate assemblies to try and ID phage regions and have two queries regarding processing these outputs:
Is there any way to extract the fasta coordinates of the phage region which have k-mer matches? Or do I just get a score for the entire contig?
What threshold would you confidently consider phage? Above say, 0.8ish?
Example output
name | len | score | pvalue |
---|---|---|---|
contig_19:67348-75230 | 7882 | 0.999377728 | 0.003021376 |
contig_48:1-8398 | 8397 | 0.746047735 | 0.029401768 |
contig_6:76096-82039 | 5943 | 0.733631253 | 0.030289297 |
contig_4:106376-112319 | 5943 | 0.733631253 | 0.030289297 |
contig_2:364020-370963 | 6943 | 0.689914644 | 0.034009366 |
contig_30:1-6033 | 6032 | 0.682190657 | 0.034689176 |
contig_7:300801-306685 | 5884 | 0.566177189 | 0.046264824 |
contig_15:59051-67163 | 8112 | 0.552826881 | 0.048020999 |
contig_21:82770-90882 | 8112 | 0.552826881 | 0.048020999 |
contig_30:3621-11733 | 8112 | 0.552826881 | 0.048020999 |
contig_43:1-5567 | 5566 | 0.450131565 | 0.083427751 |
Thanks
Hi there,
I was just wondering if contigs need to be filtered by score as well as q-value as the virfinder/deepvirfinder papers appear to only be filtered by q-value?
Thanks
thanks to your awesome work that helps me a lot ,but when I use “deepvirfinder",I dont find any suggestion about the "-l" parameter setting.so I want to know the suitable number of my data .(I used megahit for assemble before)
DeepVirFinder always goes into a sleep state when the program is running and has not yet ended
Hi! I downloaded all the training set used in the paper and concatenated all the virus and prokaryote sequences in two different fasta file of about 76MB and 110GB respectively.
Running the host encoding is taking more than one week... the command used is
python ../encode.py -i ../datasets/training/host-training.fa -l 500 -p host
But the readme says that it should take about 5 minutes. I am doing something wrong?
Thanks
你好,很幸运能使用DeepVirFinder。但是这个软件的输出好像只有名称、长度、得分和p值,请问如何得到病毒序列文件呢?就是说怎样把病毒序列从contig中提取出来。DeepVirFinder好像只能输出序列名字而已,没有输出具体的序列。那么请问后续应该怎么分析呢?
Running on Windows within Miniconda3 I get the warning below. Does val_acc need to be changed to val_accuracy in the early stopping rules?
C:\Users\...\miniconda3\envs\dvf\lib\site-packages\keras\callbacks\callbacks.py:846: RuntimeWarning: Early stopping conditioned on metric
val_acc which is not available. Available metrics are: val_loss,val_accuracy,loss,accuracy (self.monitor, ','.join(list(logs.keys()))), RuntimeWarning
Hi!
I trained model by using my own datasets, while it showed below error during viral sequence scanning.
IndexError: list index out of range
I used code:
python dvf.py -i /home/input/contigs.fasta -o /home/trained_model_scanning -l 300 -m /home/trained_model
And get error:
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
Using Theano backend.
Any suggestions is appreciated!
Best,
Yi
The code to show the help crashes with:
dvf.py: ERROR: missing required command-line argument
Traceback (most recent call last):
File "dvf.py", line 33, in
filelog.write(prog_base + ": ERROR: missing required command-line argument")
NameError: name 'filelog' is not defined
The filelog is not instantiated anywhere I can see.
conda create --name dvf python=3.6 numpy theano=1.0.3 keras=2.2.4 scikit-learn Biopython h5py
conda activate dvf
python dvf.py -h
python: can't open file 'dvf.py': [Errno 2] No such file or directory
dvf.py -h
dvf.py: command not found
Please let me know?
many thanks,
Rick
Hello there,
thanks for this tool!
I was wondering if you plan to make a release, to make it easier to make analyses reproducible.
Best
Hi there,
Thanks for your work with DeepVirFinder. I've been trialing it with some environmental metagenome data, but was also interested in whether it would be appropriate for use with identifying RNA viruses in metatranscriptome data?
From what I can tell, the provided trained models have been developed including prokaryotes in the host database and only DNA viruses in the virus database. Have you experimented with training equivalent models for RNA viruses at all? And/or do you know of any obvious reason why this might be problematic? (Presumably, if the database of known RNA viruses is much smaller than that of DNA viruses, this might not be robust enough to infer across a broad range of putative RNA viruses? And/or are RNA viruses generally considered to contain enough conserved features that the DeepVirFinder approach wouldn't improve on the other available tools?).
I was also curious why Eukaryotes appear to have been omitted from the host database. The VirFinder GitHub page mentions an updated trained model including eukaryote data, but it looks like this was paired back again to just prokaryotes for the development of DeepVirFinder? I currently include a subsequent step to filter out contigs identified as eukaryote-derived (as suggested in the VirFinder docs), but having eukaryotes specifically accounted for within the model would be great.
Thanks in advance for any info, it's much appreciated.
Kind regards,
Mike.
It seems that in the supplementary file "SupplementaryTable1_NCBI_accession.xlsx", most of the records correspond to "pokaryote" are not given by their NCBI Accession id (possibly given by other type of ids), that would be good if they become available.
Dear Jie,
Thanks for this amazing tool.
I am using DeepVirFinder to find viral sequences from contigs from metagenomic data. But there was this error when calculating the q value using p-value:
There are hole script:
library(qvalue)
result <- read.csv("/Users/leesungeun/Desktop/DeepVirFinder/test.fasta_gt1000bp_dvfpred.txt", sep='\t')
result$qvalue <- qvalue(result$pvalue)$qvalues
Error in smooth.spline(lambda, pi0, df = smooth.df) :
missing or infinite values in inputs are not allowed
head(result)
name len score pvalue
1 scaffold_18377_c1 4557 0.87375259 0.019865549
2 scaffold_44088_c1 2944 0.98103178 0.009158547
3 scaffold_57332_c1 2580 0.99517071 0.006023869
4 scaffold_65082_c1 2416 0.98514450 0.008516504
5 scaffold_72628_c1 2286 0.13312286 0.172426165
6 scaffold_83131_c1 2133 0.05367623 0.229719012
Hi,
When I run DeepVirFinder using multiIcores (e.g. 72 cores), I'm encountering warning messages like below. DeepVirFinder seems to be getting slower with these messages and use only <1 core in average.
Warning messages:
INFO (theano.gof.compilelock): Waiting for existing lock by process '52117' (I am process '52060')
INFO (theano.gof.compilelock): To manually release the lock, delete /home/user/.theano/compiledir_Linux-5.3--generic-x86_64-with-Ubuntu-18.04-bionic-x86_64-3.6.9-64/lock_dir
INFO (theano.gof.compilelock): Waiting for existing lock by process '52117' (I am process '52054')
INFO (theano.gof.compilelock): To manually release the lock, delete /home/user/.theano/compiledir_Linux-5.3--generic-x86_64-with-Ubuntu-18.04-bionic-x86_64-3.6.9-64/lock_dir
(continues maybe endlessly)
I'm running DeepVirFinder on Ubuntu 18.04. And, the messages don't disappear after deleting .theano directory.
Thanks.
Hello,thank you for useful tool to find virus,I met a little bug,can you help me?
I have succeeded in many samples,but there is a bug today.
(dvf) qinjunjun@dn1 /data1/qinjunjun/zhu_bingduzu/fifteen_pig_contig/fifteen_pig_contig/Lbig $ python /data1/qinjunjun/zhu_bingduzu/DeepVirFinder/dvf.py -i final_L2_contig.fa -o final_L2_deepvirfinder_out -l 5000
Using Theano backend.
I have a couple dozen sequences that were in my input files that cause the code to get stuck.
The load average goes to 0, and progress moving thru the input file ceases. I would like to forward my collection of sequences it gets stuck on so the problem can be debugged/identified going forward.
Hello,
I am having a problem with my run getting stuck on the first section, loading the models.
I have downloaded the required dependancies and python is fully updated.
I input
python dvf.py -i ~/Documents/PairwiseANI/my_seqs.fna -l 1000 -c 2
It returns
Using Theano backend.
1. Loading Models.
model directory /data2/joshcole/DeepVirFinder/models
Traceback (most recent call last):
File "dvf.py", line 131, in <module>
modDict[contigLengthk] = load_model(os.path.join(modDir, modName))
File "/home/ggb_joshcole/miniconda3/envs/dvf/lib/python3.6/site-packages/keras/engine/saving.py", line 419, in load_model
model = _deserialize_model(f, custom_objects, compile)
File "/home/ggb_joshcole/miniconda3/envs/dvf/lib/python3.6/site-packages/keras/engine/saving.py", line 224, in _deserialize_model
model_config = json.loads(model_config.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'
Any help with what I'm doing wrong would be greatly appreciated!
I used conda to install dvf following the github steps. But when I submitted my script to HPC, there is a problem:
DeprecationWarning: 'source deactivate' is deprecated. Use 'conda deactivate'.
Using Theano backend.
miniconda3/envs/dvf/lib/python3.6/site-packages/theano/configdefaults.py:560: UserWarning: DeprecationWarning: there is no c++ compiler.This is deprecated and with Theano 0.11 a c++ compiler will be mandatory
warnings.warn("DeprecationWarning: there is no c++ compiler."
WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string.
Traceback (most recent call last):
File "tools/DeepVirFinder/dvf.py", line 132, in
modDict[contigLengthk] = load_model(os.path.join(modDir, modName))
File "miniconda3/envs/dvf/lib/python3.6/site-packages/keras/engine/saving.py", line 419, in load_model
model = _deserialize_model(f, custom_objects, compile)
File "miniconda3/envs/dvf/lib/python3.6/site-packages/keras/engine/saving.py", line 224, in _deserialize_model
model_config = json.loads(model_config.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'
But when I run the same script on the local computer, no such problem. It's really strange.
Anyone know how to address this problem? Thanks!
Hi,
I installed DeepVirFinder Dependencies via conda with command:"
conda install python=3.6 numpy theano=1.0.3 keras=2.2.4 scikit-learn Biopython h5py=2.10.0 mkl-service=2.3.0",
for I got "AttributeError: 'str' object has no attribute 'decode' " error(#18), and I cannot import theano because the lack of mkl-service.
I solved problems above, and re-installed DeepVirFinder Dependencies, but now I got mystery error like this:
'''
You can find the C code in this temporary file: /tmp/theano_compilation_error_kp_1iqu4
Traceback (most recent call last):
File "/test/.conda/envs/DeepVirFinder-1.0/DeepVirFinder/dvf.py", line 131, in
modDict[contigLengthk] = load_model(os.path.join(modDir, modName))
File "/test/.conda/envs/DeepVirFinder-1.0/lib/python3.6/site-packages/keras/engine/saving.py", line 419, in load_model
model = _deserialize_model(f, custom_objects, compile)
File "/test/.conda/envs/DeepVirFinder-1.0/lib/python3.6/site-packages/keras/engine/saving.py", line 225, in _deserialize_model
model = model_from_config(model_config, custom_objects=custom_objects)
File "/test/.conda/envs/DeepVirFinder-1.0/lib/python3.6/site-packages/keras/engine/saving.py", line 458, in model_from_config
return deserialize(config, custom_objects=custom_objects)
File "/test/.conda/envs/DeepVirFinder-1.0/lib/python3.6/site-packages/keras/layers/init.py", line 55, in deserialize
printable_module_name='layer')
File "/test/.conda/envs/DeepVirFinder-1.0/lib/python3.6/site-packages/keras/utils/generic_utils.py", line 145, in deserialize_keras_object
list(custom_objects.items())))
File "/test/.conda/envs/DeepVirFinder-1.0/lib/python3.6/site-packages/keras/engine/network.py", line 1032, in from_config
process_node(layer, node_data)
File "/test/.conda/envs/DeepVirFinder-1.0/lib/python3.6/site-packages/keras/engine/network.py", line 991, in process_node
layer(unpack_singleton(input_tensors), **kwargs)
File "/test/.conda/envs/DeepVirFinder-1.0/lib/python3.6/site-packages/keras/engine/base_layer.py", line 431, in call
self.build(unpack_singleton(input_shapes))
File "/test/.conda/envs/DeepVirFinder-1.0/lib/python3.6/site-packages/keras/layers/convolutional.py", line 141, in build
constraint=self.kernel_constraint)
File "/test/.conda/envs/DeepVirFinder-1.0/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/test/.conda/envs/DeepVirFinder-1.0/lib/python3.6/site-packages/keras/engine/base_layer.py", line 249, in add_weight
weight = K.variable(initializer(shape),
File "/test/.conda/envs/DeepVirFinder-1.0/lib/python3.6/site-packages/keras/initializers.py", line 218, in call
dtype=dtype, seed=self.seed)
File "/test/.conda/envs/DeepVirFinder-1.0/lib/python3.6/site-packages/keras/backend/theano_backend.py", line 2600, in random_uniform
return rng.uniform(shape, low=minval, high=maxval, dtype=dtype)
File "/test/.conda/envs/DeepVirFinder-1.0/lib/python3.6/site-packages/theano/sandbox/rng_mrg.py", line 872, in uniform
rstates = self.get_substream_rstates(nstreams, dtype)
File "/test/.conda/envs/DeepVirFinder-1.0/lib/python3.6/site-packages/theano/configparser.py", line 117, in res
return f(*args, **kwargs)
File "/test/.conda/envs/DeepVirFinder-1.0/lib/python3.6/site-packages/theano/sandbox/rng_mrg.py", line 779, in get_substream_rstates
multMatVect(rval[0], A1p72, M1, A2p72, M2)
File "/test/.conda/envs/DeepVirFinder-1.0/lib/python3.6/site-packages/theano/sandbox/rng_mrg.py", line 62, in multMatVect
[A_sym, s_sym, m_sym, A2_sym, s2_sym, m2_sym], o, profile=False)
File "/test/.conda/envs/DeepVirFinder-1.0/lib/python3.6/site-packages/theano/compile/function.py", line 317, in function
output_keys=output_keys)
File "/test/.conda/envs/DeepVirFinder-1.0/lib/python3.6/site-packages/theano/compile/pfunc.py", line 486, in pfunc
output_keys=output_keys)
File "/test/.conda/envs/DeepVirFinder-1.0/lib/python3.6/site-packages/theano/compile/function_module.py", line 1841, in orig_function
fn = m.create(defaults)
File "/test/.conda/envs/DeepVirFinder-1.0/lib/python3.6/site-packages/theano/compile/function_module.py", line 1715, in create
input_storage=input_storage_lists, storage_map=storage_map)
File "/test/.conda/envs/DeepVirFinder-1.0/lib/python3.6/site-packages/theano/gof/link.py", line 699, in make_thunk
storage_map=storage_map)[:3]
File "/test/.conda/envs/DeepVirFinder-1.0/lib/python3.6/site-packages/theano/gof/vm.py", line 1091, in make_all
impl=impl))
File "/test/.conda/envs/DeepVirFinder-1.0/lib/python3.6/site-packages/theano/gof/op.py", line 955, in make_thunk
no_recycling)
File "/test/.conda/envs/DeepVirFinder-1.0/lib/python3.6/site-packages/theano/gof/op.py", line 858, in make_c_thunk
output_storage=node_output_storage)
File "/test/.conda/envs/DeepVirFinder-1.0/lib/python3.6/site-packages/theano/gof/cc.py", line 1217, in make_thunk
keep_lock=keep_lock)
File "/test/.conda/envs/DeepVirFinder-1.0/lib/python3.6/site-packages/theano/gof/cc.py", line 1157, in compile
keep_lock=keep_lock)
File "/test/.conda/envs/DeepVirFinder-1.0/lib/python3.6/site-packages/theano/gof/cc.py", line 1620, in cthunk_factory
key=key, lnk=self, keep_lock=keep_lock)
File "/test/.conda/envs/DeepVirFinder-1.0/lib/python3.6/site-packages/theano/gof/cmodule.py", line 1181, in module_from_key
module = lnk.compile_cmodule(location)
File "/test/.conda/envs/DeepVirFinder-1.0/lib/python3.6/site-packages/theano/gof/cc.py", line 1523, in compile_cmodule
preargs=preargs)
File "/test/.conda/envs/DeepVirFinder-1.0/lib/python3.6/site-packages/theano/gof/cmodule.py", line 2388, in compile_str
(status, compile_stderr.replace('\n', '. ')))
Exception: ('The following error happened while compiling the node', DotModulo(A, s, m, A2, s2, m2), '\n', "Compilation failed (return status=1): /tmp/cczygCtx.s: Assembler messages:. /tmp/cczygCtx.s:7455: Error: no such instruction: vinserti128 $0x1,%xmm0,%ymm1,%ymm0'. /tmp/cczygCtx.s:7475: Error: no such instruction:
vinserti128 $0x1,%xmm1,%ymm0,%ymm0'. ", '[DotModulo(A, s, m, A2, s2, m2)]')
'''
I have no idea about this error, hope you can help me,
Thank you very much!
Could you please comment on what data was used for model training. Thank you
Total GPU beginner question here, apologies, but grateful for pointers.
Whilst I will likely ultimately use DeepVirFinder on HPC, I am testing it out on a Windows box with NVidia GPU (Quadro K620) as I can iterate quicker. I installed the latest NVidia CUDA and cuDNN drivers following their instructions. I use the environment variable as specified in the training test: THEANO_FLAGS='mode=FAST_RUN,device=cuda0,floatX=float32,GPUARRAY_CUDA_VERSION=80'
However, I cannot find the Windows equivalent for:
source /<path_to_cuda_setup>/setup.sh
source /<path_to_cuDNN_setup>/setup.sh
And it seems to run with CPU regardless of what I do.
Do I need specific versions of CUDA/cuDNN? And do I do anything special to let python use them on Windows?
When tryying to run the test i always get this error.
(dvf) C:\Users\X\DeepVirFinder>python dvf.py -i ./test/crAssphage.fa -o ./test/ -l 300
Using Theano backend.
Loading Models.
model directory C:\Users\X\DeepVirFinder\models
Encoding and Predicting Sequences.
processing line 1
processing line 1389
Using Theano backend.
Loading Models.
model directory C:\Users\X\DeepVirFinder\models
Encoding and Predicting Sequences.
processing line 1
processing line 1389
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\X.conda\envs\dvf\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Users\X.conda\envs\dvf\lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "C:\Users\X.conda\envs\dvf\lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\X.conda\envs\dvf\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
run_name="mp_main")
File "C:\Users\X.conda\envs\dvf\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "C:\Users\X.conda\envs\dvf\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "C:\Users\X.conda\envs\dvf\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\X\DeepVirFinder\dvf.py", line 211, in
pool = multiprocessing.Pool(core_num)
File "C:\Users\X.conda\envs\dvf\lib\multiprocessing\context.py", line 119, in Pool
context=self.get_context())
File "C:\Users\X.conda\envs\dvf\lib\multiprocessing\pool.py", line 174, in init
self._repopulate_pool()
File "C:\Users\X.conda\envs\dvf\lib\multiprocessing\pool.py", line 239, in _repopulate_pool
w.start()
File "C:\Users\X.conda\envs\dvf\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\Users\X.conda\envs\dvf\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\X.conda\envs\dvf\lib\multiprocessing\popen_spawn_win32.py", line 33, in init
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\Users\X.conda\envs\dvf\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
_check_not_importing_main()
File "C:\Users\X.conda\envs\dvf\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
Just installed DeepVirFinder and configured my conda env, not sure what could be going wrong?
Full traceback:
python /home/linda/DeepVirFinder/dvf.py -i nonredundant_contigs.fasta -m /home/linda/DeepVirFinder/models -o ./deepvirfinder -l 1000 -c 5
Using Theano backend.
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
1. Loading Models.
model directory /home/linda/DeepVirFinder/models
Traceback (most recent call last):
File "/home/linda/DeepVirFinder/dvf.py", line 131, in <module>
modDict[contigLengthk] = load_model(os.path.join(modDir, modName))
File "/home/linda/anaconda3/envs/deepvirfinder/lib/python3.6/site-packages/keras/engine/saving.py", line 419, in load_model
model = _deserialize_model(f, custom_objects, compile)
File "/home/linda/anaconda3/envs/deepvirfinder/lib/python3.6/site-packages/keras/engine/saving.py", line 225, in _deserialize_model
model = model_from_config(model_config, custom_objects=custom_objects)
File "/home/linda/anaconda3/envs/deepvirfinder/lib/python3.6/site-packages/keras/engine/saving.py", line 458, in model_from_config
return deserialize(config, custom_objects=custom_objects)
File "/home/linda/anaconda3/envs/deepvirfinder/lib/python3.6/site-packages/keras/layers/__init__.py", line 55, in deserialize
printable_module_name='layer')
File "/home/linda/anaconda3/envs/deepvirfinder/lib/python3.6/site-packages/keras/utils/generic_utils.py", line 145, in deserialize_keras_object
list(custom_objects.items())))
File "/home/linda/anaconda3/envs/deepvirfinder/lib/python3.6/site-packages/keras/engine/network.py", line 1032, in from_config
process_node(layer, node_data)
File "/home/linda/anaconda3/envs/deepvirfinder/lib/python3.6/site-packages/keras/engine/network.py", line 991, in process_node
layer(unpack_singleton(input_tensors), **kwargs)
File "/home/linda/anaconda3/envs/deepvirfinder/lib/python3.6/site-packages/keras/engine/base_layer.py", line 431, in __call__
self.build(unpack_singleton(input_shapes))
File "/home/linda/anaconda3/envs/deepvirfinder/lib/python3.6/site-packages/keras/layers/convolutional.py", line 141, in build
constraint=self.kernel_constraint)
File "/home/linda/anaconda3/envs/deepvirfinder/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/home/linda/anaconda3/envs/deepvirfinder/lib/python3.6/site-packages/keras/engine/base_layer.py", line 249, in add_weight
weight = K.variable(initializer(shape),
File "/home/linda/anaconda3/envs/deepvirfinder/lib/python3.6/site-packages/keras/initializers.py", line 218, in __call__
dtype=dtype, seed=self.seed)
File "/home/linda/anaconda3/envs/deepvirfinder/lib/python3.6/site-packages/keras/backend/theano_backend.py", line 2600, in random_uniform
return rng.uniform(shape, low=minval, high=maxval, dtype=dtype)
File "/home/linda/anaconda3/envs/deepvirfinder/lib/python3.6/site-packages/theano/sandbox/rng_mrg.py", line 872, in uniform
rstates = self.get_substream_rstates(nstreams, dtype)
File "/home/linda/anaconda3/envs/deepvirfinder/lib/python3.6/site-packages/theano/configparser.py", line 117, in res
return f(*args, **kwargs)
File "/home/linda/anaconda3/envs/deepvirfinder/lib/python3.6/site-packages/theano/sandbox/rng_mrg.py", line 779, in get_substream_rstates
multMatVect(rval[0], A1p72, M1, A2p72, M2)
File "/home/linda/anaconda3/envs/deepvirfinder/lib/python3.6/site-packages/theano/sandbox/rng_mrg.py", line 62, in multMatVect
[A_sym, s_sym, m_sym, A2_sym, s2_sym, m2_sym], o, profile=False)
File "/home/linda/anaconda3/envs/deepvirfinder/lib/python3.6/site-packages/theano/compile/function.py", line 317, in function
output_keys=output_keys)
File "/home/linda/anaconda3/envs/deepvirfinder/lib/python3.6/site-packages/theano/compile/pfunc.py", line 486, in pfunc
output_keys=output_keys)
File "/home/linda/anaconda3/envs/deepvirfinder/lib/python3.6/site-packages/theano/compile/function_module.py", line 1841, in orig_function
fn = m.create(defaults)
File "/home/linda/anaconda3/envs/deepvirfinder/lib/python3.6/site-packages/theano/compile/function_module.py", line 1715, in create
input_storage=input_storage_lists, storage_map=storage_map)
File "/home/linda/anaconda3/envs/deepvirfinder/lib/python3.6/site-packages/theano/gof/link.py", line 699, in make_thunk
storage_map=storage_map)[:3]
File "/home/linda/anaconda3/envs/deepvirfinder/lib/python3.6/site-packages/theano/gof/vm.py", line 1091, in make_all
impl=impl))
File "/home/linda/anaconda3/envs/deepvirfinder/lib/python3.6/site-packages/theano/gof/op.py", line 955, in make_thunk
no_recycling)
File "/home/linda/anaconda3/envs/deepvirfinder/lib/python3.6/site-packages/theano/gof/op.py", line 858, in make_c_thunk
output_storage=node_output_storage)
File "/home/linda/anaconda3/envs/deepvirfinder/lib/python3.6/site-packages/theano/gof/cc.py", line 1217, in make_thunk
keep_lock=keep_lock)
File "/home/linda/anaconda3/envs/deepvirfinder/lib/python3.6/site-packages/theano/gof/cc.py", line 1157, in __compile__
keep_lock=keep_lock)
File "/home/linda/anaconda3/envs/deepvirfinder/lib/python3.6/site-packages/theano/gof/cc.py", line 1609, in cthunk_factory
key = self.cmodule_key()
File "/home/linda/anaconda3/envs/deepvirfinder/lib/python3.6/site-packages/theano/gof/cc.py", line 1300, in cmodule_key
c_compiler=self.c_compiler(),
File "/home/linda/anaconda3/envs/deepvirfinder/lib/python3.6/site-packages/theano/gof/cc.py", line 1379, in cmodule_key_
np.core.multiarray._get_ndarray_c_version())
AttributeError: ('The following error happened while compiling the node', DotModulo(A, s, m, A2, s2, m2), '\n', "module 'numpy.core.multiarray' has no attribute '_get_ndarray_c_version'")
I've noticed that by default DVF uses multiple cores (up to 20 on my machine) and that the flag -c
or --core
does not change this behavior. Do you know why this might be happening?
Hi!
When trying to run the test sample on my server, I got a warning error that I haven't installed mkl-service. I also ran into an incompatibility issue between keras and the h5py version that is automatically installed with python3.60.
I found that h5py= 2.10.0 and mkl-service=2.4.0 work to run the test successfully.
I wanted to recommend adding that to the conda install command you have on the README file to help those who are not familiar with package management.
Best,
Erfan
Hello
I'm trying to play the training example given in Readme https://github.com/jessieren/DeepVirFinder#example
without luck.
anyone with a working setup is willing to share the pip freeze or at least the keras//tensorflow versions that works ?
regards
Eric
I received the error above when running DeepVirFinder. It appears to be an error between h5py and tensorflow. If you downgrade your h5py to 2.10.0 the error will disappear. @jessieren you may want to set the version in the markdown to prevent this issue.
Cheers,
Cody
你好,
这个工具非常棒!
我尝试训练了一下,出来的模型准确度不高。
我注意到目前的/train_example/tr/virus_tr.fa 也就是Training Data只有1000组。
不知道您是否也是在这个数据集下进行训练的呢?
如果您还有更大的数据集,不知道是否方便提供的呢?
期待您的回复(前两天发了邮件)
谢谢!
Hi, @jessieren ,
may I know is there a plan to support Python 3.7 or later for DeepVirFinder?
Thanks.
Hi,
Thanks for putting together DeepVirFinder.
However, I am having trouble processing one particular FASTA file. I am getting the error below.
2. Encoding and Predicting Sequences.
processing line 1
processing line 7014
Traceback (most recent call last):
File "/nfs/production/interpro/metagenomics/mags-scripts/dependencies/DeepVirFinder/dvf.py", line 212, in <module>
head, score, pvalue = zip(*pool.map(pred, range(0, len(code))))
ValueError: not enough values to unpack (expected 3, got 0)
There is nothing obviously wrong with the FASTA file (I ran it with VirSorter2 and VIBRANT and had no issues). Any ideas on what could be the problem here?
Many thanks in advance.
Best,
Alex
Running into an error with the code:
DeepVirFinder> python3 dvf.py -i ./test/crAssphage.fa -o ./test/ -l 300
Using Theano backend.
- Loading Models.
model directory /global/projectb/scratch/snayfach/projects/checkv-db/4_viral/DeepVirFinder/models
Traceback (most recent call last):
File "dvf.py", line 131, in
modDict[contigLengthk] = load_model(os.path.join(modDir, modName))
File "/global/homes/s/snayfach/.conda/envs/dvf/lib/python3.6/site-packages/keras/engine/saving.py", line 417, in load_model
f = h5dict(filepath, 'r')
File "/global/homes/s/snayfach/.conda/envs/dvf/lib/python3.6/site-packages/keras/utils/io_utils.py", line 186, in init
self.data = h5py.File(path, mode=mode)
File "/global/homes/s/snayfach/.conda/envs/dvf/lib/python3.6/site-packages/h5py/_hl/files.py", line 394, in init
swmr=swmr)
File "/global/homes/s/snayfach/.conda/envs/dvf/lib/python3.6/site-packages/h5py/_hl/files.py", line 170, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 85, in h5py.h5f.open
OSError: Unable to open file (unable to lock file, errno = 524, error message = 'Unknown error 524')
Any ideas what could be wrong? Using the latest version from conda
Hi there,
Is there a way to have DeepVirFinder available through singularity or docker? I am working on an HPC and conda is not allowed as it causes conflicts.
I tried to run get the program working as follows:
$ module load python
$USER/deepvirfinder_env
$USER/deepvirfinder_env/bin/activate
$ pip install numpy theano keras scikit-learn
$ pip install biopython
But when i run the program i get the following issues:
Using Theano backend.
Traceback (most recent call last):
File "/cvmfs/soft.HPC/easybuild/software/2017/Core/python/3.6.3/lib/python3.6/configparser.py", line 1138, in _unify_values
sectiondict = self._sections[section]
KeyError: 'blas'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/USER/deepvirfinder_env/lib/python3.6/site-packages/theano/configparser.py", line 168, in fetch_val_for_key
return theano_cfg.get(section, option)
File "/cvmfs/soft.HPC/easybuild/software/2017/Core/python/3.6.3/lib/python3.6/configparser.py", line 781, in get
d = self._unify_values(section, vars)
File "/cvmfs/soft.HPC/easybuild/software/2017/Core/python/3.6.3/lib/python3.6/configparser.py", line 1141, in _unify_values
raise NoSectionError(section)
configparser.NoSectionError: No section: 'blas'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/USER/deepvirfinder_env/lib/python3.6/site-packages/theano/configparser.py", line 328, in get
delete_key=delete_key)
File "/home/USER/deepvirfinder_env/lib/python3.6/site-packages/theano/configparser.py", line 172, in fetch_val_for_key
raise KeyError(key)
KeyError: 'blas.ldflags'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/USER/deepvirfinder_env/lib/python3.6/site-packages/theano/configdefaults.py", line 1250, in check_mkl_openmp
import mkl
ModuleNotFoundError: No module named 'mkl'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/USER/deepvirfinder_env/bin/DeepVirFinder/dvf.py", line 53, in
import keras
File "/home/USER/deepvirfinder_env/lib/python3.6/site-packages/keras/init.py", line 3, in
from . import utils
File "/home/USER/deepvirfinder_env/lib/python3.6/site-packages/keras/utils/init.py", line 6, in
from . import conv_utils
File "/home/USER/deepvirfinder_env/lib/python3.6/site-packages/keras/utils/conv_utils.py", line 9, in
from .. import backend as K
File "/home/USER/deepvirfinder_env/lib/python3.6/site-packages/keras/backend/init.py", line 1, in
from .load_backend import epsilon
File "/home/USER/deepvirfinder_env/lib/python3.6/site-packages/keras/backend/load_backend.py", line 87, in
from .theano_backend import *
File "/home/USER/deepvirfinder_env/lib/python3.6/site-packages/keras/backend/theano_backend.py", line 7, in
import theano
File "/home/USER/deepvirfinder_env/lib/python3.6/site-packages/theano/init.py", line 124, in
from theano.scan_module import (scan, map, reduce, foldl, foldr, clone,
File "/home/USER/deepvirfinder_env/lib/python3.6/site-packages/theano/scan_module/init.py", line 41, in
from theano.scan_module import scan_opt
File "/home/USER/deepvirfinder_env/lib/python3.6/site-packages/theano/scan_module/scan_opt.py", line 60, in
from theano import tensor, scalar
File "/home/USER/deepvirfinder_env/lib/python3.6/site-packages/theano/tensor/init.py", line 17, in
from theano.tensor import blas
File "/home/USER/deepvirfinder_env/lib/python3.6/site-packages/theano/tensor/blas.py", line 155, in
from theano.tensor.blas_headers import blas_header_text
File "/home/USER/deepvirfinder_env/lib/python3.6/site-packages/theano/tensor/blas_headers.py", line 987, in
if not config.blas.ldflags:
File "/home/USER/deepvirfinder_env/lib/python3.6/site-packages/theano/configparser.py", line 332, in get
val_str = self.default()
File "/home/USER/deepvirfinder_env/lib/python3.6/site-packages/theano/configdefaults.py", line 1430, in default_blas_ldflags
check_mkl_openmp()
File "/home/USER/deepvirfinder_env/lib/python3.6/site-packages/theano/configdefaults.py", line 1262, in check_mkl_openmp
""")
RuntimeError:
Could not import 'mkl'. Either install mkl-service with conda or set
MKL_THREADING_LAYER=GNU in your environment for MKL 2018.
If you have MKL 2017 install and are not in a conda environment you
can set the Theano flag blas.check_openmp to False. Be warned that if
you set this flag and don't set the appropriate environment or make
sure you have the right version you will get wrong results.
Any thoughts?
Thanks
I used DeepVirFinder many times before, but suddenly it is running into an error. I removed the conda enviroment and installed again, but the error persists.
Here is the output:
`python dvf.py -i /home/gabrielfernandes/tools/DeepVirFinder/test/crAssphage.fa -o /home/gabrielfernandes/tools/DeepVirFinder/teste/ -c 10
arr[tuple(seq)]
instead of arr[seq]
. In the future this will be interpreted as an array index, arr[np.array(seq)]
, which will result either in an error or a different result.Hi,
When importing the DeepVirFinder predictions and p-values into R to use the qvalue FDR prediction, I ran into an issue with the q value package. The error output is as follows:
"Error in pi0est(p, ...) :
ERROR: The estimated pi0 <= 0. Check that you have valid p-values or use a different range of lambda."
I looked in similar issues to this as posted here, and on the qvalue package github page. My concern is with DeepVirFinder's p-value predictions for all my metagenomes - none of them range between 0 and 1, but rather 0 and about ~0.98, and the p-value distributions look quite anti-conservative (my samples have not been enriched for viruses). I was thinking maybe this is why the qvalue package is having issues, due to violations of certain assumptions. The issue of the p-value range being truncated was not an issue with the VirFinder p-value predictions, but I have had other challenges with VirFinder, so I have moved to DeepVirFinder. I have attached images of p-value histograms, for the same metagenome, but from predictions of DeepVirFinder and Virfinder, respectively.
pvalue_hist_DeepVirFinder.pdf
pvalue_hist_Virfinder_pdf.pdf
Please let me know if you have any suggestions - your time and input are greatly appreciated.
Best,
Nikhil
Hello,
I'm using DeepVirFinder for some big fasta files like 200 to 300+ mb in size, but its taking around 6 hours to run even after supplying 100 CPUs using -c 100 tag.
I have been reading documentation and other issues listed, and its mentioned that its better to supply GPUs but on the server they are not available.
Can you please answer below questions?
DeepVirFinder isn't working in Ubuntu 20.04 after installing dependencies succesfully...
(dvf)$ python dvf.py -i ./test/crAssphage.fa -o ./test/ -l 300
Using Theano backend.
WARNING (theano.configdefaults): install mkl with `conda install mkl-service`: No module named 'mkl'
1. Loading Models.
model directory /raw_data/DeepVirFinder/models
Traceback (most recent call last):
File "dvf.py", line 131, in <module>
modDict[contigLengthk] = load_model(os.path.join(modDir, modName))
File "/opt/anaconda/envs/dvf/lib/python3.6/site-packages/keras/engine/saving.py", line 419, in load_model
model = _deserialize_model(f, custom_objects, compile)
File "/opt/anaconda/envs/dvf/lib/python3.6/site-packages/keras/engine/saving.py", line 224, in _deserialize_model
model_config = json.loads(model_config.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'
and after installing mkl-service...
(dvf) $ conda install mkl-service
(dvf) $ python3.6 dvf.py -i ./test/crAssphage.fa -o ./test/ -l 300
Using Theano backend.
1. Loading Models.
model directory /soft/DeepVirFinder/models
Traceback (most recent call last):
File "dvf.py", line 131, in <module>
modDict[contigLengthk] = load_model(os.path.join(modDir, modName))
File "/opt/anaconda/envs/dvf/lib/python3.6/site-packages/keras/engine/saving.py", line 419, in load_model
model = _deserialize_model(f, custom_objects, compile)
File "/opt/anaconda/envs/dvf/lib/python3.6/site-packages/keras/engine/saving.py", line 224, in _deserialize_model
model_config = json.loads(model_config.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'
Any idea of what's going on here?
Dear author, I want to use DeepVirFinder on my metagenomics samples.
There are many contigs over 3000 bps(some about 10000-50000 bps).
And how can DeepVirFinder ensure the accuracy of long contig predictions, in light of DeepVirFinder use 150-3000 bps contigs as training data.
How DeepVirFinder reprocess contigs over 3000 bps, just abandon the fraction over 3000 bps, or in other ways?
Thanks for answer~ Best wishes~
Hi everyone,
I'm running into a problem when installing dvf on a server:
mamba install python=3.8 numpy theano=1.0.3 keras=2.2.4 scikit-learn Biopython h5py
...
Encountered problems while solving:
- package theano-1.0.3-py27_0 requires python >=2.7,<2.8.0a0, but none of the providers can be installed
I've tried adjusting the versions of theano to 1.0.5, but then keras runs into an error
- package keras-2.2.4-0 requires keras-base 2.2.4.*, but none of the providers can be installed
And when adding keras-base, we finally hit a dead end when keras-base requires python=2.7 which is (see https://stackoverflow.com/questions/67595045/how-can-i-install-two-versions-of-python-on-a-single-conda-environment) not possible to install when we need python=3.6 at the same time.
- package keras-base-2.2.4-py27_0 requires python >=2.7,<2.8.0a0, but none of the providers can be installed
Am I missing something obvious? Thank you, any help is greatly appreciated!
Cheers
Hi!
I'm trying to use DeepVirFinder with the test metagenome provided by authors (CRC_meta.fa), just as a way to see how much time does it takes to finish...
It has been running for over 27 hours now, is it normal? something is corrupted?
Used command:
python dvf.py -i test/CRC_meta.fa -l 1000 -c 14
I have:
Hi!
I am using DeepVirFinder to test RNA-seq experiments for virus discovery. But when I try calculate the q value, using both my dataset and the CRC_meta test, appears the following error?
Error in smooth.spline(lambda, pi0, df = smooth.df) :
missing or infinite values in inputs are not allowed.
One detail are the p values = 0.0 for all the matchs. How can I fix this?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.