GithubHelp home page GithubHelp logo

hostg's People

Contributors

kennthshang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

hostg's Issues

Database Extension

Dear authors, thanks for your nice tool.
I have some questions:

  1. Is the results in your paper used all prokaryote.csv or the database in github?
  2. If I download genomes in all prokaryote.csv, Do i have to retraining this model?

Knowledge graph and run_Speed_up.py Error

Traceback (most recent call last):
File "run_phage_host.py", line 64, in
_ = subprocess.check_call(blast_cmd, shell=True)
File "/fslhome/fslcollab273/.conda/envs/Host/lib/python3.7/subprocess.py", line 363, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'blastn -query out/query.fa -db blast_db/bin -outfmt 6 -out blast_tab/bin.tab -num_threads 8' returned non-zero exit status 2.
folder Cyber_data/ exist... cleaning dictionary
Cannot clean your folder... permission denied
cat: pred/: No such file or directory
folder input exist... cleaning dictionary
Dictionary cleaned
folder pred exist... cleaning dictionary
folder Split_files exist... cleaning dictionary
Dictionary cleaned
folder tmp_pred exist... cleaning dictionary
Knowledge Graph Error for file contig_0
Knowledge Graph Error for file contig_1
Knowledge Graph Error for file contig_2
Knowledge Graph Error for file contig_3
phage_host Error for file contig_4
Pre-trained CNN Error for file contig_5
Traceback (most recent call last):
File "run_Speed_up.py", line 157, in
out = subprocess.check_call(cmd, shell=True)
File "/fslhome/fslcollab273/.conda/envs/Host/lib/python3.7/subprocess.py", line 363, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'cat pred/
> final_prediction.csv' returned non-zero exit status 1.

We were so close I think. Any ideas?

Output of score values

Hi there,

Thanks for your reply to my question in #2

I just had a quick couple of unrelated follow up questions:

  • Is there currently an output file that includes the score for each match (i.e. the threshold referenced by the -t parameter) to enable filtering after the completion of the run? I may have just overlooked it in the output files, but hadn't come across this info yet.
    • This would be useful for if e.g. -t 0 was set for the run, but then user wished to filter by a threshold (e.g. 0.8) downstream?
    • And/or, I'm guessing different scores might alter whether you are confident in taking the host match at a higher taxonomic rank (e.g. phylum) even if the prediction doesn't seem confident at a lower one (e.g. genus). And so, presumably the user could set their own thresholds for which taxonomic rank to take for the host prediction based on the output score? (Perhaps 0.7 might be appropriate to take the phylum level prediction, but with 0.9 you might be confident with the genus prediction?)
  • I also just spotted that in config.py the default for --dataset is 'Caudoviridae'. Does this imply that the model is specifically designed to only really cover Caudoviridae virus-host pairings, or is that line in the script there unrelated to how the actual model was trained?

Thanks again!
Mike.

invalid literal for int() with base 8

I had some problems when run my command:

python run_Speed_up.py --contigs test_contigs.fa --len 8000 --t 0

error:

folder input exist... cleaning dictionary
folder pred exist... cleaning dictionary
folder Split_files exist... cleaning dictionary
folder tmp_pred exist... cleaning dictionary
folder Cyber_data/ exist... cleaning dictionary
Capturing compressed features
Running with cpu
Traceback (most recent call last):
  File "/home/wfugui/.conda/envs/Host/lib/python3.7/tarfile.py", line 187, in nti
    n = int(s.strip() or "0", 8)
ValueError: invalid literal for int() with base 8: 'r_v2\nq\x03('

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/wfugui/.conda/envs/Host/lib/python3.7/tarfile.py", line 2289, in next
    tarinfo = self.tarinfo.fromtarfile(self)
  File "/home/wfugui/.conda/envs/Host/lib/python3.7/tarfile.py", line 1095, in fromtarfile
    obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
  File "/home/wfugui/.conda/envs/Host/lib/python3.7/tarfile.py", line 1037, in frombuf
    chksum = nti(buf[148:156])
  File "/home/wfugui/.conda/envs/Host/lib/python3.7/tarfile.py", line 189, in nti
    raise InvalidHeaderError("invalid header")
tarfile.InvalidHeaderError: invalid header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/wfugui/.conda/envs/Host/lib/python3.7/site-packages/torch/serialization.py", line 556, in _load
    return legacy_load(f)
  File "/home/wfugui/.conda/envs/Host/lib/python3.7/site-packages/torch/serialization.py", line 467, in legacy_load
    with closing(tarfile.open(fileobj=f, mode='r:', format=tarfile.PAX_FORMAT)) as tar, \
  File "/home/wfugui/.conda/envs/Host/lib/python3.7/tarfile.py", line 1593, in open
    return func(name, filemode, fileobj, **kwargs)
  File "/home/wfugui/.conda/envs/Host/lib/python3.7/tarfile.py", line 1623, in taropen
    return cls(name, mode, fileobj, **kwargs)
  File "/home/wfugui/.conda/envs/Host/lib/python3.7/tarfile.py", line 1486, in __init__
    self.firstmember = self.next()
  File "/home/wfugui/.conda/envs/Host/lib/python3.7/tarfile.py", line 2301, in next
    raise ReadError(str(e))
tarfile.ReadError: invalid header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "run_CNN.py", line 101, in <module>
    pretrained_dict=torch.load(args.classifier, map_location='cpu')
  File "/home/wfugui/.conda/envs/Host/lib/python3.7/site-packages/torch/serialization.py", line 387, in load
    return _load(f, map_location, pickle_module, **pickle_load_args)
  File "/home/wfugui/.conda/envs/Host/lib/python3.7/site-packages/torch/serialization.py", line 560, in _load
    raise RuntimeError("{} is a zip archive (did you mean to use torch.jit.load()?)".format(f.name))
RuntimeError: Params.pkl is a zip archive (did you mean to use torch.jit.load()?)
Pre-trained CNN Error for file contig_0
cat: pred/*: No such file or directory
Traceback (most recent call last):
  File "run_Speed_up.py", line 157, in <module>
    out = subprocess.check_call(cmd, shell=True)
  File "/home/wfugui/.conda/envs/Host/lib/python3.7/subprocess.py", line 363, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'cat pred/* > final_prediction.csv' returned non-zero exit status 1.

I don't know what happened.
Thanks!

Some error occured while running run_Speed_up.py with example data

Hi KennthShang,
Thank you very much for providing an excellent tool to predict the host of prokaryotic viruses. When I run the test data with the default command, some errors as follows raised by python3:
How do I fix this problem?

(Host) [xubu@mn01 HostG]$ python3 run_Speed_up.py --contigs test_contigs.fa --len 8000 --t 0
folder input exist... cleaning dictionary
folder pred exist... cleaning dictionary
folder Split_files exist... cleaning dictionary
folder tmp_pred exist... cleaning dictionary
folder Cyber_data/ exist... cleaning dictionary
Capturing compressed features
Running with cpu
Traceback (most recent call last):
  File "/tiagor_home/xubu/miniconda3/envs/Host/lib/python3.7/tarfile.py", line 187, in nti
    n = int(s.strip() or "0", 8)
ValueError: invalid literal for int() with base 8: 'r_v2\nq\x03('

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tiagor_home/xubu/miniconda3/envs/Host/lib/python3.7/tarfile.py", line 2289, in next
    tarinfo = self.tarinfo.fromtarfile(self)
  File "/tiagor_home/xubu/miniconda3/envs/Host/lib/python3.7/tarfile.py", line 1095, in fromtarfile
    obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
  File "/tiagor_home/xubu/miniconda3/envs/Host/lib/python3.7/tarfile.py", line 1037, in frombuf
    chksum = nti(buf[148:156])
  File "/tiagor_home/xubu/miniconda3/envs/Host/lib/python3.7/tarfile.py", line 189, in nti
    raise InvalidHeaderError("invalid header")
tarfile.InvalidHeaderError: invalid header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tiagor_home/xubu/miniconda3/envs/Host/lib/python3.7/site-packages/torch/serialization.py", line 556, in _load
    return legacy_load(f)
  File "/tiagor_home/xubu/miniconda3/envs/Host/lib/python3.7/site-packages/torch/serialization.py", line 467, in legacy_load
    with closing(tarfile.open(fileobj=f, mode='r:', format=tarfile.PAX_FORMAT)) as tar, \
  File "/tiagor_home/xubu/miniconda3/envs/Host/lib/python3.7/tarfile.py", line 1593, in open
    return func(name, filemode, fileobj, **kwargs)
  File "/tiagor_home/xubu/miniconda3/envs/Host/lib/python3.7/tarfile.py", line 1623, in taropen
    return cls(name, mode, fileobj, **kwargs)
  File "/tiagor_home/xubu/miniconda3/envs/Host/lib/python3.7/tarfile.py", line 1486, in __init__
    self.firstmember = self.next()
  File "/tiagor_home/xubu/miniconda3/envs/Host/lib/python3.7/tarfile.py", line 2301, in next
    raise ReadError(str(e))
tarfile.ReadError: invalid header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "run_CNN.py", line 101, in <module>
    pretrained_dict=torch.load(args.classifier, map_location='cpu')
  File "/tiagor_home/xubu/miniconda3/envs/Host/lib/python3.7/site-packages/torch/serialization.py", line 387, in load
    return _load(f, map_location, pickle_module, **pickle_load_args)
  File "/tiagor_home/xubu/miniconda3/envs/Host/lib/python3.7/site-packages/torch/serialization.py", line 560, in _load
    raise RuntimeError("{} is a zip archive (did you mean to use torch.jit.load()?)".format(f.name))
RuntimeError: Params.pkl is a zip archive (did you mean to use torch.jit.load()?)
Pre-trained CNN Error for file contig_0
cat: pred/*: No such file or directory
Traceback (most recent call last):
  File "run_Speed_up.py", line 157, in <module>
    out = subprocess.check_call(cmd, shell=True)
  File "/tiagor_home/xubu/miniconda3/envs/Host/lib/python3.7/subprocess.py", line 363, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'cat pred/* > final_prediction.csv' returned non-zero exit status 1.

Error in Prediction

I get a lot of errors in this format and I am not sure how to interpret them

(Host) bash-4.2$ python run_Speed_up.py --contigs renamed_VF_VS_cdhit.fa --len 8000 --t 0
rm: cannot remove ‘dataset/*’: No such file or directory
Capturing compressed features
Running with cpu
Traceback (most recent call last):
File "/fslhome/fslcollab273/.conda/envs/Host/lib/python3.7/tarfile.py", line 187, in nti
n = int(s.strip() or "0", 8)
ValueError: invalid literal for int() with base 8: 'r_v2\nq\x03('

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/fslhome/fslcollab273/.conda/envs/Host/lib/python3.7/tarfile.py", line 2289, in next
tarinfo = self.tarinfo.fromtarfile(self)
File "/fslhome/fslcollab273/.conda/envs/Host/lib/python3.7/tarfile.py", line 1095, in fromtarfile
obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
File "/fslhome/fslcollab273/.conda/envs/Host/lib/python3.7/tarfile.py", line 1037, in frombuf
chksum = nti(buf[148:156])
File "/fslhome/fslcollab273/.conda/envs/Host/lib/python3.7/tarfile.py", line 189, in nti
raise InvalidHeaderError("invalid header")
tarfile.InvalidHeaderError: invalid header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/fslhome/fslcollab273/.conda/envs/Host/lib/python3.7/site-packages/torch/serialization.py", line 556, in _load
return legacy_load(f)
File "/fslhome/fslcollab273/.conda/envs/Host/lib/python3.7/site-packages/torch/serialization.py", line 467, in legacy_load
with closing(tarfile.open(fileobj=f, mode='r:', format=tarfile.PAX_FORMAT)) as tar,
File "/fslhome/fslcollab273/.conda/envs/Host/lib/python3.7/tarfile.py", line 1593, in open
return func(name, filemode, fileobj, **kwargs)
File "/fslhome/fslcollab273/.conda/envs/Host/lib/python3.7/tarfile.py", line 1623, in taropen
return cls(name, mode, fileobj, **kwargs)
File "/fslhome/fslcollab273/.conda/envs/Host/lib/python3.7/tarfile.py", line 1486, in init
self.firstmember = self.next()
File "/fslhome/fslcollab273/.conda/envs/Host/lib/python3.7/tarfile.py", line 2301, in next
raise ReadError(str(e))
tarfile.ReadError: invalid header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "run_CNN.py", line 101, in
pretrained_dict=torch.load(args.classifier, map_location='cpu')
File "/fslhome/fslcollab273/.conda/envs/Host/lib/python3.7/site-packages/torch/serialization.py", line 387, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File "/fslhome/fslcollab273/.conda/envs/Host/lib/python3.7/site-packages/torch/serialization.py", line 560, in _load
raise RuntimeError("{} is a zip archive (did you mean to use torch.jit.load()?)".format(f.name))
RuntimeError: Params.pkl is a zip archive (did you mean to use torch.jit.load()?)
Pre-trained CNN Error for file contig_0
folder Cyber_data/ exist... cleaning dictionary
Capturing compressed features

Currently necessary to run HostG from within the HostG directory?

Hi there,

Thanks for your work on HostG. It looks like a great tool, and I'm keen to give it a test run with some current data we're working on.

I just wanted to check if I'm missing something, but from what I can tell, is it currently necessary to run the tool from within the HostG/ directory?

I.e. The current calls to other python scripts in the format cmd = "python run_CNN.py" appear to only look within the working directory for run_CNN.py even if HostG/ is added to $PATH. Similarly, the call to the database (dataset/) appears to be hardcoded as being within the current directory (e.g. within run_KnowledgeGraph.py: pkl.load(open("dataset/phage2id.dict",'rb')) ).

Thanks again for all your work on this, and I'm looking forward to seeing how the outputs from our data look.

Kind regards,
Mike.

(p.s. I installed simply by cloning the repo rather than via anaconda, but perhaps it was written with the assumption that it only be run from directly within the conda environment?)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.