xapple / seqenv Goto Github PK

View Code? Open in Web Editor NEW

8.0 8.0 5.0 22.02 MB

Assign environment ontology (EnvO) terms to DNA sequences

License: MIT License

Python 98.42% Shell 1.58%

seqenv's People

Contributors

Stargazers

Watchers

Forkers

iueayhu tarah28 kasaharajune jeetsukumaran kshen3778

seqenv's Issues

Bug with upui normalisation

Running sequences through the upui normalisation gives error:

seqenv AllSites_C05.fa --min_identity 0.95 --num_threads 32 --out_dir All_95_upui --min_coverage 0.95 --max_targets 100 --normalization upui            
seqenv version 1.1.4 (pid 60382)
The exact version of the code is: 4e407e1
Start at: 2016-05-03 13:34:45.570319
--> STEP 1: Parse the input FASTA file.
Elapsed time: 0:00:00.046785
Using: All_95_upui/renamed.fasta
--> STEP 2: Similarity search against the 'nt' database with 32 processes
Elapsed time: 0:01:33.193498
--> STEP 3: Filter out bad hits from the search results
Elapsed time: 0:00:00.017426
--> STEP 4: Parsing the search results
Elapsed time: 0:00:00.027052
--> STEP 5: Setting up the SQLite3 database connection.
Elapsed time: 0:00:00.000913
Got 4114 GI hits and 3851 of them had one or more EnvO terms associated.
--> STEP 6: Computing EnvO term frequencies.
Traceback (most recent call last):
  File "/home/chris/repos/seqenv/seqenv/seqenv", line 68, in <module>
    seqenv.Analysis(input_path, **kwargs).run()
  File "/home/chris/repos/seqenv/seqenv/analysis.py", line 151, in run
    self.outputs.make_all()
  File "/home/chris/repos/seqenv/seqenv/outputs.py", line 40, in make_all
    self.tsv_seq_to_concepts()
  File "/home/chris/repos/seqenv/seqenv/outputs.py", line 93, in tsv_seq_to_concepts
    content = self.df_seqs_concepts.to_csv(None, sep=self.sep, float_format=self.float_format)
  File "/home/chris/repos/seqenv/seqenv/common/cache.py", line 35, in retrieve_from_cache
    else: result = f(self)
  File "/home/chris/repos/seqenv/seqenv/outputs.py", line 53, in df_seqs_concepts
    df = pandas.DataFrame(self.a.seq_to_counts)
  File "/home/chris/repos/seqenv/seqenv/common/cache.py", line 35, in retrieve_from_cache
    else: result = f(self)
  File "/home/chris/repos/seqenv/seqenv/analysis.py", line 396, in seq_to_counts
    if not results: raise Exception("We found no isolation sources with your input. Sorry.")
Exception: We found no isolation sources with your input. Sorry.

Output files seq_to_concepts.tsv and seq_to_names.tsv identical

It appears that the seq_to_names.tsv output file does not have the envo ids correctly mapped to the names and is therefore identical to seq_to_concepts.tsv.

Bug if output_dir not set

If output_dir is not specified program throws error:

Traceback (most recent call last):
  File "/home/chris/repos/seqenv/seqenv/seqenv", line 59, in <module>
    seqenv.Analysis(input_path, **kwargs).run()
  File "/home/chris/repos/seqenv/seqenv/analysis.py", line 128, in __init__
    if not os.path.exists(self.out_dir): os.makedirs(self.out_dir)
  File "/usr/lib64/python2.7/os.py", line 157, in makedirs
    mkdir(name, mode)
OSError: [Errno 17] File exists: 'AllSites_C05.fa/'

This seems to be because it attempts to create a directory with exactly the same name as the input file.

REAME.md file placement during install

Running 'pip install seqenv' places the seqenv program in /usr/local/bin. However, an error message occurs in the first run:

Traceback (most recent call last):
File "/usr/local/bin/seqenv", line 18, in
doc_params = re.findall('^### All parameters(.+?)###', readme_contents, flags=re.M|re.DOTALL)[0]
IndexError: list index out of range

I think this is an issue with the placement of the README.md file. Line 13 of /usr/local/bin/seqenv sets the README.md path as:

readme_path = current_dir + '../README.md'

This implies that the README.md file should be in /usr/local. In the case of my server, another README.md file was already there. I downloaded the seqenv README.md file from Github and copied it to /usr/local, and this seems to have resolved the issue. Seqenv now works.

Could this be avoided in the install, for example by placing the README.md file somewhere else, giving the README.md file a more unique name, or having a specific action to take when another README.md file is already in /usr/local?

Changing default number of sequences to use

Hi Lucas,

Rather than defaulting to using the 1000 most abundant sequences, could we just default to using all of them? The --N option can remain as is otherwise as a way for people to speed up analyses.

Thanks,
Chris

0 GI hits and 0 of them had one or more Env0 terms associated with test.fasta

Hi-

I've been trying for a few days now to get seqenv working without success, so I figured it's time to reach out.

I'm attempting to run seqenv using your test.fasta file (here). Against the nt database.

I can use local blastn, i.e.:
blastn -db /fdb/blastdb/nt -query test.fasta >> test.out.txt

And I get results for each OTU given, but when I run seqenv using the same input/db parameters:
seqenv test.fasta --search_db /fdb/blastdb/nt

Start at: 2017-10-30 12:31:20.710219
Got 0 GI hits and 0 of them had one or more EnvO terms associated.
--> STEP 6: Computing EnvO term frequencies.
Traceback (most recent call last):
  File "/home/krajacichbj/.conda/envs/krajpy/bin/seqenv", line 64, in <module>
    seqenv.Analysis(input_path, **kwargs).run()
  File "/home/krajacichbj/.conda/envs/krajpy/lib/python2.7/site-packages/seqenv/analysis.py", line 151, in run
    self.outputs.make_all()
  File "/home/krajacichbj/.conda/envs/krajpy/lib/python2.7/site-packages/seqenv/outputs.py", line 41, in make_all
    self.tsv_seq_to_concepts()
  File "/home/krajacichbj/.conda/envs/krajpy/lib/python2.7/site-packages/seqenv/outputs.py", line 108, in tsv_seq_to_concepts
    content = self.df_seqs_concepts.to_csv(None, sep=self.sep, float_format=self.float_format)
  File "/home/krajacichbj/.conda/envs/krajpy/lib/python2.7/site-packages/seqenv/common/cache.py", line 35, in retrieve_from_cache
    else: result = f(self)
  File "/home/krajacichbj/.conda/envs/krajpy/lib/python2.7/site-packages/seqenv/outputs.py", line 55, in df_seqs_concepts
    df = pandas.DataFrame(self.a.seq_to_counts)
  File "/home/krajacichbj/.conda/envs/krajpy/lib/python2.7/site-packages/seqenv/common/cache.py", line 35, in retrieve_from_cache
    else: result = f(self)
  File "/home/krajacichbj/.conda/envs/krajpy/lib/python2.7/site-packages/seqenv/analysis.py", line 398, in seq_to_counts
    if not results: raise Exception("We found no isolation sources with your input. Sorry.")
Exception: We found no isolation sources with your input. Sorry.

I get nothing back. Is this an issue with the seqenv installation/configuration?
blastn is in my path. When I installed seqenv I had to manually add a pygraphviz and orange from conda, but other than that things went smoothly.

I'm pretty new to python and unix is general, so if you need something more to help diagnose, please let me know.

Thanks for any help you can offer.

-Ben

Pre-September 2016 nt database

Do you have a source for a pre-September 2016 version of the nt database that I can use for seqenv?

Thank you,

Dan.

xapple / seqenv Goto Github PK

seqenv's People

Contributors

Stargazers

Watchers

Forkers

seqenv's Issues

Bug with upui normalisation

Output files seq_to_concepts.tsv and seq_to_names.tsv identical

Bug if output_dir not set

REAME.md file placement during install

Changing default number of sequences to use

0 GI hits and 0 of them had one or more Env0 terms associated with test.fasta

Pre-September 2016 nt database

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs