takaram / kofam_scan Goto Github PK

View Code? Open in Web Editor NEW

66.0 66.0 11.0 150 KB

CLI tool to annotate genes with KOfam

Home Page: https://www.genome.jp/tools/kofamkoala/

License: MIT License

Ruby 99.79% Emacs Lisp 0.21%

kofam_scan's People

Contributors

Stargazers

Watchers

Forkers

pythseq xuhuyang liupfskygre nousiaso mattoslmp yuchrming xinehc kaayv iq-scm cmkobel hyphaltip

kofam_scan's Issues

ruby/2.6.0/open3.rb:213:in `spawn': Permission denied (Errno::EACCES)

Hey @takaram

I am running the kofam_scan software giving the paths to parallel, hmmsearch to config.yml file. But I am getting the following error-

#my config.yml file-
profile: /work/student/jigyasa-arora/kofamscan/db/profiles

Path to the KO list file

ko_list: /work/student/jigyasa-arora/kofamscan/db/ko_list

Path to an executable file of hmmsearch

You do not have to set this if it is in your $PATH

hmmsearch: /apps/free72/hmmer/3.1b2/bin

Path to an executable file of GNU parallel

You do not have to set this if it is in your $PATH

parallel: /home/j/jigyasa-arora/local/parallel-20191022/src

#the code I am running-
$/work/student/jigyasa-arora/kofamscan/bin/kofamscan-1.1.0/exec_annotation -o 230-01-kopfam 230-01-prokka.faa

#error-
Traceback (most recent call last):
8: from /work/student/jigyasa-arora/kofamscan/bin/kofamscan-1.1.0/exec_annotation:7:in <main>' 7: from /work/student/jigyasa-arora/kofamscan/bin/kofamscan-1.1.0/lib/kofam_scan/cli.rb:21:in run'
6: from /work/student/jigyasa-arora/kofamscan/bin/kofamscan-1.1.0/lib/kofam_scan/executor.rb:8:in execute' 5: from /work/student/jigyasa-arora/kofamscan/bin/kofamscan-1.1.0/lib/kofam_scan/executor.rb:35:in execute'
4: from /work/student/jigyasa-arora/kofamscan/bin/kofamscan-1.1.0/lib/kofam_scan/executor.rb:74:in run_hmmsearch' 3: from /work/student/jigyasa-arora/kofamscan/bin/kofamscan-1.1.0/lib/kofam_scan/parallel.rb:27:in exec'
2: from /home/j/jigyasa-arora/lib/ruby/2.6.0/open3.rb:101:in popen3' 1: from /home/j/jigyasa-arora/lib/ruby/2.6.0/open3.rb:213:in popen_run'
/home/j/jigyasa-arora/lib/ruby/2.6.0/open3.rb:213:in `spawn': Permission denied - /home/j/jigyasa-arora/local/parallel-20191022 (Errno::EACCES)

Issue when I run the analysis

Hello,

I've tried the command the tutorial suggests
$ ./exec_annotation -o output.tsv input.fasta but I when I run it on my terminal, I get the message "Error: KO list not given"

I've also tried this command:

$ ./exec_annotation -p /path/to/profiles/directory -k /path/to/ko_list/directory -o output.tsv input.fasta --cpu=8

but I get this message

/home/hanna/kofamscan/bin/kofam_scan-1.3.0/lib/kofam_scan/ko.rb:11:in gets': Is a directory @ io_fillbuf - fd:5 /home/hanna/kofamscan/db/ (Errno::EISDIR)
from /home/hanna/kofamscan/bin/kofam_scan-1.3.0/lib/kofam_scan/ko.rb:11:in parse' from /home/hanna/kofamscan/bin/kofam_scan-1.3.0/lib/kofam_scan/executor.rb:80:in block in parse_ko'
from /home/hanna/kofamscan/bin/kofam_scan-1.3.0/lib/kofam_scan/executor.rb:80:in open' from /home/hanna/kofamscan/bin/kofam_scan-1.3.0/lib/kofam_scan/executor.rb:80:in parse_ko'
from /home/hanna/kofamscan/bin/kofam_scan-1.3.0/lib/kofam_scan/executor.rb:22:in execute' from /home/hanna/kofamscan/bin/kofam_scan-1.3.0/lib/kofam_scan/executor.rb:8:in execute'
from /home/hanna/kofamscan/bin/kofam_scan-1.3.0/lib/kofam_scan/cli.rb:21:in run' from ./exec_annotation:7:in

I've searched for a way to solve this issue but none of the solutions worked. 😣

parallel.rb:28:in `write': Broken pipe (Errno::EPIPE)

Hi,

I'm on a HPC system. That means users don't have write permissions to the kofam folder.
When I execute the following:

./exec_annotation -o result -p $profile -k $ko $FA

I get

Ignoring etc-1.2.0 because its extensions are not built. Try: gem pristine etc --version 1.2.0
Traceback (most recent call last):
        10: from ./exec_annotation:7:in `<main>'
         9: from /scratch/students/apptest/2022-09-23-kofam/kofambin/lib/kofam_scan/cli.rb:21:in `run'
         8: from /scratch/students/apptest/2022-09-23-kofam/kofambin/lib/kofam_scan/executor.rb:8:in `execute'
         7: from /scratch/students/apptest/2022-09-23-kofam/kofambin/lib/kofam_scan/executor.rb:35:in `execute'
         6: from /scratch/students/apptest/2022-09-23-kofam/kofambin/lib/kofam_scan/executor.rb:107:in `run_hmmsearch'
         5: from /scratch/students/apptest/2022-09-23-kofam/kofambin/lib/kofam_scan/parallel.rb:27:in `exec'
         4: from /home/apps/ruby/2.7.2/lib/ruby/gems/2.7.0/gems/open3-0.1.1/lib/open3.rb:102:in `popen3'
         3: from /home/apps/ruby/2.7.2/lib/ruby/gems/2.7.0/gems/open3-0.1.1/lib/open3.rb:227:in `popen_run'
         2: from /scratch/students/apptest/2022-09-23-kofam/kofambin/lib/kofam_scan/parallel.rb:28:in `block in exec'
         1: from /scratch/students/apptest/2022-09-23-kofam/kofambin/lib/kofam_scan/parallel.rb:28:in `puts'
/scratch/students/apptest/2022-09-23-kofam/kofambin/lib/kofam_scan/parallel.rb:28:in `write': Broken pipe (Errno::EPIPE)

any idea or hint in the right direction would be appreciated.

Best regards

What is a good e-value cut-off?

Hello!

It's not clear to me what a good e-value cut-off is to use?

Is there guidance?

Thanks
Mick

config.rb:26:in `initialize': undefined method `map' for "profile:profiles":String (NoMethodError) }.merge(initial_values.map { |k, v| [k.intern, v] }.to_h) Did you mean? tap

Hi,
i set and download kofam_scan-1.3.0 as follow steps:
conda create -n kofamscan
conda activate kofamscan
conda install -c conda-forge ruby
conda install -c bioconda hmmer
conda install -c conda-forge parallel

mkdir KofamKOALA && cd KofamKOALA
wget https://www.genome.jp/ftp/db/kofam/ko_list.gz
wget https://www.genome.jp/ftp/db/kofam/profiles.tar.gz
gunzip ko_list.gz
tar -xzvf profiles.tar.gz

but when i run "exec_annotation -o Dlongan.querryko --cpu 2 --format mapper -E 1e-5 d.pep.fasta " report an error as follws:
config.rb:26:in initialize': undefined method map' for "/path/profiles":String (NoMethodError)
}.merge(initial_values.map { |k, v| [k.intern, v] }.to_h)
^^^^
Did you mean? tap

I have no idea how to slove this problem, can you give me some help, thanks.

wen

Unknown KO Error

Hello! I've come across a strange issue that happens when running multiple instances of kofamscan at once. Error: Unknown KO: /dev/null is thrown for one of the instances running wile the other completes. This is making it difficult to run in batches. Is there a way around this problem?

commmand:

./exec_annotation --cpu 20 -p ../../../2023/annotation_comparison/analysis/profiles/ -k ../../../2023/annotation_comparison/analysis/ko_list --format mapper -o ../../kegg_annotations/147893.tsv ../../translated_files/GTDB/faa/147893.faa

Mmseqs profiles for speed up

Dear @takaram Do you think it would be possible to make a mmseqs2 implementation of kofam_scan. mmseqs2 profiles work analogous to hmmer profiles but should be 300 times faster.

I can help with the implementation.

What do you think?

Guidance for KOfam score threshold adjustment (threshold-scale)

Hello,

I have a question regarding the adaptive score threshold of the KOfam database and the possibility to adjust it in KofamScan with the option -T, --threshold-scale.

-T, --threshold-scale=VALUE
The score thresholds are multiplied by VALUE. For example, with -T2 option, the thresholds become twice as strict.

Do you have some guidance or experience whether to adjust the score threshold or how to chose it?

In the paper of Takuya Aramaki et al. with the title "KofamKOALA: KEGG Ortholog assignment based on profile HMM and adaptive score threshold" (DOI https://doi.org/10.1093/bioinformatics/btz859) I read that the KOfam database of profile hidden Markov models (pHMMs) contains an adaptive score threshold that is pre-computed for every KO family and if a new sequence has a hmmsearch score above this threshold it is considered a reliable match and KoFam scan highlights it with an asterisk (*) in the output file.
It is described that the adaptive score threshold is determined by maximizing the F-measure over a positive and negative datasets' sequence similarity scores (bit scores) and computed pHMMs.
Thus, the adaptive score threshold is a criterion to assign a KO to new sequences.

I am using KofamScan for my own project and I wonder whether it makes sense or it is advisable to relax the score threshold, and vice-versa to make the threshold more strict with KofamScan's -T, --threshold-scale option?

Can you share some guidance or your experience about this option please?
Thanks and BR,
Bernhard

Prokaryotic genome, eukaryotic genome, viral genome

Hello, can this tool be applied to prokaryotic genome, eukaryotic genome and viral genome?

Thanks for your reply!

Best wishes！

[Question] What criteria are used to determine significant HMMER hits?

I'm looking at the code here but not very familiar w/ Ruby:

module KofamScan
  class Result
    extend Autoload

    autoload :WithEvalueThreshold
    autoload :WithThresholdScale
    autoload :WithThresholdScaleAndEvalueThreshold

    def self.create(query_list, threshold_scale: nil, e_value_threshold: nil)
      if threshold_scale && e_value_threshold
        WithThresholdScaleAndEvalueThreshold.new(query_list, threshold_scale, e_value_threshold)
      elsif e_value_threshold
        WithEvalueThreshold.new(query_list, e_value_threshold)
      elsif threshold_scale
        WithThresholdScale.new(query_list, threshold_scale)
      else
        Result.new(query_list)
      end
    end

Here's what ko_list looks like:

knum	threshold	score_type	profile_type	F-measure	nseq	nseq_used	alen	mlen	eff_nseq	re/pos	definition
K00001	357.90	domain	all	0.256163	2367	1915	1975	464	14.05	0.590	alcohol dehydrogenase [EC:1.1.1.1]
K00002	443.17	full	all	0.430391	2376	2273	5878	503	7.51	0.590	alcohol dehydrogenase (NADP+) [EC:1.1.1.2]
K00003	286.37	domain	all	0.945500	6268	5369	3257	782	7.03	0.590	homoserine dehydrogenase [EC:1.1.1.3]
K00004	369.60	domain	trim	0.809403	1572	1320	1364	436	5.41	0.590	(R,R)-butanediol dehydrogenase / meso-butanediol dehydrogenase / diacetyl reductase [EC:1.1.1.4 1.1.1.- 1.1.1.303]
K00005	320.93	full	all	0.984022	1449	1051	682	366	1.99	0.590	glycerol dehydrogenase [EC:1.1.1.6]
K00006	316.47	full	all	0.899202	2118	1971	3274	549	4.64	0.590	glycerol-3-phosphate dehydrogenase (NAD+) [EC:1.1.1.8]
K00007	520.80	full	all	0.998236	358	283	834	469	1.01	0.589	D-arabinitol 4-dehydrogenase [EC:1.1.1.11]
K00008	420.17	full	all	0.500025	3737	3281	3597	524	8.29	0.590	L-iditol 2-dehydrogenase [EC:1.1.1.14]
K00009	147.27	full	trim	0.997067	1556	1192	1443	459	3.28	0.590	mannitol-1-phosphate 5-dehydrogenase [EC:1.1.1.17]

I'm seeing a score threshold but not e-value threshold. Can you elaborate on the scheme used for determining significant hits?

Ruby error when running the kofam workflow

I'm using the eukaryotes profile (profiles/eukaryote.hal)

and all the programs are in my path.

Thanks for your help!

[bobbieshaban@spartan-bm067 kofam]$ ./exec_annotation -o result.txt R5300_S1.prodigal.proteins.fa --cpu=10
Traceback (most recent call last):
        6: from .../exec_annotation:7:in `<main>'
        5: from ../kofam/lib/kofam_scan/cli.rb:16:in `run'
        4: from .../kofam/lib/kofam_scan/config.rb:11:in `load'
        3: from .../psych.rb:349:in `safe_load'
        2: from .../psych.rb:390:in `parse'
        1: from ..../psych.rb:456:in `parse': (<unknown>): did not find expected key while parsing a block mapping at line 4 column 1 (Psych::SyntaxError)

suggestion to implement a "tmp directory already exists" warning

Hiya!

First, thanks for giving us all an avenue beyond the web interface for KO annotations :)

I wanted to suggested adding a check to see if the default ./tmp directory exists before launching a run. This was very much me not paying enough attention and making a silly mistake, but I've been running kofamscan simultaneously for several assemblies called from the same directory for a couple days, and just realized now they were all constantly overwriting the same files in the ./tmp directory, making it all meaningless 😬

It might save someone else from being as silly as I was if when the command is executed the program checked to see if the temporary directory that will be used (whether default or specified by the user) already exists, and if it does, exit and tell the user.

Just a suggestion, thanks again!

Bacteria & Archaea hmms

Hi,

I have notice you have separate eukaryote and prokaryote, do you have plans on splitting prokaryote into bacteria and archaea?

Thanks,
Jie

default value of --threshold-scale?

Hi,

I am using kofan_scan 1.2.0, I am wondering if not set --threshold-scale, that means using the precomputed score as cutoff which means --threshold-scale set to 1, am I understanding it in a right way?

Thank u

prokaryote

Hello,
I was wondering why, even if I select Prokaryote as a profile, I still have annotations relative to eukaryotes.
For example, I get genes related to the "Longevity regulating pathway - worm [PATH:ko04212]".

Is it because the same gene can be involved in different pathways between Eukaryotes and Procaryotes, or am I setting the profiles wrong?

I have done it in this way:
--PROFILE=/my_path...../Kofam/profiles/prokaryote.hal

Thanks for the awesome tool!
Gabri

default evalue?

what is the default e value? i could not find it in the readme? Is it 0.01 as with the online version? (https://www.genome.jp/tools/kofamkoala/)

Core dumped error when running kofamscan

Hi @takaram
I am continuously facing an issue of 'core dumped', when running kofamscan. I have downloaded hmm profiles and kofamscan-1.3.0 tool with ko_list recently, and I am following this tutorial-
https://taylorreiter.github.io/2019-05-11-kofamscan/

I am running the following command:
./exec_annotation -p /path/to/profiles/ -k /path_to/ko_list -o out.tsv final_proteins.faa --cpu 20

I have attached the error file .txt for your reference, if that may help. Appreciate, if anyone can help, and do let me know if further details are required.

kofam_80Y_error.txt

is kofam scan running correctly - tmp directory

Hi there,

I am just updated kofam 1.2 to 1.3, running on a HPC. Usually when i run the program the tmp directory that is created disappears after the job has finished. In the new version 1.3 - the job finishes but the tmp directory is still there. Is this normal? My slurm file has no debugging info so i can't check if everything completed correctly.

I complied the program on our HPC as follows:
virtualenv ~/kofamscan
module load gcc/7.3.0 nixpkgs/16.09 hmmer/3.2.1 ruby/2.6.1 parallel/20160722
source kofamscan/bin/activate

cd kofamscan/bin
wget ftp://ftp.genome.jp/pub/tools/kofam_scan/kofam_scan-1.3.0.tar.gz
tar xvzf kofam_scan-1.3.0.tar.gz
mkdir db
cd db
wget ftp://ftp.genome.jp/pub/db/kofam/ko_list.gz
wget ftp://ftp.genome.jp/pub/db/kofam/profiles.tar.gz
gunzip ko_list.gz
tar xvzf profiles.tar.gz

Edit config.yml - added paths to ko and profiles.

I am running the program as follows:
#!/bin/bash
#SBATCH --mem=xxxx
#SBATCH --nodes=1
#SBATCH --account=xxx
#SBATCH --cpus-per-task=32
#SBATCH --time=0-23:59
#SBATCH --job-name=xxx

module load gcc/7.3.0 nixpkgs/16.09 hmmer/3.2.1 ruby/2.6.1 parallel/20160722
source /home/user/kofamscan/bin/activate
/home/user/kofamscan/bin/kofam_scan-1.3.0/./exec_annotation -o /path_to_output_dir/kofamscan.txt /path_to_orf_file/xxx.faa --tmp-dir=DIR

How to deal hmm files without a threshold & F-measure info

Hi,

How does kofam_scan deal with hmm file that without "threshold score_type profile_type F-measure" these 4 columns information in file "ko_list", it seems to me that there will never be a valid hit of those hmms.

Thank you,
Jie

Reorganize code and don't use require_relative in exec_annotation

The current organization of the code makes the code a bit hard to install because the main script (exec_annotation) use require_relative to find the lib/ directory (therefore the lib/ directory and exec_annotation must be in the same directory).

I don't know the conventions in the Ruby community but if I trust this book, the main script (exec_annotation) :

should be in the bin/ directory
could perform some magic (modification of $LOAD_PATH) to find the lib/ directory (I'm not really fond of this idea; I would rather rely on the RUBYLIB variable and modify LOAD_PATH if this fails)
use require 'lib/kofam_scan' instead of require_relative 'lib/kofam_scan'

This would simplify installation in our environment for example and probably in most environments too.

Incorrect hmmscan settings cause popen3 to hang rather than an error

I have manually assigned hmmscan in the config.yml file. Recently discovered that when it is pointing to a broken path, the call to exec_annotation will hang up forever at lib/kofam_scan/parallel.rb:28 rather than throw a file not found error.

wrong number of arguments

~/software/KOALA/kofam_scan-1.3.0/bin/exec_annotation -o ./kofam.out -p ~/software/KOALA/kofam_scan-1.3.0/db/profiles -k ~/software/KOALA/kofam_scan-1.3.0/db/profiles/eukaryote.hal --cpu 5 --tmp-dir ./tmp -E 1e-5 -f detail-tsv Pshi.short.pep
Traceback (most recent call last):
11: from /home/jingtian/software/KOALA/kofam_scan-1.3.0/bin/exec_annotation:7:in <main>' 10: from /home/jingtian/software/KOALA/kofam_scan-1.3.0/bin/lib/kofam_scan/cli.rb:21:in run'
9: from /home/jingtian/software/KOALA/kofam_scan-1.3.0/bin/lib/kofam_scan/executor.rb:8:in execute' 8: from /home/jingtian/software/KOALA/kofam_scan-1.3.0/bin/lib/kofam_scan/executor.rb:22:in execute'
7: from /home/jingtian/software/KOALA/kofam_scan-1.3.0/bin/lib/kofam_scan/executor.rb:80:in parse_ko' 6: from /home/jingtian/software/KOALA/kofam_scan-1.3.0/bin/lib/kofam_scan/executor.rb:80:in open'
5: from /home/jingtian/software/KOALA/kofam_scan-1.3.0/bin/lib/kofam_scan/executor.rb:80:in block in parse_ko' 4: from /home/jingtian/software/KOALA/kofam_scan-1.3.0/bin/lib/kofam_scan/ko.rb:12:in parse'
3: from /home/jingtian/software/KOALA/kofam_scan-1.3.0/bin/lib/kofam_scan/ko.rb:12:in each_line' 2: from /home/jingtian/software/KOALA/kofam_scan-1.3.0/bin/lib/kofam_scan/ko.rb:15:in block in parse'
1: from /home/jingtian/software/KOALA/kofam_scan-1.3.0/bin/lib/kofam_scan/ko.rb:15:in new' /home/jingtian/software/KOALA/kofam_scan-1.3.0/bin/lib/kofam_scan/ko.rb:35:in initialize': wrong number of arguments (given 1, expected 12) (ArgumentError)

undefined method `end_with?' for nil:NilClass (NoMethodError)

I'm trying to run kofam_scan with ruby 2.6.6, and I got the error as below:

Traceback (most recent call last):
        5: from ./exec_annotation:7:in `<main>'
        4: from /work/LAS/mash-lab/jing/bin/kofam_scan/lib/kofam_scan/cli.rb:21:in `run'
        3: from /work/LAS/mash-lab/jing/bin/kofam_scan/lib/kofam_scan/executor.rb:8:in `execute'
        2: from /work/LAS/mash-lab/jing/bin/kofam_scan/lib/kofam_scan/executor.rb:35:in `execute'
        1: from /work/LAS/mash-lab/jing/bin/kofam_scan/lib/kofam_scan/executor.rb:104:in `run_hmmsearch'
/work/LAS/mash-lab/jing/bin/kofam_scan/lib/kofam_scan/executor.rb:152:in `lookup_profiles': undefined method `end_with?' for nil:NilClass (NoMethodError)

Parsing output file

Has anybody else had issues when trying to read the output files?

They are not always fixed-width. Specifically, a high value in the "thrshld" column will push the next columns ("score", "E-Value" etc.) down a space. This makes it impossible to read correctly (see line 6 in the example).
Trying to use at least x2 spaces as a delimiter doesn't work either as some columns have only one space between them.

I would be so grateful if anybody had a solution! :)
output_file.txt

hmmsearch was not run successfully

Hi There,

I am running a protein annotation with kofamscan. An error message keep showing up that "hmmsearch was not run successfully". After successfully running with another .fasta file, I realized that something goes wrong with my original file. Although I finally found that ONE sequence caused that problem and ran successfully by removing it, I still don't know why.

Could anyone help explain it?

Thanks~

The sequence attached below:

VIRSorter_k141_1676536_flag=0_multi=16_9953_len=13692-cat_2_4 # 6084 # 13691 # -1 # ID=10186_4;partial=01;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.465
NGEAGTSGTSGISGINGTNGLNGTGGSSGTSGLSGVDGTSGTAGTSGTSGYSGTDGTSGT
SGISGADGMPGTSGTSGISGVDGTSGTSGINGTSGSSGTTGTSGSSGTSGISGVDGTSGT
SGLSGVDGTSGSSGTSGSSGTSGISGVDGTSGTAGTSGSSGTSGTSGISGIDGTSGSSGT
NGTSGSSGTSGISGVDGTSGTAGTSGTSGIDGTSGTSGISGVDGTSGTSGTSGISGVDGV
DGTNGTSGTSGISGVDGTSGTAGSSGTSGTTGTSGSSGTSGISGVDGTSGSSGTSGTSGI
DGTSGTSGISGVDGTSGTSGTSGSSGTSGTSGISGVDGTSGTNGSSGTSGSSGTAGTSGT
SGISGVDGTSGTSGTGTSGTSGTSGTVGTSGSSGSSGTSGISGANGEAGTSGTSGISGLN
GTNGLNGTGGSSGTSGISGVDGTSGTAGTSGTSGYSGTDGTSGTSGISGADGMPGTSGTS
GISGVDGTSGTSGTTGTSGTSGTTGTSGSSGTSGISGVDGTSGSSGTSGTSGISGVDGTS
GTSGSSGTSGTSGTSGTSGTSGISGVDGTSGSSGTSGSSGTSGSSGTSGISGINGTNGSS
GTSGISGVDGTSGTSGIDGTSGTSGIDGTSGTSGISGINGTSGTNGSSGSSGTSGLSGVD
GTSGTSGIDGTSGTSGIDGTSGTSGISGINGTSGTNGSSGSSGTSGISGVDGTSGTSGSS
GSSGTSGISGVDGTSGTSGISGIDGTSGTAGTSGTSGVDGTSGTSGISGINGTNGSSGTS
GVSGVDGTSGTSGLDGTHGTSGTTGTSGSSGTSGISGANGEAGTSGTSGISGINGTNGIA
GTGGSSGTSGISGVDGTSGTAGTSGTSGYSGTDGTSGTSGISGADGMPGTSGSSGTSGLS
GVDGTSGTAGTSGSSGTSGTTGTSGSSGTSGISGVDGTSGTAGTSGTSGISGVDGTSGSS
GTSGSSGTSGSSGTSGTSGISGVDGTSGSSGTSGSSGTSGSSGTSGISGINGTNGSSGTS
GISGVDGTSGTSGIDGTSGTSGINGTSGTSGISGVDGTSGTNGSSGSSGTSGLSGVDGTS
GTSGIDGTSGTSGIDGTSGTSGISGINGTSGTNGSSGSSGTSGLSGVDGTSGTAGTSGSS
GTSGISGVDGTSGTSGISGVDGTSGTAGTSGTSGVNGTSGTSGISGINGTNGSSGTSGIS
GVDGTSGTSGLDGTHGTSGSSGTSGTSGSSGTSGISGANGEAGTSGTSGISGVAGTNGIA
GTGGSSGTSGLSGVDGTSGTAGTSGTSGYSGTDGTSGTSGISGADGMPGTSGTSGTNGSS
GTSGLSGVDGTSGTSGTNGTSGSSGTNGSSGTSGTSGTSGISGVDGTSGTAGSSGTSGSS
GTSGLSGVDGTSGSSGTSGSSGTSGSSGTSGTSGISGVDGTSGTSGSSGTSGSSGTSGIS
GVDGTSGTSGSSGTSGIDGTSGTTGTSGISGISGTSGTNGTSGSSGTSGISGVDGTSGSS
GTSGDAGTSGTSGITGTSGISGISGTSGTNGSSGSSGTSGLSGVDGTSGTSGSSGTSGTT
GTSGTSGISGVDGTSGTSGSAGTSGTSGVDGTSGVSGVSGINGTNGSSGTSGISGVDGTS
GTSGTVGTSGTSGTNGTSGSSGTSGISGANGEAGTSGTSGISGINGTAGRQGTGGSSGTS
GVSGVDGTSGTAGTSGTSGISGTTGTSGTSGISGADGMPGTSGTSGINGTSGSSGTSGSS
GTSGSSGTSGISGINGTNGTSGISGVDGTSGSSGTSGTSGSSGTSGSSGTSGISGINGTN
GSSGTSGISGVDGTSGTSGSSGTSGSSGTSGSSGTSGSSGTSGISGVDGTSGSSGTSGIS
GVDGTSGTSGTSGSSGTSGSSGTSGSSGTSGTSGISGVDGTNGTSGTSGTSGSSGTSGSS
GTSGSSGSSGTSGISGVDGTSGSSGTSGSSGTSGISGVDGTSGTSGTSGSSGTSGSSGTS
GSSGTSGISGVNGTSGSSGTSGISGVDGTSGTAGTSGSSGTSGSAGTSGSSGTSGISGIN
GTSGTNGSSGSSGTSGVDGTSGTSGSNGTSGSSGTSGISGANGAPGTSGTSGLSGVDGTS
GTAGTSGSSGTSGSSGTSGISGVDGTSGTAGSSGTSGSSGTSGSSGTSGSSGTSGISGIN
GTSGSSGTSGSSGTSGTSGTSGSSGTSGTSGISGVDGTSGSSGTNGTSGTSGTKGTSGTS
GSSGTSGSSGSSGTSGISGINGTSGSSGTSGISGVDGTSGTAGSSGTSGTSGTSGIDGTN
GTSGSSGTSGISGINGTNGSSGTSGISGVDGIDGTSGSSGTNGTSGSSGTSGISGANGAP
GTSGTSGLSGVSGISGTNGTSGTSGTSGTTGTSGISGLNGTTGTSGTSGTGFSAILNATN
NRLITSDGTQTNAVAEANLTFDGEILNLAGVFKSKTGEGSSITANTLLYAADTALGNGWI
IDYVVKATTGVAMRTGTILAVTDGIDVTFTETSSPDLGASTAAVTFGLTINSTDLEIAAN
ISFGTWDVKVAVRVI*

Which filtering parameters are recommended？

Hello, I annotated the gene set with kofamscan software.

How many evalue and score filtering results do you recommend?

Thanks
shenglong

Error: Unknown KO:

Hi,
I've seen a few Errors recently and wondered if you knew what might be causing them, or better still, how to fix them!
Error message examples:
kofamscan/bin/lib/kofam_scan/executor.rb:54:in delete': No such file or directory @ apply2files - ./tmp/tabular/K20351 (Errno::ENOENT) ... kofamscan/bin/lib/kofam_scan/executor.rb:62:in initialize': No such file or directory @ rb_sysopen - ./tmp/tabular/K06344 (Errno::ENOENT)

Otherwise, it outputs a file just fine with plenty of hits, but I'm seeing more than a few of these.

Best wishes and thanks,
Tim

bin/lib/kofam_scan/parallel.rb:28:in `write': Broken pipe (Errno::EPIPE)

Thanks for providing so useful tool!
When I used it, I encountered a error as follows:

/mdata/xxx/software/miniconda3/bin/lib/kofam_scan/parallel.rb:28:in write': Broken pipe (Errno::EPIPE) from /mdata/xxx/software/miniconda3/bin/lib/kofam_scan/parallel.rb:28:in puts'
from /mdata/xxx/software/miniconda3/bin/lib/kofam_scan/parallel.rb:28:in block in exec' from /mdata/xxx/software/miniconda3/lib/ruby/2.5.0/open3.rb:205:in popen_run'
from /mdata/xxx/software/miniconda3/lib/ruby/2.5.0/open3.rb:95:in popen3' from /mdata/xxx/software/miniconda3/bin/lib/kofam_scan/parallel.rb:27:in exec'
from /mdata/xxx/software/miniconda3/bin/lib/kofam_scan/executor.rb:107:in run_hmmsearch' from /mdata/xxx/software/miniconda3/bin/lib/kofam_scan/executor.rb:35:in execute'
from /mdata/xxx/software/miniconda3/bin/lib/kofam_scan/executor.rb:8:in execute' from /mdata/xxx/software/miniconda3/bin/lib/kofam_scan/cli.rb:21:in run'
from /mdata/xxx/software/miniconda3/bin/exec_annotation:7:in `
'

Do you have any idea about how to solve this problem?
Thanks so much!

Pandeng

Hit KO that not found in KEGG web?

Hi takaram,

I found a K02487 hit in my result, and it doesnt has a match in KEGG web.
Could you explain a little bit about this kind of hit?
Thanks.

query.fasta

Hi,

Am I understanding correctly that the query.fasta file is the sequence file to be analyzed? Or is it some kind of reference file?

profile.yml example

Hi takaram,

short question.
Can you give an example of the profile.yml file, per chance?

Thanks,

Thierry

The set of E-value (-E) did not work.

Hi, kofam_scan_colleagues:
Recently, I run the kofam_scan software using the following command line:
exec_annotation -o 5k+_assembled_hits_to_vHMMs_ko.txt 5k+_all_assemblied_contigs.part_329_prot.fasta -k ~/database/kofam/ko_list -p ~/database/kofam/profiles/ -E 0.00001 --cpu 5
But, it seems like the "-E 0.00001" did not work. The result file contained many hits which have the e-value higher than 0.00001.
I tried "-E 0.00001" and "-E 1.0e-05", but both can not work.
My input file was attached.
Thank you all and look forward to your reply.
5k+_all_assemblied_contigs.part_329_prot.txt

kofam_scan multiple outputs at same time

Is it possible to obtain more than one output at the same time? I mean, the mapper and de detail-tsv output.

The exec_annotaion can run on the directory that it belong instead of other directory

Hi guy,
Thank to develop the great tools for kegg annotation.
But i was so confusion about the exec_annotation shell script, on account of the exec_annotation only can be run on the folder that it belong. if i hope execute it in other directory with the absolute path in exec_annotation, it was complaint about below error:
internal:gem_prelude:1:in require': cannot load such file -- rubygems.rb (LoadError) from <internal:gem_prelude>:1:in internal:gem_prelude'
I had try to modifed the exec_annotation with below code, although it was error like above:
#!/usr/bin/env /Data/software/ruby-2.7.1/bin/ruby
# frozen_string_literal: true
#require_relative 'lib/kofam_scan'
require_relative '/Data/software/kofam_scan-1.3.0/lib/kofam_scan'
#require 'kofam_scan/cli'
require '/Data/software/kofam_scan-1.3.0/lib/kofam_scan/cli'
KofamScan::CLI.run(ARGV)
What should i do for this issue?
In addition, i want to exec it in everywhere by using the absolute path of exec_annotation. how can i configure it ?
Best,
Hanhuihong

How to compare multi hmm hits from one protein sequence

Hi takaram,

Once hit score is over the threshold, then this hit will be marked as valid hit, resulted in some multi hits for one sequence.
But, each hmm has its own threhold, which makes me a little bit confusing, as seems like these thresholds are not comparable to each other.

How would you suggest us to deal with multi hmm hits from one protein sequence?

Inconsistent output formatting

I am getting an output text file with inconsistent formatting, here is a snippet of some of the output I am getting.

Take the following snippet of text output,

NODE_2_length_14983_cov_10.199960_8	K01424
NODE_2_length_14983_cov_10.199960_9
NODE_2_length_14983_cov_10.199960_10	K02055

Running expand test.txt -t 1 > test.txt cleans up all my output files,

For reference I am using version kofamscan=1.3.0 and my script is,

  exec_annotation \
    --cpu $task.cpus \
    -k $ko_list \
    -p $profiles \
    -f mapper \
    -o ${sample}.kofam.txt \
    ${faa}

takaram / kofam_scan Goto Github PK

kofam_scan's People

Contributors

Stargazers

Watchers

Forkers

kofam_scan's Issues

Path to the KO list file

Path to an executable file of hmmsearch

You do not have to set this if it is in your $PATH

Path to an executable file of GNU parallel

You do not have to set this if it is in your $PATH

Recommend Projects

Recommend Topics

Recommend Org

Jobs