GithubHelp home page GithubHelp logo

pvdam3 / foec Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 3.0 375 KB

Fusarium oxysporum effector clustering (FoEC) is a pipeline to identify candidate effector genes in a set of F. oxysporum genomes and apply hierarchical clustering on the presence-absence pattern of these genes.

Home Page: http://onlinelibrary.wiley.com/doi/10.1111/1462-2920.13445/abstract

Python 75.03% R 24.97%

foec's People

Stargazers

 avatar  avatar

Watchers

 avatar

foec's Issues

signalP v5

Can you modify the script 01a... to take new format options from signalP v5?

unable to run 01a.mimpfinder_combine_to_putefflist_MetStop.py

I am trying to use Mimpfinder tool (01a.mimpfinder_combine_to_putefflist_MetStop.py) with command "python 01a.mimpfinder_combine_to_putefflist_MetStop.py -h" (to get options) and ending with an error as follows

Traceback (most recent call last):
File "01a.mimpfinder_combine_to_putefflist_MetStop.py", line 305, in
directory_folder = sys.argv[1]
IndexError: list index out of range

should i prepare config file separately? or i have to edit paths and parameter in the 01a.mimpfinder_combine_to_putefflist_MetStop.py?.

pls guide me to solve.

Could you please clarify one of the paths required?

Hello,
I am having a little trouble working out exactly what is required for the blastdatabasedir path. What should be contained within this directory? Is it the output of makeblastdb? If so, must I first perform makeblastdb on my genomes and why would these not be provided with the testdata?

The default of path in FoEC.py:

blastdatabasedir = '/Users/Peter/Documents/Sequences/Fo_genomes/blastdbs'

Thank you,
Jamie

Test data and example output for test data does not match in all_putative_effectors_concatenated.vs.all_putative_effectors_concatenated.blastout

Hello,

I have recently run the test data for this pipeline. The example test data and the data from my run matches in the output/output_[date]_[time]/01.mimpfinder/testdata_MetStopOut directory. However, an additional directory has been created in output/output_[date]_[time]/01.mimpfinder called testdata_AugustusOut. Furthermore, the output from my run on the test data and an example output for the test data isn’t matching exactly in output/output_[date]_[time]/02.cluster_putative_effectors/all_putative_effectors_concatenated.vs.all_putative_effectors_concatenated.blastout . There appear to be 14 extra BLAST hits from my run plus the e-values are slightly different for most. I can’t work out why this would be.

There are no error messages that I can see in the log for the FoEC.py run, and the same number of mimps appears to be found. The only difference is that FoEC.py seems to be creating a directory containing predictions from Augustus, whereas the example test data output does not have this directory. I think that some of the additional BLAST hits (8) are from this additional Augustus directory, but it does not explain all of the extra hits. Some of the hits also appear to be duplicated. I am quite stuck here, I can’t work out where the extra hits are coming from. Could it be due to different versions of BLAST+ being used? Would you be able to suggest an avenue for me to explore or why this may be happening?

BLAST+ version = 2.6

AUGUSTUS version = 3.3.3

command used python FoEC.py -i /home/u1983390/apps/FoEC/testdata

Testdata example all_putative_effectors_concatenated.vs.all_putative_effectors_concatenated.blastout

0001.MRFLLLIAMSMTWVCSIAG_Fol4287sample_d2m230_len163	0001.MRFLLLIAMSMTWVCSIAG_Fol4287sample_d2m230_len163	100.000	492	0	0	1	492	1	492	0.0	909	492
0002.MKLLALLALASPLVSA_Fol4287sample_d2m786_len124	0002.MKLLALLALASPLVSA_Fol4287sample_d2m786_len124	100.000	375	0	0	1	375	1	375	0.0	693	375
0003.MLFKIAWVSLFTTWAISVAA_Fol4287sample_d2m209_len232	0003.MLFKIAWVSLFTTWAISVAA_Fol4287sample_d2m209_len232	100.000	699	0	0	1	699	1	699	0.0	1291	699
0004.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m256_len55	0006.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m1213_len55	100.000	168	0	0	1	168	1	168	8.58e-89	311	168
0004.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m256_len55	0005.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m679_len55	100.000	168	0	0	1	168	1	168	8.58e-89	311	168
0004.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m256_len55	0004.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m256_len55	100.000	168	0	0	1	168	1	168	8.58e-89	311	168
0005.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m679_len55	0006.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m1213_len55	100.000	168	0	0	1	168	1	168	8.58e-89	311	168
0005.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m679_len55	0005.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m679_len55	100.000	168	0	0	1	168	1	168	8.58e-89	311	168
0005.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m679_len55	0004.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m256_len55	100.000	168	0	0	1	168	1	168	8.58e-89	311	168
0006.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m1213_len55	0006.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m1213_len55	100.000	168	0	0	1	168	1	168	8.58e-89	311	168
0006.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m1213_len55	0005.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m679_len55	100.000	168	0	0	1	168	1	168	8.58e-89	311	168
0006.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m1213_len55	0004.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m256_len55	100.000	168	0	0	1	168	1	168	8.58e-89	311	168
0001.MKLALIASILAAGCVA_Fom001sample_d2m431_len215	0001.MKLALIASILAAGCVA_Fom001sample_d2m431_len215	100.000	648	0	0	1	648	1	648	0.0	1197	648
0001.MDRTHRGRLSHVLVMLSGGALA_Foq001sample_d2m818_len54	0002.MDRTHRGRLSHVLVMLSGGALA_Foq001sample_d2m781_len54	100.000	165	0	0	1	165	1	165	3.92e-87	305165
0001.MDRTHRGRLSHVLVMLSGGALA_Foq001sample_d2m818_len54	0001.MDRTHRGRLSHVLVMLSGGALA_Foq001sample_d2m818_len54	100.000	165	0	0	1	165	1	165	3.92e-87	305165
0002.MDRTHRGRLSHVLVMLSGGALA_Foq001sample_d2m781_len54	0002.MDRTHRGRLSHVLVMLSGGALA_Foq001sample_d2m781_len54	100.000	165	0	0	1	165	1	165	3.92e-87	305165
0002.MDRTHRGRLSHVLVMLSGGALA_Foq001sample_d2m781_len54	0001.MDRTHRGRLSHVLVMLSGGALA_Foq001sample_d2m818_len54	100.000	165	0	0	1	165	1	165	3.92e-87	305165

Test data output for all_putative_effectors_concatenated.vs.all_putative_effectors_concatenated.blastout for my run

0001.MKLALIASILAAGCVA_Fom001sample_d2m431_len215	001.MKLALIASILAAGCVA_Fom001sample_d2m431_len225	100.000	648	0	0	1	648	1	648	0.0	1197	648
0001.MKLALIASILAAGCVA_Fom001sample_d2m431_len215	0001.MKLALIASILAAGCVA_Fom001sample_d2m431_len215	100.000	648	0	0	1	648	1	648	0.0	1197	648
0001.MRFLLLIAMSMTWVCSIAG_Fol4287sample_d2m230_len163	001.MRFLLLIAMSMTWVCSIAG_Fol4287sample_d2m230_len163	100.000	492	0	0	1	492	1	492	0.0	909	492
0001.MRFLLLIAMSMTWVCSIAG_Fol4287sample_d2m230_len163	0001.MRFLLLIAMSMTWVCSIAG_Fol4287sample_d2m230_len163	100.000	492	0	0	1	492	1	492	0.0	909	492
0002.MKLLALLALASPLVSA_Fol4287sample_d2m786_len124	0002.MKLLALLALASPLVSA_Fol4287sample_d2m786_len124	100.000	375	0	0	1	375	1	375	0.0	693	375
0003.MLFKIAWVSLFTTWAISVAA_Fol4287sample_d2m209_len232	002.MLFKIAWVSLFTTWAISVAA_Fol4287sample_d2m209_len232	100.000	699	0	0	1	699	1	699	0.0	1291	699
0003.MLFKIAWVSLFTTWAISVAA_Fol4287sample_d2m209_len232	0003.MLFKIAWVSLFTTWAISVAA_Fol4287sample_d2m209_len232	100.000	699	0	0	1	699	1	699	0.0	1291	699
0004.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m256_len55	0006.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m1213_len55	100.000	168	0	0	1	168	1	168	1.70e-88	311	168
0004.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m256_len55	0005.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m679_len55	100.000	168	0	0	1	168	1	168	1.70e-88	311	168
0004.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m256_len55	0004.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m256_len55	100.000	168	0	0	1	168	1	168	1.70e-88	311	168
0005.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m679_len55	0006.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m1213_len55	100.000	168	0	0	1	168	1	168	1.70e-88	311	168
0005.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m679_len55	0005.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m679_len55	100.000	168	0	0	1	168	1	168	1.70e-88	311	168
0005.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m679_len55	0004.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m256_len55	100.000	168	0	0	1	168	1	168	1.70e-88	311	168
0006.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m1213_len55	0006.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m1213_len55	100.000	168	0	0	1	168	1	168	1.70e-88	311	168
0006.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m1213_len55	0005.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m679_len55	100.000	168	0	0	1	168	1	168	1.70e-88	311	168
0006.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m1213_len55	0004.MHTEYLFLLLIPMGAVS_Fol4287sample_d2m256_len55	100.000	168	0	0	1	168	1	168	1.70e-88	311	168
0001.MDRTHRGRLSHVLVMLSGGALA_Foq001sample_d2m818_len54	0002.MDRTHRGRLSHVLVMLSGGALA_Foq001sample_d2m781_len54	100.000	165	0	0	1	165	1	165	7.75e-87	305	165
0001.MDRTHRGRLSHVLVMLSGGALA_Foq001sample_d2m818_len54	0001.MDRTHRGRLSHVLVMLSGGALA_Foq001sample_d2m818_len54	100.000	165	0	0	1	165	1	165	7.75e-87	305	165
0002.MDRTHRGRLSHVLVMLSGGALA_Foq001sample_d2m781_len54	0002.MDRTHRGRLSHVLVMLSGGALA_Foq001sample_d2m781_len54	100.000	165	0	0	1	165	1	165	7.75e-87	305	165
0002.MDRTHRGRLSHVLVMLSGGALA_Foq001sample_d2m781_len54	0001.MDRTHRGRLSHVLVMLSGGALA_Foq001sample_d2m818_len54	100.000	165	0	0	1	165	1	165	7.75e-87	305	165
001.MKLALIASILAAGCVA_Fom001sample_d2m431_len225	001.MKLALIASILAAGCVA_Fom001sample_d2m431_len225	100.000	727	0	0	1	727	1	727	0.0	1343	727
001.MKLALIASILAAGCVA_Fom001sample_d2m431_len225	0001.MKLALIASILAAGCVA_Fom001sample_d2m431_len215	100.000	648	0	0	1	648	1	648	0.0	1197	727
001.MRFLLLIAMSMTWVCSIAG_Fol4287sample_d2m230_len163	001.MRFLLLIAMSMTWVCSIAG_Fol4287sample_d2m230_len163	100.000	492	0	0	1	492	1	492	0.0	909	492
001.MRFLLLIAMSMTWVCSIAG_Fol4287sample_d2m230_len163	0001.MRFLLLIAMSMTWVCSIAG_Fol4287sample_d2m230_len163	100.000	492	0	0	1	492	1	492	0.0	909	492
002.MLFKIAWVSLFTTWAISVAA_Fol4287sample_d2m209_len232	002.MLFKIAWVSLFTTWAISVAA_Fol4287sample_d2m209_len232	100.000	699	0	0	1	699	1	699	0.0	1291	699
002.MLFKIAWVSLFTTWAISVAA_Fol4287sample_d2m209_len232	0003.MLFKIAWVSLFTTWAISVAA_Fol4287sample_d2m209_len232	100.000	699	0	0	1	699	1	699	0.0	1291	699
003.MAPYSMVLLGALSILGFGAYA_Fol4287sample_d2m1174_len154	003.MAPYSMVLLGALSILGFGAYA_Fol4287sample_d2m1174_len154	100.000	533	0	0	1	533	1	533	0.0	985	533
001.MRPSRMLLLLPLAVSVAT_Foq001sample_d2m459_len55	002.MRPSRMLLLLPLAVSVAT_Foq001sample_d2m422_len55	100.000	276	0	0	1	276	1	276	2.65e-148	510	276
001.MRPSRMLLLLPLAVSVAT_Foq001sample_d2m459_len55	001.MRPSRMLLLLPLAVSVAT_Foq001sample_d2m459_len55	100.000	276	0	0	1	276	1	276	2.65e-148	510	276
002.MRPSRMLLLLPLAVSVAT_Foq001sample_d2m422_len55	002.MRPSRMLLLLPLAVSVAT_Foq001sample_d2m422_len55	100.000	276	0	0	1	276	1	276	2.65e-148	510	276
002.MRPSRMLLLLPLAVSVAT_Foq001sample_d2m422_len55	001.MRPSRMLLLLPLAVSVAT_Foq001sample_d2m459_len55	100.000	276	0	0	1	276	1	276	2.65e-148	510	276

Thank you,
Jamie

unable to use FoEC.py

I am trying to use the FoEC.py in Python 2.7 environment as below for identifying the MIMP alone and getting error.

python FoEC.py -i /home/microbiology/FoEC2/16/ -o /home/microbiology/FoEC2/16
Traceback (most recent call last):
File "/home/microbiology/FoEC/01a.mimpfinder_combine_to_putefflist_MetStop.py", line 308, in
directory = directory_folder.split(folder)[0]
ValueError: empty separator
python /home/microbiology/FoEC/01a.mimpfinder_combine_to_putefflist_MetStop.py /home/microbiology/FoEC2/16/ /home/microbiology/FoEC2/16 contig_ 2500 25 600 2000 /home/microbiology/anaconda3/envs/py38/bin/signalp 0.550 256
Traceback (most recent call last):
File "/home/microbiology/FoEC/01b.mimpfinder_combine_to_putefflist_AUGUSTUS.py", line 399, in
directory = directory_folder.split(folder)[0]
ValueError: empty separator
python /home/microbiology/FoEC/01b.mimpfinder_combine_to_putefflist_AUGUSTUS.py /home/microbiology/FoEC2/16/ /home/microbiology/FoEC2/16 contig_ /home/microbiology/anaconda3/envs/py38/bin/augustus --AUGUSTUS_CONFIG_PATH=/home/microbiology/anaconda3/envs/py38/config 5000 25 600 2000 /home/microbiology/anaconda3/envs/py38/bin/signalp 0.550 256
Traceback (most recent call last):
File "/home/microbiology/FoEC/02.cluster_putefflists.py", line 82, in
outfilewriter = open(outfile, 'w')
IOError: [Errno 2] No such file or directory: '/home/microbiology/FoEC2/16/01.mimpfinder/16_MetStopOut/all_putative_effectors_clustered.fasta'
python /home/microbiology/FoEC/02.cluster_putefflists.py /home/microbiology/FoEC2/16/01.mimpfinder/16_MetStopOut/all_putative_effectors.fasta /home/microbiology/anaconda3/envs/py38/bin/blastdbs TRUE /usr/local/ncbi/blast/bin 256
Traceback (most recent call last):
File "/home/microbiology/FoEC/02.cluster_putefflists.py", line 82, in
outfilewriter = open(outfile, 'w')
IOError: [Errno 2] No such file or directory: '/home/microbiology/FoEC2/16/01.mimpfinder/16_AugustusOut/all_putative_effectors_clustered.fasta'
python /home/microbiology/FoEC/02.cluster_putefflists.py /home/microbiology/FoEC2/16/01.mimpfinder/16_AugustusOut/all_putative_effectors.fasta /home/microbiology/anaconda3/envs/py38/bin/blastdbs TRUE /usr/local/ncbi/blast/bin 256
cat: /home/microbiology/FoEC2/16/01.mimpfinder/16_MetStopOut/all_putative_effectors.fasta: No such file or directory
cat: /home/microbiology/FoEC2/16/01.mimpfinder/16_AugustusOut/all_putative_effectors.fasta: No such file or directory
cat /home/microbiology/FoEC2/16/01.mimpfinder/16_MetStopOut/all_putative_effectors.fasta /home/microbiology/FoEC2/16/01.mimpfinder/16_AugustusOut/all_putative_effectors.fasta > /home/microbiology/FoEC2/16/02.cluster_putative_effectors/all_putative_effectors_concatenated.fasta 256

// Running clustering script on file /home/microbiology/FoEC2/16/02.cluster_putative_effectors/all_putative_effectors_concatenated.fasta
---BLASTDB---
sh: 1: /usr/local/ncbi/blast/bin/makeblastdb: not found
/usr/local/ncbi/blast/bin/makeblastdb -dbtype nucl -in /home/microbiology/FoEC2/16/02.cluster_putative_effectors/all_putative_effectors_concatenated.fasta -out /home/microbiology/anaconda3/envs/py38/bin/blastdbs/all_putative_effectors_concatenated 32512 ---

sh: 1: /usr/local/ncbi/blast/bin/blastn: not found

// Found 0 putative effectors in these genomes:

python /home/microbiology/FoEC/02.cluster_putefflists.py /home/microbiology/FoEC2/16/02.cluster_putative_effectors/all_putative_effectors_concatenated.fasta /home/microbiology/anaconda3/envs/py38/bin/blastdbs FALSE /usr/local/ncbi/blast/bin 0
Traceback (most recent call last):
File "/home/microbiology/FoEC/03.local_blast_clustered_putefflist_to_pres-abs_table.py", line 76, in
genome_folder_a = genome_folder.split(genome_folder_b)[0]
ValueError: empty separator
python /home/microbiology/FoEC/03.local_blast_clustered_putefflist_to_pres-abs_table.py /home/microbiology/FoEC2/16/02.cluster_putative_effectors/all_putative_effectors_concatenated_clustered.fasta /home/microbiology/FoEC2/16/ /home/microbiology/anaconda3/envs/py38/bin/blastdbs /usr/local/ncbi/blast/bin /home/microbiology/FoEC2/16 30 blastn yes 256
[1] "-------------------------------"
[1] "//Executing R script for clustering and plotting into a tree"

Attaching package: ‘gplots’

The following object is masked from ‘package:stats’:

lowess

Error in library("ctc") : there is no package called ‘ctc’
Execution halted
Rscript /home/microbiology/FoEC/04.cluster_and_plot_heatmap3.R /home/microbiology/FoEC/heatmap.3.R /home/microbiology/FoEC2/16/03.blastn_presence_absence/blastn_presence_absence.txt /home/microbiology/FoEC2/16/04.cluster_and_plot/ 1 average 1 average 256

Starttime: 23.09.04_10h06m54
Endtime 23.09.04_10h06m55

Total time used for whole script: 0:00:01.779838

I am unable to understand the issue. pls help me to solve the issue

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.