broadinstitute / synerclust Goto Github PK

View Code? Open in Web Editor NEW

8.0 8.0 4.0 7.73 MB

source code for SynerClust

License: Other

Python 31.91% Shell 0.86% C 67.15% Dockerfile 0.08%

synerclust's People

Contributors

Stargazers

Watchers

Forkers

bhargava-morampalli tonymannion feigeliudan01 computational-genomics-lab

synerclust's Issues

FormatAnnotation_external.py GFF file format specificity

The FormatAnnotation_external.py helper script results in errors if the GFF3 file format/content deviates from the test dataset.

Specifcally, the script assumes "CDS" lines are preceded by "gene" lines. This is not always the case in prokaryote annotation, when done with Prodigal 2.6.3 (no "gene" lines by default) or Prokka 1.14 (no "gene" line by default, and added below "CDS" line when --addgenes flag is used in prokka).

I fixed this locally for my use case, but do not have a fix for the parser that is usable for the various GFF3 formats.

Support for pseudogenes in refseq gff3 files

In line 196 of FormatAnnotation_external.py, I had to change it to
if line[2] == "gene" or line[2] == "pseudogene":
Some genes were labeled as pseudogene instead of gene and it was affecting all downstream analysis.
I suggest updating the code to something like this to support gff3 files with pseudogenes.

Error building repo_spec

Hi,

I am having troubles with the first steps of SynerClust and cannot figure out what is going wrong. I am trying to apply it on a set of 80 genomes (mix of draft and complete), using a newick tree built with PhyloPhlan2 as input.

I always get the error message that one of my genomes is present in the tree but absent in repo_spec.

bin/synerclust.py -w wd/ -r wd/paths.txt -t wd/xanthomonadaceae.nwk --run single -n 3

Started
Wrote locus tags to locus_tag_file.txt
reading genome to locus
reading tree
[TREE.NWK]
parsing tree
Error: Genome
Stenotrophomonas_maltophilia_CFBP3035 found in the tree but not in the repo_spec.

I checked the spelling of the names between all the input files multiple times and nothing's wrong.

Here are the log files and paths files :
locus_tag_file.txt
needed_extractions.cmd.txt
paths.txt
run_SynerClust.log

Could you give me a hand to understand what's wrong here?

Thanks.

AttributeError: 'Graph' object has no attribute 'edge'

I get this error when starting to run SynerClust. It then fails and exits.
Traceback (most recent call last): File "/gsap/garage-bacterial/Users/Tim/SynerClust/bin/synerclust.py", line 197, in <module> main() File "/gsap/garage-bacterial/Users/Tim/SynerClust/bin/synerclust.py", line 107, in main myTree.rootTree(root_edge) File "/gsap/garage-bacterial/Users/Tim/SynerClust/bin/TreeLib.py", line 193, in rootTree re_weight = (self.tree.edge[root_edge[0]][root_edge[1]]['weight']) / 2.0 AttributeError: 'Graph' object has no attribute 'edge'

WF_RefineClusters.py errors at line 284 and stalls

I got this error message.
Traceback (most recent call last): File "/home/unix/tstraub/.conda/envs/synerclust/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap self.run() File "/gsap/garage-bacterial/Users/Tim/.conda/envs/synerclust/bin/WF_RefineClusters.py", line 58, in run identical_index = next_task(self.mrca, genes_to_cluster, self.cluster_counter, self.lock, ok_trees, identical_orphans_to_check, identical_orphans_to_check_dict, identical_index, potentials, self.minSynFrac, self.synteny) File "/gsap/garage-bacterial/Users/Tim/.conda/envs/synerclust/bin/WF_RefineClusters.py", line 284, in __call__ if pairs[k][1] == 0.0 and self.graph[n1][k]['identical'] == 1 and self.graph[k][n1]['identical'] == 1: KeyError: 0

I traced it to line 284. I edited the script to include checking for each key value before trying to access the dictionary with given value. I.e.

if pairs[k][1] == 0.0 and n1 in self.graph and k in self.graph[n1] and self.graph[n1][k]['identical'] == 1 and k in self.graph and n1 in self.graph[k] and self.graph[k][n1]['identical'] == 1:

This seemed to fix the error, though I did not verify that the tool performed the analysis as expected.

ClusterPostProcessing

Hello,
I am running Synerclust on the test Ecoli example files you provided but am having problems compiling the final output files.

The command given in the instructions indicates as follows
path/to/SynerClust/bin/ClusterPostProcessing.py genomes/ nodes/N____*****/locus_mappings.pkl n

However there is only locus_mapping.pkl files in the L_000000_* not in the N____* folders. is this a typo or should there be "locus_mapping.pkl" files in the N_* nodes folder?

Thanks, Blake

broadinstitute / synerclust Goto Github PK

synerclust's People

Contributors

Stargazers

Watchers

Forkers

synerclust's Issues

FormatAnnotation_external.py GFF file format specificity

Support for pseudogenes in refseq gff3 files

Error building repo_spec

AttributeError: 'Graph' object has no attribute 'edge'

WF_RefineClusters.py errors at line 284 and stalls

ClusterPostProcessing

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs