reynoldsk / pysca Goto Github PK

View Code? Open in Web Editor NEW

27.0 27.0 23.0 20.3 MB

A python implementation of the Statistical Coupling Analysis (SCA)

License: BSD 3-Clause "New" or "Revised" License

Jupyter Notebook 81.93% Python 1.47% HTML 15.50% CSS 0.30% JavaScript 0.77% Shell 0.03%

pysca's People

Contributors

Stargazers

Watchers

pysca's Issues

Using "|" as a delimiter in AnnotPfam is problematic

I have had trouble with constructing a dictionary of phylogenetic groups after annotation using the annotate_MSA script because some annotations contain a vertical bar symbol in the description string ex:

"DNLJ_ZYMMO|DNA ligase {ECO:0000255|HAMAP-Rule:MF_01588}|Zymomonas mobilis subsp. mobilis (strain ATCC 31821 / ZM4 / CP4)|Bacteria,Proteobacteria,Alphaproteobacteria,Sphingomonadales,Sphingomonadaceae,Zymomonas."

A different delimiter is probably necessary.

Error!!! Something wrong with PDBid or path...

I found an issue related to the MSAsearch function in scaTools.py. I am new to python and just came up with a naive solution. But the output it not consist with that described in the pySCA tutorial.

Firstly I have ggsearch36 installed, which means I can use ggsearch36 in command lines :

$ ggsearch36
USAGE
 ggsearch36 [-options] query_file library_file
 ggsearch36 -help for a complete option list

DESCRIPTION
 GGSEARCH performs a global/global database searches
 version: 36.3.8 Jul, 2015

COMMON OPTIONS (options must preceed query_file library_file)
 -s:  scoring matrix;
 -f:  gap-open penalty;
 -g:  gap-extension penalty;
 -S   filter lowercase (seg) residues;
 -b:  high scores reported (limited by -E by default);
 -d:  number of alignments shown (limited by -E by default);
 -I   interactive mode;

And in python3, I tried
./scaProcessMSA.py Inputs/s1Ahalabi_1470_nosnakes.an -s 3TGI -c E -t -n
to do MSA for the S1A family.

It gives error outputs:

Trying MSASearch with ggsearch
Trying MSASearch with EMBOSS
Trying MSASearch with BioPython
Error!!! Something wrong with PDBid or path...

After debugging for days I found this issue comes from this line in scaTools.py:
i_0 = [i for i in range(len(hd)) if output.split('\t')[1] in hd[i]]

It is that the byte type is not consist with str. I just solved it by modifying it to:
i_0 = [i for i in range(len(hd)) if output.split(b'\t')[1] in bytes(hd[i],'utf-8')]

It works and could give the final MSA outputs, but with 205 positions for S1A instead of 245.

Thank you very much for your help.

TypeError: addition operator on list with integer

test_case.zip

Hello, running the following script (attached input files), I get the following error:

Code/./scaProcessMSA.py Inputs/aln-kaic.fasta --refseq ref_seq_KaiC_elongatus.fa -c A --output Outputs/KaiC1_processed.db

Using reference sequence but no position list provided! Just numbering positions 1 to length(sequence)
Traceback (most recent call last):
  File "Code/./scaProcessMSA.py", line 85, in <module>
    options.refpos = range(len(options.refseq))+1
TypeError: can only concatenate list (not "int") to list

I presume you meant options.refpos = range(len(options.refseq)+1). After changing that line, I run the same command and get the following error:

Using reference sequence but no position list provided! Just numbering positions 1 to length(sequence)
Using the reference sequence and position list...
Loaded alignment of 1194 sequences, 1530 positions.
Checking alignment for non-standard amino acids
Aligment size after removing sequences with non-standard amino acids: 1194
Trimming alignment for highly gapped positions (80% or more).
Alignment size post-trimming: 536 positions
Finding reference sequence using provided sequence file...
Trying MSASearch with ggsearch
Trying MSASearch with EMBOSS
Trying MSASearch with BioPython
Error!!  Can't find reference sequence...

In line 174, I print out h_tmp and s_tmp[0] and get

(['Elongatus_KaiC'], 'MTSAEMTSPNNNSEHQAIAKMRTMIEGFDDISHGGLPIGRSTLVSGTSGTGKTLFSIQFLYNGIIEFDEPGVFVTFEETPQDIIKNARSFGWDLAKLVDEGKLFILDASPDPEGQEVVGGFDLSALIERINYAIQKYRARRVSIDSVTSVFQQYDASSVVRRELFRLVARLKQIGATTVMTTERIEEYGPIARYGVEEFVSDNVVILRNVLEGERRRRTLEILKLRGTSHMKGEYPFTITDHGINIFPLGAMRLTQRSSNVRVSSGVVRLDEMCGGGFFKDSIILATGATGTGKTLLVSRFVENACANKERAILFAYEESRAQLLRNAYSWGMDFEEMERQNLLKIVCAYPESAGLEDHLQIIKSEINDFKPARIAIDSLSALARGVSNNAFRQFVIGVTGYAKQEEITGLFTNTSDQFMGAHSITDSHISTITDTIILLQYVEIRGEMSRAINVFKMRGSWHDKAIREFMISDKGPDIKDSFRNFERIISGSPTRITVDEKSELSRIVRGVQEKGPES')

Python version:
'2.7.13 |Enthought, Inc. (x86_64)| (default, Mar 2 2017, 08:20:50) \n[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]'

Biopython version:
1.71

Thanks in advance!

Can we use SCA for oligomeric protein ?

Main idea of SCA is to use MSA as a foundation stone. My question does it equally reliable for oligomeric protein?

pfamseq.txt

The ftp url provided for downloading pfamseq.txt (ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/database_files/) is not active. It would be helpful to know which release of pfamseq was used when writing the code for annotate MSA.py, as the current release does not seem to work.

scaSectorID.py issue with kpos

I have been having an issue with the scaSectors.py tool. When I input the command:
./scaSectorID.py ./PF00028.db
I received the error
Selected kpos=4 significant eigenmodes.
Traceback (most recent call last):
File "./scaSectorID.py", line 70, in
ics,icsize,sortedpos,cutoff,scaled_pd, pd = sca.icList(Vpica,kpos,Csca, p_cut=options.cutoff)
File "/home/dylan/pySca-master/scaTools.py", line 998, in icList
h_params = np.histogram(Vpica[:,k], nbins)
File "/home/dylan/anaconda2/lib/python2.7/site-packages/numpy/lib/function_base.py", line 719, in histogram
'bins must be an integer, a string, or an array')
TypeError: bins must be an integer, a string, or an array

I was wondering if anyone had insight into this error?

reynoldsk / pysca Goto Github PK

pysca's People

Contributors

Stargazers

Watchers

Forkers

pysca's Issues

Using "|" as a delimiter in AnnotPfam is problematic

Error!!! Something wrong with PDBid or path...

TypeError: addition operator on list with integer

Can we use SCA for oligomeric protein ?

pfamseq.txt

scaSectorID.py issue with kpos

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs