GithubHelp home page GithubHelp logo

reynoldsk / pysca Goto Github PK

View Code? Open in Web Editor NEW
27.0 27.0 23.0 20.3 MB

A python implementation of the Statistical Coupling Analysis (SCA)

License: BSD 3-Clause "New" or "Revised" License

Jupyter Notebook 81.93% Python 1.47% HTML 15.50% CSS 0.30% JavaScript 0.77% Shell 0.03%

pysca's People

Contributors

jamesmkrieger avatar olgais93 avatar reynoldsk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pysca's Issues

Using "|" as a delimiter in AnnotPfam is problematic

I have had trouble with constructing a dictionary of phylogenetic groups after annotation using the annotate_MSA script because some annotations contain a vertical bar symbol in the description string ex:

"DNLJ_ZYMMO|DNA ligase {ECO:0000255|HAMAP-Rule:MF_01588}|Zymomonas mobilis subsp. mobilis (strain ATCC 31821 / ZM4 / CP4)|Bacteria,Proteobacteria,Alphaproteobacteria,Sphingomonadales,Sphingomonadaceae,Zymomonas."

A different delimiter is probably necessary.

Error!!! Something wrong with PDBid or path...

I found an issue related to the MSAsearch function in scaTools.py. I am new to python and just came up with a naive solution. But the output it not consist with that described in the pySCA tutorial.

Firstly I have ggsearch36 installed, which means I can use ggsearch36 in command lines :

$ ggsearch36
USAGE
 ggsearch36 [-options] query_file library_file
 ggsearch36 -help for a complete option list

DESCRIPTION
 GGSEARCH performs a global/global database searches
 version: 36.3.8 Jul, 2015

COMMON OPTIONS (options must preceed query_file library_file)
 -s:  scoring matrix;
 -f:  gap-open penalty;
 -g:  gap-extension penalty;
 -S   filter lowercase (seg) residues;
 -b:  high scores reported (limited by -E by default);
 -d:  number of alignments shown (limited by -E by default);
 -I   interactive mode;

And in python3, I tried
./scaProcessMSA.py Inputs/s1Ahalabi_1470_nosnakes.an -s 3TGI -c E -t -n
to do MSA for the S1A family.

It gives error outputs:

Trying MSASearch with ggsearch
Trying MSASearch with EMBOSS
Trying MSASearch with BioPython
Error!!! Something wrong with PDBid or path...

After debugging for days I found this issue comes from this line in scaTools.py:
i_0 = [i for i in range(len(hd)) if output.split('\t')[1] in hd[i]]

It is that the byte type is not consist with str. I just solved it by modifying it to:
i_0 = [i for i in range(len(hd)) if output.split(b'\t')[1] in bytes(hd[i],'utf-8')]

It works and could give the final MSA outputs, but with 205 positions for S1A instead of 245.

Thank you very much for your help.

TypeError: addition operator on list with integer

test_case.zip

Hello, running the following script (attached input files), I get the following error:

Code/./scaProcessMSA.py Inputs/aln-kaic.fasta --refseq ref_seq_KaiC_elongatus.fa -c A --output Outputs/KaiC1_processed.db

Using reference sequence but no position list provided! Just numbering positions 1 to length(sequence)
Traceback (most recent call last):
  File "Code/./scaProcessMSA.py", line 85, in <module>
    options.refpos = range(len(options.refseq))+1
TypeError: can only concatenate list (not "int") to list

I presume you meant options.refpos = range(len(options.refseq)+1). After changing that line, I run the same command and get the following error:

Using reference sequence but no position list provided! Just numbering positions 1 to length(sequence)
Using the reference sequence and position list...
Loaded alignment of 1194 sequences, 1530 positions.
Checking alignment for non-standard amino acids
Aligment size after removing sequences with non-standard amino acids: 1194
Trimming alignment for highly gapped positions (80% or more).
Alignment size post-trimming: 536 positions
Finding reference sequence using provided sequence file...
Trying MSASearch with ggsearch
Trying MSASearch with EMBOSS
Trying MSASearch with BioPython
Error!!  Can't find reference sequence...

In line 174, I print out h_tmp and s_tmp[0] and get

(['Elongatus_KaiC'], 'MTSAEMTSPNNNSEHQAIAKMRTMIEGFDDISHGGLPIGRSTLVSGTSGTGKTLFSIQFLYNGIIEFDEPGVFVTFEETPQDIIKNARSFGWDLAKLVDEGKLFILDASPDPEGQEVVGGFDLSALIERINYAIQKYRARRVSIDSVTSVFQQYDASSVVRRELFRLVARLKQIGATTVMTTERIEEYGPIARYGVEEFVSDNVVILRNVLEGERRRRTLEILKLRGTSHMKGEYPFTITDHGINIFPLGAMRLTQRSSNVRVSSGVVRLDEMCGGGFFKDSIILATGATGTGKTLLVSRFVENACANKERAILFAYEESRAQLLRNAYSWGMDFEEMERQNLLKIVCAYPESAGLEDHLQIIKSEINDFKPARIAIDSLSALARGVSNNAFRQFVIGVTGYAKQEEITGLFTNTSDQFMGAHSITDSHISTITDTIILLQYVEIRGEMSRAINVFKMRGSWHDKAIREFMISDKGPDIKDSFRNFERIISGSPTRITVDEKSELSRIVRGVQEKGPES')

Python version:
'2.7.13 |Enthought, Inc. (x86_64)| (default, Mar 2 2017, 08:20:50) \n[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]'

Biopython version:
1.71

Thanks in advance!

pfamseq.txt

The ftp url provided for downloading pfamseq.txt (ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/database_files/) is not active. It would be helpful to know which release of pfamseq was used when writing the code for annotate MSA.py, as the current release does not seem to work.

scaSectorID.py issue with kpos

I have been having an issue with the scaSectors.py tool. When I input the command:
./scaSectorID.py ./PF00028.db
I received the error
Selected kpos=4 significant eigenmodes.
Traceback (most recent call last):
File "./scaSectorID.py", line 70, in
ics,icsize,sortedpos,cutoff,scaled_pd, pd = sca.icList(Vpica,kpos,Csca, p_cut=options.cutoff)
File "/home/dylan/pySca-master/scaTools.py", line 998, in icList
h_params = np.histogram(Vpica[:,k], nbins)
File "/home/dylan/anaconda2/lib/python2.7/site-packages/numpy/lib/function_base.py", line 719, in histogram
'bins must be an integer, a string, or an array')
TypeError: bins must be an integer, a string, or an array

I was wondering if anyone had insight into this error?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.