GithubHelp home page GithubHelp logo

benhid / pymsa Goto Github PK

View Code? Open in Web Editor NEW
20.0 4.0 10.0 1.37 MB

Scoring multiple sequence alignments with Python

License: MIT License

Python 100.00%
msa python sequence-alignment score sumofpairs gaps entropy fasta

pymsa's Introduction


pyMSA

Scoring Multiple Sequence Alignments with Python

Build Status PyPI License PyPI Python version

pyMSA is an open source software tool aimed at providing a number of scores for multiple sequence alignment (MSA) problems. A tutorial about pyMSA is available in the resources folder of the proyect.

Features

Score functions implemented:

  • Sum of pairs,
  • Star,
  • Minimum entropy,
  • Percentage of non-gaps,
  • Percentage of totally conserved columns and
  • STRIKE (Single sTRucture Induced Evaluation).

Downloading

To download PyMSA just clone the Git repository hosted in GitHub:

$ git clone https://github.com/benhid/pyMSA.git
$ python setup.py install

Alternatively, you can install it with pip:

$ pip install pyMSA

Usage

An example of running all the included scores is located in the example folder.


Terminal session

Authors

Active development team

License

This project is licensed under the terms of the MIT - see the LICENSE file for details.

pymsa's People

Contributors

ajnebro avatar benhid avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

pymsa's Issues

Negative entropy?

Just noticing that your formula for entropy doesn't have the negative sign:

current_entropy += value * math.log(value)

So an MSA that is high entropy (ie very random) will appear as a large negative number, which is a bit confusing, when instead it should be a large positive number.

Simple CLI?

Hi, thanks for this wonderful library!

I'm just wondering that since we have utilities like read_fasta_file_as_list_of_pairs, and also run_all_scores which runs a comprehensive evaluation, we could write a CLI that calls these these on an input fasta alignment (and initially not support other alignment formats for simplicity), and maybe make the scores configurable via flags. It seems that there was a benchmark.py that did this (it's alluded to in the PDF), but it must have been deleted.

This would offer a very useful and easy method of evaluating MSAs, which as far as I can tell is a gap in the ecosystem at the moment.

type equality fails with 'is' use '==' instead

Issue with NumPy array and character comparison

When working with rather large sequences, we use NumPy arrays to save memory and make some manipulations faster. However, when using char1 is self.gap_character, char1 can be of type np.str_ and self.gap_character is always str, so the 'is' equality checking fails.

The error message is as follows:

File "c:\Users\Dell\PycharmProjects\MSA_Gym_ENV\MultipleSequenceAlignmentEnv.py", line 92, in calculate_reward
return SumOfPairs(msa_obj, Blosum62()).compute()
File "C:\Users\Dell\AppData\Local\Programs\Python\Python311\Lib\site-packages\pymsa\core\score.py", line 40, in compute
final_score += self.get_column_score(k)
File "C:\Users\Dell\AppData\Local\Programs\Python\Python311\Lib\site-packages\pymsa\core\score.py", line 126, in get_column_score
score_of_column += get_score_of_two_chars(self.substitution_matrix, char_a, char_b)
File "C:\Users\Dell\AppData\Local\Programs\Python\Python311\Lib\site-packages\pymsa\core\score.py", line 27, in get_score_of_two_chars
return int(substitution_matrix.get_distance(char_a, char_b))
File "C:\Users\Dell\AppData\Local\Programs\Python\Python311\Lib\site-packages\pymsa\core\substitution_matrix.py", line 30, in get_distance
raise Exception('The pair ({0},{1}) couldn't be found in the substitution matrix'.format(char1, char2))
Exception: The pair (S,-) couldn't be found in the substitution matrix

The current code (from substitution_matrix.py) is:

if char1 is self.gap_character and char2 is self.gap_character:
    distance = 1
elif char1 is self.gap_character or char2 is self.gap_character:
    distance = self.gap_penalty

A suggested fix is:

if char1 == self.gap_character and char2 == self.gap_character:
    distance = 1
elif char1 == self.gap_character or char2 == self.gap_character:
    distance = self.gap_penalty

This will fix the issue by using the == operator instead of the is operator to check for character equality.

Tutorial Issues

The tutorial PDF is very helpful, I just noticed some issues in it:

  • For the Conserved Columns section we have image
    but I think it should be 1 + 1 in red and blue in order to get 2/9
  • For Non-Gaps, the equation seems to be calculating the proportion of gaps, the opposite of what it says:
    image
  • Finally as mentioned elsewhere, I think the benchmark script no longer exists:
    image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.