GithubHelp home page GithubHelp logo

mutalyzer / mutalyzer2 Goto Github PK

View Code? Open in Web Editor NEW
98.0 17.0 23.0 14.72 MB

HGVS variant nomenclature checker

Home Page: https://mutalyzer.nl

License: Other

Python 83.01% Mako 0.04% HTML 7.25% C# 0.11% PHP 0.12% Ruby 0.10% CSS 0.59% JavaScript 1.65% XSLT 7.14%

mutalyzer2's Introduction

Mutalyzer

Mutalyzer is an HGVS variant nomenclature checker. The canonical Mutalyzer installation can be found at mutalyzer.nl.

Documentation

User documentation can be found on the wiki.

Developer documentation is hosted at Read The Docs.

Submit your bug reports and feature requests here.

If you're interested in running your own Mutalyzer installation, please have a look at our Mutalyzer Ansible role for a completely automated deployment.

Contributing

Contributions to Mutalyzer are very welcome! They can be feature requests, bug reports, bug fixes, unit tests, documentation updates, or anything els you may come up with.

Please refer to the documentation for more information on contributions.

Copyright

Mutalyzer is copyright (c) 2009-2016 by Leiden University Medical Center and contributors (see the AUTHORS.rst file for details).

Mutalyzer is licensed under the GNU Affero General Public License. Please contact the authors if you want to discuss custom licensing.

mutalyzer2's People

Contributors

codeunsolved avatar gstouten avatar jfjlaros avatar martijnvermaat avatar mihailefter avatar mkroon1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mutalyzer2's Issues

Position Converter

Original ticket: https://humgenprojects.lumc.nl/trac/mutalyzer/ticket/155
Original date: 2013/11/01
Original reporter: pchalasani AND partners DOT org

Getting error : No transcripts found. What does this mean?
Also, One of them from position converter submitted to Name checker getting "Intronic position given for a non-genomic reference sequence". Can you please help us with this? Also can I get contact email of someone whom I can contact with all my questions.

Thanks in advance.

Optional argument of `dup` ignored.

The following description is accepted by the name checker:

NM_002001.2:c.10_12dupGGGGGG

Apparently the optional argument GGGGGG is not matched to the reference sequence. This can be solved by using the strategy for checking the optional argument for deletions.

Optionally accept email for batch jobs from webservices

Following up to #101 it would be most optimal if users could specify an email address for the batch job, also when submitting using the webservices. Care should be taken to not break existing clients, so we should test everything keeps working without providing an email address.

Better handling of database connection errors

For example, restarting the PostgreSQL server, I could trigger the following SQLAlchemy exception:

Exception on /description-extractor [POST]
Traceback (most recent call last):
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/mutalyzer/website/views.py", line 739, in description_extractor_submit
    genbank_record = retriever.loadrecord(reference_accession_number)
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/mutalyzer/Retriever.py", line 772, in loadrecord
    .filter_by(accession=identifier) \
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2367, in first
    ret = list(self[0:1])
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2228, in __getitem__
    return list(res)
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2438, in __iter__
    return self._execute_and_instances(context)
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2453, in _execute_and_instances
    result = conn.execute(querycontext.statement, self._params)
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 729, in execute
    return meth(self, multiparams, params)
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 322, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 826, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 958, in _execute_context
    context)
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1159, in _handle_dbapi_exception
    exc_info
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 951, in _execute_context
    context)
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 436, in do_execute
    cursor.execute(statement, parameters)
OperationalError: (OperationalError) terminating connection due to administrator command
SSL connection has been closed unexpectedly

Instead of resulting in an internal server error, it would probably be better to respond with a helpful error message and status 503 Service Unavailable. Especially for the web services, this would allow clients to handle this in the same way as the maintenance status, also using 503.

Gene names in Position Converter batch output.

The website groups the output of the Position Converter by gene, e.g.,:

https://mutalyzer.nl/position-converter?assembly_name_or_alias=GRCh37&description=chr13%3Ag.32972841C%3ET

This feature does not exist for the batch output, perhaps we can add a column for the gene name.

Explicitely show input on all result pages

Observed situation:

  1. Go to the Name Checker and submit the description AB026906.1:c.274G>T.
  2. Change the description in the input form to AB026906.1:c.274G>A (changed T to A) and resubmit.
  3. Press the back button in your browser.
  4. Observe that you now see the result page for AB026906.1:c.274G>T, but the input form contains AB026906.1:c.274G>A.

This is just how web forms and browsers work, so we cannot really change it. What we can do, is make it more obvious what the input was for a result page by explicitely showing it with the results (in addition to populating the input form with it):

screenshot from 2015-02-04 13 06 28

I guess this would apply to most of our interfaces, not just the Name Checker.

Store a copy of our mapping database

Migrated from GitLab #10

Over the years we have picked up an interesting set of transcript mappings, including mappings for transcripts no longer available from major database sources. When setting up a new server (or restoring a crashed one...), we always take care to not start from scratch, but instead use this existing mapping database.

Perhaps we should (and/or):

  • Make our mapping database available as a download
  • Automate creating this download
  • Automate importing this download

Should this be a raw SQL dump? Then it's not database agnostic. Should we invent our own format? Just simple CSV?

NameChecker does not support gene symbols containing a hyphen

Original ticket: https://humgenprojects.lumc.nl/trac/mutalyzer/ticket/158
Original date: 2013/12/12
Original reporter: I DOT F DOT A DOT C DOT Fokkema AND LUMC DOT nl

The NameChecker does not support gene symbols containing a hyphen, and returns an error message that is unrelated to the error.

Variant UD_132464528477(KRTAP2-4_v001):c.100del is denied as:

Expected ":" (at char 15), (line:1, col:16)
UD_132464528477(KRTAP2-4_v001):c.100del
               ^

However, the error is not near the parenthesis. If you remove the hyphen from the gene name, the error is reduced to:

Gene KRTAP24 not found. Please choose from: KRTAP2-4

Which makes sense. But filling in the correct gene symbol is not supported.

Uncertain phasing is ignored.

A description that contains uncertain phasing (;) is interpreted as a fully phased description. For example, the following:

NM_002001.2:c.[10del(;)12del]

becomes:

NM_002001.2:c.[10del;12del]

While this is one of the possibilities that can be used for effect prediction, the other options (two more for this example) are not explored. Perhaps we should simply skip the effect prediction phase for these types of descriptions.

Document maintenance mode status code 503

While in maintenance mode, Mutalyzer responds with HTTP status code 503. Especially for the web services, this should be documented so clients can handle this appropriately.

Import Mapview transcript mappings for ALT_REF_LOCI contigs

Migrated from GitLab #15

For both hg19 and hg38 we have the ALT_REF_LOCI contigs defined as chromosomes in our database (see GitLab !20), but we ignore their mappings in the Mapview file on import, making them useless.

Leaving the issue of other random and unplaced contigs aside for now (see #5), we should at least use the mappings for the ALT_REF_LOCI.

It seems NCBI Mapview mappings on ALT_REF_LOCI contigs are described as on CHROMOSOME_NAME|ALT_REF_LOCI_ACCESSION, for example:

13|NT_187592.1

Webservice rate limiting

Migrated from GitLab #27

Implementing some sort of rate limiting for the webservices should be quite easy, especially since we already use Redis for stat counters.

Description extraction interface suggestion.

Suggestion by Ken Doig:

It would also be great if the API allowed supply of a genomic position offset and chromosome as well as the ref/alt sequences and then the returned variant would be 'ready to go', already in genomic coords.

In-frame insertion before stop codon is not a p. EXT

Original ticket: https://humgenprojects.lumc.nl/trac/mutalyzer/ticket/159
Original date: 2014/01/02
Original reporter: martijn

Reported by Peter Robinson, forwarded by Johan:

> This variant:
>   NM_001005495.1:c.949_954dupGAAAAG
> gets called as an extension:
>
> NM_001005495.1(OR2T3_i001):p.(*319Gluext*2)
>
> however, the start codon is the codon just after GAAAAG*TAG*
>
> Therefore, the affect on the protein is p.E317_K318dup, and the stop codon remains in frame. Or is there a reason why this needs to be called EXT?

Correct, I assume the defenition of an extension in Mutalyzer differs 
from that acc. to HGVS. We will put it on the list, the correct 
description is p.(Glu317_Lys318dup).

Genes without transcripts can be ignored.

When running the following example through the name checker, the user is asked to select a transcript, but the list of available transcripts is empty.

NG_012232.1(NPM1P8):c.31+3A>C 

Broken link in batch notification from webservices

If a batch job was submitted using the webservices, the notification email will not contain a link to the results, just None. This is because BatchJob.download_url is not set when submitting from the webservices, which is because we don't know the absolute path to the website.

More informative error messages.

Checking the variant NM_004006.2:c.31+3A>C results in the following error message:

"Intronic position given for a non-genomic reference sequence."

More informative would be:

"This description can not be verified since the variant reported is not present in the reference sequence given. NM_ records do not contain intron sequences".

Checking the variant LRG_199t2:c.31+3A>C results in the following error message:

"Multiple transcripts found for gene DMD. Please choose from: 1"

More informative would be:

"The reference sequence contains only 1 transcript, t2 is not defined"

Missing fields in description extractor output.

When the description extractor is called with the following arguments:

Raw sequence: ATGATGATCAGATACAGTGTGATACAGGTAGTTAGACAA
RefSeq accession number: NM_004006.2

The output shows an inversion where the deleted field is filled, but the inserted field is empty.

Add offsets to the description extractor output.

It would be useful if one could supply a reference offset, together with the reference and observed bases. As it is now, people have to unpack the description and reconstruct a HGVS format themselves.

Perhaps it would be an idea to make a separate webservice for arithmetic operations instead of adding this to the extractor.

Mutalyzer should normalise the same variant in different HGVS formats

Original ticket: https://humgenprojects.lumc.nl/trac/mutalyzer/ticket/156
Original date: 2013/11/06
Original reporter: ken DOT doig AND petermac DOT org

The following two variants are identical but are treated as two distinct variants by Mutalyser. It would be great if 'dup's could be converted to 'ins' and positioned at the 3' end of a homopolymer run. This would make these two variant the same and facilitate matching of variants.

NM_000059.3:c.681+2dup NC_000013.10:g.32903631dup
NM_000059.3:c.681+1_681+2insT NC_000013.10:g.32903630_32903631insT

e-mail suggestions trigger.

The e-mail suggestion script is now triggered when an other element is selected. This has as side effect that buttons beneath the e-mail field move if there is a suggestion.

Perhaps it is possible to reserve some space for the suggestion.

Downloadable version of Mutalyzer

Original ticket: https://humgenprojects.lumc.nl/trac/mutalyzer/ticket/157
Original date: 2013/12/06
Original reporter: pchalasani AND partners DOT org

We are planning to use mutalyzer to use for analysis of patient gene information for their treatment. Even though we never had issues with web version of mutalyzer, we would be more comfortable to use a local version for reliability and availability . I would like to know if this is a possible option.
Please let me know if you need more details.

Thanks,
Poornima.

Missing chromosomes

Migrated from GitLab #7

Apparently we always missed a few chromosome definitions used by the transcript mappings. They are:

hg18
chr13_random
chr2_random
chr3_random
chr19_random
chr4_random
chr7_random
chr9_random
chr8_random
chr5_h2_hap1
chrX_random
chr21_random
chr16_random
chr15_random
chr17_random
chr6_random
chr5_random
chr22_random
chr6_qbl_hap2
chr1_random

hg19
chr4_gl000193_random
chrUn_gl000228
chr4_gl000194_random
chrUn_gl000220
chrUn_gl000222
chr19_gl000209_random
chrUn_gl000211
chrUn_gl000212
chr17_gl000205_random
chrUn_gl000219
chrUn_gl000218
chr7_gl000195_random

And for GRCh38 we don't have any chromosomes defined yet besides the standard ones, see merge request !20.

Failing test.

The test TestScheduler.test_xlsx_file fails.

================================================================================================================================ FAILURES ================================================================================================================================
______________________________________________________________________________________________________________________ TestScheduler.test_xlsx_file ______________________________________________________________________________________________________________________

self = <test_scheduler.TestScheduler object at 0x7f58906ba290>

def test_xlsx_file(self):
    """
        Office Open XML Spreadsheet input for batch job.
        """
    path = os.path.join(os.path.dirname(os.path.realpath(__file__)),
                        'data',
                        'batch_input.xlsx')
    batch_file = open(path, 'rb')
    expected = [['AB026906.1:c.274G>T',
                 'OK'],
                ['AL449423.14(CDKN2A_v002):c.5_400del',
                 'OK']]
  self._batch_job(batch_file, expected, 'syntax-checker')

tests/test_scheduler.py:327:


tests/test_scheduler.py:39: in _batch_job
job_type, argument=argument)


self = <mutalyzer.Scheduler.Scheduler instance at 0x7f5890661f38>, email = '[email protected]', queue = None, columns = 1, job_type = 'syntax-checker', argument = None, create_download_url = None

def addJob(self, email, queue, columns, job_type, argument=None,
           create_download_url=None):
    """
        Add a job to the Database and start the BatchChecker.

        @arg email:         e-mail address of batch supplier
        @type email:        unicode
        @arg queue:         A list of jobs
        @type queue:        list
        @arg columns:       The number of columns.
        @type columns:      int
        @arg job_type:       The type of Batch Job that should be run
        @type job_type:
        @arg argument:          Batch Arguments, for now only build info
        @type argument:
        @arg create_download_url: Function accepting a result_id and returning
                                  the URL for downloading the batch job
                                  result. Can be None.
        @type create_download_url: function

        @return: result_id
        @rtype:
        """
    # Add jobs to the database
    batch_job = BatchJob(job_type, email=email, argument=argument)
    if create_download_url:
        batch_job.download_url = create_download_url(batch_job.result_id)
    session.add(batch_job)
  for i, inputl in enumerate(queue):
        # NOTE:
        # This is a very dirty way to skip entries before they are fed
        # to the batch processes. This is needed for e.g. an empty line
        # or because the File Module noticed wrong formatting. These lines
        # used to be discarded but are now preserved by the escape string.
        # The benefit of this is that the users input will match the
        # output in terms of input line and outputline.
        if inputl.startswith("~!"): #Dirty Escape

E TypeError: 'NoneType' object is not iterable

mutalyzer/Scheduler.py:752: TypeError
================================================================================================================= 1 failed, 428 passed in 43.06 seconds ==================================================================================================================

Links from position converter to name checker and vice versa.

It would be nice if the name checker results are linked to the position converter and vice versa.

On the position converter page this can be realised by making the results clickable, on the name checker page we would need a separate button or link.

(Request by Johan den Dunnen).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.