mutalyzer / mutalyzer2 Goto Github PK

View Code? Open in Web Editor NEW

98.0 17.0 23.0 14.72 MB

HGVS variant nomenclature checker

Home Page: https://mutalyzer.nl

License: Other

Python 83.01% Mako 0.04% HTML 7.25% C# 0.11% PHP 0.12% Ruby 0.10% CSS 0.59% JavaScript 1.65% XSLT 7.14%

mutalyzer2's Introduction

Mutalyzer

Mutalyzer is an HGVS variant nomenclature checker. The canonical Mutalyzer installation can be found at mutalyzer.nl.

Documentation

User documentation can be found on the wiki.

Developer documentation is hosted at Read The Docs.

Submit your bug reports and feature requests here.

If you're interested in running your own Mutalyzer installation, please have a look at our Mutalyzer Ansible role for a completely automated deployment.

Contributing

Contributions to Mutalyzer are very welcome! They can be feature requests, bug reports, bug fixes, unit tests, documentation updates, or anything els you may come up with.

Please refer to the documentation for more information on contributions.

Copyright

Mutalyzer is licensed under the GNU Affero General Public License. Please contact the authors if you want to discuss custom licensing.

mutalyzer2's People

Contributors

Stargazers

Watchers

mutalyzer2's Issues

Position Converter

Original ticket: https://humgenprojects.lumc.nl/trac/mutalyzer/ticket/155
Original date: 2013/11/01
Original reporter: pchalasani AND partners DOT org

Getting error : No transcripts found. What does this mean?
Also, One of them from position converter submitted to Name checker getting "Intronic position given for a non-genomic reference sequence". Can you please help us with this? Also can I get contact email of someone whom I can contact with all my questions.

Thanks in advance.

Transcript mappings table needs another index

Migrated from GitLab #31

Didn't really investigate yet, but PostgreSQL stats indicate that we miss an index on the transcript mappings table.

Optional argument of `dup` ignored.

The following description is accepted by the name checker:

NM_002001.2:c.10_12dupGGGGGG

Apparently the optional argument GGGGGG is not matched to the reference sequence. This can be solved by using the strategy for checking the optional argument for deletions.

Optionally accept email for batch jobs from webservices

Following up to #101 it would be most optimal if users could specify an email address for the batch job, also when submitting using the webservices. Care should be taken to not break existing clients, so we should test everything keeps working without providing an email address.

Change .html extension for Jinja2 templates

To e.g., .jinja2 so that they are easier recognized as Jinja2 templates by editors.

Better handling of database connection errors

For example, restarting the PostgreSQL server, I could trigger the following SQLAlchemy exception:

Exception on /description-extractor [POST]
Traceback (most recent call last):
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/mutalyzer/website/views.py", line 739, in description_extractor_submit
    genbank_record = retriever.loadrecord(reference_accession_number)
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/mutalyzer/Retriever.py", line 772, in loadrecord
    .filter_by(accession=identifier) \
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2367, in first
    ret = list(self[0:1])
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2228, in __getitem__
    return list(res)
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2438, in __iter__
    return self._execute_and_instances(context)
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2453, in _execute_and_instances
    result = conn.execute(querycontext.statement, self._params)
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 729, in execute
    return meth(self, multiparams, params)
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 322, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 826, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 958, in _execute_context
    context)
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1159, in _handle_dbapi_exception
    exc_info
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 951, in _execute_context
    context)
  File "/opt/mutalyzer/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 436, in do_execute
    cursor.execute(statement, parameters)
OperationalError: (OperationalError) terminating connection due to administrator command
SSL connection has been closed unexpectedly

Instead of resulting in an internal server error, it would probably be better to respond with a helpful error message and status 503 Service Unavailable. Especially for the web services, this would allow clients to handle this in the same way as the maintenance status, also using 503.

Gene names in Position Converter batch output.

The website groups the output of the Position Converter by gene, e.g.,:

https://mutalyzer.nl/position-converter?assembly_name_or_alias=GRCh37&description=chr13%3Ag.32972841C%3ET

This feature does not exist for the batch output, perhaps we can add a column for the gene name.

Explicitely show input on all result pages

Observed situation:

Go to the Name Checker and submit the description AB026906.1:c.274G>T.
Change the description in the input form to AB026906.1:c.274G>A (changed T to A) and resubmit.
Press the back button in your browser.
Observe that you now see the result page for AB026906.1:c.274G>T, but the input form contains AB026906.1:c.274G>A.

This is just how web forms and browsers work, so we cannot really change it. What we can do, is make it more obvious what the input was for a result page by explicitely showing it with the results (in addition to populating the input form with it):

I guess this would apply to most of our interfaces, not just the Name Checker.

Example input links are broken

Example input links are broken in most of the forms. Probably due to the latest javascript updates.

Store a copy of our mapping database

Migrated from GitLab #10

Over the years we have picked up an interesting set of transcript mappings, including mappings for transcripts no longer available from major database sources. When setting up a new server (or restoring a crashed one...), we always take care to not start from scratch, but instead use this existing mapping database.

Perhaps we should (and/or):

Make our mapping database available as a download
Automate creating this download
Automate importing this download

Should this be a raw SQL dump? Then it's not database agnostic. Should we invent our own format? Just simple CSV?

NameChecker does not support gene symbols containing a hyphen

Original ticket: https://humgenprojects.lumc.nl/trac/mutalyzer/ticket/158
Original date: 2013/12/12
Original reporter: I DOT F DOT A DOT C DOT Fokkema AND LUMC DOT nl

The NameChecker does not support gene symbols containing a hyphen, and returns an error message that is unrelated to the error.

Variant UD_132464528477(KRTAP2-4_v001):c.100del is denied as:

Expected ":" (at char 15), (line:1, col:16)
UD_132464528477(KRTAP2-4_v001):c.100del
               ^

However, the error is not near the parenthesis. If you remove the hyphen from the gene name, the error is reduced to:

Gene KRTAP24 not found. Please choose from: KRTAP2-4

Which makes sense. But filling in the correct gene symbol is not supported.

Uncertain phasing is ignored.

A description that contains uncertain phasing (;) is interpreted as a fully phased description. For example, the following:

NM_002001.2:c.[10del(;)12del]

becomes:

NM_002001.2:c.[10del;12del]

While this is one of the possibilities that can be used for effect prediction, the other options (two more for this example) are not explored. Perhaps we should simply skip the effect prediction phase for these types of descriptions.

Use NCBI provided GFF files as transcript mapping source?

Migrated from GitLab #6

Instead of the mapview files, perhaps we should use the more standard GFF files as provided here?

ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/GFF/

I haven't yet checked if they contain the same information.

Interpret transl_except aa:TERM annotation

Several transcripts in NC_012920.1 (rCRS) are annotated with an incomplete CDS. This CDS is completed by the addition of a 3' polyA tail to the mRNA.

This is annotated in the GenBank file as a note and with the transl_except tag. Mutalyzer should interpret this tag in order to correctly construct the complete CDS.

http://www.ncbi.nlm.nih.gov/nuccore/251831106
https://mutalyzer.nl/name-checker?description=NC_012920.1%28ND1_v001%29%3Ac.232del

Make source code repository more discoverable from mutalyzer.nl

Dumping remote database (mutalyzer.nl) into my own local

Hi,

Is there any way that i can replicate the database on the mutalyzer.nl into my local?

Regards,

Typo in name checker output

Protein codedicted from variant coding sequence

Drop BatchJob.download_url column

We did not drop this column in #111, but we should do this at some later point (waiting at least one release).

Document maintenance mode status code 503

While in maintenance mode, Mutalyzer responds with HTTP status code 503. Especially for the web services, this should be documented so clients can handle this appropriately.

Import Mapview transcript mappings for ALT_REF_LOCI contigs

Migrated from GitLab #15

For both hg19 and hg38 we have the ALT_REF_LOCI contigs defined as chromosomes in our database (see GitLab !20), but we ignore their mappings in the Mapview file on import, making them useless.

Leaving the issue of other random and unplaced contigs aside for now (see #5), we should at least use the mappings for the ALT_REF_LOCI.

It seems NCBI Mapview mappings on ALT_REF_LOCI contigs are described as on CHROMOSOME_NAME|ALT_REF_LOCI_ACCESSION, for example:

13|NT_187592.1

Webservice rate limiting

Migrated from GitLab #27

Implementing some sort of rate limiting for the webservices should be quite easy, especially since we already use Redis for stat counters.

Description extraction interface suggestion.

Suggestion by Ken Doig:

It would also be great if the API allowed supply of a genomic position offset and chromosome as well as the ref/alt sequences and then the returned variant would be 'ready to go', already in genomic coords.

Turn of internet during unit tests?

We sometimes forget to include fixtures or even miss bugs in caching because of availability of internet access during unit tests. Would it be a good idea to force disable that?

https://github.com/astropy/astropy/blob/master/astropy/tests/disable_internet.py

In-frame insertion before stop codon is not a p. EXT

Original ticket: https://humgenprojects.lumc.nl/trac/mutalyzer/ticket/159
Original date: 2014/01/02
Original reporter: martijn

Reported by Peter Robinson, forwarded by Johan:

> This variant:
>   NM_001005495.1:c.949_954dupGAAAAG
> gets called as an extension:
>
> NM_001005495.1(OR2T3_i001):p.(*319Gluext*2)
>
> however, the start codon is the codon just after GAAAAG*TAG*
>
> Therefore, the affect on the protein is p.E317_K318dup, and the stop codon remains in frame. Or is there a reason why this needs to be called EXT?

Correct, I assume the defenition of an extension in Mutalyzer differs 
from that acc. to HGVS. We will put it on the list, the correct 
description is p.(Glu317_Lys318dup).

Use the `logging` module for logging.

The title is quite self-explanatory.

Genes without transcripts can be ignored.

When running the following example through the name checker, the user is asked to select a transcript, but the list of available transcripts is empty.

NG_012232.1(NPM1P8):c.31+3A>C

Refactor logging

Migrated from GitLab #20

Probably use standard logging module (perhaps with rotating functionality). Race conditions on the log file are probably a problem in the current setup.

Instead of logging module rotating, we could also use logrotate: http://serverfault.com/questions/55610/logrotate-and-open-files

Broken link in batch notification from webservices

If a batch job was submitted using the webservices, the notification email will not contain a link to the results, just None. This is because BatchJob.download_url is not set when submitting from the webservices, which is because we don't know the absolute path to the website.

Add hg38 notice to SNP converter

Migrated from GitLab #25

Something like

The SNP converter always uses the current dbSNP build (for human, that's currently GRCh38/hg38).

More informative error messages.

Checking the variant NM_004006.2:c.31+3A>C results in the following error message:

"Intronic position given for a non-genomic reference sequence."

More informative would be:

"This description can not be verified since the variant reported is not present in the reference sequence given. NM_ records do not contain intron sequences".

Checking the variant LRG_199t2:c.31+3A>C results in the following error message:

"Multiple transcripts found for gene DMD. Please choose from: 1"

More informative would be:

"The reference sequence contains only 1 transcript, t2 is not defined"

Batch checker crashes on non-text input.

When uploading the file referred to below, the Position Converter Batch Checker, the server returns an ``Internal Server Error''.

https://barmsijs.lumc.nl/jeroen/batch_job_crash_selection.txt

Look into using mailcheck for batch jobs

https://github.com/mailcheck/mailcheck

Incorrect reporting of `ext*?` on protein-level

This example should yield p.(*143Gluext*31), not p.(*143Gluext*?).

Bug seems to have been introduced with #65 which broke the examples in the util.in_frame_description docstring due to the same problem.

Up maximum variant description length in batch input

Migrated from GitLab #36

Currently we use a VARCHAR(200) database field, but we might want to extend that. From our paper, 1MB would be a sensible upper bound (but I'm not sure we want to allow that).

Missing fields in description extractor output.

When the description extractor is called with the following arguments:

Raw sequence: ATGATGATCAGATACAGTGTGATACAGGTAGTTAGACAA
RefSeq accession number: NM_004006.2

The output shows an inversion where the deleted field is filled, but the inserted field is empty.

Don't use n. on protein-coding transcripts

For example, instead of NM_181690.2:n.798A>G we could just show NM_181690.2:798A>G.

Add offsets to the description extractor output.

It would be useful if one could supply a reference offset, together with the reference and observed bases. As it is now, people have to unpack the description and reconstruct a HGVS format themselves.

Perhaps it would be an idea to make a separate webservice for arithmetic operations instead of adding this to the extractor.

Mutalyzer should normalise the same variant in different HGVS formats

Original ticket: https://humgenprojects.lumc.nl/trac/mutalyzer/ticket/156
Original date: 2013/11/06
Original reporter: ken DOT doig AND petermac DOT org

The following two variants are identical but are treated as two distinct variants by Mutalyser. It would be great if 'dup's could be converted to 'ins' and positioned at the 3' end of a homopolymer run. This would make these two variant the same and facilitate matching of variants.

NM_000059.3:c.681+2dup NC_000013.10:g.32903631dup
NM_000059.3:c.681+1_681+2insT NC_000013.10:g.32903630_32903631insT

Use monoseq for protein sequences in the name checker

Migrated from GitLab #17

Use monoseq for showing protein sequences in the name checker (or perhaps look into BioJS).

Memory issue with Pyparsing 2.0.6

With Pyparsing 2.0.6, unit tests fail due to memory usage.

See http://sourceforge.net/p/pyparsing/mailman/message/34635387/

Drop TranscriptProteinLink database table

We did not drop this table in #94, but we should do this at some later point (waiting at least one release).

e-mail suggestions trigger.

The e-mail suggestion script is now triggered when an other element is selected. This has as side effect that buttons beneath the e-mail field move if there is a suggestion.

Perhaps it is possible to reserve some space for the suggestion.

Check reporting of counters on about page

Not sure what's up with some of the totals, they don't add up. Also, why should the totals exclude batch jobs?

Downloadable version of Mutalyzer

Original ticket: https://humgenprojects.lumc.nl/trac/mutalyzer/ticket/157
Original date: 2013/12/06
Original reporter: pchalasani AND partners DOT org

We are planning to use mutalyzer to use for analysis of patient gene information for their treatment. Even though we never had issues with web version of mutalyzer, we would be more comfortable to use a local version for reliability and availability . I would like to know if this is a possible option.
Please let me know if you need more details.

Thanks,
Poornima.

Missing chromosomes

Migrated from GitLab #7

Apparently we always missed a few chromosome definitions used by the transcript mappings. They are:

hg18
chr13_random
chr2_random
chr3_random
chr19_random
chr4_random
chr7_random
chr9_random
chr8_random
chr5_h2_hap1
chrX_random
chr21_random
chr16_random
chr15_random
chr17_random
chr6_random
chr5_random
chr22_random
chr6_qbl_hap2
chr1_random

hg19
chr4_gl000193_random
chrUn_gl000228
chr4_gl000194_random
chrUn_gl000220
chrUn_gl000222
chr19_gl000209_random
chrUn_gl000211
chrUn_gl000212
chr17_gl000205_random
chrUn_gl000219
chrUn_gl000218
chr7_gl000195_random

And for GRCh38 we don't have any chromosomes defined yet besides the standard ones, see merge request !20.

Refactor batch job architecture

Migrated from GitLab #21

There are some problems with the current batch architecture, especially that there cannot be multiple workers without synchronisation problems.

Good read: http://news.ycombinator.com/item?id=3002861
Suggestion: http://celeryproject.org/

Failing test.

The test TestScheduler.test_xlsx_file fails.

================================================================================================================================ FAILURES ================================================================================================================================
______________________________________________________________________________________________________________________ TestScheduler.test_xlsx_file ______________________________________________________________________________________________________________________

self = <test_scheduler.TestScheduler object at 0x7f58906ba290>

def test_xlsx_file(self):
    """
        Office Open XML Spreadsheet input for batch job.
        """
    path = os.path.join(os.path.dirname(os.path.realpath(__file__)),
                        'data',
                        'batch_input.xlsx')
    batch_file = open(path, 'rb')
    expected = [['AB026906.1:c.274G>T',
                 'OK'],
                ['AL449423.14(CDKN2A_v002):c.5_400del',
                 'OK']]

  self._batch_job(batch_file, expected, 'syntax-checker')

tests/test_scheduler.py:327:

tests/test_scheduler.py:39: in _batch_job
job_type, argument=argument)

self = <mutalyzer.Scheduler.Scheduler instance at 0x7f5890661f38>, email = '[email protected]', queue = None, columns = 1, job_type = 'syntax-checker', argument = None, create_download_url = None

def addJob(self, email, queue, columns, job_type, argument=None,
           create_download_url=None):
    """
        Add a job to the Database and start the BatchChecker.

        @arg email:         e-mail address of batch supplier
        @type email:        unicode
        @arg queue:         A list of jobs
        @type queue:        list
        @arg columns:       The number of columns.
        @type columns:      int
        @arg job_type:       The type of Batch Job that should be run
        @type job_type:
        @arg argument:          Batch Arguments, for now only build info
        @type argument:
        @arg create_download_url: Function accepting a result_id and returning
                                  the URL for downloading the batch job
                                  result. Can be None.
        @type create_download_url: function

        @return: result_id
        @rtype:
        """
    # Add jobs to the database
    batch_job = BatchJob(job_type, email=email, argument=argument)
    if create_download_url:
        batch_job.download_url = create_download_url(batch_job.result_id)
    session.add(batch_job)

  for i, inputl in enumerate(queue):
        # NOTE:
        # This is a very dirty way to skip entries before they are fed
        # to the batch processes. This is needed for e.g. an empty line
        # or because the File Module noticed wrong formatting. These lines
        # used to be discarded but are now preserved by the escape string.
        # The benefit of this is that the users input will match the
        # output in terms of input line and outputline.
        if inputl.startswith("~!"): #Dirty Escape

E TypeError: 'NoneType' object is not iterable

mutalyzer/Scheduler.py:752: TypeError
================================================================================================================= 1 failed, 428 passed in 43.06 seconds ==================================================================================================================

Links from position converter to name checker and vice versa.

It would be nice if the name checker results are linked to the position converter and vice versa.

On the position converter page this can be realised by making the results clickable, on the name checker page we would need a separate button or link.

(Request by Johan den Dunnen).

New layout uses Google web fonts

Migrated from GitLab #32

The new layout by Landscape uses the Robotic font via the Google web fonts API:

<link href="http://fonts.googleapis.com/css?family=Roboto:400,300,300italic,400italic,500,500italic,700,700italic"
      rel="stylesheet" type="text/css">

We probably don't want this and host any custom fonts on our own webserver.

Note we use transcript to genome mappings from NCBI

Migrated from GitLab #4

On the website, next to already mentioned genbank files.

mutalyzer / mutalyzer2 Goto Github PK

mutalyzer2's Introduction

Mutalyzer

Documentation

Contributing

Copyright

mutalyzer2's People

Contributors

Stargazers

Watchers

Forkers

mutalyzer2's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs