GithubHelp home page GithubHelp logo

idmapping sleep/ wait about unipressed HOT 10 OPEN

Dan-Burns avatar Dan-Burns commented on June 12, 2024
idmapping sleep/ wait

from unipressed.

Comments (10)

multimeric avatar multimeric commented on June 12, 2024

Okay so what I would do is submit the request, then run a loop where you check the status of the job until it finishes. e.g.

import time
from unipressed import IdMappingClient

request = IdMappingClient.submit(
    source="UniProtKB_AC-ID", dest="Gene_Name", ids={"A1L190", "A0JP26", "A0PK11"}
)
while True:
    status = request.get_status()
    if status in {"FINISHED", "ERROR"}:
        break
    else:
        sleep(5)

from unipressed.

Dan-Burns avatar Dan-Burns commented on June 12, 2024

Thank you - I didn't realize there was a get_status() method - should've looked harder.

I implemented that but I still got the same error. Although I might have looped through more request submissions before it happened this time.

Thank you,
Dan

from unipressed.

multimeric avatar multimeric commented on June 12, 2024

Can you please post a reproducible example?

from unipressed.

Dan-Burns avatar Dan-Burns commented on June 12, 2024

The attached .zip is a json file containing a dictionary where the keys are integers and the values are sets of uniprot ids that I'm trying to get GI numbers for. This dictionary is referred to as "id_lists" in the loop and the "chunk" is the dictionary key. I loop through the dictionary keys to submit the subset of uniprot ids with the idmapping client with the included function get_gi_numbers() with:

uniprot_to_gi = {}
for chunk, id_list in id_lists.items():
    uniprot_to_gi[chunk] = get_gi_numbers(id_list, delay=5)

uniprot_ids.zip

def get_gi_numbers(uniprot_ids, delay=5):

    request = IdMappingClient.submit(
           source="UniProtKB_AC-ID", dest="GI_number", ids=uniprot_ids
        )
    
    while True:
        status = request.get_status()
        if status in {"FINISHED", "ERROR"}:
            break
        else:
            time.sleep(delay)
    
    return [i for i in request.each_result()]

Using this, I still get:
IdMappingError: UniProt has not yet processed the results, consider using time.sleep() to wait until they are complete.

Thank you,
Dan

from unipressed.

multimeric avatar multimeric commented on June 12, 2024

I can't easily reproduce this. The only way I could see this happening is if uniprot is actually returning an invalid result which tricks my code into thinking it hasn't finished. If you could narrow down the IDs (or possibly single ID) that causes this by catching the error unipressed throws, that would be great.

from unipressed.

Dan-Burns avatar Dan-Burns commented on June 12, 2024

I see, I was wondering if it might be a bad id.

I'm not sure if that is the case since I can make it through one set of ids on one attempt but on another attempt, it will fail on that same set of ids.

I will look into it.

from unipressed.

multimeric avatar multimeric commented on June 12, 2024

It doesn't seem like a single bad ID would make it fail, I just tried it and Uniprot just ignores invalid IDs, but otherwise behaves reasonably.

from unipressed.

multimeric avatar multimeric commented on June 12, 2024

My guess is you're hitting an intermittent issue with the uniprot API itself, so you would get this same issue with any client library (not just unipressed). However I would like to be able to smooth over that glitch in unipressed which is why I want to catch it.

from unipressed.

yoonkihoon avatar yoonkihoon commented on June 12, 2024

I had this "unstable return/connection/timeout" with this package due to the lack of exception handling. All three functions, submit, get_status, and each_result call can break individually, and it is not easy to catch all the possible exceptions.
Finally, I came up with a solution without pagination ability. Hope this example helps.

from retry import retry
from unipressed import IdMappingClient
from unipressed.id_mapping.core import IdMappingError
from unipressed.id_mapping.core import IdMappingJob


@retry(IdMappingError, delay=2, tries=5)
def submit_query(gene_ids: str) -> IdMappingJob:
    """
    Query UniProt DB with a string of Gene ids
    Args:
        gene_ids: A string of NCBI Gene IDs separated by comma
    Returns:
    IdMappingJob Object
    """
    try:
        job_request = IdMappingClient.submit(
            source="GeneID", dest="UniProtKB", ids={gene_ids}
        )
        return job_request
    except:
        raise IdMappingError


@retry(ValueError, delay=2, tries=5)
def check_status(job_request: IdMappingJob) -> str:
    """
    Obtain job status
    Args:
        job_reuqest: an IdMappingJob Object
    Returns:
    FINISHED or FAILED
    """
    try:
        job_status = job_request.get_status()
        if job_status == "FINISHED":
            return job_status
        elif job_status == "RUNNING":
            raise ValueError()
    except:
        return "FAILED"


@retry(IdMappingError, delay=2, tries=25)
def get_results(job_request: IdMappingJob) -> list:
    """
    Retrives individual results
    Args:
        job_reuqest: an IdMappingJob Object
    Returns:
    A list of Id mapping results in the format of [{'from': '1', 'to': 'P04217'}, {'from': '1', 'to': 'V9HWD8'}
    """
    try:
        returned = list(job_request.each_result())
        return returned
    except:
        raise IdMappingError


def get_uniprot_ids_from_gene_ids(gene_ids: str) -> list[dict[str, str]]:
    """
    By using NCBI Gene IDs, this function maps to UniProt IDs. One NCBI Gene ID can be mapped to one or many.
    Args:
        gene_ids: A string of NCBI Gene IDs separated by comma
    Returns:
    A list of dictionaries, each dictionary consists of {'from': 'NCBI Gene ID', 'to': 'UniProt ID'}
    """
    job_request = returned = None
    results_parsed = None
    job_request = submit_query(gene_ids)
    if job_request is not None:
        jstatus = check_status(job_request)
        if jstatus != "FAILED":
            returned = get_results(job_request)
            if returned is not None:
                results_parsed = []
                for result in returned:
                    results_parsed.append(result)
    return results_parsed

from unipressed.

multimeric avatar multimeric commented on June 12, 2024

Hi @yoonkihoon. If there really is an intermittent issue with the uniprot API, then I think your @retry solution is a good one. Feel free to submit it as a PR.

from unipressed.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.