ranking-agent / aragorn Goto Github PK

View Code? Open in Web Editor NEW

4.0 4.0 3.0 4.13 MB

A Translator ARA combining asynchronous database querying, answer coalescence, and answer ranking.

License: MIT License

Dockerfile 0.71% Python 98.70% Shell 0.59%

ara ncats-translator trapi

aragorn's Introduction

ARAGORN

Autonomous Relay Agent for Generation Of Ranked Networks (ARAGORN)

A tool to query Knowledge Providers (KPs) and synthesize highly ranked answers relevant to user-specified questions.

Operates in a federated knowledge environment.
Bridges the precision mismatch between data specificity in KPs and more abstract levels of user queries.
Generalizes answer ranking.
Normalizes data to use preferred and equivalent identifiers.

The ARAGORN tool relies on a number of external services to perform a standardized ranking of a user-specified question.

Strider - Accepts a query and provides knowledge-provider querying, answer generation and ranking.
Answer Coalesce - Accepts a query containing Strider answers and returns answers that have been coalesced by property, graph and/or ontology analysis.
Node normalization - A Translator SRI service that provides the preferred CURIE and equivalent identifiers for data in the query.
ARAGORN Ranker - Accepts a query and provides Omnicorp overlays, score and weight-correctness rankings of coalesced answers.

Demonstration

A live version of the API can be found here.

Source Code

Below you will find references that detail the standards, web services and supporting tools that are part of ARAGORN.

Installation

This version of ARAGORN has all links to subordinate services hard coded. In the future, these links will be defined in the Kubernetes configuration files.

In the meantime some manual edits will be needed in the src/service_aggregator.py file to support your installation.

Subordinate services

The ARAGORN subordinate services will have to be deployed prior to the stand-up of ARAGRON. Please reference the following READMEs for more information on standing those up:

Command line installation

cd <aragorn codebase root>

python<version> -m venv venv
source venv/bin/activate

Install dependencies

pip install -r requirements.txt

Run Script

cd <aragorn root>

./main.sh

DOCKER installation

Or build an image and run it.

cd <aragorn root>

docker build --tag <image_tag> .

Then start the container

docker run --name aragorn -p 8080:4868 aragorn-test

Kubernetes configurations

Kubernetes configurations and helm charts for this project can be found at:

https://github.com/helxplatform/translator-devops/helm/aragorn

aragorn's People

Contributors

Stargazers

Watchers

Forkers

ueser jdr0887 dnlrkorn

aragorn's Issues

Aragorn providing edge_attribute_source without infores id information

See the most recent version of the workflow progress tracker / edge attribute source sheet.

https://docs.google.com/spreadsheets/d/1O1cMmYGxoIqP6xbzj6FG5owiKQVg57wx2O_XIA_hN_A/edit#gid=642301883

Rows at the top indicate in which workflows aragorn is returning edge attribute sources without infores id information.

self-subclasses as edges?

See NCATSTranslator/testing#130

For queries like (A_fixed)-(B*)-(C_fixed) sometimes you can return C in for B, since C is a subclass of itself. So you get results like
(A)-(C)-[subclass_of]-(C).

Now, it's not wrong necessarily, but it's probably not what's intended.

This will occur any time A has a direct link to C in this kind of query.

Resolve the neo4J changes for biolink3 qualifiers

Per meeting notes from 7/22 Aragorn team meeting

Link for background info: https://docs.google.com/document/d/1x_ukP9eyy8ahb6ZlZAg78bdf6H2XxkcfGKR_r3WRcuM/edit?pli=1#

Update to TRAPI 1.1

We need a 1.1 of aragorn, and all of its subcomponents

Should we combine in results? Where should it happen?

Consider this one hop query:

query = {
    "message": {
        "query_graph": {
            "edges": {
                "e01": {
                    "object": "n0",
                    "predicates": [
                        "biolink:treats"
                    ],
                    "subject": "n1"
                }
            },
            "nodes": {
                "n0": {
                    "ids": [
                        "MONDO:0021187"
                    ]
                },
                "n1": {
                    "categories": [
                        "biolink:SmallMolecule"
                    ]
                }
            }
        }
    }
}

For some values of n1, there are two edges that come back: One that has predicate "treats" and one that has "approved_to_treat" which is a subclass of treat.

Edge 1:

{
    "subject": "PUBCHEM.COMPOUND:24875259",
    "object": "MONDO:0021187",
    "predicate": "biolink:treats",
    "attributes": [
        {
            "attribute_type_id": "biolink:aggregator_knowledge_source",
            "value": "infores:molepro",
            "value_type_id": "biolink:InformationResource",
            "original_attribute_name": "biolink:aggregator_knowledge_source",
            "value_url": null,
            "attribute_source": "infores:molepro",
            "description": "Molecular Data Provider",
            "attributes": null
        },
        {
            "attribute_type_id": "biolink:aggregator_knowledge_source",
            "value": "infores:molepro",
            "value_type_id": "biolink:InformationResource",
            "original_attribute_name": "biolink:aggregator_knowledge_source",
            "value_url": null,
            "attribute_source": "infores:chembl",
            "description": "Molecular Data Provider",
            "attributes": []
        },
        {
            "attribute_type_id": "biolink:aggregator_knowledge_source",
            "value": "infores:aragorn",
            "value_type_id": null,
            "original_attribute_name": null,
            "value_url": null,
            "attribute_source": null,
            "description": null,
            "attributes": null
        },
        {
            "attribute_type_id": "biolink:primary_knowledge_source",
            "value": "infores:chembl",
            "value_type_id": "biolink:InformationResource",
            "original_attribute_name": "biolink:primary_knowledge_source",
            "value_url": null,
            "attribute_source": "infores:chembl",
            "description": "MolePro's ChEMBL indication transformer",
            "attributes": []
        },
        {
            "attribute_type_id": "biolink:FDA_approval_status",
            "value": "FDA Clinical Research Phase 2",
            "value_type_id": "biolink:FDA_approval_status_enum",
            "original_attribute_name": "max phase for indication",
            "value_url": null,
            "attribute_source": "infores:chembl",
            "description": null,
            "attributes": []
        },
        {
            "attribute_type_id": "biolink:Publication",
            "value": "NCT02719028",
            "value_type_id": "string",
            "original_attribute_name": "ClinicalTrials",
            "value_url": "https://clinicaltrials.gov/search?id=%22NCT02719028%22",
            "attribute_source": "infores:chembl",
            "description": null,
            "attributes": []
        }
    ]
}

Edge 2:

{
    "subject": "PUBCHEM.COMPOUND:24875259",
    "object": "MONDO:0021187",
    "predicate": "biolink:approved_to_treat",
    "attributes": [
        {
            "attribute_type_id": "biolink:aggregator_knowledge_source",
            "value": "infores:aragorn",
            "value_type_id": null,
            "original_attribute_name": null,
            "value_url": null,
            "attribute_source": null,
            "description": null,
            "attributes": null
        },
        {
            "attribute_type_id": "biolink:primary_knowledge_source",
            "value": [
                "infores:chembl"
            ],
            "value_type_id": "biolink:InformationResource",
            "original_attribute_name": null,
            "value_url": null,
            "attribute_source": null,
            "description": null,
            "attributes": null
        },
        {
            "attribute_type_id": "biolink:aggregator_knowledge_source",
            "value": [
                "infores:biothings-explorer"
            ],
            "value_type_id": "biolink:InformationResource",
            "original_attribute_name": null,
            "value_url": null,
            "attribute_source": null,
            "description": null,
            "attributes": null
        },
        {
            "attribute_type_id": "biolink:aggregator_knowledge_source",
            "value": [
                "infores:mychem-info"
            ],
            "value_type_id": "biolink:InformationResource",
            "original_attribute_name": null,
            "value_url": null,
            "attribute_source": null,
            "description": null,
            "attributes": null
        }
    ]
}

Currently, this creates 2 results in strider and hence in aragorn. Each result binds to one of the two edges. These edges are not going to get merged in the KG because they are different predicates. But maybe they should get rolled into a single result for scoring.

If so, where would this occur? Strider? AC? Elsewhere?

Note that there will be consortium-level discussions coming about what ARAs are supposed to do in cases like this. The last time it was discussed, the answer was that ARAs were free to do as they chose, but now that we want to merge ARA results, we might need to revisit.

Failing Query

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "category": [
                        "biolink:ChemicalSubstance"
                    ],
                    "is_set": false,
                    "name": "Chemical Substance"
                },
                "n1": {
                    "id": "MONDO:0018150",
                    "is_set": false,
                    "name": "Gaucher disease"
                }
            },
            "edges": {
                "e0": {
                    "subject": "n0",
                    "object": "n1",
                    "predicate": [
                        "biolink:treats"
                    ]
                }
            }
        }
    }

This query returns ~100 results in strider, but is failing with an error in aragorn

Workflow Implemented

TRAPI input has a workflow section with operations that must be completed in order specified by workflow.

Due: July 29, 2021

Details in architecture repo Git issue here.

Passing logs kills ranker

If (input) logs is not empty, then omnicorp (and probably other services) throw 500.

This appears to be because the pydantic model is turning the inputs into something (datetimes) that are causing an error when they get json serialized on output.

TRAPI Timeline: ARAs complete v1.2 implementations and register in SmartAPI

Due Date: September 1
Note: Implement 1.2 with asynch.
TRAPI 1.1 to 1.2 Change Log.
ReasonerAPI Reference Ticket

500 in /score

This query gives 37 results in strider, but returns no results from aragorn

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "id": "UniProtKB:P52788",
                    "category": "biolink:Gene"
                },
                "n1": {
                    "category": "biolink:ChemicalSubstance"
                }
            },
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1"
                }
            }
        }
    }
}

Update tests

The aragorn test suite is not up to date. The test jsons are not conformant.

I think that what we should do is write tests here that mock the underlying services and at the aragorn level we're just testing do we make the right calls for workflows, does the callback functionality work, etc.

The if we want to test that eg. ranker is returning the right property, we test that on ranker.

Chunk RMQ

Within aragorn we use rabbit mq to communicate between worker threads. That's done by pushing a strider response through the mq.

But, there is a max size for rabbit mq messages. So we need some chunking on the messages, which is probably going to complicate things....

But without it, we will always fail on strider results greater than X.

Cannot run direct three-hop ARAGORN queries for Workflow B

This issue is to report that both @xu-hao and I cannot run direct three-hop ARAGORN queries for Workflow B. I initially thought the error was on my end, but if Hao is encountering issues, then I think there's something not quite right on the ARAGORN side.

The TRAPI query can be found here. Note that Hao tested both e01 biolink:has_real_world_evidence_of_association_with and e01 biolink_correlated_with. I only tested the latter predicate, as that's the one I used when testing direct three-hop ARAX queries.

Here's the command:

curl -XPOST https://aragorn.renci.org/1.1/query -d '{                                                                                 
                               "message": {
                                   "query_graph": {
                                       "nodes": {
                                           "n0": {
                                                "ids": ["MESH:D056487"],
                                                "categories": ["biolink:DiseaseOrPhenotypicFeature"]
                                           },
                                           "n1": {
                                               "categories": ["biolink:DiseaseOrPhenotypicFeature"]
                                           },
                                           "n2": {
                                               "categories": ["biolink:Gene"]
                                           },
                                           "n3": {
                                               "categories": ["biolink:ChemicalEntity"]
                                           }
                                       },
                                       "edges": {
                                           "e01": {
                                               "subject": "n0",
                                               "object": "n1",
                                               "predicates": ["biolink:correlated_with"]
                                           },
                                           "e02": {
                                               "subject": "n2",
                                               "object": "n1",
                                               "predicates": ["biolink:gene_associated_with_condition"]
                                           },
                                           "e03": {
                                               "subject": "n2",
                                               "object": "n3",
                                               "predicates": ["biolink:related_to"]
                                           }
                                       }
                                   }
                               }
                           }' -H "Content-Type: application/json"

Here's the error message that Hao received from e01 biolink:has_real_world_evidence_of_association_with:

{"message":{"query_graph":{"nodes":{"n0":{"ids":["MESH:D056487"],"categories":["biolink:DiseaseOrPhenotypicFeature"],"is_set":false,"constraints":null},"n1":{"ids":null,"categories":["biolink:DiseaseOrPhenotypicFeature"],"is_set":false,"constraints":null},"n2":{"ids":null,"categories":["biolink:Gene"],"is_set":false,"constraints":null},"n3":{"ids":null,"categories":["biolink:ChemicalEntity"],"is_set":false,"constraints":null}},"edges":{"e01":{"subject":"n0","object":"n1","predicates":["biolink:has_real_world_evidence_of_association_with"],"relation":null,"constraints":null},"e02":{"subject":"n2","object":"n1","predicates":["biolink:gene_associated_with_condition"],"relation":null,"constraints":null},"e03":{"subject":"n2","object":"n3","predicates":["biolink:related_to"],"relation":null,"constraints":null}}},"knowledge_graph":{"nodes":{},"edges":{}},"results":[]},"logs":[{"timestamp":"2021-08-24T22:13:34.599883","level":"WARNING","code":null,"message":"warning: empty returned"},{"timestamp":"2021-08-24T22:13:34.648997","level":"ERROR","code":null,"message":"No results to coalesce"},{"timestamp":"2021-08-24T22:13:34.653174","level":"ERROR","code":null,"message":"answer_coalesce error: HTML error status code 422 returned."},{"timestamp":"2021-08-24T22:13:34.785124","level":"WARNING","code":null,"message":"warning: empty returned"},{"timestamp":"2021-08-24T22:13:34.836039","level":"WARNING","code":null,"message":"warning: empty returned"},{"timestamp":"2021-08-24T22:13:34.882219","level":"WARNING","code":null,"message":"warning: empty returned"}],"status":null,"workflow":["lookup","enrich_results","connect_knodes","score"]}

And here's the error message that was returned with e01 biolink:correlated_with:

Any chance you all can work on this query and send me/Hao both the executable query and the associated JSON output, so that Hao and I can figure out what we did wrong and (importantly) I can review the answers? I honestly think this might be the more efficient testing approach.

Handle lookup options

When a lookup query occurs, you can set a parameter on the max number of results, but we are currently ignoring that.

Make a github action building/posting images on creation of a release

We should also do this for ranker and ac.

Update all components to reasoner-pydantic 1.2.0.4

We are currently having a "X" on the arax webpage for trapi 1.2 validation. It's because of an error in reasoner-pydantic, but that has been fixed in 1.2.0.4. So we need to update aragorn, ranker, and AC.

Local ARAGORN

ARAGORN receives TRAPI queries, which may contain workflows. If a TRAPI query does not contain a workflow, a default workflow is run.

One of the possible workflow elements is "lookup" which consults data sources to provide all possible answers to the question. In current ARAGORN, lookup is implemented by making an asynchronous TRAPI call to strider, which implements a federated lookup to all translator KPs.

We want to make a Local ARAGORN. This would be another instance of the container, with a different configuration (Helm chart?)
When a lookup operation happens, then Local ARAGORN will not call strider, but will instead consult a single automat endpoint (either robokop or covidkop probably). It's not quite as simple as replacing the strider URL with the automat URL because automat can only be consulted synchronously.

Additionally, there will soon be another operation (infer or creative lookup) which will have different implementations in the two ARAGORNs: Federated ARAGORN will use mined rules to query strider, while Local ARAGORN will simply consult a database of pre-calculated results.

Implement Aragorn Service

We want a TRAPI 0.9.2 interface that the ARS can hit that does the following

Call strider
Take the result from strider and send it to the ranker
A. Omnicorp
B. Weight
C. Score
Optionally send the result of scoring to answercoalsce
Return the result

To handle coalescence we want an option at the level of message. So we want

{
  "message": {...},
  "coalesce": ""
}

Valid options for the coalesce should be "none", "graph", "ontology", and "property". "graph", "ontology", and "property" should be passed as options to the AC service. If coalesce is "none", don't call the coalsecer.

If the option does not exist, ... we decided on Friday to not coalesce, but I think we'll want to change that soon. Probably default to "graph" IMO.

TRAPI 1.2 implementation: Verify registration, including uptime status:pass

Verify registration, including uptime status:pass here

Include timing information (and more!) in log

Right now the logs are pretty sparse.
We should include at least what services got called, how long they ran, and some sense of the size of the components in and out.

Make use of asyncquery when calling components

With PR 29, we'll stand up an asyncquery version of aragorn. However, internally, aragorn will still be connecting to strider, AC, and ranker synchronously. We will next want to add asyncquery at this lower level so that we don't needlessly hold open a bunch of connections & fail on long running stuff.

GitHub releases for Demo tagged with “Dec2021Demo

Due by Nov 16.

Move endpoints to config

ARAGORN is a simple workflow engine, and it parcels out work to services at other endpoints. The urls are currently hardcoded, but these should be pulled out into a config to be changed on the fly.

Implement a data overlay

We now use omnicorp to add overlay edges. But I would like to be able to get overlay edges from everywhere. Both to improve scoring, but also to provide extra contexrt.

Remove invalid `filter_message_top_n` from the default workflow

It looks like we're including a filter_message_top_n workflow as a default (source) and it looks like that isn't a valid workflow. ARAX may have just improved their TRAPI validation because they are now showing an error

Make sure that the SRI services are also properly registered with SmartAPI

Discussed during the 7/22 Aragorn team meeting

Check out the meeting notes for background info: https://docs.google.com/document/d/1x_ukP9eyy8ahb6ZlZAg78bdf6H2XxkcfGKR_r3WRcuM/edit?pli=1#

Provide evidence by adding other edges/nodes

Suppose somebody looks up A->B. We go find that edge, and provide the evidence that we found in the KPs, like papers or p-values, etc...

But are there other graph elements that provide extra support (or conversely that reduce support). So for instance, if I find A-not->B that's something I should know and it should affect the ranking?

Or maybe we know that A->C->B can imply A->B or is often associated with A->B. If we then go find some C, does returning that information help convince a user?

@schatzkara @kennethmorton

Bad answer to (procedures treating cataracts)

Query:
https://github.com/NCATSTranslator/testing/blob/main/ars-requests/not-none/1.2/cataractTreatment.json
Results:
https://arax.ncats.io/?r=1ef9d36e-dd59-4248-b0fc-fb588a387010

The query is "procedure that treats cataracts". RTX KG2 is retuning a bunch of answers relating to kidneys, with very general (low IC) terms for the procedure (e.g. "Therapeutic Procedure").

Then AC happily says, "hey what do a bunch of kidney diseases and eye diseases have in common?" and finds some garbage high level node like "disease by anatomical region" and merges everything together.

Then the ranker looks at that, says "great, so many nodes!" and gives a high score.

So I think that there are multiple things that could be done here, affecting different components:

RTX I think shouldn't be returning those results, I have an issue into them
Should strider try to verify the subclass of and filter in cases when it thinks the KPs are wrong? How much trust vs verify do we need in strider?
AC should probably be tuned; I doubt that disease by anatomical feature should ever be considered an enrichment?
Ranker should downweight this answer based on the low IC of the "disease by anatomical feature"

Convert to 1.0

We will want a TRAPI 1.0 version of this interface as well.

Operations and Workflow

We need to expose available operations (once they are defined)
We need to handle the workflow input in the extended TRAPI schema. This will require calling individual services in the order proscribed in the input.

Incorect node category found - glucose as gene/protein

Glucose was returned as a gene or protein:

Query:
{ "edges": { "N1": { "constraints": [], "object": "n1", "predicates": [ "biolink:has_normalized_google_distance_with" ], "subject": "n0" }, "N2": { "constraints": [], "object": "n2", "predicates": [ "biolink:has_normalized_google_distance_with" ], "subject": "n0" }, "N3": { "constraints": [], "object": "n2", "predicates": [ "biolink:has_normalized_google_distance_with" ], "subject": "n1" }, "e00": { "constraints": [], "object": "n1", "subject": "n0" }, "e01": { "constraints": [], "object": "n1", "subject": "n2" } }, "nodes": { "n0": { "categories": [ "biolink:SmallMolecule" ], "constraints": [], "ids": [ "UMLS:C0034407" ], "is_set": false, "name": "Quinazolines" }, "n1": { "categories": [ "biolink:Gene" ], "constraints": [], "is_set": false }, "n2": { "categories": [ "biolink:Gene" ], "constraints": [], "ids": [ "NCBIGene:10628", "NCBIGene:22861", "NCBIGene:51085", "NCBIGene:1490", "NCBIGene:389692", "NCBIGene:3480", "NCBIGene:598", "NCBIGene:2308", "NCBIGene:22877", "NCBIGene:2033" ], "is_set": true, "name": "TXNIP, NLRP1, MLXIPL, CTGF, MAFA, IGF1R, BCL2L1, FOXO1, MLXIP, EP300" } } }

TRAPI TIMELINE: KPs complete v1.2 implementations and register in SmartAPI

Due Date: August 18
Note: Implement 1.2 with asynch.
TRAPI 1.1 to 1.2 Change Log.
ReasonerAPI Reference Ticket

Doubling Answers

This standup query:

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "id": "NCBIGENE:1017",
                    "category": "biolink:Gene"
                },
                "n1": {
                    "category": "biolink:Pathway"
                }
            },
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1"
                }
            }
        }
    }
}

Returns 116 results from strider.

AC does not do any aggregation, because we can't aggregate on pathways at the moment.

So we should get 116 results back.

We somehow get 232.

If I run each component by hand, I end up with only 116, i.e. none of the components directly seem to be doubling the answers.

EPC Corrections

Need to ensure this is working as expected.

Straighten out SMARTAPI registrations

https://aragorn.ci.transltr.io/1.2 is currently listed as 'x-maturity:development' but it should be 'x-maturity:staging'

https://aragorn.test.transltr.io/1.2 is listed ast 'x-maturity:test' but it should be 'x-maturity:testing'

Aragorn calling COHD with duplicate queries

I was testing the following query through ARS, and noticed Aragorn is calling COHD four times, two of which are duplicate queries. Two of the queries are inverses of each other (the subject and object nodes are swapped in the edge). COHD returns essentially the same results either way, but it seems reasonable for an ARA to do this. But the second two queries are exact duplicates of the first two. This issue is potentially related to #10.

The duplication of queries also occurred on the larger 3-hop query graph for December Demo's Workflow B, with Aragorn sending COHD 4 queries total (2 duplicates) for the single edge relevant to COHD (same as the query below)

I double checked our registration in SmartAPI, and I don't think COHD is double registered or anything, but please let me know if it might be something on our end that's causing this.

This is not a high priority issue for us.

{
  "message": {
    "query_graph": {
      "nodes": {
        "n1": {
          "categories": [
            "biolink:DiseaseOrPhenotypicFeature"
          ],
          "name": "Disease Or Phenotypic Feature"
        },
        "n0": {
          "ids": [
            "SNOMEDCT:197358007"
          ],
          "categories": [
            "biolink:DiseaseOrPhenotypicFeature"
          ],
          "name": "drug-induced liver injury"
        }
      },
      "edges": {
        "e0": {
          "subject": "n0",
          "object": "n1",
          "predicates": [
            "biolink:correlated_with"
          ]
        }
      }
    }
  },
  "logs": [],
  "status": null
}

Operations Exposed in SmartAPI Registry

Due: July 22, 2021

Details in architecture repo Git issue here.

Raising 500

m = {"message":{"query_graph":{
  "edges": {
    "e01": {
      "constraints": [],
      "object": "n0",
      "predicates": [
        "biolink:has_manifestation"
      ],
      "subject": "n1"
    }
  },
  "nodes": {
    "n0": {
      "categories": [
        "biolink:Disease"
      ],
      "constraints": [],
      "ids": [
        "MONDO:0004995"
      ],
      "fulltextname": "n0"
    },
    "n1": {
      "categories": [
        "biolink:PathologicalProcess"
      ],
      "constraints": [],
      "fulltextname": "n1"
    }
  }
}}}

Returns fine from strider with 14 results, then aragorn throws a 500.

Current tools all add “x-maturity: production” and “infores” to all existing SmartAPI registrations

Change default behavior

The default behavior now is to run answer coalsecence (graph style). We have 3 AC types, as well as "none". Often we will want none, and more to the point,

The user won't know which one they want a priori
There is no control on the ARS to specify what they do want

So the default is very important.

We need to (I think) run all 3 types of coalescence, and merge that set of results, as well as all the original results and return them.

Add tests

There are currently no tests on this repository or service.

Graceful failure when operations not implemented

NCATSTranslator/testing#113

Add Robokop ARA to the SmartAPI registry

Discussed during the 7/22 Aragorn meeting (agenda contains background info)

Improve error handling

Rather than checking for 0-size responses from the component tools, we should check for status codes.
We should pass along errors and logs from the underlying elements, esp strider.
If a component fails, we should return whatever we got to up to that point, rather than returning nothing. Everything after strider is a valid response, even if it does not have everything we want.

Dev tools stand up, register in smartapi with “x-maturity: development”

Due by Nov 8.

Add “x-maturity: production” and “translator-id” to all existing SmartAPI registrations

Due Nov 2.

500 error on one-hop with many results.

NCATSTranslator/testing#91

Returns > 3000 results from strider.

This is causing Aragorn is returning a 500, which is unexpected, since I think it should capture an AC error and continue.

ARAGORN removing parts of strider's message when empty?

Query:

{
    "message": {
        "query_graph": {
            "edges": {
                "e01": {
                    "object": "n0",
                    "subject": "n1",
                    "predicates": [
                        "biolink:negatively_regulates_entity_to_entity"
                    ]
                }
            },
            "nodes": {
                "n0": {
                    "ids": [
                        "NCBIGene:23221"
                    ],
                    "categories": [
                        "biolink:Gene"
                    ]
                },
                "n1": {
                    "categories": [
                        "biolink:Gene"
                    ]
                }
            }
        }
    }
}

Sent directly to strider, this produces no results, but this message:

{
    "query_graph": {
        "nodes": {
            "n1": {
                "ids": null,
                "categories": [
                    "biolink:Gene"
                ],
                "is_set": false,
                "constraints": null
            },
            "n0": {
                "ids": [
                    "NCBIGene:23221"
                ],
                "categories": [
                    "biolink:Gene"
                ],
                "is_set": false,
                "constraints": null
            }
        },
        "edges": {
            "e01": {
                "subject": "n1",
                "object": "n0",
                "predicates": [
                    "biolink:negatively_regulates_entity_to_entity"
                ],
                "relation": null,
                "constraints": null
            }
        }
    },
    "knowledge_graph": {
        "nodes": {},
        "edges": {}
    },
    "results": []
}

But when we call aragorn, we only get back the qg in the message:

{
    "query_graph": {
        "nodes": {
            "n0": {
                "ids": [
                    "NCBIGene:23221"
                ],
                "categories": [
                    "biolink:Gene"
                ],
                "is_set": false
            },
            "n1": {
                "categories": [
                    "biolink:Gene"
                ],
                "is_set": false
            }
        },
        "edges": {
            "e01": {
                "subject": "n1",
                "object": "n0",
                "predicates": [
                    "biolink:negatively_regulates_entity_to_entity"
                ]
            }
        }
    }
}

This is legal trapi, but it creates problems for downstream components (like arax) that don't expect this. Also, I don't see why we would do it.

Multiple deployments

We need 2 deployments of ARAGORN. One Prod and one Dev.

At runtime, there needs to be a config defining the environment
Based on the config, use the right components (all sub tools will need multiple deployments as well)
deploy both
Update smart api registry to point at the two different aragorns, with the correct x-maturity levels

Aragorn returning 2-hop results on 1-hop query (Workflow B.2x)

Top ranking results from B.2x queries have what looks like a 2-hop query format with a set of DiseaseOrPhenotypicFeature nodes in the middle and ChemicalEntity nodes on each end. Lower ranked results have the expected structure.

Sample Query:

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids": ["MESH:D000077385"],
                    "categories": [
                        "biolink:ChemicalEntity"
                    ],
                    "name": "Silybin"
                },
                "n1": {
                    "categories": [
                        "biolink:DiseaseOrPhenotypicFeature"
                    ]
                }
            },
            "edges": {
                "e0": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:related_to"]
                }
            }
        }
    }
}

Results:
https://arax.ncats.io/?r=964ae5cc-f9f1-4917-8ebb-0b95322e5fbf