Comments (18)
Theres a chance some of these are obsoleted classes or not in any ontology. You could put them into monarch to see how we categorize them via:
https://scigraph-data.monarchinitiative.org/scigraph/graph/{id}
for example:
HP:0004929 - obsolete Coronary atherosclerosis {class, node}
OMIM:612042 - Recombination Rate Quantitative Trait Locus 1 {class, node} (perhaps should be QTL or heritable phenotypic marker?)
Orphanet:98301 - Laminopathy {node} - should be eq to MONDO:0021106? ticket here - monarch-initiative/mondo#579
from kgx.
@kshefchek Thank you, I'll try that.
I looked up clinvar's two most freqent predicates GENO:0000845
and GENO:0000844
, which refer to has_uncertain_significance_for_condition
and likely_benign_for_condition
. Clearly the objects of these statements must be conditions/diseases.
Is there some way to automate the inference of node categories from the domain and range of predicates? Is there some owl file that provides the domain and range for GENO:0000845
and so on?
from kgx.
Here are the results of performing a clique merge on clinvar.ttl
, omim.ttl
, hpoa.ttl
, and orphanet.ttl
:
+------------------+-----------+
| Category | Frequency |
+------------------+-----------+
| sequence feature | 418119 |
| disease | 9834 |
| phenotype | 9263 |
| None | 4483 |
| gene | 4212 |
| GENO:0000871 | 288 |
| GENO:0000847 | 220 |
+------------------+-----------+
+------------------+------------------+-----------------+-----------+
| Subject Category | Predicate | Object Category | Frequency |
+------------------+------------------+-----------------+-----------+
| disease | has phenotype | phenotype | 214331 |
| sequence feature | GENO:0000845 | None | 173963 |
| sequence feature | GENO:0000844 | None | 106413 |
| sequence feature | GENO:0000845 | disease | 94257 |
| sequence feature | GENO:0000840 | disease | 68787 |
| sequence feature | GENO:0000843 | None | 63215 |
| sequence feature | GENO:0000844 | disease | 45765 |
| sequence feature | GENO:0000840 | None | 37594 |
| None | has phenotype | phenotype | 31460 |
| sequence feature | GENO:0000843 | disease | 25573 |
| sequence feature | GENO:0000841 | disease | 21379 |
| sequence feature | GENO:0000841 | None | 19807 |
| sequence feature | causes condition | disease | 6113 |
| disease | has disposition | HP:0000005 | 5565 |
| gene | causes condition | disease | 4162 |
| phenotype | has phenotype | phenotype | 2179 |
| sequence feature | GENO:0000840 | phenotype | 1558 |
| disease | has disposition | HP:0031797 | 1306 |
| sequence feature | GENO:0000845 | phenotype | 1054 |
| disease | has phenotype | None | 904 |
| gene | RO:0003304 | disease | 830 |
| gene | RO:0002326 | disease | 778 |
| sequence feature | GENO:0000844 | phenotype | 580 |
| disease | has disposition | HP:0012823 | 537 |
| None | causes condition | disease | 472 |
| sequence feature | GENO:0000843 | phenotype | 372 |
| gene | RO:0002607 | disease | 370 |
| sequence feature | GENO:0000841 | phenotype | 368 |
| gene | causes condition | phenotype | 288 |
| None | has phenotype | None | 277 |
| gene | RO:0002326 | phenotype | 246 |
| disease | has phenotype | HP:0031797 | 245 |
| None | RO:0002326 | phenotype | 234 |
| sequence feature | causes condition | phenotype | 228 |
| GENO:0000871 | GENO:0000840 | disease | 218 |
| GENO:0000847 | GENO:0000840 | disease | 204 |
| sequence feature | GENO:0000845 | gene | 204 |
| gene | has phenotype | phenotype | 153 |
| None | RO:0002326 | disease | 132 |
| sequence feature | GENO:0000844 | gene | 123 |
| sequence feature | GENO:0000840 | gene | 122 |
| None | has disposition | HP:0000005 | 102 |
| gene | RO:0002607 | phenotype | 102 |
+------------------+------------------+-----------------+-----------+
from kgx.
I'd expect to see a count of genes
also 'sequence feature' is quite generic, are these variants from clinvar
I think you may need another source to be complete with gene clique merges, dipper team can advise
from kgx.
I'd expect to see a count of genes
The count is in the first table: there are 4,212 genes. Am I misunderstanding what you mean?
also 'sequence feature' is quite generic, are these variants from clinvar
Yes, I was mapping http://purl.obolibrary.org/obo/GENO_0000002 onto sequence feature. I don't remember why I was doing this, because the web page says its name is "variant allele". I think I made a mistake, and intended to map it onto "sequence variant" from the biolink model.
Should I change it from "sequence feature" to "sequence variant"? Or should I try to use more fine grained categories?
from kgx.
from kgx.
Summary of hgnc.ttl after being parsed:
+------------------------------------+--------------------------------------------------+-----------+
| Uncategorized Example Base IRI | Uncategorized Example Full IRI | Frequency |
+------------------------------------+--------------------------------------------------+-----------+
| http://purl.obolibrary.org/obo/CHR | http://purl.obolibrary.org/obo/CHR_9606chr2q12.3 | 1084 |
+------------------------------------+--------------------------------------------------+-----------+
+------------------+-----------+
| Category | Frequency |
+------------------+-----------+
| gene | 19237 |
| SO:0000336 | 13001 | pseudogene
| SO:0001877 | 4062 | lnc_RNA
| SO:0001265 | 1914 | miRNA_gene
| None | 1084 |
| SO:0001267 | 567 | snoRNA_gene
| SO:0001500 | 518 | heritable_phenotypic_marker
| SO:0001272 | 410 | tRNA_gene
| sequence feature | 295 |
| SO:0002122 | 228 | immunoglobulin_gene
| SO:0000460 | 207 | vertebrate_immunoglobulin_T_cell_receptor_segment
| SO:0002098 | 203 | immunoglobulin_pseudogene
| SO:0000655 | 150 | ncRNA
| SO:0000883 | 125 | stop_codon_read_through
| SO:0001411 | 117 | biological_region
| SO:0000100 | 109 | endogenous_retroviral_gene
+------------------+-----------+
+------------------+-----------------+------------------+-----------+
| Subject Category | Predicate | Object Category | Frequency |
+------------------+-----------------+------------------+-----------+
| None | has_subsequence | gene | 38462 |
| None | has_subsequence | SO:0000336 | 25982 |
| None | has_subsequence | SO:0001877 | 8122 |
| None | has_subsequence | SO:0001265 | 3828 |
| None | has_subsequence | SO:0001267 | 1134 |
| None | has_subsequence | SO:0001500 | 966 |
| None | has_subsequence | SO:0001272 | 774 |
| None | has_subsequence | sequence feature | 556 |
| None | has_subsequence | SO:0002122 | 456 |
| None | has_subsequence | SO:0000460 | 414 |
| None | has_subsequence | SO:0002098 | 404 |
| None | has_subsequence | SO:0000883 | 250 |
| None | has_subsequence | SO:0001411 | 234 |
| None | has_subsequence | SO:0000100 | 216 |
| None | has_subsequence | SO:0000651 | 100 |
+------------------+-----------------+------------------+-----------+
from kgx.
This is the result of performing the clique merge on everything plus hgnc.ttl (omitting the edge summary for now):
|Nodes|=484546
|Edges|=1010849
+----------------+-----------------------------------------+-----------+
| Prefix | Category | Frequency |
+----------------+-----------------------------------------+-----------+
| CHR | None | 1084 |
| HGNC | gene | 19224 |
| HGNC | SO:0000336 | 13001 |
| HGNC | SO:0001877 | 4061 |
| HGNC | SO:0001265 | 1914 |
| HGNC | sequence feature | 297 |
| HGNC | SO:0001267 | 566 |
| HGNC | SO:0001500 | 517 |
| HGNC | SO:0001268 | 37 |
| HGNC | SO:0001272 | 410 |
| HGNC | SO:0001411 | 117 |
| HGNC | SO:0000100 | 109 |
| HGNC | SO:0000460 | 207 |
| HGNC | SO:0000655 | 150 |
| HGNC | SO:0002098 | 203 |
| HGNC | SO:0000001 | 24 |
| HGNC | SO:0002122 | 228 |
| HGNC | SO:0000651 | 52 |
| HGNC | SO:0000883 | 125 |
| HGNC | GENO:0000418 | 27 |
| HGNC | SO:0002099 | 36 |
| HGNC | SO:0000405 | 4 |
| HGNC | disease | 28 |
| HGNC | SO:0000946 | 8 |
| HGNC | phenotype | 7 |
| HGNC | SO:0000404 | 4 |
| HGNC | SO:0001266 | 3 |
| ClinVarVariant | sequence feature | 411841 |
| MedGen | None | 1919 |
| OMIM | disease | 5234 |
| OMIM | phenotype | 878 |
| OMIM | None | 250 |
| Orphanet | disease | 1834 |
| ClinVarVariant | GENO:0000847 | 220 |
| ClinVarVariant | GENO:0000871 | 288 |
| ClinVarVariant | None | 196 |
| ClinVarVariant | GENO:0000848 | 97 |
| Orphanet | None | 49 |
| OMIM | owl:AnnotationProperty | 2 |
| Orphanet | http://www.orpha.net/ORDO/ObsoleteClass | 3 |
| https | None | 987 |
| https | sequence feature | 61 |
| BNODE | sequence feature | 6215 |
| OMIM | gene | 10 |
| BNODE | None | 6 |
| OMIM | owl:Axiom | 4 |
| DOID | disease | 2282 |
| HP | phenotype | 8326 |
| MESH | disease | 456 |
| MESH | None | 590 |
| DOID | None | 315 |
| HP | None | 87 |
| HP | HP:0031797 | 23 |
| HP | HP:0000005 | 23 |
| DECIPHER | None | 47 |
| HP | HP:0012823 | 5 |
| Orphanet | phenotype | 48 |
| DOID | phenotype | 3 |
| DOID | owl:Axiom | 1 |
| Orphanet | Orphanet:C001 | 3 |
| Orphanet | gene | 1 |
| Orphanet | owl:Restriction | 1 |
+----------------+-----------------------------------------+-----------+
from kgx.
from kgx.
@cmungall It's this one here. HGNC:7415
, ENSEMBL:ENSG00000228253
, and OMIM:516070
don't appear elsewhere in the file, which is why it didn't match anything.
{
"category":[
"gene"
],
"description":[
"mitochondrially encoded ATP synthase membrane subunit 8"
],
"id":"Orphanet:159773",
"iri":"http://www.orpha.net/ORDO/Orphanet_159773",
"name":[
"MT-ATP8"
],
"provided_by":[
"orphanet.ttl"
],
"same_as":[
"HGNC:7415",
"ENSEMBL:ENSG00000228253",
"OMIM:516070"
],
"synonyms":[
"A6L",
"mitochondrially encoded ATP synthase membrane subunit A6L",
"ATP8"
],
"type":[
"owl:Class"
]
}
from kgx.
from kgx.
Yes, and it exists in hgnc.ttl. It wasn't loaded because no edges were loaded for it. The RdfTransformer works by first loading all edges, and then loading all the nodes connected to those edges. And there were no edges because I'm treating equivalence predicates as node properties rather than edges.
I'll make it so that even isolated nodes get loaded and then re-run the scripts.
Edit: I only made this change for HGNC.
from kgx.
I re-ran the workflow (main.py) from scratch. Here is the summary of the resulting graph:
$ kgx node-summary clique_merged.json
|Nodes|=446507
|Edges|=928655
Reading knowledge graph [####################################] 100%
xref prefixes: OMIM, NCBIGene, ENSEMBL, UMLS, HGNC, Orphanet
+----------------+------------------+-----------+
| Prefix | Category | Frequency |
+----------------+------------------+-----------+
| ClinVarVariant | sequence feature | 411841 |
| OMIM | phenotype | 4044 |
| MedGen | None | 1919 |
| OMIM | None | 2212 |
| Orphanet | None | 1938 |
| ClinVarVariant | GENO:0000847 | 220 |
| ClinVarVariant | GENO:0000871 | 288 |
| NCBIGene | gene | 125 |
| ClinVarVariant | GENO:0000848 | 97 |
| OMIM | disease | 84 |
| ClinVarVariant | None | 196 |
| OMIM | gene | 3856 |
| OMIM | HP:0031859 | 2 |
| NCBIGene | None | 987 |
| BNODE | SO:0001500 | 184 |
| OMIM | GENO:0000418 | 33 |
| NCBIGene | SO:0001500 | 61 |
| OMIM | SO:0001263 | 4 |
| BNODE | None | 6 |
| HP | None | 8464 |
| DOID | None | 2601 |
| MESH | None | 1046 |
| DECIPHER | None | 47 |
| BNODE | sequence variant | 6031 |
| Orphanet | gene | 218 |
| Orphanet | SO:0001263 | 13 |
| Orphanet | sequence feature | 2 |
+----------------+------------------+-----------+
+------------------+-----------+
| Category | Frequency |
+------------------+-----------+
| sequence feature | 411843 |
| phenotype | 4044 |
| None | 19416 |
| GENO:0000847 | 220 |
| GENO:0000871 | 288 |
| gene | 4199 |
| GENO:0000848 | 97 |
| disease | 84 |
| HP:0031859 | 2 |
| SO:0001500 | 245 |
| GENO:0000418 | 33 |
| SO:0001263 | 17 |
| sequence variant | 6031 |
+------------------+-----------+
+----------------+-----------+
| Prefixes | Frequency |
+----------------+-----------+
| ClinVarVariant | 412642 |
| OMIM | 10235 |
| MedGen | 1919 |
| Orphanet | 2171 |
| NCBIGene | 1173 |
| BNODE | 6221 |
| HP | 8464 |
| DOID | 2601 |
| MESH | 1046 |
| DECIPHER | 47 |
+----------------+-----------+
$ kgx edge-summary clique_merged.json
|Nodes|=446507
|Edges|=928655
Reading knowledge graph [####################################] 100%
+----------------+------------------+-----------------------------------------------+---------------+-----------------+-----------+
| Subject Prefix | Subject Category | Predicate | Object Prefix | Object Category | Frequency |
+----------------+------------------+-----------------------------------------------+---------------+-----------------+-----------+
| ClinVarVariant | sequence feature | GENO:0000840 | OMIM | phenotype | 68182 |
| OMIM | phenotype | has phenotype | HP | None | 75821 |
| OMIM | phenotype | has disposition | HP | None | 5482 |
| ClinVarVariant | sequence feature | GENO:0000844 | MedGen | None | 95210 |
| ClinVarVariant | sequence feature | GENO:0000843 | MedGen | None | 58763 |
| ClinVarVariant | sequence feature | GENO:0000845 | MedGen | None | 149396 |
| ClinVarVariant | sequence feature | GENO:0000845 | OMIM | phenotype | 89444 |
| ClinVarVariant | sequence feature | GENO:0000845 | OMIM | None | 26198 |
| ClinVarVariant | sequence feature | GENO:0000844 | OMIM | phenotype | 43690 |
| ClinVarVariant | sequence feature | GENO:0000840 | MedGen | None | 31021 |
| ClinVarVariant | sequence feature | GENO:0000841 | MedGen | None | 17799 |
| ClinVarVariant | sequence feature | GENO:0000841 | OMIM | phenotype | 20612 |
| ClinVarVariant | sequence feature | GENO:0000843 | OMIM | None | 4779 |
| ClinVarVariant | sequence feature | GENO:0000844 | OMIM | None | 12053 |
| ClinVarVariant | sequence feature | GENO:0000843 | OMIM | phenotype | 24874 |
| ClinVarVariant | sequence feature | GENO:0000840 | OMIM | None | 7130 |
| ClinVarVariant | sequence feature | GENO:0000840 | Orphanet | None | 662 |
| OMIM | phenotype | RO:0002326 | OMIM | phenotype | 28 |
| ClinVarVariant | sequence feature | GENO:0000845 | Orphanet | None | 3335 |
| Orphanet | None | has phenotype | HP | None | 24675 |
| ClinVarVariant | GENO:0000847 | GENO:0000840 | OMIM | phenotype | 205 |
| ClinVarVariant | sequence feature | GENO:0000841 | OMIM | None | 2459 |
| ClinVarVariant | sequence feature | GENO:0000841 | Orphanet | None | 467 |
| ClinVarVariant | sequence feature | GENO:0000843 | Orphanet | None | 381 |
| OMIM | None | has phenotype | HP | None | 14120 |
| ClinVarVariant | GENO:0000871 | GENO:0000845 | MedGen | None | 10 |
| ClinVarVariant | sequence feature | GENO:0000845 | NCBIGene | gene | 204 |
| NCBIGene | gene | causes condition | OMIM | phenotype | 121 |
| NCBIGene | gene | causes condition | OMIM | disease | 2 |
| NCBIGene | gene | RO:0002326 | OMIM | phenotype | 31 |
| NCBIGene | gene | causes condition | NCBIGene | gene | 85 |
| NCBIGene | gene | RO:0002607 | OMIM | phenotype | 17 |
| NCBIGene | gene | RO:0003304 | OMIM | phenotype | 25 |
| ClinVarVariant | sequence feature | GENO:0000844 | Orphanet | None | 1253 |
| OMIM | None | has disposition | HP | None | 2033 |
| ClinVarVariant | GENO:0000848 | GENO:0000840 | OMIM | phenotype | 60 |
| ClinVarVariant | sequence feature | GENO:0000840 | NCBIGene | gene | 120 |
| ClinVarVariant | sequence feature | GENO:0000841 | OMIM | disease | 11 |
| OMIM | disease | has phenotype | HP | None | 252 |
| OMIM | disease | has disposition | HP | None | 31 |
| ClinVarVariant | sequence feature | http://monarchinitiative.orghas_drug_response | MedGen | None | 31 |
| ClinVarVariant | GENO:0000871 | GENO:0000840 | OMIM | phenotype | 218 |
| ClinVarVariant | sequence feature | GENO:0000844 | NCBIGene | gene | 123 |
| ClinVarVariant | GENO:0000848 | GENO:0000845 | OMIM | phenotype | 19 |
| ClinVarVariant | None | GENO:0000840 | MedGen | None | 19 |
| ClinVarVariant | sequence feature | http://monarchinitiative.orghas_drug_response | OMIM | phenotype | 18 |
| ClinVarVariant | sequence feature | GENO:0000841 | NCBIGene | gene | 23 |
| ClinVarVariant | None | GENO:0000840 | OMIM | phenotype | 84 |
| ClinVarVariant | GENO:0000847 | GENO:0000845 | MedGen | None | 6 |
| ClinVarVariant | GENO:0000847 | GENO:0000840 | MedGen | None | 12 |
| ClinVarVariant | GENO:0000847 | GENO:0000844 | OMIM | phenotype | 2 |
| ClinVarVariant | sequence feature | GENO:0000840 | OMIM | HP:0031859 | 1 |
| ClinVarVariant | GENO:0000847 | GENO:0000843 | MedGen | None | 7 |
| ClinVarVariant | None | GENO:0000841 | OMIM | phenotype | 67 |
| ClinVarVariant | sequence feature | GENO:0000845 | OMIM | disease | 4 |
| ClinVarVariant | GENO:0000871 | GENO:0000841 | OMIM | phenotype | 29 |
| ClinVarVariant | GENO:0000871 | GENO:0000840 | MedGen | None | 21 |
| ClinVarVariant | GENO:0000871 | GENO:0000845 | OMIM | phenotype | 8 |
| ClinVarVariant | sequence feature | GENO:0000840 | OMIM | disease | 14 |
| ClinVarVariant | GENO:0000848 | GENO:0000844 | OMIM | phenotype | 3 |
| ClinVarVariant | sequence feature | GENO:0000843 | NCBIGene | gene | 14 |
| ClinVarVariant | None | GENO:0000841 | MedGen | None | 4 |
| ClinVarVariant | GENO:0000848 | GENO:0000843 | OMIM | phenotype | 13 |
| ClinVarVariant | None | GENO:0000840 | OMIM | None | 2 |
| ClinVarVariant | GENO:0000871 | GENO:0000845 | OMIM | None | 4 |
| ClinVarVariant | None | GENO:0000845 | MedGen | None | 5 |
| ClinVarVariant | GENO:0000871 | GENO:0000843 | OMIM | phenotype | 1 |
| ClinVarVariant | GENO:0000847 | GENO:0000841 | MedGen | None | 5 |
| ClinVarVariant | GENO:0000847 | GENO:0000843 | OMIM | phenotype | 12 |
| ClinVarVariant | GENO:0000871 | GENO:0000844 | OMIM | phenotype | 1 |
| ClinVarVariant | None | GENO:0000845 | Orphanet | None | 1 |
| ClinVarVariant | sequence feature | GENO:0000840 | OMIM | gene | 2 |
| OMIM | gene | has phenotype | HP | None | 110 |
| OMIM | gene | has disposition | HP | None | 29 |
| OMIM | gene | RO:0002326 | OMIM | gene | 1 |
| OMIM | gene | causes condition | OMIM | phenotype | 4201 |
| ClinVarVariant | GENO:0000847 | GENO:0000844 | MedGen | None | 1 |
| ClinVarVariant | GENO:0000871 | GENO:0000845 | Orphanet | None | 1 |
| ClinVarVariant | GENO:0000847 | GENO:0000845 | OMIM | phenotype | 5 |
| ClinVarVariant | None | GENO:0000845 | OMIM | None | 2 |
| ClinVarVariant | GENO:0000871 | GENO:0000841 | OMIM | None | 1 |
| ClinVarVariant | sequence feature | GENO:0000841 | OMIM | HP:0031859 | 1 |
| ClinVarVariant | GENO:0000847 | GENO:0000841 | OMIM | phenotype | 6 |
| ClinVarVariant | sequence feature | related to | OMIM | phenotype | 2 |
| ClinVarVariant | None | GENO:0000845 | OMIM | phenotype | 4 |
| ClinVarVariant | GENO:0000847 | GENO:0000845 | OMIM | None | 1 |
| ClinVarVariant | GENO:0000848 | GENO:0000841 | OMIM | phenotype | 2 |
| ClinVarVariant | None | GENO:0000844 | OMIM | None | 3 |
| ClinVarVariant | GENO:0000871 | GENO:0000841 | MedGen | None | 3 |
| ClinVarVariant | None | GENO:0000844 | OMIM | phenotype | 1 |
| ClinVarVariant | GENO:0000871 | GENO:0000840 | OMIM | None | 2 |
| ClinVarVariant | None | GENO:0000840 | Orphanet | None | 1 |
| ClinVarVariant | GENO:0000871 | GENO:0000844 | MedGen | None | 1 |
| ClinVarVariant | None | GENO:0000841 | OMIM | None | 3 |
| ClinVarVariant | GENO:0000871 | http://monarchinitiative.orghas_drug_response | OMIM | phenotype | 1 |
| OMIM | gene | RO:0002607 | OMIM | phenotype | 290 |
| OMIM | gene | RO:0003304 | OMIM | phenotype | 305 |
| OMIM | gene | RO:0002326 | OMIM | phenotype | 943 |
| OMIM | gene | RO:0003304 | Orphanet | None | 262 |
| NCBIGene | None | RO:0002326 | OMIM | phenotype | 354 |
| NCBIGene | gene | RO:0003304 | Orphanet | None | 48 |
| NCBIGene | None | causes condition | OMIM | phenotype | 537 |
| NCBIGene | gene | RO:0002326 | NCBIGene | gene | 33 |
| BNODE | SO:0001500 | causes condition | OMIM | phenotype | 157 |
| NCBIGene | None | RO:0002607 | OMIM | phenotype | 60 |
| OMIM | gene | causes condition | OMIM | disease | 11 |
| NCBIGene | None | RO:0002326 | OMIM | GENO:0000418 | 13 |
| OMIM | GENO:0000418 | causes condition | OMIM | phenotype | 29 |
| OMIM | gene | RO:0002607 | Orphanet | None | 98 |
| OMIM | GENO:0000418 | RO:0002607 | Orphanet | None | 7 |
| OMIM | GENO:0000418 | RO:0002607 | OMIM | phenotype | 12 |
| OMIM | phenotype | causes condition | OMIM | phenotype | 5 |
| NCBIGene | gene | has phenotype | HP | None | 43 |
| NCBIGene | gene | has disposition | HP | None | 13 |
| NCBIGene | gene | RO:0002607 | NCBIGene | gene | 11 |
| NCBIGene | gene | RO:0003304 | OMIM | disease | 1 |
| NCBIGene | SO:0001500 | has phenotype | OMIM | disease | 61 |
| NCBIGene | SO:0001500 | RO:0002326 | OMIM | disease | 9 |
| BNODE | SO:0001500 | RO:0002326 | OMIM | phenotype | 25 |
| NCBIGene | None | causes condition | OMIM | GENO:0000418 | 11 |
| OMIM | GENO:0000418 | has phenotype | HP | None | 21 |
| OMIM | GENO:0000418 | has disposition | HP | None | 5 |
| NCBIGene | gene | RO:0002607 | Orphanet | None | 10 |
| OMIM | GENO:0000418 | RO:0002326 | OMIM | phenotype | 21 |
| BNODE | SO:0001500 | RO:0002607 | OMIM | disease | 2 |
| OMIM | gene | causes condition | OMIM | gene | 2 |
| NCBIGene | gene | RO:0002607 | OMIM | disease | 1 |
| OMIM | SO:0001263 | RO:0003304 | OMIM | phenotype | 4 |
| OMIM | SO:0001263 | causes condition | OMIM | phenotype | 4 |
| OMIM | SO:0001263 | RO:0003304 | Orphanet | None | 5 |
| NCBIGene | SO:0001500 | causes condition | OMIM | disease | 7 |
| NCBIGene | None | RO:0002326 | OMIM | gene | 6 |
| NCBIGene | None | RO:0002607 | OMIM | gene | 2 |
| OMIM | gene | causes condition | NCBIGene | gene | 1 |
| OMIM | gene | RO:0002326 | OMIM | disease | 5 |
| OMIM | GENO:0000418 | RO:0003304 | OMIM | phenotype | 5 |
| OMIM | gene | RO:0002607 | OMIM | disease | 2 |
| BNODE | SO:0001500 | RO:0002326 | OMIM | disease | 1 |
| OMIM | GENO:0000418 | RO:0003304 | Orphanet | None | 4 |
| NCBIGene | None | RO:0002607 | OMIM | GENO:0000418 | 8 |
| BNODE | None | causes condition | OMIM | HP:0031859 | 2 |
| BNODE | SO:0001500 | causes condition | OMIM | disease | 2 |
| NCBIGene | SO:0001500 | RO:0002607 | OMIM | disease | 2 |
| NCBIGene | None | causes condition | OMIM | gene | 3 |
| OMIM | disease | causes condition | OMIM | phenotype | 5 |
| BNODE | None | causes condition | OMIM | disease | 2 |
| BNODE | None | causes condition | NCBIGene | gene | 1 |
| BNODE | SO:0001500 | RO:0002607 | OMIM | phenotype | 1 |
| NCBIGene | None | causes condition | OMIM | disease | 2 |
| BNODE | None | RO:0002326 | OMIM | gene | 1 |
| NCBIGene | None | RO:0002326 | NCBIGene | gene | 1 |
| NCBIGene | None | RO:0002326 | OMIM | disease | 2 |
| NCBIGene | None | causes condition | NCBIGene | gene | 1 |
| NCBIGene | SO:0001500 | has phenotype | OMIM | phenotype | 1 |
| NCBIGene | SO:0001500 | causes condition | OMIM | phenotype | 1 |
| DOID | None | has phenotype | HP | None | 95540 |
| MESH | None | has phenotype | HP | None | 36961 |
| DECIPHER | None | has phenotype | HP | None | 284 |
| DECIPHER | None | has disposition | HP | None | 1 |
| BNODE | sequence variant | causes condition | OMIM | phenotype | 5430 |
| BNODE | sequence variant | causes condition | Orphanet | None | 562 |
| BNODE | sequence variant | RO:0003304 | OMIM | phenotype | 36 |
| Orphanet | gene | RO:0003304 | OMIM | phenotype | 97 |
| Orphanet | SO:0001263 | RO:0003304 | Orphanet | None | 9 |
| Orphanet | gene | RO:0003304 | Orphanet | None | 88 |
| Orphanet | gene | RO:0002607 | OMIM | phenotype | 47 |
| Orphanet | SO:0001263 | RO:0002607 | Orphanet | None | 9 |
| Orphanet | SO:0001263 | RO:0003304 | OMIM | phenotype | 2 |
| Orphanet | gene | RO:0002607 | Orphanet | None | 20 |
| BNODE | sequence variant | RO:0003304 | Orphanet | None | 2 |
| Orphanet | sequence feature | RO:0002607 | OMIM | phenotype | 1 |
| BNODE | sequence variant | causes condition | OMIM | disease | 1 |
| Orphanet | sequence feature | RO:0003304 | OMIM | phenotype | 1 |
| Orphanet | SO:0001263 | RO:0002607 | OMIM | phenotype | 1 |
| BNODE | sequence variant | causes condition | OMIM | gene | 1 |
+----------------+------------------+-----------------------------------------------+---------------+-----------------+-----------+
from kgx.
from kgx.
Once you have clique-merged, do you dispose of non-leader nodes? I wouldn't have expected so many OMIM genes
Assuming by "leader" you mean the node that each member of a clique is mapped to, no. I don't delete any nodes beyond what gets merged. But there is only a single node per clique since I'm not loading equivalencies as edges in the graph.
But if you were expecting that the genes would be Orphanet or HGNC rather than OMIM, I didn't write the clique merging method to be very selective about which prefixes it keeps.
I could make it so that particular prefixes are preferred for particular categories.
from kgx.
from kgx.
@cmungall Modified the clique_merge function to choose in the order you suggested:
Edit: Actually it uses the lowest index of a prefix match from the id_prefix lists of the categories of the clique as a sort key for the nodes in that clique. Since some cliques have multiple categories this doesn't always guarantee a particular order, so I do an ordinary sort first. And you can see this in the genes: NCBIGene is the most common prefix to use, then ENSEMBL, and then HGNC. And that's the order of the first three items in gene's id_prefix list.
Edit2: I must have run the summary script on the wrong output file, it looked like hgnc.ttl was absent. This now is the correct summary of the clique merge of all turtle files including hgnc.ttl:
$ kgx node-summary clique_merged.json
|Nodes|=484434
|Edges|=1010841
Reading knowledge graph [####################################] 100%
xref prefixes: UMLS, NCBIGene, Orphanet, ENSEMBL, OMIM, HGNC
+----------------+------------------+-----------+
| Prefix | Category | Frequency |
+----------------+------------------+-----------+
| CHR | None | 1084 |
| ENSEMBL | SO:0002098 | 196 |
| NCBIGene | gene | 19225 |
| ENSEMBL | SO:0000336 | 11188 |
| ENSEMBL | SO:0001877 | 3923 |
| HGNC | SO:0001272 | 558 |
| ENSEMBL | SO:0000883 | 122 |
| HGNC | SO:0000100 | 100 |
| ENSEMBL | SO:0001265 | 1850 |
| HGNC | SO:0000336 | 1829 |
| ENSEMBL | SO:0001267 | 434 |
| ENSEMBL | SO:0000460 | 198 |
| HGNC | SO:0001500 | 254 |
| ENSEMBL | SO:0000404 | 4 |
| ENSEMBL | SO:0001263 | 13 |
| NCBIGene | GENO:0000418 | 33 |
| HGNC | SO:0000651 | 40 |
| HGNC | SO:0001411 | 117 |
| ENSEMBL | SO:0000651 | 20 |
| HGNC | SO:0002122 | 14 |
| ENSEMBL | sequence feature | 227 |
| HGNC | SO:0001877 | 129 |
| HGNC | None | 307 |
| HGNC | sequence feature | 90 |
| ENSEMBL | SO:0002122 | 208 |
| HGNC | SO:0001265 | 60 |
| HGNC | SO:0000655 | 146 |
| HGNC | SO:0001267 | 132 |
| ENSEMBL | disease | 12 |
| ENSEMBL | SO:0001500 | 15 |
| HGNC | SO:0000883 | 3 |
| ENSEMBL | SO:0002099 | 34 |
| NCBIGene | SO:0001877 | 9 |
| HGNC | SO:0000946 | 4 |
| ENSEMBL | SO:0001272 | 24 |
| NCBIGene | SO:0001267 | 1 |
| HGNC | SO:0000405 | 1 |
| NCBIGene | SO:0001265 | 4 |
| ENSEMBL | SO:0000655 | 2 |
| HGNC | SO:0002098 | 8 |
| ENSEMBL | SO:0001266 | 3 |
| NCBIGene | SO:0002122 | 6 |
| HGNC | SO:0000001 | 23 |
| ENSEMBL | SO:0001268 | 33 |
| ENSEMBL | SO:0000405 | 3 |
| NCBIGene | SO:0001263 | 4 |
| NCBIGene | SO:0000336 | 1 |
| NCBIGene | SO:0001500 | 32 |
| HGNC | disease | 8 |
| HGNC | SO:0000460 | 8 |
| HGNC | SO:0002099 | 2 |
| HGNC | phenotype | 8 |
| NCBIGene | disease | 5 |
| NCBIGene | phenotype | 3 |
| ENSEMBL | SO:0000100 | 9 |
| NCBIGene | SO:0000460 | 1 |
| NCBIGene | SO:0000655 | 2 |
| HGNC | SO:0001268 | 3 |
| NCBIGene | SO:0000001 | 1 |
| NCBIGene | SO:0001268 | 1 |
| HGNC | gene | 1 |
| ClinVarVariant | sequence feature | 411841 |
| OMIM | phenotype | 1935 |
| MedGen | None | 1919 |
| OMIM | None | 4275 |
| Orphanet | None | 1938 |
| ClinVarVariant | GENO:0000847 | 220 |
| ClinVarVariant | GENO:0000871 | 288 |
| ClinVarVariant | GENO:0000848 | 97 |
| ClinVarVariant | None | 196 |
| OMIM | disease | 54 |
| OMIM | HP:0031859 | 2 |
| NCBIGene | None | 637 |
| BNODE | SO:0001500 | 184 |
| BNODE | None | 6 |
| OMIM | gene | 1 |
| HP | None | 8464 |
| DOID | None | 2601 |
| MESH | None | 1046 |
| DECIPHER | None | 47 |
| BNODE | sequence variant | 6031 |
+----------------+------------------+-----------+
+------------------+-----------+
| Category | Frequency |
+------------------+-----------+
| None | 22520 |
| SO:0002098 | 204 |
| gene | 19227 |
| SO:0000336 | 13018 |
| SO:0001877 | 4061 |
| SO:0001272 | 582 |
| SO:0000883 | 125 |
| SO:0000100 | 109 |
| SO:0001265 | 1914 |
| SO:0001267 | 567 |
| SO:0000460 | 207 |
| SO:0001500 | 485 |
| SO:0000404 | 4 |
| SO:0001263 | 17 |
| GENO:0000418 | 33 |
| SO:0000651 | 60 |
| SO:0001411 | 117 |
| SO:0002122 | 228 |
| sequence feature | 412158 |
| SO:0000655 | 150 |
| disease | 79 |
| SO:0002099 | 36 |
| SO:0000946 | 4 |
| SO:0000405 | 4 |
| SO:0001266 | 3 |
| SO:0000001 | 24 |
| SO:0001268 | 37 |
| phenotype | 1946 |
| GENO:0000847 | 220 |
| GENO:0000871 | 288 |
| GENO:0000848 | 97 |
| HP:0031859 | 2 |
| sequence variant | 6031 |
+------------------+-----------+
+----------------+-----------+
| Prefixes | Frequency |
+----------------+-----------+
| CHR | 1084 |
| ENSEMBL | 18518 |
| NCBIGene | 19965 |
| HGNC | 3845 |
| ClinVarVariant | 412642 |
| OMIM | 6267 |
| MedGen | 1919 |
| Orphanet | 1938 |
| BNODE | 6221 |
| HP | 8464 |
| DOID | 2601 |
| MESH | 1046 |
| DECIPHER | 47 |
+----------------+-----------+
$ kgx edge-summary clique_merged.json
|Nodes|=484434
|Edges|=1010841
Reading knowledge graph [####################################] 100%
+----------------+------------------+-----------------------------------------------+---------------+------------------+-----------+
| Subject Prefix | Subject Category | Predicate | Object Prefix | Object Category | Frequency |
+----------------+------------------+-----------------------------------------------+---------------+------------------+-----------+
| CHR | None | has_subsequence | ENSEMBL | SO:0002098 | 392 |
| CHR | None | has_subsequence | NCBIGene | gene | 38510 |
| CHR | None | has_subsequence | ENSEMBL | SO:0001877 | 7846 |
| CHR | None | has_subsequence | HGNC | SO:0001267 | 264 |
| CHR | None | has_subsequence | ENSEMBL | SO:0000336 | 22370 |
| CHR | None | has_subsequence | ENSEMBL | SO:0002122 | 416 |
| CHR | None | has_subsequence | ENSEMBL | SO:0001265 | 3700 |
| CHR | None | has_subsequence | HGNC | SO:0001877 | 256 |
| CHR | None | has_subsequence | HGNC | SO:0002098 | 12 |
| CHR | None | has_subsequence | ENSEMBL | sequence feature | 454 |
| CHR | None | has_subsequence | ENSEMBL | SO:0000883 | 244 |
| CHR | None | has_subsequence | ENSEMBL | SO:0001267 | 868 |
| CHR | None | has_subsequence | HGNC | SO:0000336 | 3606 |
| CHR | None | has_subsequence | HGNC | SO:0001265 | 120 |
| NCBIGene | gene | causes condition | OMIM | phenotype | 2376 |
| NCBIGene | gene | has disposition | HP | None | 89 |
| NCBIGene | gene | has phenotype | HP | None | 378 |
| CHR | None | has_subsequence | HGNC | None | 586 |
| CHR | None | has_subsequence | ENSEMBL | SO:0000100 | 18 |
| CHR | None | has_subsequence | HGNC | SO:0000001 | 46 |
| CHR | None | has_subsequence | HGNC | SO:0000100 | 198 |
| CHR | None | has_subsequence | HGNC | SO:0001272 | 770 |
| CHR | None | has_subsequence | HGNC | SO:0001411 | 234 |
| CHR | None | has_subsequence | NCBIGene | GENO:0000418 | 70 |
| CHR | None | has_subsequence | ENSEMBL | SO:0000460 | 396 |
| CHR | None | has_subsequence | ENSEMBL | SO:0002099 | 68 |
| CHR | None | has_subsequence | HGNC | SO:0001500 | 374 |
| CHR | None | has_subsequence | HGNC | SO:0000460 | 16 |
| CHR | None | has_subsequence | NCBIGene | SO:0000460 | 2 |
| CHR | None | has_subsequence | ENSEMBL | SO:0000655 | 4 |
| CHR | None | has_subsequence | ENSEMBL | SO:0001268 | 66 |
| CHR | None | has_subsequence | HGNC | SO:0001268 | 6 |
| CHR | None | has_subsequence | HGNC | phenotype | 18 |
| CHR | None | has_subsequence | ENSEMBL | SO:0000404 | 8 |
| NCBIGene | gene | causes condition | OMIM | None | 1871 |
| NCBIGene | gene | RO:0002607 | Orphanet | None | 128 |
| CHR | None | has_subsequence | ENSEMBL | disease | 24 |
| CHR | None | has_subsequence | ENSEMBL | SO:0001500 | 30 |
| NCBIGene | gene | RO:0003304 | OMIM | None | 203 |
| NCBIGene | gene | RO:0002326 | OMIM | None | 220 |
| NCBIGene | gene | RO:0002607 | OMIM | None | 102 |
| NCBIGene | gene | causes condition | NCBIGene | gene | 192 |
| NCBIGene | gene | RO:0002326 | OMIM | phenotype | 764 |
| NCBIGene | gene | RO:0003304 | Orphanet | None | 398 |
| NCBIGene | gene | RO:0003304 | OMIM | phenotype | 225 |
| CHR | None | has_subsequence | HGNC | sequence feature | 104 |
| CHR | None | has_subsequence | NCBIGene | SO:0001877 | 18 |
| CHR | None | has_subsequence | NCBIGene | SO:0001268 | 2 |
| NCBIGene | gene | RO:0002607 | OMIM | phenotype | 256 |
| ENSEMBL | SO:0001267 | RO:0003304 | Orphanet | None | 6 |
| ENSEMBL | SO:0001263 | RO:0003304 | Orphanet | None | 9 |
| NCBIGene | GENO:0000418 | RO:0002326 | OMIM | phenotype | 20 |
| NCBIGene | GENO:0000418 | causes condition | NCBIGene | GENO:0000418 | 11 |
| NCBIGene | GENO:0000418 | causes condition | NCBIGene | gene | 11 |
| NCBIGene | gene | causes condition | NCBIGene | GENO:0000418 | 11 |
| CHR | None | has_subsequence | NCBIGene | SO:0001263 | 8 |
| CHR | None | has_subsequence | NCBIGene | phenotype | 6 |
| CHR | None | has_subsequence | NCBIGene | disease | 10 |
| CHR | None | has_subsequence | HGNC | SO:0002122 | 28 |
| CHR | None | has_subsequence | NCBIGene | SO:0002122 | 12 |
| NCBIGene | gene | RO:0002326 | NCBIGene | gene | 57 |
| CHR | None | has_subsequence | NCBIGene | SO:0000001 | 2 |
| CHR | None | has_subsequence | ENSEMBL | SO:0001263 | 12 |
| CHR | None | has_subsequence | HGNC | SO:0000883 | 6 |
| CHR | None | has_subsequence | NCBIGene | SO:0001500 | 4 |
| NCBIGene | GENO:0000418 | RO:0002607 | NCBIGene | GENO:0000418 | 8 |
| NCBIGene | GENO:0000418 | RO:0002607 | NCBIGene | gene | 8 |
| NCBIGene | gene | RO:0002607 | NCBIGene | GENO:0000418 | 8 |
| NCBIGene | gene | RO:0002607 | NCBIGene | gene | 22 |
| CHR | None | has_subsequence | ENSEMBL | SO:0000651 | 34 |
| HGNC | None | causes condition | OMIM | phenotype | 187 |
| CHR | None | has_subsequence | ENSEMBL | SO:0001266 | 6 |
| NCBIGene | gene | RO:0002326 | NCBIGene | phenotype | 11 |
| NCBIGene | gene | RO:0002326 | NCBIGene | disease | 10 |
| CHR | None | has_subsequence | HGNC | SO:0000655 | 64 |
| CHR | None | has_subsequence | HGNC | disease | 16 |
| CHR | None | has_subsequence | ENSEMBL | SO:0001272 | 4 |
| CHR | None | has_subsequence | NCBIGene | SO:0000336 | 2 |
| CHR | None | has_subsequence | HGNC | SO:0002099 | 4 |
| NCBIGene | gene | RO:0002607 | HGNC | phenotype | 2 |
| NCBIGene | gene | RO:0002607 | HGNC | SO:0001500 | 2 |
| NCBIGene | gene | causes condition | OMIM | disease | 5 |
| NCBIGene | GENO:0000418 | causes condition | OMIM | phenotype | 11 |
| NCBIGene | GENO:0000418 | RO:0003304 | OMIM | None | 3 |
| HGNC | None | causes condition | OMIM | None | 33 |
| ENSEMBL | SO:0000336 | has phenotype | ENSEMBL | SO:0000336 | 12 |
| ENSEMBL | SO:0000336 | has phenotype | ENSEMBL | disease | 12 |
| ENSEMBL | SO:0000336 | has phenotype | ENSEMBL | SO:0001500 | 12 |
| ENSEMBL | disease | has phenotype | ENSEMBL | SO:0000336 | 12 |
| ENSEMBL | disease | has phenotype | ENSEMBL | disease | 12 |
| ENSEMBL | disease | has phenotype | ENSEMBL | SO:0001500 | 12 |
| ENSEMBL | SO:0001500 | has phenotype | ENSEMBL | SO:0000336 | 12 |
| ENSEMBL | SO:0001500 | has phenotype | ENSEMBL | disease | 12 |
| ENSEMBL | SO:0001500 | has phenotype | ENSEMBL | SO:0001500 | 12 |
| ENSEMBL | SO:0000336 | RO:0002607 | ENSEMBL | SO:0000336 | 1 |
| ENSEMBL | SO:0000336 | RO:0002607 | ENSEMBL | disease | 1 |
| ENSEMBL | SO:0000336 | RO:0002607 | ENSEMBL | SO:0001500 | 1 |
| ENSEMBL | disease | RO:0002607 | ENSEMBL | SO:0000336 | 1 |
| ENSEMBL | disease | RO:0002607 | ENSEMBL | disease | 1 |
| ENSEMBL | disease | RO:0002607 | ENSEMBL | SO:0001500 | 1 |
| ENSEMBL | SO:0001500 | RO:0002607 | ENSEMBL | SO:0000336 | 1 |
| ENSEMBL | SO:0001500 | RO:0002607 | ENSEMBL | disease | 1 |
| ENSEMBL | SO:0001500 | RO:0002607 | ENSEMBL | SO:0001500 | 1 |
| ENSEMBL | SO:0000336 | has phenotype | HP | None | 1 |
| ENSEMBL | disease | has phenotype | HP | None | 1 |
| ENSEMBL | SO:0001500 | has phenotype | HP | None | 1 |
| CHR | None | has_subsequence | NCBIGene | SO:0001265 | 8 |
| CHR | None | has_subsequence | HGNC | SO:0000405 | 2 |
| CHR | None | has_subsequence | ENSEMBL | SO:0000405 | 6 |
| NCBIGene | SO:0001877 | RO:0002326 | OMIM | phenotype | 1 |
| CHR | None | has_subsequence | NCBIGene | SO:0001267 | 2 |
| CHR | None | has_subsequence | HGNC | SO:0000651 | 66 |
| NCBIGene | gene | causes condition | NCBIGene | phenotype | 91 |
| NCBIGene | gene | causes condition | NCBIGene | disease | 93 |
| NCBIGene | GENO:0000418 | RO:0002607 | OMIM | None | 4 |
| NCBIGene | GENO:0000418 | RO:0002607 | OMIM | phenotype | 8 |
| NCBIGene | GENO:0000418 | RO:0003304 | OMIM | phenotype | 2 |
| NCBIGene | GENO:0000418 | RO:0002326 | NCBIGene | GENO:0000418 | 11 |
| NCBIGene | GENO:0000418 | RO:0002326 | NCBIGene | gene | 11 |
| NCBIGene | gene | RO:0002326 | NCBIGene | GENO:0000418 | 11 |
| CHR | None | has_subsequence | NCBIGene | SO:0000655 | 4 |
| HGNC | None | RO:0002326 | OMIM | phenotype | 62 |
| NCBIGene | SO:0001267 | causes condition | OMIM | phenotype | 1 |
| NCBIGene | SO:0001265 | causes condition | OMIM | None | 1 |
| NCBIGene | GENO:0000418 | has phenotype | HP | None | 21 |
| NCBIGene | GENO:0000418 | has disposition | HP | None | 5 |
| NCBIGene | GENO:0000418 | RO:0002607 | Orphanet | None | 7 |
| NCBIGene | GENO:0000418 | RO:0003304 | Orphanet | None | 4 |
| ENSEMBL | SO:0000651 | RO:0002607 | Orphanet | None | 1 |
| ENSEMBL | SO:0001263 | RO:0002607 | Orphanet | None | 9 |
| CHR | None | has_subsequence | HGNC | SO:0000946 | 8 |
| NCBIGene | SO:0001877 | causes condition | OMIM | None | 4 |
| ENSEMBL | SO:0000336 | has phenotype | OMIM | disease | 3 |
| ENSEMBL | SO:0001500 | has phenotype | OMIM | disease | 3 |
| NCBIGene | GENO:0000418 | causes condition | OMIM | None | 18 |
| HGNC | None | has disposition | HP | None | 7 |
| HGNC | None | has phenotype | HP | None | 8 |
| HGNC | sequence feature | RO:0002607 | OMIM | phenotype | 1 |
| NCBIGene | SO:0002122 | has disposition | HP | None | 1 |
| NCBIGene | SO:0002122 | RO:0003304 | OMIM | phenotype | 1 |
| CHR | None | has_subsequence | HGNC | gene | 2 |
| NCBIGene | SO:0001263 | causes condition | OMIM | None | 2 |
| NCBIGene | SO:0001877 | RO:0003304 | Orphanet | None | 5 |
| NCBIGene | SO:0001263 | RO:0003304 | Orphanet | None | 5 |
| NCBIGene | SO:0000336 | RO:0002607 | OMIM | phenotype | 1 |
| NCBIGene | SO:0001877 | RO:0003304 | OMIM | phenotype | 3 |
| NCBIGene | SO:0001263 | RO:0003304 | OMIM | phenotype | 4 |
| HGNC | None | RO:0002326 | OMIM | None | 5 |
| HGNC | None | causes condition | NCBIGene | phenotype | 11 |
| HGNC | None | causes condition | NCBIGene | disease | 10 |
| HGNC | None | causes condition | NCBIGene | gene | 11 |
| NCBIGene | SO:0001500 | has phenotype | OMIM | disease | 28 |
| NCBIGene | gene | has phenotype | OMIM | disease | 1 |
| HGNC | disease | has phenotype | HGNC | disease | 8 |
| HGNC | disease | has phenotype | HGNC | SO:0001500 | 8 |
| HGNC | SO:0001500 | has phenotype | HGNC | disease | 8 |
| HGNC | SO:0001500 | has phenotype | HGNC | SO:0001500 | 8 |
| NCBIGene | SO:0001877 | causes condition | OMIM | phenotype | 3 |
| NCBIGene | SO:0001263 | causes condition | OMIM | phenotype | 2 |
| ENSEMBL | SO:0001877 | RO:0003304 | Orphanet | None | 3 |
| ENSEMBL | SO:0001272 | RO:0002607 | Orphanet | None | 8 |
| HGNC | phenotype | RO:0002607 | HGNC | phenotype | 6 |
| ENSEMBL | SO:0001877 | RO:0003304 | OMIM | None | 2 |
| ENSEMBL | SO:0001263 | RO:0003304 | OMIM | None | 2 |
| NCBIGene | disease | has phenotype | NCBIGene | disease | 2 |
| NCBIGene | disease | has phenotype | NCBIGene | gene | 2 |
| NCBIGene | gene | has phenotype | NCBIGene | disease | 2 |
| NCBIGene | gene | has phenotype | NCBIGene | gene | 2 |
| ENSEMBL | SO:0000655 | RO:0003304 | Orphanet | None | 3 |
| HGNC | phenotype | RO:0002326 | HGNC | phenotype | 1 |
| NCBIGene | phenotype | has disposition | HP | None | 13 |
| NCBIGene | disease | has disposition | HP | None | 10 |
| NCBIGene | phenotype | has phenotype | HP | None | 154 |
| NCBIGene | disease | has phenotype | HP | None | 144 |
| NCBIGene | phenotype | RO:0002326 | NCBIGene | phenotype | 2 |
| NCBIGene | phenotype | RO:0002326 | NCBIGene | disease | 1 |
| NCBIGene | phenotype | RO:0002326 | NCBIGene | gene | 2 |
| NCBIGene | disease | RO:0002326 | NCBIGene | phenotype | 1 |
| NCBIGene | disease | RO:0002326 | NCBIGene | disease | 1 |
| NCBIGene | disease | RO:0002326 | NCBIGene | gene | 1 |
| NCBIGene | phenotype | RO:0002326 | OMIM | None | 1 |
| NCBIGene | disease | RO:0002326 | OMIM | None | 1 |
| HGNC | phenotype | has phenotype | HP | None | 9 |
| HGNC | SO:0001500 | has phenotype | HP | None | 11 |
| HGNC | phenotype | has disposition | HP | None | 2 |
| HGNC | SO:0001500 | has disposition | HP | None | 3 |
| HGNC | phenotype | causes condition | HGNC | phenotype | 1 |
| HGNC | phenotype | causes condition | HGNC | SO:0001500 | 1 |
| HGNC | SO:0001500 | causes condition | HGNC | phenotype | 1 |
| HGNC | SO:0001500 | causes condition | HGNC | SO:0001500 | 1 |
| NCBIGene | SO:0001500 | RO:0002326 | NCBIGene | SO:0001500 | 1 |
| NCBIGene | SO:0001500 | RO:0002326 | NCBIGene | gene | 1 |
| NCBIGene | gene | RO:0002326 | NCBIGene | SO:0001500 | 1 |
| NCBIGene | SO:0001265 | RO:0002326 | OMIM | None | 1 |
| NCBIGene | disease | causes condition | NCBIGene | disease | 2 |
| NCBIGene | disease | causes condition | NCBIGene | gene | 2 |
| NCBIGene | disease | causes condition | OMIM | None | 1 |
| NCBIGene | disease | causes condition | OMIM | phenotype | 4 |
| NCBIGene | SO:0001265 | causes condition | OMIM | phenotype | 1 |
| NCBIGene | SO:0000460 | causes condition | OMIM | None | 1 |
| NCBIGene | gene | RO:0002326 | OMIM | disease | 5 |
| NCBIGene | phenotype | RO:0002326 | OMIM | phenotype | 2 |
| NCBIGene | SO:0000655 | RO:0003304 | OMIM | phenotype | 2 |
| NCBIGene | SO:0000655 | causes condition | OMIM | phenotype | 3 |
| NCBIGene | gene | RO:0002607 | NCBIGene | phenotype | 2 |
| NCBIGene | gene | RO:0002607 | NCBIGene | disease | 2 |
| HGNC | disease | RO:0002607 | HGNC | disease | 1 |
| HGNC | disease | RO:0002607 | HGNC | SO:0001500 | 1 |
| HGNC | SO:0001500 | RO:0002607 | HGNC | disease | 1 |
| HGNC | SO:0001500 | RO:0002607 | HGNC | SO:0001500 | 1 |
| HGNC | disease | has phenotype | HP | None | 2 |
| NCBIGene | SO:0001877 | RO:0003304 | OMIM | None | 1 |
| HGNC | SO:0001500 | has phenotype | OMIM | disease | 4 |
| NCBIGene | phenotype | causes condition | OMIM | phenotype | 1 |
| NCBIGene | gene | causes condition | NCBIGene | SO:0000001 | 1 |
| NCBIGene | SO:0000001 | causes condition | NCBIGene | gene | 1 |
| NCBIGene | SO:0000001 | causes condition | NCBIGene | SO:0000001 | 1 |
| NCBIGene | SO:0000001 | has phenotype | HP | None | 3 |
| NCBIGene | SO:0000001 | has disposition | HP | None | 1 |
| NCBIGene | gene | causes condition | HGNC | phenotype | 3 |
| NCBIGene | gene | causes condition | HGNC | SO:0001500 | 3 |
| NCBIGene | SO:0002122 | RO:0002607 | OMIM | phenotype | 1 |
| NCBIGene | SO:0002122 | causes condition | NCBIGene | SO:0002122 | 1 |
| NCBIGene | SO:0002122 | causes condition | NCBIGene | gene | 1 |
| NCBIGene | gene | causes condition | NCBIGene | SO:0002122 | 1 |
| NCBIGene | SO:0001268 | causes condition | OMIM | None | 2 |
| ENSEMBL | SO:0000336 | causes condition | OMIM | disease | 1 |
| ENSEMBL | SO:0001500 | causes condition | OMIM | disease | 1 |
| NCBIGene | SO:0002122 | causes condition | OMIM | phenotype | 1 |
| HGNC | disease | has disposition | HP | None | 1 |
| NCBIGene | gene | RO:0002607 | OMIM | disease | 2 |
| HGNC | None | RO:0002607 | OMIM | phenotype | 1 |
| NCBIGene | SO:0002122 | RO:0003304 | Orphanet | None | 1 |
| HGNC | None | RO:0002326 | NCBIGene | phenotype | 2 |
| HGNC | None | RO:0002326 | NCBIGene | disease | 2 |
| HGNC | None | RO:0002326 | NCBIGene | gene | 2 |
| NCBIGene | SO:0001265 | RO:0002607 | OMIM | phenotype | 1 |
| HGNC | SO:0001500 | causes condition | OMIM | None | 1 |
| HGNC | SO:0001500 | has phenotype | OMIM | None | 1 |
| NCBIGene | GENO:0000418 | RO:0002326 | OMIM | None | 1 |
| ENSEMBL | SO:0001272 | RO:0002607 | OMIM | None | 1 |
| ENSEMBL | SO:0001263 | RO:0002607 | OMIM | None | 1 |
| HGNC | sequence feature | causes condition | OMIM | phenotype | 1 |
| HGNC | sequence feature | RO:0003304 | OMIM | phenotype | 1 |
| NCBIGene | SO:0002122 | causes condition | OMIM | None | 1 |
| ClinVarVariant | sequence feature | GENO:0000840 | OMIM | phenotype | 43408 |
| OMIM | phenotype | has phenotype | HP | None | 28530 |
| OMIM | phenotype | has disposition | HP | None | 2313 |
| ClinVarVariant | sequence feature | GENO:0000844 | MedGen | None | 95210 |
| ClinVarVariant | sequence feature | GENO:0000843 | MedGen | None | 58763 |
| ClinVarVariant | sequence feature | GENO:0000845 | MedGen | None | 149396 |
| ClinVarVariant | sequence feature | GENO:0000845 | OMIM | phenotype | 47238 |
| ClinVarVariant | sequence feature | GENO:0000845 | OMIM | None | 68132 |
| ClinVarVariant | sequence feature | GENO:0000844 | OMIM | phenotype | 22866 |
| ClinVarVariant | sequence feature | GENO:0000840 | MedGen | None | 31021 |
| ClinVarVariant | sequence feature | GENO:0000841 | MedGen | None | 17799 |
| ClinVarVariant | sequence feature | GENO:0000841 | OMIM | None | 11311 |
| ClinVarVariant | sequence feature | GENO:0000843 | OMIM | None | 15333 |
| OMIM | None | has phenotype | HP | None | 61275 |
| OMIM | None | has disposition | HP | None | 5153 |
| ClinVarVariant | sequence feature | GENO:0000844 | OMIM | None | 32770 |
| ClinVarVariant | sequence feature | GENO:0000843 | OMIM | phenotype | 14245 |
| ClinVarVariant | sequence feature | GENO:0000840 | OMIM | None | 31012 |
| ClinVarVariant | sequence feature | GENO:0000840 | Orphanet | None | 662 |
| ClinVarVariant | sequence feature | GENO:0000840 | NCBIGene | phenotype | 882 |
| ClinVarVariant | sequence feature | GENO:0000840 | NCBIGene | disease | 880 |
| ClinVarVariant | sequence feature | GENO:0000840 | NCBIGene | gene | 1002 |
| ClinVarVariant | sequence feature | GENO:0000841 | OMIM | phenotype | 11631 |
| ClinVarVariant | sequence feature | GENO:0000845 | Orphanet | None | 3335 |
| Orphanet | None | has phenotype | HP | None | 24675 |
| ClinVarVariant | GENO:0000847 | GENO:0000840 | OMIM | None | 187 |
| ClinVarVariant | sequence feature | GENO:0000841 | Orphanet | None | 467 |
| ClinVarVariant | sequence feature | GENO:0000843 | Orphanet | None | 381 |
| ClinVarVariant | GENO:0000871 | GENO:0000845 | MedGen | None | 10 |
| ClinVarVariant | sequence feature | GENO:0000845 | NCBIGene | gene | 466 |
| ClinVarVariant | sequence feature | GENO:0000844 | Orphanet | None | 1253 |
| ClinVarVariant | GENO:0000848 | GENO:0000840 | OMIM | phenotype | 53 |
| ClinVarVariant | sequence feature | GENO:0000845 | NCBIGene | phenotype | 262 |
| ClinVarVariant | sequence feature | GENO:0000845 | NCBIGene | disease | 262 |
| ClinVarVariant | sequence feature | http://monarchinitiative.orghas_drug_response | MedGen | None | 31 |
| OMIM | phenotype | RO:0002326 | OMIM | phenotype | 25 |
| ClinVarVariant | sequence feature | GENO:0000841 | NCBIGene | phenotype | 128 |
| ClinVarVariant | sequence feature | GENO:0000841 | NCBIGene | disease | 128 |
| ClinVarVariant | sequence feature | GENO:0000841 | NCBIGene | gene | 151 |
| ClinVarVariant | sequence feature | GENO:0000844 | NCBIGene | phenotype | 107 |
| ClinVarVariant | sequence feature | GENO:0000844 | NCBIGene | disease | 107 |
| ClinVarVariant | sequence feature | GENO:0000844 | NCBIGene | gene | 230 |
| ClinVarVariant | GENO:0000871 | GENO:0000840 | OMIM | None | 115 |
| ClinVarVariant | GENO:0000848 | GENO:0000845 | OMIM | phenotype | 19 |
| ClinVarVariant | None | GENO:0000840 | MedGen | None | 19 |
| ClinVarVariant | sequence feature | http://monarchinitiative.orghas_drug_response | OMIM | phenotype | 12 |
| ClinVarVariant | GENO:0000871 | GENO:0000840 | OMIM | phenotype | 100 |
| ClinVarVariant | sequence feature | GENO:0000843 | NCBIGene | phenotype | 74 |
| ClinVarVariant | sequence feature | GENO:0000843 | NCBIGene | disease | 74 |
| ClinVarVariant | sequence feature | GENO:0000843 | NCBIGene | gene | 88 |
| ClinVarVariant | None | GENO:0000840 | OMIM | phenotype | 48 |
| ClinVarVariant | GENO:0000847 | GENO:0000840 | OMIM | phenotype | 18 |
| ClinVarVariant | GENO:0000847 | GENO:0000845 | MedGen | None | 6 |
| ClinVarVariant | GENO:0000847 | GENO:0000840 | MedGen | None | 12 |
| ClinVarVariant | GENO:0000847 | GENO:0000844 | OMIM | None | 2 |
| ClinVarVariant | GENO:0000848 | GENO:0000840 | OMIM | None | 7 |
| ClinVarVariant | sequence feature | GENO:0000840 | OMIM | HP:0031859 | 1 |
| ClinVarVariant | GENO:0000847 | GENO:0000843 | MedGen | None | 7 |
| ClinVarVariant | None | GENO:0000840 | OMIM | None | 36 |
| ClinVarVariant | None | GENO:0000841 | OMIM | phenotype | 37 |
| ClinVarVariant | sequence feature | GENO:0000845 | OMIM | disease | 2 |
| ClinVarVariant | GENO:0000871 | GENO:0000841 | OMIM | phenotype | 21 |
| ClinVarVariant | GENO:0000871 | GENO:0000841 | OMIM | None | 9 |
| ClinVarVariant | GENO:0000871 | GENO:0000840 | MedGen | None | 21 |
| ClinVarVariant | GENO:0000871 | GENO:0000845 | OMIM | None | 6 |
| ClinVarVariant | sequence feature | GENO:0000843 | HGNC | phenotype | 1 |
| ClinVarVariant | sequence feature | GENO:0000843 | HGNC | SO:0001500 | 1 |
| ClinVarVariant | sequence feature | GENO:0000845 | HGNC | phenotype | 10 |
| ClinVarVariant | sequence feature | GENO:0000845 | HGNC | SO:0001500 | 10 |
| ClinVarVariant | GENO:0000848 | GENO:0000844 | OMIM | phenotype | 3 |
| ClinVarVariant | None | GENO:0000841 | OMIM | None | 33 |
| ClinVarVariant | None | GENO:0000841 | MedGen | None | 4 |
| ClinVarVariant | sequence feature | GENO:0000840 | HGNC | phenotype | 12 |
| ClinVarVariant | sequence feature | GENO:0000840 | HGNC | SO:0001500 | 12 |
| ClinVarVariant | GENO:0000848 | GENO:0000843 | OMIM | phenotype | 13 |
| ClinVarVariant | None | GENO:0000845 | MedGen | None | 5 |
| ClinVarVariant | GENO:0000871 | GENO:0000843 | OMIM | phenotype | 1 |
| ClinVarVariant | GENO:0000871 | GENO:0000845 | OMIM | phenotype | 5 |
| ClinVarVariant | GENO:0000847 | GENO:0000841 | MedGen | None | 5 |
| ClinVarVariant | sequence feature | GENO:0000840 | OMIM | disease | 2 |
| ClinVarVariant | GENO:0000847 | GENO:0000843 | OMIM | None | 8 |
| OMIM | disease | has phenotype | HP | None | 150 |
| OMIM | disease | has disposition | HP | None | 24 |
| ClinVarVariant | GENO:0000871 | GENO:0000844 | OMIM | phenotype | 1 |
| ClinVarVariant | GENO:0000871 | GENO:0000840 | NCBIGene | phenotype | 5 |
| ClinVarVariant | GENO:0000871 | GENO:0000840 | NCBIGene | disease | 5 |
| ClinVarVariant | GENO:0000871 | GENO:0000840 | NCBIGene | gene | 5 |
| ClinVarVariant | None | GENO:0000845 | Orphanet | None | 1 |
| ClinVarVariant | GENO:0000871 | GENO:0000845 | NCBIGene | phenotype | 1 |
| ClinVarVariant | GENO:0000871 | GENO:0000845 | NCBIGene | disease | 1 |
| ClinVarVariant | GENO:0000871 | GENO:0000845 | NCBIGene | gene | 1 |
| ClinVarVariant | None | GENO:0000840 | NCBIGene | phenotype | 2 |
| ClinVarVariant | None | GENO:0000840 | NCBIGene | disease | 2 |
| ClinVarVariant | None | GENO:0000840 | NCBIGene | gene | 2 |
| ClinVarVariant | GENO:0000847 | GENO:0000844 | MedGen | None | 1 |
| ClinVarVariant | GENO:0000847 | GENO:0000843 | OMIM | phenotype | 4 |
| ClinVarVariant | GENO:0000871 | GENO:0000845 | Orphanet | None | 1 |
| ClinVarVariant | GENO:0000847 | GENO:0000845 | OMIM | None | 4 |
| ClinVarVariant | None | GENO:0000845 | OMIM | None | 3 |
| ClinVarVariant | sequence feature | GENO:0000841 | OMIM | HP:0031859 | 1 |
| ClinVarVariant | GENO:0000847 | GENO:0000841 | OMIM | phenotype | 5 |
| ClinVarVariant | sequence feature | related to | OMIM | phenotype | 2 |
| ClinVarVariant | sequence feature | GENO:0000841 | HGNC | phenotype | 1 |
| ClinVarVariant | sequence feature | GENO:0000841 | HGNC | SO:0001500 | 1 |
| ClinVarVariant | GENO:0000847 | GENO:0000845 | OMIM | phenotype | 2 |
| ClinVarVariant | None | GENO:0000845 | OMIM | phenotype | 3 |
| ClinVarVariant | GENO:0000848 | GENO:0000841 | OMIM | phenotype | 2 |
| ClinVarVariant | None | GENO:0000844 | OMIM | None | 3 |
| ClinVarVariant | GENO:0000871 | GENO:0000841 | MedGen | None | 3 |
| ClinVarVariant | sequence feature | http://monarchinitiative.orghas_drug_response | OMIM | None | 6 |
| ClinVarVariant | None | GENO:0000844 | OMIM | phenotype | 1 |
| ClinVarVariant | None | GENO:0000840 | Orphanet | None | 1 |
| ClinVarVariant | GENO:0000871 | GENO:0000844 | MedGen | None | 1 |
| ClinVarVariant | GENO:0000871 | http://monarchinitiative.orghas_drug_response | OMIM | phenotype | 1 |
| ClinVarVariant | GENO:0000847 | GENO:0000841 | OMIM | None | 1 |
| NCBIGene | None | RO:0002326 | OMIM | phenotype | 279 |
| NCBIGene | None | causes condition | OMIM | phenotype | 264 |
| BNODE | SO:0001500 | causes condition | OMIM | None | 82 |
| NCBIGene | None | RO:0002607 | OMIM | phenotype | 52 |
| BNODE | SO:0001500 | causes condition | OMIM | phenotype | 75 |
| OMIM | phenotype | causes condition | OMIM | phenotype | 5 |
| NCBIGene | SO:0001500 | RO:0002326 | OMIM | disease | 8 |
| BNODE | SO:0001500 | RO:0002326 | OMIM | phenotype | 20 |
| BNODE | SO:0001500 | RO:0002607 | OMIM | disease | 2 |
| NCBIGene | SO:0001500 | causes condition | OMIM | disease | 3 |
| NCBIGene | None | causes condition | OMIM | None | 37 |
| BNODE | SO:0001500 | RO:0002326 | OMIM | disease | 1 |
| NCBIGene | SO:0001500 | has phenotype | OMIM | None | 2 |
| NCBIGene | SO:0001500 | RO:0002326 | OMIM | None | 1 |
| BNODE | SO:0001500 | RO:0002326 | OMIM | None | 5 |
| NCBIGene | None | RO:0002326 | OMIM | None | 4 |
| BNODE | None | causes condition | OMIM | HP:0031859 | 2 |
| BNODE | SO:0001500 | causes condition | OMIM | disease | 2 |
| NCBIGene | SO:0001500 | causes condition | OMIM | None | 1 |
| BNODE | None | causes condition | OMIM | disease | 2 |
| BNODE | None | causes condition | NCBIGene | gene | 1 |
| NCBIGene | None | causes condition | HGNC | phenotype | 1 |
| NCBIGene | None | causes condition | HGNC | SO:0001500 | 1 |
| BNODE | SO:0001500 | RO:0002607 | OMIM | None | 1 |
| BNODE | None | RO:0002326 | OMIM | gene | 1 |
| NCBIGene | None | causes condition | NCBIGene | phenotype | 1 |
| NCBIGene | None | causes condition | NCBIGene | disease | 1 |
| NCBIGene | None | causes condition | NCBIGene | gene | 1 |
| NCBIGene | SO:0001500 | has phenotype | OMIM | phenotype | 1 |
| NCBIGene | SO:0001500 | causes condition | OMIM | phenotype | 1 |
| DOID | None | has phenotype | HP | None | 95540 |
| MESH | None | has phenotype | HP | None | 36961 |
| DECIPHER | None | has phenotype | HP | None | 284 |
| DECIPHER | None | has disposition | HP | None | 1 |
| BNODE | sequence variant | causes condition | OMIM | phenotype | 3048 |
| BNODE | sequence variant | causes condition | Orphanet | None | 562 |
| BNODE | sequence variant | RO:0003304 | OMIM | None | 14 |
| BNODE | sequence variant | causes condition | OMIM | None | 2236 |
| BNODE | sequence variant | causes condition | NCBIGene | phenotype | 137 |
| BNODE | sequence variant | causes condition | NCBIGene | disease | 136 |
| BNODE | sequence variant | causes condition | NCBIGene | gene | 137 |
| BNODE | sequence variant | RO:0003304 | OMIM | phenotype | 22 |
| BNODE | sequence variant | causes condition | HGNC | phenotype | 11 |
| BNODE | sequence variant | causes condition | HGNC | SO:0001500 | 11 |
| BNODE | sequence variant | RO:0003304 | Orphanet | None | 2 |
+----------------+------------------+-----------------------------------------------+---------------+------------------+-----------+
Hopefully this is not too unwieldy to read through. Let me know if the summaries should be more compact.
from kgx.
Closing this issue since we now have a successful workflow for building a subset of Monarch Knowledge Graph
from kgx.
Related Issues (20)
- add documentation to the kgx merge doc that shows when to use cat-merge as well as a set of steps to use merge config
- Dynamic configuration of Biolink Model semantics in KGX validation
- Knowledge Graph Exchange (KGE) Archive Download
- neo_sink.py constraints not created in neo4j v5+
- Do the KGX serialization formats support representation of Biolink qualifiers? HOT 2
- Error occurs when specifying a version starting with "v" in get_biolink_model_schema method HOT 1
- remove pinning of docutils==0.16.0 so that downstream users can also use sphinx
- Option to add un-prefixed labels in neo4j labels
- Automate mapping of infores in primary_knowledge_source HOT 1
- Fetching biolink-model for validate is confused by `v` in biolink version HOT 1
- run kgx validate from inside the project - need scripts defined in pyproject.toml HOT 2
- kgx validate `-o` option writes an empty file, output goes to stderr rather than stdout
- log_error in validation scales poorly HOT 1
- add SQL sink
- When translating kgx tsvs to ntriples, type assertions are wrong
- update KGX to optionally provide "retrieval_source_ids" attribute to collect information retrieval properties
- Relax KGX's `pandas` version dependency.
- KGX merge overwrites "provided_by" property
- Obsolete/Deprecated property is not populated when converting from obojson HOT 15
- Undocumented requirements for input nodes/edges file names?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kgx.