cu-dbmi / rtx-kg2-gateway Goto Github PK

Enabling RTX-KG2 data access through various means.

License: BSD 3-Clause "New" or "Revised" License

Jupyter Notebook 98.44% Python 1.56%

rtx-kg2-gateway's Issues

Check Kuzu ingest for LIST type entity attributes

In making further queries of the Kuzu database I noticed there might be a discrepancy with multi-value LIST attributes of certain entities (mostly noticed with NODE entities). This issue highlights a need to double check these values and make any necessary adjustments to ensure these are queryable as needed.

Enhance project with reusable Python package(s) for related data integrations

My hope is to generalize some of the functionality for potential reuse with property graphs and Kuzu (at least in this context).

There's what seems like an opportunity to propose multi-dimensional property graph structures within Parquet as a strongly typed data storage alternative to JSON or TSV that may come with performance benefits. I felt the metadata storage components of Parquet were especially well-suited to shared schema and provenance understandings (along with default data citation within the files themselves).

It's likely we could also share a Neo4J-compatible version of the data for those who may prefer it over embedded approaches.

Originally posted by @d33bs in #1 (comment)

Elaborate on example cypher and related content from RTX-KG2 Kuzu database

Good that you show how to start Jupyter Lab! You might consider adding a short tutorial where you query the data, e.g. drawing their attention to a particular notebook where they can start trying queries and any setup they might need to do before running their first query. In the tutorial, you might even show a sample query, its result, and how you can do things with the result (e.g., drawing a graph of the resulting nodes).

Not necessary IMO, but you might consider giving a simple high-level overview of what Kuzu is doing, e.g. that it's creating an in-memory database on which to perform Cypher queries on the RTX-KG2 graph.

... I'd suggest having one or a few notebooks showing how to do that in detail, including getting into the schema of the dataset in the notebook. I see that you have a notebook called "example_cypher_kuzu" that shows an example query; perhaps that one could be extended to describe the data, etc. and show useful things you can do with Kuzu on the dataset?

Originally posted by @falquaddoomi in #1 (comment)

Use `ORDER BY` with `LIMIT` and `OFFSET` SQL queries

When using SQL LIMIT and OFFSET one must use ORDER BY to ensure deterministic results. This issue pertains to the use of DuckDB for extracting row-chunks of node and edge data for ingest into a Kuzu database and adding ORDER BY to ensure all results are extracted properly.

cu-dbmi / rtx-kg2-gateway Goto Github PK

rtx-kg2-gateway's Issues

Check Kuzu ingest for LIST type entity attributes

Enhance project with reusable Python package(s) for related data integrations

Elaborate on example cypher and related content from RTX-KG2 Kuzu database

Use `ORDER BY` with `LIMIT` and `OFFSET` SQL queries

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs