idaholab / deep-lynx Goto Github PK

Deep Lynx is a unique data warehouse where users can provide a custom ontology and have their data stored under said ontology in a graph-like format. Deep Lynx is written in Node.js and Rust and is actively maintained.

License: MIT License

Dockerfile 0.07% PLpgSQL 2.66% TypeScript 72.65% PowerShell 0.03% Shell 0.08% Handlebars 0.60% CSS 0.12% JavaScript 0.46% HTML 0.03% Vue 21.87% SCSS 0.07% Rust 1.37%

deep-lynx's People

Contributors

Stargazers

Watchers

Forkers

brennanxyz nwoodr94 jimmy-inl rossdhays keshava pinndulum jatinnimawat abhishekdutt-blr codyzkares chaione-viralshah ssghost

deep-lynx's Issues

Update Container with Ontology: Metatype/Relationship Deletion Warning

Reason

When a container is updated via an ontology file, the user should see a warning for any metatypes, metatype relationships, or relationships pairs that have associated node/edge data with them. This will avoid the unintentional removal of data associated with one of these container elements.

Design

A container update via ontology import should remove these container elements if no data is associated with them. Otherwise the container update should fail and notify the user of the issue. This scenario could occur when a user deletes a class or relationship from an ontology and attempts to update an associated container in Deep Lynx.

Impact

This enhancement affects the container update via import process as well as the associated GUI.

Deleting Import should give option to delete data from import

Reason

The user should be able to easily delete all inserted nodes/edges from any given import.

Design

When a user attempts to delete an import via the UI a prompt should allow them to choose whether or not the deletion of the import should affect already inserted data. If they choose yes, all inserted data from that import should be deleted from the database.

The API endpoint should have a flag on it for deleting data along with import. If flag is set - delete data with import, if flag is not set - delete only the import and leave inserted data untouched.

Impact

The largest impact this has is that there will now be the chance for orphaned data - data not tied to any given import.

Processed Imports not showing % processed

Bug Description

The import's listing page is not showing the process completed correctly - especially for completed imports.

Steps to Reproduce

Create an import. Create type mappings/transformations enough to get it processed. Return to import screen - import should read completed but processing % should still read 0.

Impact

Users will be unsure of whether or not their imports processed correctly.

Checksum generation for uploaded files

Reason

The end user needs to be able to verify that their download of a file was completed successfully. This can be done by matching a file checksum generated at time of file upload.

Design

Generate and store a file checksum at time of file uploading to outside service.

Type Mappings Nested Keys (Unique Identifier/Type/Origin/Destination)

Right now type mappings can handle nested keys for PROPERTIES ONLY. The keys governing unique identifier, type, origin and destination ids, and relationship type are required to be top level keys due to the database trigger not handling those keys being nested.
In order to make this work you will need to modify the database triggers for the type mappings as well as the type mapping function itself(granted that shouldn't be a big modification)

Redis Cache Support

Reason

Need to be able to interface with a Redis cache instance.

Design

This should be fairly simple. The cacher interface already exists and an in-memory implementation has been created. Simply follow the pattern there and using this library - https://www.npmjs.com/package/redis - implement redis caching.

Impact

Should create the ability for us to run DL in a clustered environment while holding a single cache.

Deep-Lynx login failure

Bug Description

Error message on failed login is not parsed

Steps to Reproduce

Attempt to log in to Deep Lynx with a username that doesn't exist in the database

Impact

Low impact

Reimpliment Mock File Storage Provider

The Mock provider needs to be either implemented so that it uses readable streams, or notes made as to its deprecation

Clearer errors on type mismatches (io-ts)

io-ts is a powerful type and payload checking library, but the error message on encoding/decoding errors can be cryptic. We need to make error reporting on encoding and decoding with io-ts more user friendly and allow for a more in-depth explanation of the error.

Rewrite JSON Import to Streams

The JSON import endpoint for datasources (src/api/data_source_routes) needs to be rewritten to use busboy when passed a file and not an array of json objects as a POST payload. We should also be able to handle files larger than 4gb - but this will involve breaking up the import into multiple, smaller imports as we can't store more than 4gb data in a single import.

User Password Reset

Reason

We need the ability to have a user reset their own password.

Design

Create an endpoint that accepts a user, their current password, and their desired password + confirmation. If those passwords match and the current is valid, reset the user's password. If the email functionality exists at this point also integrate email into the reset password process.

Query by attribute wildcard = e.g animal = %cat%

New Relationship Pair Dialog - Dynamic Search

Reason

The search boxes for the origin and destination metatypes as well as metatype relationships needs to be dynamic based on user input. Currently it simply lists them and this will max out at 1000 records (as well as bog down the UI)

Design

Check out the type transformation dialog for examples of metatype and relationship search boxes.

Impact

Should lower memory footprint and insure all metatypes/metatype relationships are available to a user.

Documentation: Data Import

Reason

Currently we do not have documentation that informs the user how Deep Lynx processes data. We must correct that.

Documentation reflecting a valid JSON object and how Deep Lynx will parse and process it
How Edges are determined by Deep Lynx from incoming data
How update/delete/create is handled by Deep Lynx

Extendible initial boot function

The boot functionality feels very squishy and short. There is a lot more that could be done here to make the initial boot of the application more robust. Config checking and data source checking could easily be added here.

More filter operators

Reason

Currently we only support a few filter operators, such as eq, neq, like, and in. We need to create at least these few more in order to have a feature complete system.

gt greater than and gte greater than or equal to
lt less than and lte less than or equal to
notLike
notIn

Design

TBD

Impact

This should only add functionality and not affect any existing tests or functionality.

Metatype Relationship Key CRUD

Reason

Need the ability to manage a metatype relationship's keys from the UI

Impact

This shouldn't impact anything severely, only allow a user to more easily manage an ontology via the UI.

Support nested properties in GraphQL nodes and edges query

Reason

Currently the property matching when attempting to filter nodes and edges will not recognize nested json properties. This is a problem because not all data stored in Deep Lynx will consist of a single level, and we'll need to be able to query on all parts of the data, not just the top most level of the json object.

Design

TBD

Impact

This should impact any functionality or tests when added. This is a completely new feature and does not replace or subsume any existing feature.

Generating and Using API Keys instructions in wiki is outdated

Bug Description

The wiki page Generating and Using API Keys states in the section Using an API Key/Secret pair that oauth/token is a POST request. It is now a GET.

Impact

Misleading instructions for generating token

Dynamic Schema

Use `pipe()` on GremlinImpl and HttpImpl

The Gremlin and HttpImpl classes both have a function called New that needs to be refactored from using the io-ts type's is function to using the pipe() functionality instead. Look at any of the storage layers for examples on how to implement the pipe()

Import Data page entry shows NaN%

Bug Description

After deleting the one associated type mapping for a data import, the % Processed column updated from 0% to NaN%.

Steps to Reproduce

Delete all associated type mappings with a data import.

Impact

Does not prevent work from being done.

Stop JWT encrypting whole `UserT` object

Right now the program will encrypt and store the entire UserT object as part of the JWT token passed to the client. Correct this so that only a session identifier, or something like that, is included.

Email Service

Reason

We need the ability to have Deep Lynx send emails to users for many different reasons, from registration to password resetting etc. A service should be built to take advantage of SMTP servers.

Design

https://nodemailer.com - should be used to create this integration. Create an email service singleton that can be called anywhere throughout the application. Store email templates alongside the source code.

Modify Taxonomy tables to be Data Tables

Bug Description

Taxonomy tables with many entries (e.g. Metatype Relationship Pairs) are currently making a single API call to grab all of the rows in the corresponding table. This calls contain a limit, defaulting to 1000. These tables should be updated to act as data tables, grabbing rows with a limit and offset that matches the pagination of the table.

Steps to Reproduce

Having more than 1000 rows in the Metatype Relationship Pairs table will result in not all of them being visible from the Taxonomies section of the Admin GUI.

Impact

Does not prevent work from being done.

Type mappings associated with uninserted data need protected

Reason

A user must not delete a type mapping that is associated with data that hasn't been inserted as nodes/edges. Currently a user could potentially delete a type mapping for data pending insertion, and the system will not attempt to recreate that type mapping causing the system to error out.

Design

Three changes must occur.

UI must be modified to not allow a user to delete a type mapping with uninserted data
API endpoint for deleting a type mapping must not work if type mapping has uninserted data
Processing loop must be able to handle data in an import that no longer has a type mapping, able to recreate that type mapping.

Impact

Will impact all users across the board who deal with type mapping.

Make migration functionality more robust

Simple as it sounds. Throw some more error switches in there, allow for options of redoing a migration if need be.

Support data import larger than 4gb

Reason

Right now data, when imported, is stored as a json array in the data_staging table. This is stored in a single record and single record size maxes out at 4gb. Find a way to handle imports larger than 4gb.

Design

TBD

Impact

TBD

Query a node by its unique ID

Metatype Key CRUD

Reason

Need the ability to manage a metatype's keys from the UI

Impact

This shouldn't impact anything severely, only allow a user to more easily manage an ontology via the UI.

Granting user roles checks container status

Right now, if you assign a role to a user you can potentially assign roles inside a container you don't have permissions on. We need to fix this.

Docker Postgres needs persistance

Bug Description

Currently the Docker Postgres setup does not persist the database - meaning each time you restart the Docker client the database will get wiped.

Steps to Reproduce

Run the Postgres Docker container
Run the migrate functionality npm run migrate while pointing to the docker container
Close Docker and the Container
Restart the Postgres Docker Container, verify database is in pristine condition once again.

Impact

This is a large problem for those individuals doing local development on Deep Lynx as running a Postgres database with the required plugin can be difficult cross platform without using Docker.

Export functionality progress endpoint

Reason

We need the ability to verify a data export is taking place and to judge its progress, whether its processing or finished.

Design

Create an endpoint that returns whether or not a data export is running, terminated, or completed successfully.

More complex tests on MetatypeKeys, MetatypeRelationshipKeys, and MetatypeRelationshipPairs

We lack solid tests for these data types.

Ability to walk relationships

Import list not updating on new import

Bug Description

Creation of a new import is not updating the Import list table.

Steps to Reproduce

Create a new data source - navigate to the imports screen - create a new import. The list should update with the new import.

Impact

Users might try to import their data multiple time.s

Dynamic Resolvers

Authentication service needs to manage JWT token expiration

Reason

A user needs to be re-logged in if their JWT expires.

Design

The authentication service needs to do a periodic check on the JWT expiry time. Either a constant timer running in the background for the check, or a check on each API call to validate that a user has valid access.

Impact

Shouldn't impact more than long-term users.

Users should not be able to delete data source if nodes/edges exist that pertain to it

Reason

A user should not be able delete a data source if there are any nodes or edges that are associated with it.

Design

The API method should return an error when a user attempts to delete a data source with existing data.
The UI should not allow a user to call the delete method if data exists for the data source.
An API method and UI function should exist for deleting all nodes/edges for a given data source.

Impact

Anyone using data source should be affected.

Postman Authentication Routes Out of Date

Bug Description

The authentication routes in the Postman and Swagger API docs are out of date.

Steps to Reproduce

Attempt to use any of the documented routes.

Impact

Need accurate documentation

Query by attribute(property) equality - e.g animal = cat

JSON File Import increase max size or use temp directory

Regex and Cardinality on MetatypeKeys and MetatypeRelationships

While the MetatypeKey and MetatypeRelationshipKey both have data structures for recording cardinality, uniqueness, and regex pattern matching, currently nothing takes advantage of this.
Stored data's properties are never checked using these types of functionalities.

Default Value
Regex on string and enumerable type
Cardinality (not entirely sure what this means in the scope of the project yet. Communicate with Christopher Ritter)

Transformation condition updating not working

Bug Description

Attemping to update a transformation's condition list is not working.

Steps to Reproduce

Attempt to modify an existing transformation's conditions and save that transformation. Either it's not updating the transformation itself, or it's not updating the UI. Either way it needs fixed.

Impact

Users need the ability to modify all aspects of a transformation.

Default Value Type Mapping Keys

Type mapping keys need a default value so that if the payload is missing that property the system can substitute a default value.

Ontology and Type Mapping Cached Storage

Reason

In order to cut down the amount of database calls and time when processing raw data we need to implement caching on ontology retrieval and type mapping/transformation retrieval operations.

Design

This should take place entirely inside the storage layers. The update/delete/retrieve functions must be updated to store, remove, or retrieve data values from the cache first - instead of the database.

Impact

This will impact at least half the existing storage layers.

Manual data import currently json only

While you can declare a manual data source, you can currently only upload json data using that data source. The endpoint and manual data source implementation will need to be updated at some future date to handle different data type.s

Number of Metatype Relationship Pairs limited to 1000

Bug Description

It seems that the number of Metatype Relationship Pairs is limited to 1000

Steps to Reproduce

Import a container with the DIAMOND ontology. See number of relationship pairs is 1000. Attempt to add an additional relationship pair, then search in the list

Impact

Requires reduction of the full ontology

Filter on List API endpoints

Right now listing endpoints and functions only accept limit and offset paramters. Need to ability to query without those parameters as well as filter against properties like name and others.

eliminated wildcard searches

Error screen if container fails to load

Reason

There is currently no indication to a user when the selected container fails to load into the dashboard.

Design

We need an error screen, or simply an error notification and redirect, when a container fails to load its dashboard.