Deep Lynx is a unique data warehouse where users can provide a custom ontology and have their data stored under said ontology in a graph-like format. Deep Lynx is written in Node.js and Rust and is actively maintained.
When a container is updated via an ontology file, the user should see a warning for any metatypes, metatype relationships, or relationships pairs that have associated node/edge data with them. This will avoid the unintentional removal of data associated with one of these container elements.
Design
A container update via ontology import should remove these container elements if no data is associated with them. Otherwise the container update should fail and notify the user of the issue. This scenario could occur when a user deletes a class or relationship from an ontology and attempts to update an associated container in Deep Lynx.
Impact
This enhancement affects the container update via import process as well as the associated GUI.
The user should be able to easily delete all inserted nodes/edges from any given import.
Design
When a user attempts to delete an import via the UI a prompt should allow them to choose whether or not the deletion of the import should affect already inserted data. If they choose yes, all inserted data from that import should be deleted from the database.
The API endpoint should have a flag on it for deleting data along with import. If flag is set - delete data with import, if flag is not set - delete only the import and leave inserted data untouched.
Impact
The largest impact this has is that there will now be the chance for orphaned data - data not tied to any given import.
The import's listing page is not showing the process completed correctly - especially for completed imports.
Steps to Reproduce
Create an import. Create type mappings/transformations enough to get it processed. Return to import screen - import should read completed but processing % should still read 0.
Impact
Users will be unsure of whether or not their imports processed correctly.
The end user needs to be able to verify that their download of a file was completed successfully. This can be done by matching a file checksum generated at time of file upload.
Design
Generate and store a file checksum at time of file uploading to outside service.
Right now type mappings can handle nested keys for PROPERTIES ONLY. The keys governing unique identifier, type, origin and destination ids, and relationship type are required to be top level keys due to the database trigger not handling those keys being nested.
In order to make this work you will need to modify the database triggers for the type mappings as well as the type mapping function itself(granted that shouldn't be a big modification)
Need to be able to interface with a Redis cache instance.
Design
This should be fairly simple. The cacher interface already exists and an in-memory implementation has been created. Simply follow the pattern there and using this library - https://www.npmjs.com/package/redis - implement redis caching.
Impact
Should create the ability for us to run DL in a clustered environment while holding a single cache.
io-ts is a powerful type and payload checking library, but the error message on encoding/decoding errors can be cryptic. We need to make error reporting on encoding and decoding with io-ts more user friendly and allow for a more in-depth explanation of the error.
The JSON import endpoint for datasources (src/api/data_source_routes) needs to be rewritten to use busboy when passed a file and not an array of json objects as a POST payload. We should also be able to handle files larger than 4gb - but this will involve breaking up the import into multiple, smaller imports as we can't store more than 4gb data in a single import.
We need the ability to have a user reset their own password.
Design
Create an endpoint that accepts a user, their current password, and their desired password + confirmation. If those passwords match and the current is valid, reset the user's password. If the email functionality exists at this point also integrate email into the reset password process.
The search boxes for the origin and destination metatypes as well as metatype relationships needs to be dynamic based on user input. Currently it simply lists them and this will max out at 1000 records (as well as bog down the UI)
Design
Check out the type transformation dialog for examples of metatype and relationship search boxes.
Impact
Should lower memory footprint and insure all metatypes/metatype relationships are available to a user.
The boot functionality feels very squishy and short. There is a lot more that could be done here to make the initial boot of the application more robust. Config checking and data source checking could easily be added here.
Currently we only support a few filter operators, such as eq, neq, like, and in. We need to create at least these few more in order to have a feature complete system.
gt greater than and gte greater than or equal to
lt less than and lte less than or equal to
notLike
notIn
Design
TBD
Impact
This should only add functionality and not affect any existing tests or functionality.
Currently the property matching when attempting to filter nodes and edges will not recognize nested json properties. This is a problem because not all data stored in Deep Lynx will consist of a single level, and we'll need to be able to query on all parts of the data, not just the top most level of the json object.
Design
TBD
Impact
This should impact any functionality or tests when added. This is a completely new feature and does not replace or subsume any existing feature.
The Gremlin and HttpImpl classes both have a function called New that needs to be refactored from using the io-ts type's is function to using the pipe() functionality instead. Look at any of the storage layers for examples on how to implement the pipe()
Right now the program will encrypt and store the entire UserT object as part of the JWT token passed to the client. Correct this so that only a session identifier, or something like that, is included.
We need the ability to have Deep Lynx send emails to users for many different reasons, from registration to password resetting etc. A service should be built to take advantage of SMTP servers.
Design
https://nodemailer.com - should be used to create this integration. Create an email service singleton that can be called anywhere throughout the application. Store email templates alongside the source code.
Taxonomy tables with many entries (e.g. Metatype Relationship Pairs) are currently making a single API call to grab all of the rows in the corresponding table. This calls contain a limit, defaulting to 1000. These tables should be updated to act as data tables, grabbing rows with a limit and offset that matches the pagination of the table.
Steps to Reproduce
Having more than 1000 rows in the Metatype Relationship Pairs table will result in not all of them being visible from the Taxonomies section of the Admin GUI.
A user must not delete a type mapping that is associated with data that hasn't been inserted as nodes/edges. Currently a user could potentially delete a type mapping for data pending insertion, and the system will not attempt to recreate that type mapping causing the system to error out.
Design
Three changes must occur.
UI must be modified to not allow a user to delete a type mapping with uninserted data
API endpoint for deleting a type mapping must not work if type mapping has uninserted data
Processing loop must be able to handle data in an import that no longer has a type mapping, able to recreate that type mapping.
Impact
Will impact all users across the board who deal with type mapping.
Right now data, when imported, is stored as a json array in the data_staging table. This is stored in a single record and single record size maxes out at 4gb. Find a way to handle imports larger than 4gb.
Currently the Docker Postgres setup does not persist the database - meaning each time you restart the Docker client the database will get wiped.
Steps to Reproduce
Run the Postgres Docker container
Run the migrate functionality npm run migrate while pointing to the docker container
Close Docker and the Container
Restart the Postgres Docker Container, verify database is in pristine condition once again.
Impact
This is a large problem for those individuals doing local development on Deep Lynx as running a Postgres database with the required plugin can be difficult cross platform without using Docker.
A user needs to be re-logged in if their JWT expires.
Design
The authentication service needs to do a periodic check on the JWT expiry time. Either a constant timer running in the background for the check, or a check on each API call to validate that a user has valid access.
A user should not be able delete a data source if there are any nodes or edges that are associated with it.
Design
The API method should return an error when a user attempts to delete a data source with existing data.
The UI should not allow a user to call the delete method if data exists for the data source.
An API method and UI function should exist for deleting all nodes/edges for a given data source.
While the MetatypeKey and MetatypeRelationshipKey both have data structures for recording cardinality, uniqueness, and regex pattern matching, currently nothing takes advantage of this.
Stored data's properties are never checked using these types of functionalities.
Default Value
Regex on string and enumerable type
Cardinality (not entirely sure what this means in the scope of the project yet. Communicate with Christopher Ritter)
Attemping to update a transformation's condition list is not working.
Steps to Reproduce
Attempt to modify an existing transformation's conditions and save that transformation. Either it's not updating the transformation itself, or it's not updating the UI. Either way it needs fixed.
Impact
Users need the ability to modify all aspects of a transformation.
In order to cut down the amount of database calls and time when processing raw data we need to implement caching on ontology retrieval and type mapping/transformation retrieval operations.
Design
This should take place entirely inside the storage layers. The update/delete/retrieve functions must be updated to store, remove, or retrieve data values from the cache first - instead of the database.
Impact
This will impact at least half the existing storage layers.
While you can declare a manual data source, you can currently only upload json data using that data source. The endpoint and manual data source implementation will need to be updated at some future date to handle different data type.s
It seems that the number of Metatype Relationship Pairs is limited to 1000
Steps to Reproduce
Import a container with the DIAMOND ontology. See number of relationship pairs is 1000. Attempt to add an additional relationship pair, then search in the list
Right now listing endpoints and functions only accept limit and offset paramters. Need to ability to query without those parameters as well as filter against properties like name and others.