omenapps / django-postgresql-dag Goto Github PK

Directed Acyclic Graphs with a variety of methods for both Nodes and Edges, and multiple exports (NetworkX, Pandas, etc). This project is the foundation for a commercial product, so expect regular improvements. PR's and other contributions are welcomed.

License: Apache License 2.0

Python 100.00%

django graph dag graph-algorithms directed-acyclic-graph directed-graph postgresql cte common-table-expression

django-postgresql-dag's Issues

Add tooling for rearranging the graph

Inserting nodes on an edge (splitting an edge into two with a node in-between). Consider whether the existing edge properties should be mirrored to the two replacement edges, or if new values should be used. (aka: subdivision)
Removing nodes - we can currently remove children nodes one-by-one, but I want to be able to specify a particular node and remove it, potentially with various strategies for how to handle its associated edges
- 1. simply delete, potentially leaving a gap
- 2. weld if there is only one parent and one child edge (aka: smoothing)
- 3. delete, and cascade-delete downward
- 4. Others?
Moving the start/end of an edge to another node, with circular check
Copy a section of the graph

Test 04 (deep dag): Issues with multiprocessing

Hi I have problems to get test_04_deep_dag() running correctly.
When running as is, it results into an AttributeError. The error message states problems with local objects, which is common for multiprocessing:
AttributeError: Can't pickle local object 'DagTestCase.test_04_deep_dag.<locals>.run_test'
The line it fails is:

django-postgresql-dag/tests/test.py

Line 625 in 20c4797

p.start()

I'm running this from Windows. As far as I know Windows systems have pickle-issues. Maybe there is an alternative to using multiprocessing? One can also not use signal, as this is missing some functionality on Windows.

UpwardPathQuery and DownwardPathQuery result in error due to default casting

When performing path queries, the path is cast as bigint while the array of ids is cast as integer

Example query explain from our codebase:

WITH RECURSIVE traverse(parent_id, child_id, depth, PATH) AS
    (SELECT first.parent_id,
            first.child_id,
            1 AS depth, ARRAY[first.parent_id] AS PATH
     FROM flow_networks_networkedge AS FIRST
     WHERE parent_id = 4269
     UNION ALL SELECT first.parent_id,
                      first.child_id,
                      second.depth + 1 AS depth,
                      PATH || first.parent_id AS PATH
     FROM flow_networks_networkedge AS FIRST,
          traverse AS SECOND
     WHERE first.parent_id = second.child_id
         AND (first.parent_id <> ALL(second.path)) )
SELECT UNNEST(ARRAY[pkid]) AS pkid
FROM
    (SELECT PATH || ARRAY[4215], depth
     FROM traverse
     WHERE child_id = 4215
         AND depth <= 20
     LIMIT 1) AS x(pkid);

Result:

ERROR:  operator does not exist: bigint[] || integer[]
LINE 17:     (SELECT PATH || ARRAY[4215], depth
                          ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.
SQL state: 42883
Character: 657

Update CI matrix

Update the CI matrix to run the following combinations:

Python 3.7 through 3.10
Both Ubuntu and Windows
At least one recent version of Postgres (for Ubuntu, running against multiple versions of postgres should be easy; less so for Windows)

Implement basic CI

black
automatic testing

Rename `master` branch to `main`

Hard coded pk

Pytest fails due to hard coded pks.

Add option to prevent multiple shared edges between two nodes

In some graphs, it may be undesirable to have more than one edge between the same two nodes.

Add an option to disallow this behavior, with a custom exception when a model the same parent or child is added more than once to a node.

networkx is missing when installing

When installing from pip, networkx doesn't get installed. It's imported in transformations.py at the top, but this may want to be moved to an inline import in nx_from_queryset().

The networkx stuff looks like an in-progress feature, but just wanted to let you know.

Add method for hashing connected graphs

In preparation for some future work, it would be convenient to be able to calculate the hash of a connected graph, assuming no directivity in edges. That is, if we query for all nodes connected to a node using django-postgresql-dag's connected_graph() method, we should be able to calculate a hash value of the current state of all connected elements. While potentially expensive to compute †, this should allow guaranteed identification of changes to the graph.

It is possible to have multiple DAGs using django-postgresql-dag. For this issue, we want to calculate the hash of a particular DAG, not all nodes/graphs within the project. To do this, we might approach the problem in the following manner:

Initially treat the DAG as an undirected graph (which is done with the connected_graph() query) to retrieve all nodes in the current graph.
For each identified node within the QuerySet, annotate the node with a sorted list of the parent id values of that node.
Sort the QuerySet by node id.
Convert the resulting QuerySet to a tuple of tuples (which contains the id and parents for each node).
Iterate through the tuple, calculating the combined hash all elements.
Return the calculated hash value.

Recommendations for improvement are welcome. There's likely a more memory-friendly approach that will still get the job done.

Alternative approaches considered: NetworkX Provides a weisfeiler_lehman_graph_hash() function, but it doesn't work well on graphs that are not isomorphic. The graphs produced with django-postgresql-dag are likely to be non-isomorphic.

† Potentially expensive in time and memory. The tuple we use to calculate the hash contains the id of every node in the graph, and for each node, the ids of the parent nodes. Using hashllib's blake2s hash will probably be a speedy approach, but the entire process of querying, converting, and iterating to calculate the hash will take a certain amount of time.

Missing code/tags in Github repo?

The PyPI package which purports to be from this repository has significantly more releases than this repository has tags:
https://pypi.org/project/django-postgresql-dag/#history

Notably there are no tags on this repository since 0.2.3 on 2021-03-21 but PyPI has up to 0.4.0 now
I see a commit to this repo on 2022-01-12 which bumps the version number in setup.py to 0.3.2 but nothing for 0.4.0

Is this still the correct repository for that PyPI package?

Can we make max_depth configurable from the django settings

Hi guys i just checked this package and i saw, we can go max 20 depth because all things configured into the BaseQuery class.
Can we make it configurable into the settings.py?

Unable to delete children/parents when they share more than one edge

Great library, this is working pretty well for us!

One issue I'm facing is that you can add multiple duplicate edges, resulting in some instability in the API.

node = GraphNode.objects.get(id=1)
new_parent = GraphNode.objects.get(id=2)

node.add_parent(new_parent)

# can call this multiple times
node.add_parent(new_parent)
node.add_parent(new_parent)

# need to refresh the related manager
node.refresh_from_db(fields=['parents'])
print(node.parents)

This results in 3 edges being created, all with the same parent and child pointers. This may be desired - it's still a valid graph. The issue comes when you try to delete the parent:

node = GraphNode.objects.get(id=1)
parent_to_remove = GraphNode.objects.get(id=2)

node.remove_parent(parent_to_remove)

An exception is raised:

Exception Type: MultipleObjectsReturned
Exception Value: get() returned more than one GraphEdge -- it returned 3!

Should remove_parent() be changed from parent.children.through.objects.get(parent=parent, child=self).delete() to a .filter()?

Compatibilty with Django Polymorphic

The raw query generated by the Node are incompatible with Django Polymorphic because the .get_pk_name() method return the field name (e.g. parent_ptr) and not the field attribute name (e.g. parent_ptr_id).

Changing:

        def get_pk_name(self):
            """Sometimes we set a field other than 'pk' for the primary key.
            This method is used to get the correct primary key field name for the
            model so that raw queries return the correct information."""
            return self._meta.pk.name

to:

        def get_pk_name(self):
            """Sometimes we set a field other than 'pk' for the primary key.
            This method is used to get the correct primary key field name for the
            model so that raw queries return the correct information."""
            return self._meta.pk.attname

solve the problem.

The repo seems to not be up to date with the PyPi repo so I am hesitant to make a pull request.

Implement pre-CTE filtering

Add the ability to filter down prior to running CTE.

For instance, in a graph of a municipal district, it may be more efficient to limit the search to a particular region, category of nodes/edges, or other characteristics.

This would involve modifying the raw Postgresql statement to include the additional filtering.

omenapps / django-postgresql-dag Goto Github PK

django-postgresql-dag's People

Contributors

Stargazers

Watchers

Forkers

django-postgresql-dag's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs