GithubHelp home page GithubHelp logo

omenapps / django-postgresql-dag Goto Github PK

View Code? Open in Web Editor NEW
40.0 3.0 6.0 240 KB

Directed Acyclic Graphs with a variety of methods for both Nodes and Edges, and multiple exports (NetworkX, Pandas, etc). This project is the foundation for a commercial product, so expect regular improvements. PR's and other contributions are welcomed.

License: Apache License 2.0

Python 100.00%
django graph dag graph-algorithms directed-acyclic-graph directed-graph postgresql cte common-table-expression

django-postgresql-dag's People

Contributors

alyjak avatar jackatomenapps avatar jacklinke avatar jinglinz avatar worsht avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

django-postgresql-dag's Issues

Add tooling for rearranging the graph

  • Inserting nodes on an edge (splitting an edge into two with a node in-between). Consider whether the existing edge properties should be mirrored to the two replacement edges, or if new values should be used. (aka: subdivision)
  • Removing nodes - we can currently remove children nodes one-by-one, but I want to be able to specify a particular node and remove it, potentially with various strategies for how to handle its associated edges
    • 1. simply delete, potentially leaving a gap
    • 2. weld if there is only one parent and one child edge (aka: smoothing)
    • 3. delete, and cascade-delete downward
    • 4. Others?
  • Moving the start/end of an edge to another node, with circular check
  • Copy a section of the graph

Test 04 (deep dag): Issues with multiprocessing

Hi I have problems to get test_04_deep_dag() running correctly.
When running as is, it results into an AttributeError. The error message states problems with local objects, which is common for multiprocessing:
AttributeError: Can't pickle local object 'DagTestCase.test_04_deep_dag.<locals>.run_test'
The line it fails is:

I'm running this from Windows. As far as I know Windows systems have pickle-issues. Maybe there is an alternative to using multiprocessing? One can also not use signal, as this is missing some functionality on Windows.

UpwardPathQuery and DownwardPathQuery result in error due to default casting

When performing path queries, the path is cast as bigint while the array of ids is cast as integer

Example query explain from our codebase:

WITH RECURSIVE traverse(parent_id, child_id, depth, PATH) AS
    (SELECT first.parent_id,
            first.child_id,
            1 AS depth, ARRAY[first.parent_id] AS PATH
     FROM flow_networks_networkedge AS FIRST
     WHERE parent_id = 4269
     UNION ALL SELECT first.parent_id,
                      first.child_id,
                      second.depth + 1 AS depth,
                      PATH || first.parent_id AS PATH
     FROM flow_networks_networkedge AS FIRST,
          traverse AS SECOND
     WHERE first.parent_id = second.child_id
         AND (first.parent_id <> ALL(second.path)) )
SELECT UNNEST(ARRAY[pkid]) AS pkid
FROM
    (SELECT PATH || ARRAY[4215], depth
     FROM traverse
     WHERE child_id = 4215
         AND depth <= 20
     LIMIT 1) AS x(pkid);

Result:

ERROR:  operator does not exist: bigint[] || integer[]
LINE 17:     (SELECT PATH || ARRAY[4215], depth
                          ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.
SQL state: 42883
Character: 657

Update CI matrix

Update the CI matrix to run the following combinations:

  • Python 3.7 through 3.10
  • Both Ubuntu and Windows
  • At least one recent version of Postgres (for Ubuntu, running against multiple versions of postgres should be easy; less so for Windows)

networkx is missing when installing

When installing from pip, networkx doesn't get installed. It's imported in transformations.py at the top, but this may want to be moved to an inline import in nx_from_queryset().

The networkx stuff looks like an in-progress feature, but just wanted to let you know.

Add method for hashing connected graphs

In preparation for some future work, it would be convenient to be able to calculate the hash of a connected graph, assuming no directivity in edges. That is, if we query for all nodes connected to a node using django-postgresql-dag's connected_graph() method, we should be able to calculate a hash value of the current state of all connected elements. While potentially expensive to compute โ€ , this should allow guaranteed identification of changes to the graph.

It is possible to have multiple DAGs using django-postgresql-dag. For this issue, we want to calculate the hash of a particular DAG, not all nodes/graphs within the project. To do this, we might approach the problem in the following manner:

  1. Initially treat the DAG as an undirected graph (which is done with the connected_graph() query) to retrieve all nodes in the current graph.
  2. For each identified node within the QuerySet, annotate the node with a sorted list of the parent id values of that node.
  3. Sort the QuerySet by node id.
  4. Convert the resulting QuerySet to a tuple of tuples (which contains the id and parents for each node).
  5. Iterate through the tuple, calculating the combined hash all elements.
  6. Return the calculated hash value.

Recommendations for improvement are welcome. There's likely a more memory-friendly approach that will still get the job done.

Alternative approaches considered: NetworkX Provides a weisfeiler_lehman_graph_hash() function, but it doesn't work well on graphs that are not isomorphic. The graphs produced with django-postgresql-dag are likely to be non-isomorphic.

โ€  Potentially expensive in time and memory. The tuple we use to calculate the hash contains the id of every node in the graph, and for each node, the ids of the parent nodes. Using hashllib's blake2s hash will probably be a speedy approach, but the entire process of querying, converting, and iterating to calculate the hash will take a certain amount of time.

Missing code/tags in Github repo?

The PyPI package which purports to be from this repository has significantly more releases than this repository has tags:
https://pypi.org/project/django-postgresql-dag/#history

Notably there are no tags on this repository since 0.2.3 on 2021-03-21 but PyPI has up to 0.4.0 now
I see a commit to this repo on 2022-01-12 which bumps the version number in setup.py to 0.3.2 but nothing for 0.4.0

Is this still the correct repository for that PyPI package?

Unable to delete children/parents when they share more than one edge

Great library, this is working pretty well for us!

One issue I'm facing is that you can add multiple duplicate edges, resulting in some instability in the API.

node = GraphNode.objects.get(id=1)
new_parent = GraphNode.objects.get(id=2)

node.add_parent(new_parent)

# can call this multiple times
node.add_parent(new_parent)
node.add_parent(new_parent)

# need to refresh the related manager
node.refresh_from_db(fields=['parents'])
print(node.parents)

This results in 3 edges being created, all with the same parent and child pointers. This may be desired - it's still a valid graph. The issue comes when you try to delete the parent:

node = GraphNode.objects.get(id=1)
parent_to_remove = GraphNode.objects.get(id=2)

node.remove_parent(parent_to_remove)

An exception is raised:

Exception Type: MultipleObjectsReturned
Exception Value: get() returned more than one GraphEdge -- it returned 3!

Should remove_parent() be changed from parent.children.through.objects.get(parent=parent, child=self).delete() to a .filter()?

Compatibilty with Django Polymorphic

The raw query generated by the Node are incompatible with Django Polymorphic because the .get_pk_name() method return the field name (e.g. parent_ptr) and not the field attribute name (e.g. parent_ptr_id).

Changing:

        def get_pk_name(self):
            """Sometimes we set a field other than 'pk' for the primary key.
            This method is used to get the correct primary key field name for the
            model so that raw queries return the correct information."""
            return self._meta.pk.name

to:

        def get_pk_name(self):
            """Sometimes we set a field other than 'pk' for the primary key.
            This method is used to get the correct primary key field name for the
            model so that raw queries return the correct information."""
            return self._meta.pk.attname

solve the problem.

The repo seems to not be up to date with the PyPi repo so I am hesitant to make a pull request.

Implement pre-CTE filtering

Add the ability to filter down prior to running CTE.

For instance, in a graph of a municipal district, it may be more efficient to limit the search to a particular region, category of nodes/edges, or other characteristics.

This would involve modifying the raw Postgresql statement to include the additional filtering.

Modify methods to use CTE

  • ancestors_ids
  • descendants_ids
  • path_ids_list
  • descendants_edges_ids
  • descendants_edges_ids
  • ancestors_edges_ids
  • descendants_tree
  • ancestors_tree
  • get_roots
  • get_leaves

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.