GithubHelp home page GithubHelp logo

root-11 / graph-theory Goto Github PK

View Code? Open in Web Editor NEW
79.0 5.0 17.0 3.49 MB

A simple graph library

License: MIT License

Python 100.00%
graph graph-algorithms shortest-path tsp-solver minimum-spanning-trees graphs graph-theory graph-library assignment-problem flow-problem

graph-theory's Introduction

graph-theory

Build status codecov Downloads Downloads PyPI version

A simple graph library...
... A bit like networkx, just without the overhead...
... similar to graph-tool, without the Python 2.7 legacy...
... with code that you can explain to your boss...

Detailed tutorial evolving in the examples section.


Install:

pip install graph-theory

Upgrade:

pip install graph-theory --upgrade --no-cache

Testing:

pytest tests

Import:

import Graph
g = Graph()  

import Graph3d
g3d = Graph3D()

Modules:

module description
from graph import Graph, Graph3D Elementary methods (see basic methods below) for Graph and Graph3D.
from graph import ... All methods available on Graph (see table below)
from graph.assignment_problem import ... solvers for assignment problem, the Weapons-Target Assignment Problem, ...
from graph.hash import ... graph hash functions: graph hash, merkle tree, flow graph hash
from graph.random import ... graph generators for random, 2D and 3D graphs.
from graph.transshipment_problem import ... solvers for the transshipment problem
from graph.traffic_scheduling_problem import ... solvers for the traffic jams (and slide puzzle)
from graph.visuals import ... methods for creating matplotlib plots
from graph.finite_state_machine import ... finite state machine

All module functions are available from Graph and Graph3D (where applicable).

Graph Graph3D methods returns example
+ + a in g assert if g contains node a
+ + g.add_node(n, [obj]) adds a node (with a pointer to object obj if given)
+ + g.copy() returns a shallow copy of g
+ + g.node(node1) returns object attached to node 1
+ + g.del_node(node1) deletes node1 and all it's edges
+ + g.nodes() returns a list of nodes
+ + len(g.nodes()) returns the number of nodes
+ + g.nodes(from_node=1) returns nodes with edges from node 1
+ + g.nodes(to_node=2) returns nodes with edges to node 2
+ + g.nodes(in_degree=2) returns nodes with 2 incoming edges
+ + g.nodes(out_degree=2) returns nodes with 2 outgoing edges
+ + g.add_edge(1,2,3) adds edge to g for vector (1,2) with value 3
+ + g.edge(1,2) returns value of edge between nodes 1 and 2
+ + g.edge(1,2,default=3) returns default=3 if edge(1,2) doesn't exist.
similar to d.get(key, 3)
+ + g.del_edge(1,2) removes edge between nodes 1 and 2
+ + g.edges() returns a list of edges
+ + len(g.edges()) returns the number of edges
+ + g.edges(path=[path]) returns a list of edges (along a path if given).
+ + same_path(p1,p2) compares two paths to determine if they contain same sequences
ex.: [1,2,3] == [2,3,1]
+ + g.edges(from_node=1) returns edges outgoing from node 1
+ + g.edges(to_node=2) returns edges incoming to node 2
+ + g.from_dict(d) updates the graph from a dictionary
+ + g.to_dict() returns the graph as a dictionary
+ + g.from_list(L) updates the graph from a list
+ + g.to_list() return the graph as a list of edges
+ + g.shortest_path(start,end [, memoize, avoids]) returns the distance and path for path with smallest edge sum
If memoize=True, sub results are cached for faster access if repeated calls.
If avoids=set(), then these nodes are not a part of the path.
+ + g.shortest_path_bidirectional(start,end) returns distance and path for the path with smallest edge sum using bidrectional search.
+ + g.is_connected(start,end) determines if there is a path from start to end
+ + g.breadth_first_search(start,end) returns the number of edges and path with fewest edges
+ + g.breadth_first_walk(start,end) returns a generator for a BFS walk
+ + g.degree_of_separation(n1,n2) returns the distance between two nodes using BFS
+ + g.distance_map(starts,ends, reverse) returns a dictionary with the distance from any start to any end (or reverse)
+ + g.network_size(n1, degree_of_separation) returns the nodes within the range given by degree_of_separation
+ + g.topological_sort(key) returns a generator that yields node in order from a non-cyclic graph.
+ + g.critical_path() returns the distance of the critical path and a list of Tasks. Example
+ + g.critical_path_minimize_for_slack() returns graph with artificial dependencies that minimises slack. Example
+ + g.phase_lines() returns a dictionary with the phase_lines for a non-cyclic graph.
+ + g.sources(n) returns the source_tree of node n
+ + g.depth_first_search(start,end) returns path using DFS and backtracking
+ + g.depth_scan(start, criteria) returns set of nodes where criteria is True
+ + g.distance_from_path(path) returns the distance for path.
+ + g.maximum_flow(source,sink) finds the maximum flow between a source and a sink
+ + g.maximum_flow_min_cut(source,sink) finds the maximum flow minimum cut between a source and a sink
+ + g.minimum_cost_flow(inventory, capacity) finds the total cost and flows of the capacitated minimum cost flow.
+ + g.solve_tsp() solves the traveling salesman problem for the graph.
Available methods: 'greedy' (default) and 'bnb
+ + g.subgraph_from_nodes(nodes) returns the subgraph of g involving nodes
+ + g.is_subgraph(g2) determines if graph g2 is a subgraph in g
+ + g.is_partite(n) determines if graph is n-partite
+ + g.has_cycles() determines if there are any cycles in the graph
+ + g.components() returns set of nodes in each component in g
+ + g.same_path(p1,p2) compares two paths, returns True if they're the same
+ + g.adjacency_matrix() returns the adjacency matrix for the graph
+ + g.all_pairs_shortest_paths() finds the shortest path between all nodes
+ + g.minsum() finds the node(s) with shortest total distance to all other nodes
+ + g.minmax() finds the node(s) with shortest maximum distance to all other nodes
+ + g.shortest_tree_all_pairs() finds the shortest tree for all pairs
+ + g.has_path(p) asserts whether a path p exists in g
+ + g.all_simple_paths(start,end) finds all simple paths between 2 nodes
+ + g.all_paths(start,end) finds all combinations of paths between 2 nodes
- + g3d.distance(n1,n2) returns the spatial distance between n1 and n2
- + g3d.n_nearest_neighbour(n1, [n]) returns the n nearest neighbours to node n1
- + g3d.plot() returns matplotlib plot of the graph.

FAQ

want to... doesn't work... do instead... ...but why?
have multiple edges between two nodes Graph(from_list=[(1,2,3), (1,2,4)] Add dummy nodes
[(1,a,3), (a,2,0),
(1,b,4),(b,2,0)]
Explicit is better than implicit.
multiple values on an edge g.add_edge(1,2,{'a':3, 'b':4}) Have two graphs
g_a.add_edge(1,2,3)
g_b.add_edge(1,2,4)
Most graph algorithms don't work with multiple values
do repeated calls to shortest path g.shortest_path(a,b) is slow Use g.shortest_path(a,b,memoize=True) instead memoize uses bidirectional search and caches sub-results along the shortest path for future retrievals

Credits:

  • Arturo Soucase for packaging and testing.
  • Peter Norvig for inspiration on TSP from pytudes.
  • Harry Darby for the mountain river map.
  • Kyle Downey for depth_scan algorithm.
  • Ross Blandford for munich firebrigade centre -, traffic jam - and slide puzzle - test cases.
  • Avi Kelman for type-tolerant search, and a number of micro optimizations.
  • Joshua Crestone for all simple paths test.
  • CodeMartyLikeYou for detecting a bug in @memoize
  • Tom Carroll for detecting the bug in del_edge and inspiration for topological sort.
  • Sappique for discovering bug in __eq__ and has_cycles.

graph-theory's People

Contributors

04t02 avatar fiendish avatar qasim-at-tci avatar root-11 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

graph-theory's Issues

Deleting a node doesn't fully eliminate it from the graph

It appears that when you delete a node, certain traces of it are left behind in the graph's internal bookkeeping structures.

Consider this simple graph:

g = graph.Graph()
g.add_edge(1, 3)
g.add_edge(2, 3)

If we now try to do a topological sort on it by iteratively finding the edges with in_degree=0:

the_sort = []

zero_in = g.nodes(in_degree=0)
while zero_in:
    for n in zero_in:
        the_sort.append(n)
        g.del_node(n)
    zero_in = g.nodes(in_degree=0)
    print(f"remaining nodes: {len(g.nodes())},"
          f" in_degree=0 nodes: {len(g.nodes(in_degree=0))}")

...this loop will never terminate. The remaining nodes will be reported as 0 (len(g.nodes())), however there will
remain nodes in the graph with in_degree=0 (len(g.nodes(in_degree=0))). When looking at the code, it appears to
be a problem with the accounting done in del_edge(); self._in_degree[node2] is decremented, but the dict entry
probably should be deleted if self._out_degree[node2] is zero as well. However, since del_edge() is a public method,
it might be the case that having a floating node with no inbound or outbound edges is totally acceptable, in which case
you might need something else to decide when to actually get rid of the entry for a deleted node in these two dicts.

Make examples with explanations in .pynb

I'm adding this issue, as I've received a request to add an examples section for the github repo, so that the more sophisticated examples are explained using jupyter notebooks, with images and visualisations. The first idea is to link the table of functions from the readme.md to directly to the examples:

image

(cut for brevity)
image

An example could be:
image
which explains how the method works.

Suggestions are welcome

`__eq__` (equality) can return false for two identical graphs if an edge was added and removed in one of them

If one graph had a edge added and then removed again, comparing it to an identical graph can return false. This occurs it the added and removed edge is outgoing from an node that had no outgoing edges before.

To reproduce

Python 3.10.0 (tags/v3.10.0:b494f59, Oct  4 2021, 19:00:18) [MSC v.1929 64 bit (AMD64)] on win32
>>> from graph import Graph
>>> g1 = Graph(from_list=[(1,2),(2,3)])
>>> g2 = g1.copy()
>>> g1 == g2
True
>>> g1.add_edge(3,1)
>>> g1.del_edge(3,1)
>>> g1 == g2
False
>>> g1.edges() == g2.edges()
True
>>> g1.nodes() == g2.nodes()
True

Cause

As far as I can tell, this bug occurs, because __eq__ compares the two graphs private edge variables _edges instead of calling the public edge getter edges().
When an edge from an node, that did not have any outgoing edges before, is added and removed, that edges name remains as a key in the graph's internal edge variable _edges. Two dictionaries are unequal if one has a key the other has not, even if all values are equal, thus the two graphs are unequal.

Other examples

Because copy() uses the public edge getter edges() this bug can lead to some very confusing behavior:

Python 3.10.0 (tags/v3.10.0:b494f59, Oct  4 2021, 19:00:18) [MSC v.1929 64 bit (AMD64)] on win32
>>> from graph import Graph
>>> g1 = Graph(from_list=[(1,2),(2,3)])
>>> g1.add_edge(3,1)
>>> g1.del_edge(3,1)
>>> g2 = g1.copy()
>>> g1 == g2
False

Additional information

I'm using version 2023.7.5 of graph-theory with Python 3.10 on windows.

More examples of assignment problems

Here's a list:

    - Assignment problems (AP)
      - Explain that AP is special case of GAP with just one agent.
       - Explain why the hungarian algorithm is subperformant relative to alternating iterative auction.

    - The Knapsack problem (code done, tutorial missing)
        - Cutting stock problem (https://en.wikipedia.org/wiki/Cutting_stock_problem)
        - 3D bin packing problem

    - Maximum flow (code done, tutorial missing)
    - Minimum costs (code done? )
    - Assignment problem with allowed groups

    - Quadratic assignment problem (https://en.wikipedia.org/wiki/Quadratic_assignment_problem)
      and Facility location problem (https://en.wikipedia.org/wiki/Facility_location_problem)
        - Explain that the problem resembles that of the assignment problem, except that the
          cost function is expressed in terms of quadratic inequalities, hence the name.
        - Example: The problem is to assign all facilities to different locations with the
          goal of minimizing the sum of the distances multiplied by the corresponding flows.

          Hint: Use XY-graph for solving the problem.

`has_cycles` allways returns false if the graph is disconnected

The function has_cycles allways returns false if the graph is disconnected, including when a cycle exists.

If this is the intended behavior, the documentation should be modified to clearly convey that.

To reproduce

Python 3.10.0 (tags/v3.10.0:b494f59, Oct  4 2021, 19:00:18) [MSC v.1929 64 bit (AMD64)] on win32
>>> from graph import Graph
>>> g = Graph(from_list=[(1,2),(2,3),(3,1)])
>>> g.has_cycles()
True
>>> g.add_node(4)
>>> g.has_cycles()
False

Additional information

I'm using version 2023.7.4 of graph-theory with Python 3.10 on windows.

An more sophisticated flow problem example?

Check if this could belong in examples. @04t02

from collections import defaultdict

from graph import Graph


def calculate_max_traffic(graph, unit_quantities):
    max_traffic = defaultdict(int)

    for in_node, out_nodes in unit_quantities.items():
        for out_node, quantity in out_nodes.items():
            _, path = graph.shortest_path(in_node, out_node)

            path_steps = list(zip(path[:-1], path[1:]))

            for (first_node, second_node) in path_steps:
                max_traffic[(first_node, second_node)] += quantity

    return max_traffic


# Example usage:
graph_edges = ["in_1", "out_1", "in_2", "out_2", "in_3", "out_3", "in_4", "out_4"]
graph_edges_pairs = list(zip(graph_edges[:-1], graph_edges[1:]))
graph_edges_pairs.append((graph_edges[-1], graph_edges[0]))

g = Graph(from_list=graph_edges_pairs)
unit_quantities = {"in_1": {"out_1": 10, "out_2": 5, "out_3": 5, "out_4": 5},  # 25
                   "in_2": {"out_2": 10, "out_3": 5, "out_4": 5, "out_1": 5},  # 25
                   "in_3": {"out_3": 5, "out_4": 5, "out_1": 5, "out_2": 10},  # 25
                   "in_4": {"out_4": 5, "out_1": 5, "out_2": 10, "out_3": 5}}  # 25

max_traffic = calculate_max_traffic(g, unit_quantities)

for (first_node, second_node), quantity in max_traffic.items():
    print(f"{first_node} -> {second_node}: {quantity}")

Feature request: minimum cut function

Hello,
I know that NetworkX has a direct function to implement min cut max flow algorithm minimum_cut(). It works well with small graphs but fails for large graphs.

I checked graph-theory and found that it has test_flow_problem.py but I couldn't find a direct minimum_cut() function like NetworkX.

This is a feature request to add this functionality in graph-theory.

branch and bound algorithm for TSP isn't solid.

To recreate:

def test_random_graph_3_bnb():
    for i in range(8,15):
        d = None
        for j in range(3):
            g = random_xy_graph(i, x_max=800, y_max=400)  # a fully connected graph.
            start = time.process_time()
            d1, t1 = g.solve_tsp('bnb')  # tsp_branch_and_bound(g)
            d2, t2 = g.solve_tsp('greedy')  # tsp_greedy(g)
            assert d1 <= d2, (d1, d2, g.edges())
            if d is None:
                d = d1
            else:
                assert d == d1, (d, d1)

            end = time.process_time()
            print(i, j, end-start)

Traceback (most recent call last):
  File "C:/Users/madsenbj/AppData/Roaming/JetBrains/PyCharm2020.2/scratches/scratch_7.py", line 54, in <module>
    test_random_graph_3_bnb()
  File "C:/Users/madsenbj/AppData/Roaming/JetBrains/PyCharm2020.2/scratches/scratch_7.py", line 42, in test_random_graph_3_bnb
    assert d1 <= d2, (d1, d2, g.edges())
AssertionError: (2293.897719652855, 2004.6354644817718, [((655, 58), (559, 45), 96.87620966986684), ((655, 58), (229, 72), 426.2299848673249), ((655, 58), (26, 380), 706.6293229126569), ((655, 58), (693, 380), 324.2344830520036), ((655, 58), (605, 217), 166.67633305301626), ((655, 58), (755, 53), 100.12492197250393), ((655, 58), (282, 126), 379.14772846477666), ((559, 45), (26, 380), 629.5347488423495), ((559, 45), (605, 217), 178.04493814764857), ((559, 45), (693, 380), 360.8060420780118), ((559, 45), (655, 58), 96.87620966986684), ((559, 45), (229, 72), 331.1027030998086), ((559, 45), (282, 126), 288.60006930006097), ((559, 45), (755, 53), 196.16319736382766), ((26, 380), (693, 380), 667.0), ((26, 380), (559, 45), 629.5347488423495), ((26, 380), (229, 72), 368.8807395351511), ((26, 380), (655, 58), 706.6293229126569), ((26, 380), (755, 53), 798.9806005154318), ((26, 380), (282, 126), 360.6272313622475), ((26, 380), (605, 217), 601.5064421932653), ((693, 380), (755, 53), 332.8257802514703), ((693, 380), (605, 217), 185.23768515072737), ((693, 380), (229, 72), 556.9201019895044), ((693, 380), (559, 45), 360.8060420780118), ((693, 380), (282, 126), 483.15318481823135), ((693, 380), (655, 58), 324.2344830520036), ((693, 380), (26, 380), 667.0), ((229, 72), (605, 217), 402.9900743194552), ((229, 72), (755, 53), 526.3430440311718), ((229, 72), (26, 380), 368.8807395351511), ((229, 72), (693, 380), 556.9201019895044), ((229, 72), (655, 58), 426.2299848673249), ((229, 72), (559, 45), 331.1027030998086), ((229, 72), (282, 126), 75.66372975210778), ((605, 217), (655, 58), 166.67633305301626), ((605, 217), (229, 72), 402.9900743194552), ((605, 217), (26, 380), 601.5064421932653), ((605, 217), (755, 53), 222.25210910135362), ((605, 217), (559, 45), 178.04493814764857), ((605, 217), (282, 126), 335.574134879314), ((605, 217), (693, 380), 185.23768515072737), ((755, 53), (229, 72), 526.3430440311718), ((755, 53), (693, 380), 332.8257802514703), ((755, 53), (559, 45), 196.16319736382766), ((755, 53), (605, 217), 222.25210910135362), ((755, 53), (655, 58), 100.12492197250393), ((755, 53), (282, 126), 478.60004178854814), ((755, 53), (26, 380), 798.9806005154318), ((282, 126), (26, 380), 360.6272313622475), ((282, 126), (229, 72), 75.66372975210778), ((282, 126), (755, 53), 478.60004178854814), ((282, 126), (693, 380), 483.15318481823135), ((282, 126), (655, 58), 379.14772846477666), ((282, 126), (559, 45), 288.60006930006097), ((282, 126), (605, 217), 335.574134879314)])



is_connected can probably be merged with breadth_first_search

In the performance-agnostic case, is_connected(graph, start, end) is just bool(breadth_first_search(graph, start, end)), though BFS could optimize a few things further by e.g. taking an argument to control whether the found path gets reconstructed or not.

Missing path in all_paths

Given the following graph:

g = graph.Graph()
g.add_node('a')
g.add_node('b')
g.add_node('c')
g.add_node('d')
g.add_node('e')

g.add_edge('a', 'b', bidirectional=True)
g.add_edge('b', 'c', bidirectional=True)
g.add_edge('b', 'd', bidirectional=True)
g.add_edge('c', 'd', bidirectional=True)
g.add_edge('c', 'e', bidirectional=True)
g.add_edge('d', 'e', bidirectional=True)

A call to graph.all_paths(g, 'e', 'a') returns:
[['e', 'c', 'b', 'a'], ['e', 'd', 'b', 'a'], ['e', 'c', 'd', 'b', 'a']]

This is missing ['e', 'd', 'c', 'b', 'a']

Feature: Option: Add regions to graphs

Minimal illustration of the problem

image
(classic graph vs multi-graph)

Assume G is a binary tree with a root and 2 levels if bifurcation resulting in $2^{2}$ leaves with randomized weights on the edges.

Assume that all search starts at the root and ends by identifying the route to a leaf using BFS to determine the shortest path.

Problem: Due to the symmetric nature of the graph, shortest path BFS will practically visit every node every time a search is performed.

Proposition 1: If (!) G is redesigned such that the graph is holds information about what can be found below each bifurcation point, only 10 nodes need to be visited. This is ideal from a search perspective, but the memory overhead is problematic as it requires the graph to store all leaves at all bifurcation levels: ~10x more memory. A second problem with this approach is that it only works for DAGs.

Proposition 2: If a partition of G can be declared as a another graph G' and BFS and shortest-path search can query G' to whether or not it contains or has a route to the target node, then the search can be accelerated:

  1. If the target node is in G' and BFS sees G' as a single node in G, then the destination node has been found.
  2. If the target node is NOT in G', BFS can eliminate the search through G' all together.

For the binary tree this means that G defined as $G_{1}' + G_{2}' = G_{1.1}' + G_{1.2}' + G{2.1}' + G_{2.2}...$ a BFS or shortest-path will require only $2*10$ recursive queries akin to "is target in G'".

The reason for 2*10 is because at each recursive step the binary partition will have at least one failure.

Edges cases:

For non-trees, such as road networks, which may be partitioned using the "AA", "A", "B", ... road network classification, each branch will lead to a $G_{n}'$ where knowing the probability of reaching the target (for example using (lat, lon)-distance) will help to accelerate the search, but if such information isn't available - for example in information networks - the better method is to partition by proximity e.g. in clusters of $G/2$-nodes. The search must thereby treat G' as nodes that either have been visited or not.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.