-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
gh-129847: Add graphlib.reverse(), graphlib.as_transitive() #130875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Misc/NEWS.d/next/Core_and_Builtins/2025-03-05-10-43-31.gh-issue-129847.dug2ca.rst
Show resolved
Hide resolved
Bugs are fixed; they were bugs in how I unrolled a recursive implementation to iterative. For reference, a recursive version is def as_transitive(graph):
def visit(node, visited, stack):
if node in stack:
cycle = stack[stack.index(node):] + [node]
raise CycleError("nodes are in a cycle", cycle)
if node in visited:
return visited[node]
closure = set()
stack.append(node)
for child in graph.get(node, []):
closure.add(child)
closure.update(visit(child, visited, stack))
stack.pop()
visited[node] = closure
return closure
visited = {}
return {node: visit(node, visited, []) for node in graph} The problem with the recursive version is that it is limited to graphs with diameter < I've also explored using TopologicalSorter: def as_transitive(graph):
graph = {k: set(v) for k, v in graph.items()}
transitive = {}
for node in TopologicalSorter(graph).static_order():
if node in graph:
direct = graph[node]
t = transitive[node] = set(direct)
t.update(*(transitive.get(d, ()) for d in direct))
return transitive However, this is much slower as it needs to allocate a lot of temporary objects. Timings:
|
06ef96d
to
d90f7c2
Compare
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
791f94d
to
1889b93
Compare
import sys | ||
import os |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: sorting
import sys | |
import os | |
import os | |
import sys |
class TestAsTransitive(unittest.TestCase): | ||
"""Tests for graphlib.as_transitive().""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with the recursive version is that it is limited to graphs with diameter < sys.getrecursionlimit() so 1000 by default.
Could we add a test for such a pathological graph?
Question: why is transitive closure making a special case of cycles? If there's a cycle involving That is, the usual concept of transitive closure has nothing to do with cycles, so using the name to mean something else is at best surprising. I think it would be best to stick with the conventional meaning. This is, after all, not the "acyclic_graphlib" module 😉. |
Despite the name I understand it is the acyclic graphlib module. From the discussion in #129847 there is no desire to compete with a full graph library like NetworkX but we may be willing to admit functions that work on acyclic graphs, especially task dependency graphs, which is the original use case for TopologicalSorter. I usually use TopologicalSorter xor transitive closure in these kind of applications, so it's useful to have a cyclic check when computing the transitive closure, as it visits every node and the check drops out. I could add a kw-only arg? |
Transitive closure isn't competing with anyone 😉. It's a basic operation applicable to all grsphs of all kinds, and has the same meaning everywhere. By default, in the absence of truly compelling reasons not to, Python should do the same as all other packages for it. So I'd be OK with adding an optional That it's "natural" for a TC implementation to detect cycles isn't necessarily so. It depends on the specific algorithm. Here, for example, is an implementation of Warshall's algorithm (not tested much - may be buggy). It has no concept of "cycle", and does not build explicit paths: def tc(graph):
tc = {i : set(j) for i, j in graph.items()}
for M, Mset in tc.items():
for R in Mset:
for Lset in tc.values():
if M in Lset:
Lset.add(R)
return tc If you care about speed, you should check that out too. It builds very few temp objects. |
Fun observation: a bunch of places on the web say Warshall's algorithm is only useful when using an adjacency bit matrix representation. But that's not so. This variation is even simpler, and runs much faster, especially so on dense input graphs Although this is Python, and part of "the trick" is that no indexing or visible def tc(graph):
tc = {i : set(j) for i, j in graph.items()}
for M, Mset in tc.items():
if Mset:
for Lset in tc.values():
if M in Lset:
Lset.update(Mset)
return tc |
Huh! For more speed, replace:
with
|
I'm happy with I suspect for the use case of task graphs that |
Ya, Warshall is best suited for dense graphs, and I agree those aren't the natural focus here. An idea is to split off "is there a cycle?" into its own function, so there's only one place that needs to change if ambitious change. "The best" compromise for transitive closure is to find strongly connected components first (in linear time). Then each SCC trivially induces a complete subgraph, and the DAG of SCCs can be done via a topsort. There was no "grand plan" at the start, but I doubt anyone had in mind "sparse graphs" as a goal. Even cycle-free graphs can have a number of edges quadratic in the number of nodes. There was a desire not to complicate things by introducing a Graph class too. So it was implicitly accepted that we'd stick (at least at first) to "the natural" Python graph representation: a dict mapping s bashable node to a collection of neighbors. So "directed and unweighted" was implicit at the start. I'd prefer to leave it there too. My own unstated opinion was that graphlib would reach its limit when it grew a function to compute a directed graph's strongly connected components. a delicate undertaking to do efficiently. The functions added by this PR are very comfortably within that limit. |
graphlib | ||
-------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be re-targeted to 3.15.
graphlib.reverse()
for reversing a DAG in the form accepted by TopologicalSortergraphlib.as_transitive()
for computing a transitive closure📚 Documentation preview 📚: https://cpython-previews--130875.org.readthedocs.build/