Graphframes Pyspark route compaction

36 views Asked by At

In pyspark given a directed graph structure represented by a nodes and edges dataframes how can i compact routes. Given certain nodes that can be sources and certain nodes that can be a route destination I want to apply route compaction so that if there is a route with multiple hops/edges I will have just one hop/edge between the source and destination nodes in the path. For example, given a->b->c->d The generated graph (compacted edges) will include just the edge between a->d

Graphframes provides Motif find, but It will require multiple runs with different depths, and the. Connecting the found sources to destinations found, which is not efficient at all. What other options are there?

I know that in Neo4j it is possible with route projection.

1

There are 1 answers

0
user238607 On

You can possibly use

networkx's Edmond's algorithm to find minimum spanning arborescence rooted at a particular root in a given directed graph.

https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.tree.branchings.Edmonds.html

In graph theory, an arborescence is a directed graph having a distinguished vertex u (called the root) such that, for any other vertex v, there is exactly one directed path from u to v.

In graph theory, Edmonds' algorithm or Chu–Liu/Edmonds' algorithm is an algorithm for finding a spanning arborescence of minimum weight (sometimes called an optimum branching). It is the directed analog of the minimum spanning tree problem.