I've been looking for packages using which I could create subgraphs with overlapping vertices.
From what I understand in Networkx and metis one could partition a graph into two or multi-parts. But I couldn't find how to partition into subgraphs with overlapping nodes.
Suggestions on libraries that support partitioning with overlapping vertices will be really helpful.
EDIT: I tried the angel algorithm in CDLIB to partition the original graph into subgraphs with 4 overlapping nodes.
import networkx as nx
from cdlib import algorithms
if __name__ == '__main__':
g = nx.karate_club_graph()
coms = algorithms.angel(g, threshold=4, min_community_size=10)
print(coms.method_name)
print(coms.method_parameters) # Clustering parameters)
print(coms.communities)
print(coms.overlap)
print(coms.node_coverage)
Output:
ANGEL
{'threshold': 4, 'min_community_size': 10}
[[14, 15, 18, 20, 22, 23, 27, 29, 30, 31, 32, 8], [1, 12, 13, 17, 19, 2, 21, 3, 7, 8], [14, 15, 18, 2, 20, 22, 30, 31, 33, 8]]
True
0.6470588235294118
From the communities returned, I understand 1 and 3 have an overlap of 4 nodes but 2 and 3 or 1 and 3 don't have an overlap size of 4 nodes.
It is not clear to me how the overlap threshold (4 overlaps) has to be specified
here algorithms. angel(g, threshold=4, min_community_size=10). I tried setting threshold=4 here to define an overlap size of 4 nodes. However, from the documentation available for angel
:param threshold: merging threshold in [0,1].
I am not sure how to translate the 4 overlaps to the value that has to be set between the bounds [0, 1]. Suggestions will be really helpful.
You can check out CDLIB:
They have a great amount of community finding algorithms applicable to networkX, including some overlapping communities algorithms.
The return type of the functions is called
Node Clusteringwhich might be a little confusing at first so here are the methods applicable to it, usually you simply want to convert to a Python dictionary.Specifically about the
angelalgorithm in CDLIB:According to ANGEL: efficient, and effective, node-centric community discovery in static and dynamic networks, the threshold is not the overlapping threshold, but used as follows:
Basically, this value determines whether to further merge the nodes into bigger communities, and is not equivalent to the number of overlapping nodes.
Also, don't mistake "labels" with "node's labels" (as in
nx.relabel_nodes(G, labels)). The "labels" referred are actually correlated with the Label Propagation Algorithm which is used by ANGEL.As for the effects of varying this threshold:
[based on the comment by @J. M. Arnold]
From ANGEL's github repository you can see that when
threshold >= 1only themin_comsizevalue is used: