Understanding why the leiden algorithm is not able to find communities for the iris dataset

Question

Understanding why the leiden algorithm is not able to find communities for the iris dataset

395 views Asked by Carles At 17 May 2023 at 09:12

Good morning!,

I am trying to understand the Leiden algorithm and its usage to find partitions and clusterings. The example provided in the documentation already finds a partition directly, such as the following:

import leidenalg as la
import igraph as ig

G = ig.Graph.Famous('Zachary')

partition = la.find_partition(G, la.ModularityVertexPartition)
G.vs['cluster'] = partition.membership
ig.plot(partition,vertex_size = 30)

If one checks partition.membership, it already gets 4 clusters.

However, I am trying to do a similar thing with the iris dataset and the algorithm is not able to find clusters. I have tried getting the X variables and create a:

1- correlation matrix or,
pairwise distances,

but those do not work well (not even by scaling values), because it is not able to create clusters based on observations. I assume correlations are not good to separate them or pairwise distances. What am I not understanding well ?

here is the code for the correlation matrix:

import numpy as np
from sklearn import datasets
import igraph as ig
import leidenalg
import cairo
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import pairwise_distances
# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data  # Features
y = iris.target  # Class labels
# Create an adjacency matrix based on observation similarity
# adj_matrix = abs(1-np.corrcoef(X))

adj_matrix = pairwise_distances(X)
print(adj_matrix)
# Create an igraph graph object
graph = ig.Graph.Weighted_Adjacency(adj_matrix)
# Apply the Leiden algorithm for community detection evaluating the nº of clusters created by changing the resolution parameter.
for i in np.arange(0.9,1.05,0.05):
    partition = leidenalg.find_partition(graph, leidenalg.CPMVertexPartition,
                                   resolution_parameter = i)
    print(i,len(np.unique(partition.membership)) )

#0.9 1
#0.9500000000000001 1
#1.0 150
#1.0500000000000003 150

As one can see, once it gets to 1, there is 150 cluster (equally to the nº of observations), and before that, it considers everything 1 cluster. Let me know your ideas.

Thank you for you time

Original Q&A

There are 1 answers

**Vincent Traag** · Answer 1 · 2023-12-06T20:01:57+00:00

Vincent Traag On 06 December 2023 at 20:01

Make sure to pass in the weights to find_partition. See the documentation for more detail.

With correlations I highly recommend to use CPM, not Modularity.

TechQA.

Understanding why the leiden algorithm is not able to find communities for the iris dataset

There are 1 answers

Related Questions in PYTHON

Related Questions in GRAPH

Related Questions in CLUSTER-ANALYSIS

Related Questions in LEIDEN

Popular Questions

Trending Questions