Clustering lat/long data points that are very close to each other

84 views Asked by At

I have a fairly small dataset (only 138 observations) that are all very close to each other geographically. I would like to create clusters from this data.

The requirements fro my clusters would be:

  • hopefully between 10-12 clusters overall of around 10-12 locations in each
  • within a geographical distance of say 5km-10km from each other

I have tried to techniques

KMeans with the number of clusters set to 12. This gave decent results, but I also know that KMeans is not right for lat/long data.

I am trying dbscan, but I think I am hitting an issue where the dataset is simply too small. This only outputs one cluster group for all of my observations.

In this paper I see it mentioned that dbscan gives nonsensical results in such a small set.

I want to create these clusters with a better approach than KMeans and want to know if I am missing something in my code or if there is a better approach to take here

from sklearn.cluster import DBSCAN
from geopy.distance import great_circle
from shapely.geometry import MultiPoint
import pandas as pd
import numpy as np
coords = pd.read_csv("locs.csv")
coords1 = coords.to_numpy()
kms_per_radian = 6371.0088
max_distance = 5
epsilon = max_distance/kms_per_radian
db = DBSCAN(eps=epsilon, min_samples=5,  algorithm='ball_tree', metric='haversine').fit(np.radians(coords1))
cluster_labels = db.labels_
num_clusters = len(set(cluster_labels))
clusters = pd.Series([coords[cluster_labels == n] for n in range(num_clusters)])
print('Number of clusters: {}'.format(num_clusters))

here is my data - sorry if it shows up sloppily. it getsread in a locs.csv here

lat long
33.1176813  -96.6889152
33.1311714  -96.6673619
33.0960563  -96.6961237
33.1453321  -96.6842826
33.1391395  -96.6758389
33.0971504  -96.6870497
33.1262443  -96.6878928
33.1143839  -96.6945359
33.114209   -96.6945359
33.1149433  -96.6927684
33.109267   -96.6937003
33.1331405  -96.6666864
33.1383939  -96.6680508
33.1472743  -96.688812
33.1057274  -96.680582
33.1368991  -96.6796421
33.107386   -96.678968
33.150424   -96.6965207
33.1279273  -96.6806465
33.1087588  -96.6876371
33.1262534  -96.6742925
33.098562   -96.6853026
33.1125942  -96.6883732
33.1102927  -96.6961864
33.1035043  -96.6955179
33.1243762  -96.6788757
33.1291342  -96.677637
33.1156607  -96.6858248
33.093478   -96.6997758
33.1260525  -96.6890209
33.0916379  -96.6965207
33.1283669  -96.6724685
33.1033839  -96.6874843
33.1483239  -96.6828948
33.157496   -96.684684
33.0956047  -96.6976071
33.0956047  -96.6976071
33.1057274  -96.680582
33.1574837  -96.6850944
33.1582715  -96.6965416
33.1386326  -96.6783054
33.158112   -96.684861
33.138768   -96.675811
33.098562   -96.6853026
33.1030668  -96.6955179
33.107386   -96.678968
33.1379475  -96.6698679
33.1117192  -96.6878473
33.0926369  -96.691028
33.092519   -96.6948466
33.1542432  -96.6823992
33.1542432  -96.6823992
33.1330701  -96.6743789
33.1309656  -96.6879346
33.1469985  -96.6874333
33.1454754  -96.6839861
33.1261935  -96.6725202
33.1566998  -96.6841281
33.1566998  -96.6841281
33.107347   -96.6921094
33.107147   -96.6817192
33.1081097  -96.6880473
33.1243427  -96.6773343
33.1294931  -96.685219
33.1089024  -96.6894143
33.1348689  -96.6686227
33.125196   -96.6825663
33.1239856  -96.6892297
33.1549715  -96.6965207
33.1033242  -96.6887841
33.098562   -96.6853026
33.1360933  -96.67346
33.1081031  -96.6886583
33.1552268  -96.69299
33.1323984  -96.6658496
33.1262448  -96.6740307
33.1257552  -96.6869133
33.1257552  -96.6869133
33.1143839  -96.6945359
33.126066   -96.692029
33.1374841  -96.6677798
33.1405272  -96.6739612
33.1129799  -96.6893467
33.1320952  -96.6811459
33.1239267  -96.6899191
33.1252 -96.6837151
33.1033242  -96.6887841
33.1123512  -96.6774962
33.103736   -96.6867271
33.094192   -96.7010288
33.1392404  -96.6783889
33.1527167  -96.683548
33.1129669  -96.6826289
33.1193324  -96.6887284
33.1519071  -96.6921126
33.1358899  -96.6758085
33.1358899  -96.6758085
33.0987211  -96.6966042
33.1226774  -96.6762474
33.0968079  -96.6967923
33.1393173  -96.6642923
33.108696   -96.6970012
33.1078495  -96.6846551
33.1243762  -96.6788757
33.1243762  -96.6788757
33.1518825  -96.6811668
33.1239576  -96.6921962
33.1483239  -96.6828948
33.1125542  -96.6792695
33.1517525  -96.6835898
33.1145307  -96.6889703
33.1235157  -96.6751516
33.1549715  -96.6965207
33.1041763  -96.6868112
33.1352762  -96.6691161
33.1311522  -96.6726246
33.1526416  -96.6857622
33.1391395  -96.6758389
33.114209   -96.6945359
33.1343714  -96.6649288
33.1243142  -96.6937003
33.1343154  -96.6788902
33.0914204  -96.6888572
33.1087365  -96.6946195
33.1087365  -96.6946195
33.1123452  -96.6715541
33.1453321  -96.6842826
33.1573651  -96.6865559
33.1302622  -96.6730521
33.1549715  -96.6965207
33.1266383  -96.6807078
33.1091322  -96.6884788
33.114369   -96.6825872
33.1512367  -96.68994
33.126426   -96.6910263
33.1117173  -96.6940972
33.1117173  -96.6940972
33.1061857  -96.6810057
0

There are 0 answers