sklearn KDTree giving incorrect output for nearest neighbours

36 views Asked by At

I have two datasets containing points. I want to 'match' the points to eachother so that I can plot them and have a line joining them. Using a KDTree seemed to be the best solution to find the nearest neighbour in each list which worked for some of the data I have but not for others.

Example of how the plots should look

Some of the plots come out looking like this with lines connecting the wrong points when its clear there are closer points available

A = centres['arr_13b'].to_numpy()
B = tops['arr_13b'].to_numpy()
tree = KDTree(A, metric = 'euclidean')
dist_,ind_ = tree.query(B)

coords = np.zeros((len(A),2,2))
for i,match in enumerate(coords):
    match[0] = A[i]
    match[1] = B[ind_[i]]

lines = LineCollection(coords, color = 'red')
print(coords)
print(ind_)
print(A)
print(B)

fig, ax = plt.subplots(dpi = 200)
plt.scatter(centres['arr_13b']['X'], centres['arr_13b']['Y'], s = 50, color = 'deepskyblue')
plt.scatter(tops['arr_13b']['X'], tops['arr_13b']['Y'], s = 10, color = 'dodgerblue')

ax.add_artist(lines)
plt.show()

I originally assumed the way I assigned coordinate values to the matrix was getting mixed up and causing the wrong points to be connected but after looking at the output from KDtree.query() it appears that its actually identifying the wrong points as nearest neighbours. Am I doing something wrong or is there a better way to achieve what I want to do?

0

There are 0 answers