What exacrtly does the spark.ml.feature.BucketedRandomProjectionLSH function give as output?

18 views Asked by PRATIK CHAPADGAONKAR At 17 January 2024 at 10:56

I am trying to use spark.ml.feature.BucketedRandomProjectionLSH for creating LSH hash vectors.

val lsh = new BucketedRandomProjectionLSH()
  .setBucketLength(0.6812920690579612)
  .setNumHashTables(4)
  .setInputCol("features")
  .setOutputCol("hashes")
val lshModel = lsh.fit(repoDF)

val hashedUserDF = lshModel.transform(userDF)
val hashedRepoDF = lshModel.transform(repoDF)
hashedRepoDF.show(false)

this is giving me the following output

// +-------+----------------------------------------------+--------------------------------+
// |repo_id|features                                      |hashes                          |
// +-------+----------------------------------------------+--------------------------------+
// |11     |(6,[0,1,2,3,4,5],[1.0,1.0,1.0,1.0,1.0,1.0])   |[[1.0], [-2.0], [-1.0], [-1.0]] |
// |12     |(6,[0,1,2,3,4,5],[9.0,-2.0,-21.0,9.0,1.0,9.0])|[[21.0], [-28.0], [18.0], [0.0]]|
// |13     |(6,[0,1,2,3,4,5],[1.0,1.0,-3.0,3.0,7.0,9.0])  |[[4.0], [-10.0], [6.0], [-3.0]] |
// |14     |(6,[0,1,2],[1.0,1.0,-3.0])                    |[[2.0], [-3.0], [2.0], [1.0]]   |
// |15     |(6,[1,2],[1.0,1.0])                           |[[-1.0], [0.0], [-2.0], [0.0]]  |
// +-------+----------------------------------------------+--------------------------------+

based on a theoretical understanding of Bucketed Random Projection LSH I was under the impression that the hash value is supposed to be a vector/array consisting of only 1s and -1s depending on which side of the hyperplane your point lies.

The documentation for this library https://spark.apache.org/docs/3.1.1/api/scala/org/apache/spark/ml/feature/BucketedRandomProjectionLSH.html was rather sparse and didn't really explain the output properly. Can anyone help me understand the output or point me to any source that will help me understand that?

Original Q&A

TechQA.

What exacrtly does the spark.ml.feature.BucketedRandomProjectionLSH function give as output?

There are 0 answers

Related Questions in SCALA

Related Questions in APACHE-SPARK

Related Questions in LOCALITY-SENSITIVE-HASH

Popular Questions

Trending Questions