Databricks Connect: spark-shell not found

61 views Asked by At

I installed databricks-connect already:

Requirement already satisfied: databricks-connect==13.3.3 in /usr/local/lib/python3.10/dist-packages (13.3.3)
Requirement already satisfied: py4j==0.10.9.7 in /usr/local/lib/python3.10/dist-packages (from databricks-connect==13.3.3) (0.10.9.7)
Requirement already satisfied: six in /usr/lib/python3/dist-packages (from databricks-connect==13.3.3) (1.14.0)
Requirement already satisfied: pandas>=1.0.5 in /usr/local/lib/python3.10/dist-packages (from databricks-connect==13.3.3) (2.2.0)
Requirement already satisfied: pyarrow>=4.0.0 in /usr/local/lib/python3.10/dist-packages (from databricks-connect==13.3.3) (15.0.0)
Requirement already satisfied: grpcio>=1.48.1 in /usr/local/lib/python3.10/dist-packages (from databricks-connect==13.3.3) (1.60.1)
Requirement already satisfied: grpcio-status>=1.48.1 in /usr/local/lib/python3.10/dist-packages (from databricks-connect==13.3.3) (1.60.1)
Requirement already satisfied: googleapis-common-protos>=1.56.4 in /usr/local/lib/python3.10/dist-packages (from databricks-connect==13.3.3) (1.62.0)
Requirement already satisfied: numpy>=1.15 in /usr/local/lib/python3.10/dist-packages (from databricks-connect==13.3.3) (1.26.4)
Requirement already satisfied: databricks-sdk>=0.1.11 in /usr/local/lib/python3.10/dist-packages (from databricks-connect==13.3.3) (0.19.0)
Requirement already satisfied: google-auth~=2.0 in /usr/local/lib/python3.10/dist-packages (from databricks-sdk>=0.1.11->databricks-connect==13.3.3) (2.27.0)
Requirement already satisfied: requests<3,>=2.28.1 in /usr/local/lib/python3.10/dist-packages (from databricks-sdk>=0.1.11->databricks-connect==13.3.3) (2.28.2)
Requirement already satisfied: protobuf!=3.20.0,!=3.20.1,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0.dev0,>=3.19.5 in /usr/local/lib/python3.10/dist-packages (from googleapis-co
mmon-protos>=1.56.4->databricks-connect==13.3.3) (4.25.2)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas>=1.0.5->databricks-connect==13.3.3) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=1.0.5->databricks-connect==13.3.3) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.10/dist-packages (from pandas>=1.0.5->databricks-connect==13.3.3) (2024.1)
Requirement already satisfied: cachetools<6.0,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from google-auth~=2.0->databricks-sdk>=0.1.11->databricks-connect==13.3.3) (5.3.2)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/lib/python3/dist-packages (from google-auth~=2.0->databricks-sdk>=0.1.11->databricks-connect==13.3.3) (0.2.1)
Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.10/dist-packages (from google-auth~=2.0->databricks-sdk>=0.1.11->databricks-connect==13.3.3) (4.9)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2.28.1->databricks-sdk>=0.1.11->databricks-connect==13.3.3) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3/dist-packages (from requests<3,>=2.28.1->databricks-sdk>=0.1.11->databricks-connect==13.3.3) (2.8)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/lib/python3/dist-packages (from requests<3,>=2.28.1->databricks-sdk>=0.1.11->databricks-connect==13.3.3) (1.25.8)
Requirement already satisfied: certifi>=2017.4.17 in /usr/lib/python3/dist-packages (from requests<3,>=2.28.1->databricks-sdk>=0.1.11->databricks-connect==13.3.3) (2019.11.28)
Requirement already satisfied: pyasn1>=0.1.3 in /usr/lib/python3/dist-packages (from rsa<5,>=3.1.4->google-auth~=2.0->databricks-sdk>=0.1.11->databricks-connect==13.3.3) (0.4.2)
user@v00000Y:~$ 

Now, I try to run databricks-connect test in order to connect to a remote clsuter, but I always get the following error:

x@vx:~$ databricks-connect test
* PySpark is installed at /usr/local/lib/python3.10/dist-packages/pyspark
* Checking SPARK_HOME
* Checking java version
openjdk version "1.8.0_392"
OpenJDK Runtime Environment (build 1.8.0_392-8u392-ga-1~20.04-b08)
OpenJDK 64-Bit Server VM (build 25.392-b08, mixed mode)
* Testing scala command
/bin/sh: 10: spark-shell: not found

Traceback (most recent call last):
  File "/usr/local/bin/databricks-connect", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/pyspark/databricks_connect.py", line 311, in main
    test()
  File "/usr/local/lib/python3.10/dist-packages/pyspark/databricks_connect.py", line 267, in test
    raise ValueError("Scala command failed to produce correct result")
ValueError: Scala command failed to produce correct result

What is missing/how can I install the spark-shell?

Here it is stated:

If you can’t run commands like spark-shell, it is also possible your PATH was not automatically set up by pip3 install and you’ll need to add the installation bin dir to your PATH manually.

Probably, this is the issue but where is the bin by default?

Edit: I used the same setup once with databricks-connect==11 which works and once with databricks-connect=13 which doesn't work: so it seems like spark shell is not inlcuded any more? If not what is the recommended way to include it?

Edit: I found out that pyspark is working! So with pyspark command that uses the .databrickcfg file with almost the same auth information it works. However, databricks-connect test fails? Has the default Port be changed (15001) - I didn't see anything in the docu?

0

There are 0 answers