I am trying to run Apache Drill in distributed mode on Google Cloud Dataproc, but unable to start drillbit on each node in the cluster.
I have created a basic cluster (1 master, 2 worker) with GCP Dataproc service, using the initialization scripts and instructions provided in the Apache Drill website.
Installing Drill in Distributed Mode in Dataproc
Apache Drill 1.19.0 and Apache Zookeeper 3.6.3 versions were configured in the setup script. The cluster provisioning in Dataproc was successful and I am able to connect with each node using SSH. When I tried to check the status of Zookeeper using telnet localhost 2181 and entering stats, it is showing the following
Then, I try to start drillbit service on each node using the command bin/drillbit.sh start as mentioned here Starting Drill in Distributed Mode,
then it shows
Starting drillbit, logging to /opt/drill/log/drillbit.out
When I check the status of drill using bin/drillbit.sh status, it displays
/opt/drill/drillbit.pid file is present but drillbit is not running.
Kindly provide help on how to resolve the issue and setup Apache Drill in distributed mode.
I don't know Dataproc but the contributed scripts you're using, specifically automation.sh and apache-drill.sh, already contain commands to start ZooKeeper and Drill. So you shouldn't be using drillbit.sh to start up Drillbits yourself. You can check whether Drill is running by going to its web UI at http://[drillbit-host]:8047. Note that there is no master node in a Drill cluster and you can use any one of the Drillbits in the web UI URL.
Footnote: Drill has moved on a bit since 1.19 so you might try making the following change on line 10 of apache-drill.sh.