Dynamic allocation of spark 2 cluster resources to the running jobs

206 views Asked by At

We have a spark 2 HDInsight cluster which has 650 GB and 195 Vcores. This is a 9 worker nodes and 2 head nodes cluster. The problem is that the jobs are not fully utilizing the cluster. For example when I run one job, Its using only 164 GB of memory when we have 650 GB. I have solved this problem by increasing the spark.executor.memory to 40 GB from 10 GB. The spark.executor.instances is 16 . But again the problem comes when I run multiple jobs. The job which came first will use the entire cluster till it finishes. The other jobs will just be in running mode with only 3 GB of memory. The requirement is that , the cluster has to be fully utilized when only one job is running. If there are multiple jobs, the resources ( Ram and Vcores ) have to be shared among the jobs.

1

There are 1 answers

0
Matt Andruff On

I suggest that you change your yarn scheduler to Capacity scheduler. This is better at sharing resources. This will help you to ensure the resources are better shared. By default hadoop is 'First in First out'. I respectfully disagree with 100% utilization of HDInsight. Your fix of increase executors to 40GB is exactly why a new job of 3GB can't get into your cluster. (And just because you allocate 40GB does not mean your job will use it.) If you want to increase cluster usage you might consider adding executors with less threads to fully utilize the cluster. This in combination with Capacity Scheduler & Pre-emption might be the answer to getting more performance and flexibility.