Why is the Hadoop job slower in cloud (with multi-node clustering) than on normal pc?

Question

Why is the Hadoop job slower in cloud (with multi-node clustering) than on normal pc?

265 views Asked by santobedi At 06 September 2017 at 12:37

I am using cloud Dataproc as a cloud service for my research. Running Hadoop and spark job on this platform(cloud) is a bit slower than that of running the same job on a lower capacity virtual machine. I am running my Hadoop job on 3-node cluster(each with 7.5gb RAM and 50GB disk) on the cloud which took 4min49sec, while the same job took 3min20sec on the single node virtual machine(my pc) having 3gb RAM and 27GB disk. Why is the result slower in the cloud with multi-node clustering than on normal pc?

Original Q&A

There are 2 answers

kf2 On 06 September 2017 at 15:53

to be a bit more detailed here the numbers/facts which are interesting to find out the reason for the "slower" cloud environment:

job type &size:
- size of data 1mb or 1TB
- xml , parquet ....
- what kind of process (e.g wordcount, format change, ml,....) and of course the options (executors and drivers ) for your spark-submit or spark-shell
Hadoop Configuration:
- do you use a distribution (hortonworks or cloudera?)
- spark standalone or in yarn mode
- how are nodemangers configured

**kf2** · Accepted Answer · 2017-09-06T13:24:36+00:00

First of all: not easy to answer without knowing the complete configuration and the type of job your running.

possible reasons are:

missconfiguration

http://HOSTNAME:8080 open ressourcemanager webapp and compare available vcores and memory

job type

Job adds more overhead when running parallelized so that it is slower

hardware Selected virtual Hardware is slower than the local one. Thourgh low disk io and network overhead

I would say it is something like 1. and 2.

For more detailed answer let me know:

size and type of the job and how you run it.
hadoop configuration
cloud architecture

br

TechQA.

Why is the Hadoop job slower in cloud (with multi-node clustering) than on normal pc?

There are 2 answers

Related Questions in HADOOP

Related Questions in APACHE-SPARK

Related Questions in CLOUD

Related Questions in VIRTUAL-MACHINE

Related Questions in GOOGLE-CLOUD-DATAPROC

Popular Questions

Trending Questions