ConnectionError(MaxRetryError("HTTPConnectionPool Max retries exceeded using pywebhdfs

7.7k views Asked by At

Hi i am using pywebhdfs python lib. i am connecting EMR by calling and trying to create file on HDFS. I am getting below exception which seems irrelevant against what i am performing as i am not hitting any connection limit here. is it due to how webhdfs works

from pywebhdfs.webhdfs import PyWebHdfsClient
hdfs = PyWebHdfsClient(host='myhost',port='50070', user_name='hadoop')
my_data = '01010101010101010101010101010101'
my_file = 'user/hadoop/data/myfile.txt'
hdfs.create_file(my_file, my_data)

throws:

requests.exceptions.ConnectionError: HTTPConnectionPool(host='masterDNS', port=50070): Max retries exceeded with url: /webhdfs/v1/user/hadoop/data/myfile.txt?op=CREATE&user.name=hadoop (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 115] Operation now in progress',))

4

There are 4 answers

0
Greg On

I had this issue as well. I found that for some reason the call to:

send(self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None):

is passed a timeout of 0, and that causes send to throw a

MaxRetryError

Bottom line, I found if you just set timeout = 1, it works fine:

hdfs = PyWebHdfsClient(host='yourhost', port='50070', user_name='hdfs', timeout=1)

Hope this works for you as well.

0
user7779187 On

maybe, webhdfs service is not running on the host that you specify. you may check your cluster to see which host is running webhdfs service.

0
Angelo Di Donato On

Formatting the namenode solved this problem for me several times.

hdfs namenode -format
0
Amit On

Please check the status of your connection. Run below command to see if the webhdfs port works from your host:

netstat -an | grep 50070 | grep LIST

Please note:

  • If SSL is enabled then port would be 50470.
  • hdfs namenode -format should not run from the node because it formats your namenode and you loose everything.