I'm trying to read/save my csv file from hdfs using pyspark in jupyter, but i get this error (which i did'nt get before while reading or saving my data file):
output = "hdfs://localhost:9000/DATA/cache_data"
new_data.write.parquet(output)
This is the error i get:
24/02/01 12:16:15 ERROR datanode.DataNode: BlockSender.sendChunks() exception:
java.io.IOException: An established connection was aborted by the software in your host machine
at sun.nio.ch.SocketDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:51)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:51)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:469)
at sun.nio.ch.FileChannelImpl.transferToTrustedChannel(FileChannelImpl.java:516)
at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:609)
at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:223)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:624)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:808)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:755)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:552)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253)
at java.lang.Thread.run(Thread.java:750)
I looked for the possible causes, I checked my resourceanager interface it shows an unhealthy node. I checked disk utilization. it seems good ResourceManager
I'm wondering if anyone encountered the same issue, or knows how to fix it ?