Aggregate Spark standalone executor logs

46 views Asked by At

I am trying to test spark-submit standalone mode and running below example task

spark-submit \
    --class org.apache.spark.examples.SparkPi \
    --master spark://MBP-49F32N-CSP.local:7077 \
    --driver-memory 3g \
    --executor-memory 3g \
    --num-executors 2 \
    --executor-cores 2 \
    --conf spark.dynamicAllocation.enabled=false \
    /opt/homebrew/Cellar/apache-spark/3.5.0/libexec/examples/jars/spark-examples_2.12-3.5.0.jar \
    10

And i can see logs generated under

/apache-spark/3.5.0/libexec/work

I can see directories with application-id

app-20230929175322-0003 app-20231003110238-0000

And inside app-20231003110238-0000 there are sub-directories 0 1 2 3 4 5 which are named after executors. and inside each directory i can see stderr stdout

Is there any way to aggregate all executor logs under application-id(ex=app-20231003110238-0000) directory?

like when we run spark in yarn mode we see all logs under yarn logs -applicationId <application_ID>

1

There are 1 answers

0
Janani On

You can have a shell script to aggregate all the stdout and stderr logs, similar to the below.

cd /apache-spark/3.5.0/libexec/work/<app_id>

# Concatenate executor stderr logs
cat */stderr > agg_stderr.log

# Concatenate executor stdout logs
cat */stdout > agg_stdout.log