How do I run GraphFrame in AWS Glue 3.0?

114 views Asked by At

How do I use GraphFrame in AWS Glue 3.0. I see that only Spark 2.x version has python wheel package but other version of Spark does not have it. I am getting class loading exception

py4j.protocol.Py4JJavaError: An error occurred while calling o180.loadClass.
: java.lang.ClassNotFoundException: org.graphframes.GraphFramePythonAPI
    at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)

I have wheel package of graphframes:0.8.2-spark3.1-s_2.12 I have given --conf and keep wheel package in python library as well.

1

There are 1 answers

1
user238607 On

You can specify jars with maven coordinates directly inside the code. Latest versions of spark also have Graphframes jars available.

Jars can be found at this location : https://spark-packages.org/package/graphframes/graphframes

from pyspark.sql import SparkSession

##### Adding the graphframes jar so that we can access GraphX API of Apache Spark in pyspark

spark = SparkSession.builder \
    .appName("MyApp") \
    .config('spark.jars.packages', 'graphframes:graphframes:0.8.2-spark2.4-s_2.11') \
    .getOrCreate()