Access to a database from a Spark job in synapse

258 views Asked by At

I'm working within a Synapse workspace.

I have a linked service to a oracle database in a private network (10.x.x.x)

Linked service

I have a SparkJob which is trying to connect to that database with the oracle thin client and the default method in the spark library, i.e. spark.read.jdbc, and it can not reach it.

Error message

Is it possible for the spark job to reach the database?

I've tried to open firewall connections but can not find from where to where because the Spark pool is in a virtual network for which I don't know the network segment.

1

There are 1 answers

3
DileeprajnarayanThumula On

In order to read data from oracle DB using Pyspark inside synapse notebook The Below steps will help you:

Step 1: Add the jar ojdbc8-19.15.0.0.1.jar to the Synapse workspace packages.

Step 2: Add that workspace package to the Apache spark pool

Step 3:execute the following code:

host = 'my_host_url'  
port = 1521  
service_name = 'my_service_name'  
jdbcUrl = f'jdbc:oracle:thin:@{host}:{port}:{service_name}'  
    sql = 'SELECT * FROM my_table'  
    user = 'my_username'  
    password = 'my_password'  
    jdbcDriver = 'oracle.jdbc.driver.OracleDriver'  
    jdbcDF = spark.read.format('jdbc') \  
        .option('url', jdbcUrl) \  
        .option('query', sql) \  
        .option('user', user) \  
        .option('password', password) \  
        .option('driver', jdbcDriver) \  
        .load()  
    display(jdbcDF)

By following the above steps you can read data from oracle db in spark notebook.