How to write a Spark Dataframe into multiple JDBC table based on a column

179 views Asked by At

I'm working with a batch Spark pipeline written in Scala (v2.4). I would like to save a dataframe into a Postgresql database. However, instead of saving all rows into a single table in the database, I want to save them to multiple tables based on the value of a column.

Suppose the dataframe has a column named country, I want to write records into the respective country table, e.g.

df.show()
+-------+----+
|country|val1|
+-------+----+
|   CN  | 1.0| 
|   US  | 2.5| 
|   CN  | 3.0| 
+-------+----+

Then I would like to save records ( (CN,1.0) and (CN,3.0)) into table app_CN and record (US,2.5) into table app_US. Assume that the tables already exist.

Can I use dataframe API to achieve this? Or should I repartition into RDD and provides a JDBC-like object into executors and manually saved them?

0

There are 0 answers