String Interpolation Issue Jupyter Notebook Pyspark

23 views Asked by At

I am passing in the following as a query (.dbtable) to pyspark, running in jupyter notebook on AWS EMR.

num = [1234,5678]

newquery = "(SELECT * FROM db.table WHERE col = 1234) as new_table"
newquery = "(SELECT * FROM db.table WHERE col = {num}) as new_table"
newquery = "(SELECT * FROM db.table WHERE col IN %(num)s) as new_table"
newquery = "(SELECT * FROM db.table WHERE col IN :(num)) as new_table"

The first "newquery" will return results. The rest fail.

What is the correct way to return this?

1

There are 1 answers

0
Lingesh.K On

You can try using f-strings in PySpark

num = [1234,5678]

filter_part = str(num)[1:-1]

newquery = f"(SELECT * FROM db.table WHERE col IN ({num_str})) AS new_table"

# Run the query
spark.sql(newquery)

Also note, this function str(num)[1:-1] is safe on string inputs too, if your list is having strings like ['1234', '5678'] it should create a IN clause that factors this in as well.

Also I hope you are using new_table as a part of a subquery.