String Interpolation Issue Jupyter Notebook Pyspark

Question

String Interpolation Issue Jupyter Notebook Pyspark

23 views Asked by andruidthedude At 22 February 2024 at 22:12

I am passing in the following as a query (.dbtable) to pyspark, running in jupyter notebook on AWS EMR.

num = [1234,5678]

newquery = "(SELECT * FROM db.table WHERE col = 1234) as new_table"
newquery = "(SELECT * FROM db.table WHERE col = {num}) as new_table"
newquery = "(SELECT * FROM db.table WHERE col IN %(num)s) as new_table"
newquery = "(SELECT * FROM db.table WHERE col IN :(num)) as new_table"

The first "newquery" will return results. The rest fail.

What is the correct way to return this?

Original Q&A

There are 1 answers

**Lingesh.K** · Answer 1 · 2024-02-24T00:51:56+00:00

You can try using f-strings in PySpark

num = [1234,5678]

filter_part = str(num)[1:-1]

newquery = f"(SELECT * FROM db.table WHERE col IN ({num_str})) AS new_table"

# Run the query
spark.sql(newquery)

Also note, this function str(num)[1:-1] is safe on string inputs too, if your list is having strings like ['1234', '5678'] it should create a IN clause that factors this in as well.

Also I hope you are using new_table as a part of a subquery.

TechQA.

String Interpolation Issue Jupyter Notebook Pyspark

There are 1 answers

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Related Questions in STRING-INTERPOLATION

Popular Questions

Trending Questions