Fetching data in Jupyter Notebook is taking too long

236 views Asked by At

I want to fetch all rows from a table, using the following code:

table_row_count = 1000000
batch_size = 10000

sql = """SELECT t.*
FROM (
    SELECT ROWNUM AS row_num,
           sub_t.*
    FROM (
        SELECT name_employer
        FROM my_table
        WHERE section = 'OTHER'
    ) sub_t
) t
WHERE t.row_num BETWEEN :LOWER_BOUND AND :UPPER_BOUND"""

data = []
for lower_bound in range(0, table_row_count, batch_size):
    cursor.execute(sql, {'LOWER_BOUND': lower_bound, 
                         'UPPER_BOUND': lower_bound + batch_size - 1})

    for row in cursor.fetchall():
       data.append(row)

The original source of the code: cx_Oracle: fetchall() stops working with big SELECT statements

However, it is taking forever. My data has 5 mil. of rows. Is there any other way to do this?

1

There are 1 answers

0
Christopher Jones On

For big result sets, increase arraysize. Try something like cursor.arraysize = 10000 and then tune the size to suit your data and performance requirements.

Refer to the Tuning cx_Oracle manual.

You may also want to look at the best practices in https://github.com/cjbj/cx-oracle-notebooks

Also see the latest cx_Oracle release announcement - it's time to upgrade to python-oracledb.