Using BCP to write data to azure sql database not (much) faster than pandas to_sql

45 views Asked by RogerKint At 18 November 2023 at 19:27

I have written this quick test script to compare the performance of bcp (using bcpandas package) to the performance of regular pandas df.to_sql. See the script below:

import time
import pandas as pd
import os
import numpy as np
from dotenv import load_dotenv
from db import create_con
from bcpandas import SqlCreds, to_sql

load_dotenv(override=True)

creds = SqlCreds(
    os.environ['SQL_SERVER'],
    os.environ['SQL_DB'],   
    os.environ['SQL_USER'],
    os.environ['SQL_PW']
)


df = pd.DataFrame(
    data=np.ndarray(shape=(100000, 6), dtype=int),
    columns=[f"col_{x}" for x in range(6)]
)

#Using BCP
start = time.time()
print('Starting...')
to_sql(df, 'bcp_test_src', creds, index=False, if_exists='replace', schema='test', batch_size=10000)
print('Ending...')
end = time.time()
elapsed = end - start
print(f'Elapsed time: {elapsed} seconds')# 15.5 seconds


# Using Pandas to_sql
start = time.time()
print('Starting...')
con = create_con()
df.to_sql('bcp_test_src', con=con, index=False, if_exists='replace', schema='test')
print('Ending...')
end = time.time()
elapsed = end - start
print(f'Elapsed time: {elapsed} seconds')# 17.2 seconds

However, the difference between BCP and Pandas is very small. BCP takes 15.5 seconds and df.to_sql 17.2. Reading about BCP I was expecting it to just take a few seconds maximum to finish it all. What am I doing wrong? The database I am writing it to is an Azure SQL Database with the pricing tier of Standard S0: 10 DTUs. Could this have anything to do with it?

Original Q&A

TechQA.

Using BCP to write data to azure sql database not (much) faster than pandas to_sql

There are 0 answers

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in BCP

Popular Questions

Trending Questions