Translating the whole dataframe with googletrans is taking too long

Question

Translating the whole dataframe with googletrans is taking too long

406 views Asked by Reza Bahmani At 17 May 2023 at 10:06

I'm trying to translate all the elements of a dataframe from Persian to English. I'm using the code below, but it takes long time to run. Is there any quick way?

import pandas as pd
exl_file = 'data.xlsx'
df = pd.read_excel(exl_file)

import googletrans
from googletrans import Translator
translator = Translator()

df_en = df.copy()

df_en.rename(columns=lambda x: translator.translate(x).text, inplace=True)

df_en.columns

translations = {}
for column in df_en.columns:
    unique_elements = df_en[column].unique()
    for element in unique_elements:
        translations[element] = translator.translate(element).text

df_en.replace(translations, inplace = True)

df_en.to_csv('en_data.csv', index=False)

Original Q&A

There are 2 answers

**Anas Altarazi** · Answer 1 · 2023-05-17T10:24:39+00:00

I am not the expert of this domain, but I can suggest some enhancement here.

if the excel file's size is too large, try to split it into segments so you can translate each segment and push it to the next step while translating the next segment.
don't copy the df, instead of it try to use a mapping function to change the collection from type to type.
if there is a way to add a caching feature to the translator lib it will be great for the performance.

**valentinmk** · Answer 2 · 2023-05-17T11:26:20+00:00

Beside of pandas optimisations, that could be done. I believe most slowest part is fetching results from Google translate api.

But the library already provide bulk translate feature, try to use it in your script.

https://py-googletrans.readthedocs.io/en/latest/#advanced-usage-bulk

TechQA.

Translating the whole dataframe with googletrans is taking too long

There are 2 answers

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in GOOGLETRANS

Popular Questions

Trending Questions