I am using ijson to read large json files, I am trying to paralellel processing, not working

94 views Asked by At

I am trying to read json files of 30gb and I can do it by using ijson, but to speeed up the process I am trying to use multiprocessing. but I am unable to make it work, I can see the n workers ready but only one worker is taking all the load of the work.

Does anyone if it is possible to run multiprocessing + ijson

here is a sample of the code:

import ijson
import pandas as pd
import multiprocessing
file='jsonfile'
player=[]
def games(record):
  games01 = record["games"]
  for game01 in games01:
    try:
        player.append(game01['player'])
    except KeyError:
        player.append('No_record_found')
if __name__=='__main__':    
   with open(file, "rb") as f:
      pool = multiprocessing.Pool()
      pool.map(games, ijson.items(f, "game.item"))
      pool.close()
      pool.join()
0

There are 0 answers