Dealing with giant JSON files in Node.js

31 views Asked by At

I'm trying to read in a "giant" (~1.22 GB) JSON file, do some manipulation on it, and spit out a new JSON file. I have the reading and manipulation working just fine, but the write step keeps failing. I even took out all of my code and just write out the exact same JSON file I read straight from one stream to another, and I still can't get it to write the file.

import fs from 'fs';
import {disassembler} from 'stream-json/Disassembler';
import {stringer} from 'stream-json/Stringer';
import {chain} from 'stream-chain';
import {parser} from 'stream-json';
import {streamValues} from 'stream-json/streamers/StreamValues';

const stream = chain([
  fs.createReadStream(oldLevelFileName),
  parser(),
  streamValues(),
  disassembler(),
  stringer(),
  fs.createWriteStream(newLevelFileName)
]);

let counter = 0;
stream.on('data', () => ++counter);
stream.on('end', () =>
  console.log(`Some ${counter} value to trigger processing.`));

If I run it without explicit memory limits (I think it's about 4GB by default)

node code/myscript.js

<--- Last few GCs --->

[68840:000001C9561B7B20]    75520 ms: Mark-Compact (reduce) 4078.7 (4143.8) -> 4078.7 (4143.8) MB, 2995.13 / 0.00 ms  (average mu = 0.150, current mu = 0.090) allocation failure; GC in old space requested
[68840:000001C9561B7B20]    76505 ms: Mark-Compact (reduce) 4078.7 (4143.8) -> 4078.7 (4143.8) MB, 985.08 / 0.00 ms  (average mu = 0.106, current mu = 0.000) allocation failure; GC in old space requested


<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----

 1: 00007FF690A56E7B node::SetCppgcReference+16075
 2: 00007FF6909CD996 v8::base::CPU::num_virtual_address_bits+79190
 3: 00007FF6909CFBA5 v8::base::CPU::num_virtual_address_bits+87909
 4: 00007FF69143D9E1 v8::Isolate::ReportExternalAllocationLimitReached+65
 5: 00007FF691427178 v8::Function::Experimental_IsNopFunction+1336
 6: 00007FF691288AA0 v8::Platform::SystemClockTimeMillis+659328
 7: 00007FF691294D23 v8::Platform::SystemClockTimeMillis+709123
 8: 00007FF691292684 v8::Platform::SystemClockTimeMillis+699236
 9: 00007FF6912857C0 v8::Platform::SystemClockTimeMillis+646304
10: 00007FF69129AE3A v8::Platform::SystemClockTimeMillis+733978
11: 00007FF69129B6B7 v8::Platform::SystemClockTimeMillis+736151
12: 00007FF6912A9FAF v8::Platform::SystemClockTimeMillis+795791
13: 00007FF690F6A44F v8::CodeEvent::GetFunctionName+116495
14: 00007FF63149AAFA

But if I force it to 32GB or 64GB of RAM (I have 64GB in Windows 11)

node --max-old-space-size=65536 code/myscript.js


#
# Fatal error in , line 0
# Fatal JavaScript invalid size error 169220804 (see crbug.com/1201626)
#
#
#
#FailureMessage Object: 000000870B9FDC30
----- Native stack trace -----

 1: 00007FF690A56E7B node::SetCppgcReference+16075
 2: 00007FF69095622F node::TriggerNodeReport+70111
 3: 00007FF6918217F2 V8_Fatal+162
 4: 00007FF6912B8A55 v8::Platform::SystemClockTimeMillis+855861
 5: 00007FF691139BE3 v8::base::Thread::StartSynchronously+1456675
 6: 00007FF691158583 v8::Object::GetIsolate+15459
 7: 00007FF690F7A303 v8::CodeEvent::GetFunctionName+181699
 8: 00007FF63149AAFA

Tried trimming my code down, removing code, using different libraries, etc. Expecting it to write a JSON file.

0

There are 0 answers