Excessive RAM usage for TFRecord file while feeding into any model

42 views Asked by At

Problem

I have a *.tfrecords file that I want to feed in a ConvLSTM2D model, created using Tensorflow. Here is the model structure.

model = Sequential([
    ConvLSTM2D(64, (3, 3), activation='relu', input_shape=(20, 224, 224, 3), return_sequences=True),
    BatchNormalization(),
    ConvLSTM2D(64, (3, 3), activation='relu', return_sequences=True),
    BatchNormalization(),
    Flatten(),
    Dense(1, activation='sigmoid')
])

When i try to fit my data into the model, it takes up all of the system ram.

Tested on M1 MacBook 2020 (Jupyter Notebook, Pycharm), Google Colab.
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(train_input_fn(), steps_per_epoch=5,validation_data=val_input_fn(),epochs=10)

What we are doing

We have shanghai dataset which have fight and non fight dataset. so we are trying to predict and classify fight and non fight video using Convolutional Long Short term Memory.

We have 800 train videos. We capture frames with 250 ms interval and convert all into numpy array. Then we store all the arrays in a TFRecord file.

When we pass the dataset into our model, we do so using this function train_input_fn() which reads the tfrecord file and pass data into our model.


You can see the colab notebook from here

Dataset Structure is given below:

Dataset
    - train
        - Fight      # has 800 *.avi files
        - NonFight   # has 800 *.avi files
    - val
        - Fight      # has 200 *.avi files
        - NonFight   # has 200 *.avi files

What we have tried?

  • We have tried minimizing batch_size from 64 down to 16.
  • Reduced the whole dataset from 800 videos down to 200 videos in train set
  • Tried to reduce the filter size of ConvLSTM2D
  • Did all the same things with *.mp4
  • Reduced one layer of BatchNormalization() and ConvLSTM2D
1

There are 1 answers

1
Govind Hrishikesh On

Try to do it using pySpark rather than using up your own RAM.You will find most of the big data based ML/DL solutions will be done using Spark.