What's the quickest way to perform convolution in multiple images stored in numpy arrays, in python?

39 views Asked by At

I would like to automate a process I usually do on videos, instead of following the steps on a video editor to do that every time. For example I would like to sharpen a video with my own custom kernels, which is feasible on most video editors only by copying the movie clip multiple times and blending them appropriately, which is a tedious task.

My thought process is to separate the clip into individual images, then processing them with a python script individually, then recompiling them into a video clip.

I have a python script with which I open the individual images using PIL, then storing them in a numpy array, and then applying a function on them, aka a convolution with the kernel.

An example code is:

from PIL import Image
import numpy as np

def convolve(img, w, h):
  kernel = [x, y, z] #normally a separable 3x3 kernel, so a 1D convolution is performed twice
  out = np.copy(img)
  out2 = np.copy(img)

  # first convolution
  for i in range(h):
    sum = 0
    for j in range(w):
      for k in range(-1,2):
        sum += img[i+k,j,0] * kernel[k+1]
    out[i,j,0] = sum

  # second convolution
  for i in range(h):
    sum = 0
    for j in range(w):
      for k in range(-1,2):
        sum += out[i,j+k,0] * kernel[k+1]
    out2[i,j,0] = sum
  return out2

img = Image.open("file")
img = img.convert("YCbCr")
arr = np.asarray(img)
h, w, d = np.shape(arr)
conv = convolve(arr, w, h)
new_img = Image.fromarray(conv, mode="YCbCr")
new_img.save("outputfile", [...parameters...])

This code is ok for standalone images, yet is incredibly inefficient for a series of images. Granted I'm doing convolution by hand and in python out of all tools, but I am not aware of another way to perform a convolution efficiently.

What I've come up with is using the numpy convolve function but it does not work with 3D arrays. Another method would be to first perform a transform like DCT or Fourier to transform images into the frequency domain, then multiplying with the kernel and then transforming them back to pixels but this sounds like too much work and would counteract the efficiency.

So this task does help getting rid of the repetition of video editing at the cost of taking way too much time to render the final video even for trivial tasks like sharpening or blurring a clip.

Thanks in advance

0

There are 0 answers