Goal: To extract all the frames from a video, process the frames in a Machine Learning model and rebuild a new video out of those processed frames.
Problem: I am currently able to extract frames from a video but at a very slow speed. About 2 frames per seconds. I want to speed it up to at least 10-20 frames per second.
I am using the android's native MediaMetadataRetriever class to extract the frames as Bitmap and store them in a List
Here's the code:
fun getAllFrames(uri: Uri): List<Bitmap> {
val frameList = ArrayList<Bitmap>()
setDataSource(context, uri)
// playback duration (in ms) of the data source.
val duration: String? = extractMetadata(METADATA_KEY_DURATION)
val durationMillis = duration!!.toInt()
val durationMicros = durationMillis * 1000
// to get a video output with 30fps (input can be 60 or 30fps)
val fps30 = ((1000/30)*1000).toLong()
for (i in 0L..durationMicros step fps30) {
val frame = getFrameAtTime(i, OPTION_CLOSEST)
frame?.let {
frameList.add(frame)
}
}
return frameList
}
I have looked into the ffmpeg and javaCV library but I didn't see a method that extract all the frames accurately and efficiently (maybe I've missed it?). Instead of using time interval in the getFrameAtTime, I want a method like grabeAllFrames.
Can anyone give me any hints how to achieve this goal???
Nearly all videos you are processing will be encoded in one format or another, and these encoding formats, e.g. h.264, will usually compress the video to save storage and bandwidth.
The trade off is that you need to do work to uncompress the video and get each frame, and the more 'efficient' the codec (the encoder) the more work it usually has to do.
Most devices have dedicated HW accelerate paths to decode and display common video encoding formats, but these are usually optimised for displaying and not analysing and modifying a video.
If you have the option to do the work on the server side it is usually much easier due to the greater available processing power, and also to possibly a wider set of libraries and services that may be available.
If you do have to work on the mobile then it may be worth looking at OpenCV for Android, with the caveat that it can be tricky to compile and that the documentation is usually eclipse based.
Certainly you should be able to achieve better than 2 frames per second if your analysis of each frame is not too processor hungry.
A good simple example to look at first is the color club detection which detects an object or blob of a specific color each frame: https://github.com/opencv/opencv/tree/master/samples/android/color-blob-detection/src/org/opencv/samples/colorblobdetect
This answer provides an annotated extract to explain how it works: https://stackoverflow.com/a/40918718/334402
Its worth adding that Machine Learning use cases may also be processor and time hungry so it may be the combination of both that is slowing you down for your use case.