Video Analytics in Scala with Akka Actors, FFmpeg, GraphicsMagick, OpenCV and OpenIMAJ; Retrospective- A valuable learning experience

Image for post
Image for post
Architecture Diagram generated from repo source

This is going to be a long post. You can skip to the Design section if you don’t want to go through some background history of the project.

This was my first time writing Scala almost 6 years ago(and the last time for a project of this scale). Back then the word around was that Ruby is going to get obsolete as Rails is losing its charm after setting the web development world on the path of convention over configuration. Scala was a relatively new kid on the block with Twitter recently investing a lot of its stack development in it.

The whole idea of video analytics 6 years ago was quite a buzz. Around the time I started this, Azure was not even a cloud player but now it’s the 2nd biggest player and has beaten AWS to the Pentagon billion dollar cloud project. The world was still running their own racks or getting dedicated servers in a data center somewhere on the globe. Cloud was not the cloud we know today. So yeah this seemed like a daunting task where not much resources about how to go around doing this were available online and neither the tools in the cloud to scale were readily available for commercial purpose nor open source projects to do such analysis. All that you could find were some sample projects around using the libraries.

As a young developers with just over 3 years of experience in the industry this looked like an opportunity to disrupt the big players with commercial offerings. I named the project Retrospective then and now looking back at it, it’s a retrospective of that journey.

Primarily being a Ruby developer I wanted to do this in ruby and even found an openCV binding for it on Github called ropencv. It lacked resources over how to work with it and had limited features available, having limited past experience of working with openCV I managed to run the classifiers using ropencv and contributed examples for future users in a PR18.

The performance was not to the standards where you would be able to process videos in a reasonable amount of time. The next obvious choices were using C/C++ or JVM. Never wrote C++ in my life and never enjoyed writing a limited amount of C that I did. On the other hand Java was a decent compromise but its boilerplate was now a nuisance being a Ruby developer. That is when exploring other languages I narrowed it down to Groovy and Scala. The choice to proceed ended being Scala due to its strong support for Actors which seemed very impressive for doing multi tasking. Luckily OpenCV has a guide to generate Java bindings but none were available to download and use so I open sourced the bindings for anyone to use on Github

Enough of the background over how, why and when this project was commenced, will come back to why it was never pursued with commercial intent and it never got worked upon in last 6 years and what are the takeaways from this project.

Design

Now over how the whole program works or how I designed for it to run and process gigabytes of files over a modest laptop HP Pavillion dm-3000ea with 16 GB of RAM in significantly less time then the total duration of videos .

Videos are nothing but a sequence of Images and that is what was the primary block of analysis. Each video was broken down to level of images. Each video to be processed was termed as Footage which was then broken down into chunks called Video, video was further broken down in to clips from which images were extracted for analysis. Each breakdown was configurable down to a minute. Assume you have a Footage of 1 hour then it could be broken down in to chunks of 15 mins of videos which would then be broken down to chunks of 5 mins clips and frame extracted out of each clip. The clips could be generated down to 1 minute interval for analysis.

Image for post
Image for post
Image for post
Image for post

As you can see the breakdown in the folder structure to better understand the basic unit of processing.

Each Footage was owned by an Akka Actor an amazing toolkit to build reactive and concurrent systems. So each Footage was assigned to a Footage Actor which would convert the footage in to chunks of Video where each video was owned by its own actor VideoActor, which was a child actor to FootageActor. The VidoeActor would convert each video to a clip and each clip in same manner was owned by a ClipActor that would be a child of the VideoActor. So one FootageActor can have many child VideoActor and each VideoActor can have many ClipActor. This made a hierarchy of worker each working in its own space and doing analysis per clip

Image for post
Image for post

The FFmpeg was used to create Video and Clip chunks from the Footage. It was simpler to use the CLI tool via scala then a Java library which was not feature complete. To run CLI commands through out the code I wrote a small helper class that helped in executing commands and capturing their outputs to know about success or errors.

 
package services;

import grizzled.slf4j.Logging;

import sys.process._;

import scala.collection.mutable.ListBuffer;

object CommandService {

private object Locker
val cmd: CommandService = new CommandService

def execute(command: String): String = {
//Locker.synchronized{
cmd << command
cmd.execute
cmd.outputLogAll
// }
}
}

class CommandService extends Logging {
var commands = new ListBuffer[String]
val out = new StringBuilder
val err = new StringBuilder

val outputLogger = ProcessLogger(
(o: String) => out.append(o),
(e: String) => err.append(e)
)

def this(command: String) {
this
commands+=(command)
}

def << (command: String) = commands+=(command)

def cmdNew = commands+= "&&"

def printCmd = info(createCommand)

def execute {
var command: String = createCommand
info(command)
command ! outputLogger
commands.clear
}

def createCommand = commands.mkString(" ")

def outputLog = outputLogger

def outputLogOut = out.mkString

def outputLogErr = err.mkString

def outputLogAll = f"$outputLogOut %n $outputLogErr"

def printStdOutput {
printStdOut
printStdErr
}

def printStdOut {
info("output start")
info(out.mkString(""))
info("output end")
}

def printStdErr {
info("error start")
info(err.mkString(""))
info("error end")
}

}

Once we had our basic unit of processing it was turn for doing some image processing on the extracted frames. There can be as many processor as you like be added to the design as each process was performed by an actor in parallel. The main processes that were being done in parallel before Heatmaps get generated were following.

  • Background Subtraction and Connected Component
  • Classifiers Harr and HOG
  • ORB (Oriented FAST and Rotated BRIEF)
Image for post
Image for post

Background Subtraction

OpenCV was used to do background subtraction. Background was subtracted per frame to identify changes in each frame. This helped in identifying any motion in the video and also indicated particular regions that were occupied by objects with their duration which made it possible to create heatmaps of the indoor and outdoor locations.

Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post

Connected Components

To calculate the connected components from the background subtracted images initially OpenCV was tried but it had some issues and would cause segmentation faults. OpenIMAJ another great library to do image analysis was used and the results were decent enough to identify active location that were interacted with by people.

Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post

Cascade Classifier

Nowadays identification classifiers have progressed by miles. Its quite simple to built object identifying programs using toolkits such as TensorFlow. OpenCV had two basic classifiers at that time and they did a decent job of identifying objects when ran together and results combined with outliers rejected.

Image for post
Image for post
Image for post
Image for post

Heatmaps

This was my favourite feature of the whole and the initial use case to identify the hot zones in indoor and outdoor videos. This worked like a charm, it was probably the most accurate of all the features built on the backgroud subtraction output.

Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post

After all the processors ran then came the job of accumulating the results from each clip and combine them per video and then on footage level. This was also done using Actors. Accumulator Actor would combine results per clip and send them to a VideoActor who would do the same by running an Accumulator Actor per video and then send it over to Footage Actor.

This was all done using Strategy Pattern in Akka. I got to share this as an answer to question on stack overflow out of experience of this project as well

case class EventOperation[T <: Event](eventType: T)

class OperationActor extends Actor {

def receive = {
case EventOperation(eventType) => eventType.execute
}

}

trait Event {
def execute //implement execute in specific event class
}

class Event1 extends Event {/*execute implemented with business logic*/}
class Event2 extends Event {/*execute implemented with business logic*/}

This pretty much sums up the whole Retrospective project. The reason I gave up on Scala after this was its own doing the Scala 3 was on the rise with a Python 3 moment around the corner for it. Also the Community was split up itself trying to take each other down which isn’t a great welcoming experience for new comers. There are some really great individuals who can be credited to have new comers join the Scala club but for me the road ended with this project.

As for the project goes it didn’t get its chance to be commercialised due to the demand and this was probably way before its time for people to understand the importance of video analytics.

Nonetheless this project did introduce me to a lot of challenges and gave me an opportunity on how to solve them. It gave me an experience of working on a reactive concurrent system. Nowadays creating projects thay identify objects is a piece of cake with the advances over time but how to do that at scale over gigabytes of footage is what this project turned out to be a model for me and not just another object identifier in videos.

At some point I might put the source code on Github but for now give this a like if you learned something or enjoyed reading it.

Coder during the day, squash player in the evening and cricketer over the weekends. Doubts are the ants in the pants, that keep faith moving

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store