SIAT Architecture

SIAT: Video Data Processing Layer

Our work integrates the image and video processing library JavaCV with Spark, so stays in the same ecosystem as much as possible. Our work was not only limited to providing basic distributed video processing APIs, but also supports distributed dynamic feature extraction APIs which extracts the prominent information in the video data. In our work, the Distributed Video Data Processing Layer (DVDPL) is mainly in charge of pre-processing and extracting the important features from the raw videos which are provided as input to the Distributed Video Data Mining Layer. It is composed of two main components, namely, Distributed Video Pre-processor and Distributed Feature Extractor. Distributed Video pre-processor is responsible for performing basic video processing algorithms such as video conversion, video enhancement, video restoration, video encoding and decoding, video compression, video segmentation, background subtraction, and so forth. While Distributed Feature Extractor is in charge of extracting the predominant features from the videos in a distributed manner. The component of Distributed Feature Extractor includes color feature extractor, dynamic and frame based texture feature extractor, dynamic and frame based shape feature extractor, motion feature extractor, object feature extraction and Key Frame extractor. The pre-processing and Feature Extraction are employed based on the selection of higher-level services. The output of the Feature Extractor is represented either as Bag-of-Words or Histogram and stored in low-level result DS in the Big Data Curation Layer.

Here, we integrated the popular image and video processing library JavaCV with Spark to support basic video processing operations, for example, frame extraction, frame conversion (RGB to gray-scale), and many more. On top of that, we built our distributed video processing and mining APIs, which plays an important role to provide any higher level services. Some sample APIs for distributed video processing are presented in Table 1. Our wrapper APIs permit us to achieve most video processing operations in parallel in a distributed environment, which reduces the processing time dramatically. Similar to our previous work, in this work, we have implemented several video processing algorithms on top of Spark that includes edge detection, video encoding, background subtraction, key-frame extraction and dynamic feature extraction. For distributed dynamic feature extraction, we have built APIs for Volume Local Binary Pattern (VLBP), Volume Local Ternary Pattern (VLTP), Local Binary Pattern for three orthogonal planes (LBP-TOP) and Directional Local Ternary Pattern for three orthogonal planes (DLTP-TOP). These dynamic feature extraction algorithms form a circularly symmetric neighbor set. For edge detection, we have implemented distributed Sobel operator, distributed Laplacian operator and distributed Canny operator. Furthermore, we have developed distributed video encoding using MPEG and H264. We also deployed distributed key frame extractor using Local Binary Pattern (LBP) and Histogram of Oriented Gradient (HOG) based features. The main advantages of these APIs are developer can produce any higher-level services using these APIs without considering the insights of these APIs, which makes the developer's life easy and simple. However, the developer can also build their own APIs and can integrate with the system.

FlowScalability of low-level APIs for distributed video processing (a) Time (in seconds) required for the dynamic feature extraction on UCF50 dataset, (b) Time (in seconds) required for edge detection on UCF50 dataset, (c) Time (in seconds) required for video encoding on UCF50 dataset and (d) Time (in seconds) required for key frame extraction on UCF50 dataset.


Md Azher Uddin, Joolekha bibi Joolee, Aftab Alam and Young-Koo Lee, "Human Action Recognition using Adaptive Local Motion Descriptor in Spark", IEEE Access, 2017.