SIAT Architecture

SIAT: Video Data Mining Layer

The Distributed Video Data Mining Layer (VDML) is responsible for producing the high-level semantic result from the features generated by the DVDPL. It provides two lower level services: Batched video data processing and Real-time video stream processing. In the batch processing, Video Classifier, Video Retrieval, Video Annotator are defined while in the real-time video stream processing Future Behavior Predictor, Video classifier, Object Tracker and Video Annotator are defined. Video Classifier refers to classifying the videos based on pre-trained videos and Video Annotator refers to automatically assigning tags to the object or actions performed on the videos, while Video Retrieval refers to retrieving the similar looking videos from the Database. Lastly, Future Behavior Predictor is responsible for predicting the near future behavior or action by analyzing the previous few video frames. The parameter estimator is in charge of estimating the parameters. The result of DVDML is stored to High-Level result DS in the Big Data Curation Layer which works as an input to the Knowledge Curation Layer (KCL).

Spark MLlib is a library developed on top of Apache Spark that provides access to a large number of machine learning algorithms. Spark MLlib comprises of fast and scalable executions of typical learning algorithms including classification, clustering, regression, collaborative filtering, and dimensionality reduction. It also supports numerous underlying statistics, linear algebra and optimization primitives including a generic gradient descent optimization algorithm. In our work, we have integrated the Spark MLlib to perform the distributed mining on the low-level feature data that are extracted by DVDPL.

Recently, data-driven and computational intensive approaches like neural networks or deep learning (DL) have attracted lots of attention from the research community. This is because of their efficiency to extract hierarchical features from the raw data. Accordingly, they enable to discover significantly meaningful information from the vast amount of large-scale databases. Among the top deep learning techniques, Convolutional Neural Network (CNN) is the most popular one with particular focus on classifying and learning image data. By inheriting the success of deep learning, several tools or libraries (e.g., TensorFlowonSpark, Sparknet, DL4J, and BigDL) have been developed by integrating DL capabilities with big data frameworks like Apache Spark. However, most of the current tools have not been well-supported to video processing and Spark MLlib also lacks DL functions. Therefore, to complement our video data mining based on Spark Mllib and overcome existing limitations, we further incorporate DL libraries into our DVDML. Our DL library not only fits into Spark APIs but also provide a uniform set of APIs for programming of video understanding applications. In other words, this integration makes DL more accessible to scalable video processing. Within the scope of this paper, we develop our DL library based on DeepLearning4j stack, which provides the comprehensive support of DL technologies such as training and inference in the distributed setting.

Scalability test of deep learning API for distributed feature extraction and training classifier. (a), Running time for deep feature extraction on Hollywood2 dataset; (b), Running time for deep feature extraction on UCF50 dataset; (c), Running time for training softmax classifier on Hollywood2 dataset; (d), Running time for training softmax classifier on UCF50 dataset.


Md Azher Uddin, Joolekha bibi Joolee, Aftab Alam and Young-Koo Lee, "Human Action Recognition using Adaptive Local Motion Descriptor in Spark", IEEE Access, 2017.