Article

SIAT: Intelligent Video Big Data Analytics In The Cloud

In the recent past, the number of surveillance cameras placed in the public has increased significantly, and an enormous amount of visual data is produced at an alarming rate. Resultantly, there is a demand for a distributed system for video analytics. However, a majority of existing research on video analytics focuses on improving video content management and rely on a traditional client/server framework. In this project, we develop a scalable and flexible architecture called SIAT on top of general-purpose big data technologies for intelligent video big data analytics in the cloud. The proposed architecture acquires video streams from device-independent data-sources utilizing distributed streams and file management systems. High-level abstractions are provided to allow the researcher to develop and deploy video analytics algorithms and services in the cloud under the as-a-service paradigm.

Video Big Data, Cloud Computing, and Their Relationship

The term big data appeared and popularized by John R. Masey in the late 1990s, which refers to a large volume of data that are impractical to be stored, processed and analyzed using traditional data management and processing technologies. The data can be unstructured, semi-structured, and structured data, but mostly unstructured data is considered. The definition of big data evolved and has been described in terms of three, four, or five characteristics. In literature, among these characteristics, three are shared, i.e., Volume, Velocity, and Variety, while the others are Veracity and Value. Various video stream sources generate a considerable amount of unstructured video data on a regular bases and becoming a new application field of big data. The data generated by such sources are further subject to contextual analysis and interpretation to uncover the hidden patterns for decision-making and business purposes.

In the context of a large volume of video data, we specialize the generic big data characteristics. The size of data is referred to as Volume, but the majority of the shares, i.e., 65%, are held only by surveillance videos. The type of data generated by various sources such as text, picture, video, voice, and logs are known as Variety. The video data are acquired from multimodal video stream sources, e.g., IP-Camera, depth camera, body-worn camera, etc., and from different geolocations, which augments the Variety property. The pace of data generation and transmission is known as Velocity. The video data also possess the Velocity attribute, i.e., the Video Stream Data Source (VSDS) primarily produce video stream 24/7 and acquired by the data center storage servers. Veracity can be defined as the diversity of quality, accuracy, and trustworthiness of the data. Video data are acquired directly from real-world domains and meet the Veracity characteristic. The Value refers to contextual analysis to extract the significant values for decision-making and business purpose. Video data has high Value because of its direct relation with real-word. Automatic criminal investigation, illegal vehicle detection, and abnormal activity recognition are some of the examples of Value extraction. Almost all the big data properties are dominated by the video data, which encourage us to give birth to Video Big Data.

These five characteristics impose many challenges on the organizations when embracing video big data analytics. Storing, scaling, and analyzing are some apparent challenges associated with video big data. To cope with these challenges, converged and hyper-converged infrastructure and software-defined storage are the most convenient solutions. Distributed databases, data processing engines, and machine learning libraries have been introduced to overcome video big data management, processing, and analysis issues, respectively.

These big data technologies are deployed over a computer cluster to process and manage a massive amount of video data in parallel. A computer cluster may consist of few to hundreds, and even thousands of nodes work together as a single integrated computing resource, on different parts of the same program. Deploying an indoor computer cluster is an option for big data technologies, but hardware cost and maintenance issues are associated with it. An alternative solution can be cloud computing that elegantly reduces the costs associated with the management of hardware and software resources.

Typically, cloud services are provided on-demand in a "pay-as-you-go" manner for the conveniences of end-users and organizations, as shown in Figure 1. Cloud computing follows the philosophy of the "as-a-Service" and offers its "services" according to different models, for example, Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), Software-as-a-Service (SaaS). Under IaaS (e.g., Amazon's EC2), the cloud service provider facilitates and allows the consumers to provision fundamental computing resources and deploy arbitrary software. In PaaS, the service provider provides a convenient platform enabling customers to develop, run, and manage applications without considering the complexities of building and maintaining the infrastructure. The examples of PaaS are Google''s Apps Engine and Microsoft Azure. In SaaS, applications (e.g., email, docs, etc.) are deployed on cloud infrastructure by service providers and allow the consumer to subscribe. These applications can be easily accessed from various client devices using a thin client or program interfaces.

In the cloud, big data technologies can be utilized for IVA solutions, as shown in Figure 2. In the cloud, the IVA solutions are made available under the as-a-Service (aaS) paradigm, i.e., IVAaaS & IVAAlgorithmaaS, as shown in Figure 3. This triangular relationship among Big data technologies, cloud computing, and IVA leads to diverse types of research issues. In this research project, we try to propose a distributed, layered, service-oriented, and lambda style inspired reference architecture for large-scale IVA in the cloud. Similarly, propose real-world domain specific services on top of the SIAT.