Video Annotation API

Video Annotation

Video Annotation

Annotation are used to provide notes or more information of a topic


Video annotation provide information about the contains as a level of each frame of a video

Spatial information like Car, road, Lane


annotation example


Car, Truck, Road Car, Truck, Road Car, Truck, Road
Car, Truck, Road
Car, Truck, Road
Car, Truck, Road

Prerequisite

  • java SE development kit 1.8 Download

  • Eclipse IDE for Java EE Developers (Eclipse Kepler) Download

  • Hadoop 3.1.1 on windows Download

  • Apache Spark version 1.6.2 Download

  • JavaCV 1.3.3 binary archive Download

  • Spark ML Library Version 1.3.0 Download

  • DL4J nd4j-native-platform 1.0-beta3 version Download


Datasets

Download the given datasets KTH and SIAT from below link


Run Program


  1. Video Annotation

    1. Download SIAT dataset which consists of 20 videos in train and 3 videos in test in each category

    2. Create Train and Test data by moving all train and test data into a single folder respectively

    3. Then rename the video file from 1 to n (number of video, in our case 1 – 60 for training and 1 -9 for testing) and also keep the label as the new video number

    4. Go to siat.vdml.videoAnnotation.App.java class and open it

    5. Give path for Train and test data

    6. Give the path for output file both for train and test feature and also for Spatial annotation final output

    7. Call siat.vdml.videoAnnotation.spatial.App.main(trainDataPath, testDataPath, trainFeature, testFeature) for spatial feature extraction

      • trainDataPath is the train path location
      • testDataPath is the test data path location
      • trainFeature is the output file for spatial feature extraction for train data
      • testFeature is the output file for spatial feature extraction for test data
    8. Call siat.vdml.videoAnnotation.temporal.App.main(trainDataPath, testDataPath, trainFeature, testFeature) for temporal feature extraction

      • trainDataPath is the train path location
      • testDataPath is the test data path location
      • trainFeature is the output file for temporal feature extraction for train data
      • testFeature is the output file for temporal feature extraction for test data
    9. Call siat.vdml.videoAnnotation.classification.SearchNearestFrames.classify (trainFeaturePath, testFeaturePath, labelFile) for spatial annotation

      • trainFeaturePath is the feature path for train data that we got from (g)
      • testFeaturePath is the feature path for test data that we got from (g)
      • labelFile is the label file for spatial annotation
    10. Call siat.vdml.videoAnnotation.classification.SearchNearestClips.classify (trainFeaturePath, testFeaturePath) for temporal annotation

      • trainFeaturePath is the feature path for train data that we got from (h)
      • testFeaturePath is the feature path for test data that we got from (h)
  2. Human Action Recognition

    1. Download KTH datasets which consist of three categories and in each category consists of 80 and 20 videos as train and test respectively

    2. Create Train and Test data by moving all train and test data into a single folder respectively

    3. Then rename the video file from 1 to n (number of videos, in our case 1 – 240 for training and 1 - 60 for testing)

    4. Open siat.vdml.actionRecognition.App.java class and give the train, test, savePath

    5. Call siat.vdml.actionRecognition.callSpark.featureExtraction(data_url, result_url, trainType) for feature extraction

      • Data_url is for train/test data path
      • Result_url is for the feature extraction output file
      • trainType is true then it’s for training data and false for testing data
    6. Call siat.vdml.actionRecognition.classification.Train_RandomForest.TrainData (data_url, no_class, no_tree, depth, bins)

      • Data_url is the train feature data got from (e)
      • no_class is the number of class of the dataset
      • no_tree is the model parameter, how many tree want for training
      • depth is the depth of the tree
      • bin is for number of bin we want for training
    7. Call siat.vdml.actionRecognition.classification.Test_RandomForest.classify(data_url, model)

      • Data_url is the test feature data path obtain from (e)
      • Model is the trained model obtain from (f)

Run Program in Cluster


Go to siat.kr Console Node 1-5


Go to Spark-client path using following command

cd /usr/hdp/current/spark2-client

Then, write the following command

./bin/spark-submit --class classPath --master yarn-cluster --num-executors 4 --driver-memory 1G --executor-memory 5G --executor-cores 3 jarPath


Example:- ./bin/spark-submit --class siat.dml.videoAnnotation.Apps --master yarn-cluster --num-executors 7 --driver-memory 1G --executor-memory 1G --executor-cores 3 hdfs:///user/abir/jar/Cluster-0.0.1-SNAPSHOT.jar

Run Program

Run Program in Cluster


Go to siat.kr Console Node 1-5


Go to Spark-client path using following command

cd /usr/hdp/current/spark2-client

Then, write the following command

./bin/spark-submit --class classPath --master yarn-cluster --num-executors 4 --driver-memory 1G --executor-memory 5G --executor-cores 3 jarPath


Example:- ./bin/spark-submit --class siat.dml.videoAnnotation.Apps --master yarn-cluster --num-executors 7 --driver-memory 1G --executor-memory 1G --executor-cores 3 hdfs:///user/abir/jar/Cluster-0.0.1-SNAPSHOT.jar

Run Program

APIs

End User


APIs Description
siat.vdml.videoAnnotation (video_data, “Method?) Annotate each frame from a video and retrieve true video annotation as tags or sentences
Description Data Types
Input Video Data and Method Name Video (avi, Mp4)
Output Tags as well as Sentence String
Method
distane_Learning Nearest Neighbor Search Approach to search nearest frame and clips
topic_Model Topic modeling is the process of identifying topics in a set of documents
Matrix_Completion Matrix completion is the task of filling in the missing entries of a partially observed matrix. A wide range of datasets are naturally organized in matrix form



Developer


Scene Based Feature Extractor


APIs Description
siat.vdml.videoAnnotation(video_data) Annotate each frame from a video and retrieve true video annotation as tags or sentences
siat.vdml.videoAnnotation.sceneBasedFeatureExtractor(AlgoOBJ) Extract spatial feature from video by taking individual frame
Description Data Types
Input CNN Algorithm Object Object
Output Feature Vector 2D Double Array
Algorithm
VGG19 VGG-19, from Very Deep Convolutional Networks for Large-Scale Image Recognition, take fixed size image input as {3, 224, 224}
DarkNet19 There are 2 pretrained models, one for 224x224 images and one fine-tuned for 448x448 images. Call setInputShape() with either {3, 224, 224} or {3, 448, 448} before initialization
ResNet Residual networks for deep learning and take input as {3, 224, 224}



Dynamic Feature Extractor


APIs Description
siat.vdml.videoAnnotation(video_data) Annotate each frame from a video and retrieve true video annotation as tags or sentences
siat.vdml.videoAnnotation.dynamicFeatureExtractor(AlgoOBJ) Extract Dynamic Feature from Video
Description Data Types
Input Dynamic Feature Extractor Algorithm Object Object
Output Feature Vector Double Array
Algorithm
VLBP Volume Local Binary pattern compare pixel with center pixel
VLTP Volume Local Ternary Pattern compare pixel with center pixel and return ternary pattern.
LBP_TOP LBP-TOP is an extension of LBP from two-dimensional space to three-dimensional space including spatial and time domain



Similarity Measure


APIs Description
siat.vdml.videoAnnotation.SearchSimilarFrame (algo Obj) Calculate Similarity score for Query Feature and Database Feature and return array tags as output
siat.vdml.videoAnnotation.SearchSimilarClips (algo Obj) Calculate Similarity score for Query Feature and Database Feature and return sentence as output
Description Data Types
SearchSimilarFrame Input Similarity Measure Algorithm Object Object
Output tags String Array
SearchSimilarClips Input Similarity Measure Algorithm Object Object
Output Sentence String
Algorithm
distane_Learning Nearest Neighbor Search Approach to search nearest frame and clips
topic_Model Topic modeling is the process of identifying topics in a set of documents
Matrix_Completion Matrix completion is the task of filling in the missing entries of a partially observed matrix. A wide range of datasets are naturally organized in matrix form



Admin


Scene Based Feature Extractor


APIs Description
siat.vdml.videoAnnotation(video_data) Preprocess video as color frame and gray frame
siat.vdml.videoAnnotation.sceneBasedFeatureExtractor Extract spatial feature from video
Description Data Types
Input Frame Frame Data (jpg, png, jpeg)
Output Feature Vector 2D Double Array
Method
VGG19 VGG-19, from Very Deep Convolutional Networks for Large-Scale Image Recognition, take fixed size image input as {3, 224, 224}
DarkNet19 There are 2 pretrained models, one for 224x224 images and one fine-tuned for 448x448 images. Call setInputShape() with either {3, 224, 224} or {3, 448, 448} before initialization
ResNet Residual networks for deep learning and take input as {3, 224, 224}



Dynamic Feature Extractor


APIs Description
siat.vdml.videoAnnotation(video_data) Preprocess video as color frame and gray frame
siat.vdml.videoAnnotation.dynamicFeatureExtractor Extract Dynamic Feature from Video
Description Data Types
Input Video Video(mp4, avi)
Output Feature Vector Double Array
Method
VLBP Volume Local Binary pattern compare pixel with center pixel
VLTP Volume Local Ternary Pattern compare pixel with center pixel and return ternary pattern
LBP_TOP LBP-TOP is an extension of LBP from two-dimensional space to three-dimensional space including spatial and time domain



Similarity Measure


APIs Description
siat.vdml.videoAnnotation.SearchSimilarFrame Calculate Similarity score for Query Feature and Database Feature
siat.vdml.videoAnnotation.SearchSimilarClips Calculate Similarity score for Query Feature and Database Feature
Description Data Types
SearchSimilarFrame Input Query_feature, Database_feature Text file
Output Array of Frame ID Integer Array
SearchSimilarClips Input Query_feature, Database_feature Text file
Output Clip ID Integer
Method
distane_Learning Nearest Neighbor Search Approach to search nearest frame and clips
topic_Model Topic modeling is the process of identifying topics in a set of documents
Matrix_Completion Matrix completion is the task of filling in the missing entries of a partially observed matrix. A wide range of datasets are naturally organized in matrix form



Retrieve Tags


APIs Description
siat.vdml.videoAnnotation.retrieveTagsForFrame Retrieve tags from given ID
siat.vdml.videoAnnotation.retrieveTagsForClip Retrieve sentence from given ID
Description Data Types
retrieveFrame Input Array of Frame ID Integer Array
Output Tags String Array
retrieveClip Input Frame ID Integer
Output Sentence String

Create Jar File

Maven Clean


Right Click on Project “Run as? Maven Clean [Figure 1] or


Go to project folder and Open terminal, and write following command [Figure 2]

mvn clean


Maven Clean Create Jar File


Maven Build


Right Click on Project “Run as? Maven Install [Figure 1] or


Go to Project folder and write following Command [Figure 2]

mvn clean package


Maven build Create Jar File

×


Upload Files

Upload Jar File


Go to cluster and Create a Directory as Jar


Upload jar file on that directory


Upload Jar File




Upload Videos


Go to cluster and Create directory as Train and test


Upload training and testing videos on Train and test directory respectively


Create Jar File


Run Program in Cluster


Go to siat.kr Console Node 1-5


Go to Spark-client path using following command

cd /usr/hdp/current/spark2-client

Then, write the following command

./bin/spark-submit --class classPath --master yarn-cluster --num-executors 4 --driver-memory 1G --executor-memory 5G --executor-cores 3 jarPath


Example:- ./bin/spark-submit --class siat.dml.videoAnnotation.Apps --master yarn-cluster --num-executors 7 --driver-memory 1G --executor-memory 1G --executor-cores 3 hdfs:///user/abir/jar/Cluster-0.0.1-SNAPSHOT.jar

Run Program