Product architecture

The analytic framework of IBM® Video Analytics is composed of the Metadata Ingestion Lookup and Signaling (MILS), the Semantic Streams Engine (SSE), and the Deep Learning Engine (DLE).

The following diagram shows a high-level view of the product architecture and the flow of information between the various components of Video Analytics.

A diagram of the solution architecture

Video data and metadata for alerts and recorded video flow through the system in the following sequence.

Video from cameras is uploaded by file or streamed from a video management system (VMS).
The SSE requests and receives video.
The SSE uses the DLE for object detection and attribute analysis.
The SSE generates metadata, and sends it to the server.
The MILS indexes and stores the metadata.
The operator client either searches the metadata or receives alerts from MILS.
Live or recorded video can be viewed from the video player in the operator client.

Through the extensible capability of Video Analytics, each industry can build a surveillance solution. A solution can be built on top of the framework by building a set of plug-ins:

Analytics with a new or modified schema
Video management system (VMS) integration
Graphical user interface
Sensor integration

The following diagram shows the basic system architecture of Video Analytics.

Communication between MILS, SSE, and DLE servers over network

Metadata Ingestion Lookup and Signaling
The Metadata Ingestion Lookup and Signaling (MILS) of Video Analytics provides consolidated back-end metadata management capabilities, system management, user management, and various extensibility services. The MILS receives and stores the metadata that is generated when the Semantic Streams Engine processes and analyzes recorded video or the digital video feed from an integrated video management system (VMS).
Semantic Streams Engine
The Semantic Streams Engine (SSE) is a framework for capturing events that are observed by sensors such as cameras. The SSE is designed to process streams of live or recorded video footage and generate metadata information about the activities that occur in the field of view of a camera. The system can also process video from video capture cards that are on the SSE server.
Deep Learning Engine
The Deep Learning Engine (DLE) is used by the SSE to perform object detection and attribute classification with artificial neural networks for detection and attribute analysis. The DLE is an optional component, and Video Analytics can be installed without it if deep learning capabilities are not required. Note that if installed, the DLE requires a Graphics Processing Unit (GPU).

The SSE and DLE are distributed across 1 or more servers. On each server is an engine service, and within each engine service are 1 or more engines. Communication between the SSE servers and the MILS, and between the DLE and SSE servers, takes place over an Internet Protocol network.

Parent topic:
Solution overview
Related concepts:
Key concepts of Video Analytics