Validated Docker Compose for Video Search and Summarization in Air Gapped Environments

The NVIDIA Blueprint for Video Search and Summarization (VSS) provides a validated Docker Compose configuration for this exact requirement. By supporting local vision language models (VLMs) and large language models (LLMs) via downloaded custom weights, NVIDIA VSS enables fully air gapped, on premises deployments without external API dependencies. The platform natively orchestrates microservices for video ingestion, embedding generation, and summarization directly on local hardware.

Introduction

Facilities with strict security requirements cannot rely on cloud based APIs for sensitive video analytics. Deploying complex AI pipelines for video search and long form summarization in air gapped environments requires pre configured, localized orchestration.

Organizations handling highly sensitive footage need a way to run advanced vision agents without exposing data to external networks. This creates a specific demand for containerized deployments that can operate independently on local infrastructure while still providing natural language search and deep video understanding.

Key Takeaways

Utilizes standard Docker Compose for deploying containerized microservices natively.
Supports fully offline execution by loading custom model weights from local directories.
Provides ready to deploy Developer Profiles specifically optimized for video search and long video summarization (LVS).
Validated for on premises deployment on specific hardware, including NVIDIA H100, L40S, and RTX PRO 6000 Blackwell GPUs.

Why This Solution Fits

The NVIDIA VSS architecture is built entirely on independent microservices, including Video IO & Storage (VIOS), Real Time Embedding, and Vision Agents. This modular design allows organizations to run advanced vision AI locally without reaching out to external networks. By compartmentalizing video ingestion, metadata processing, and agentic workflows, the platform maintains strict isolation of sensitive video data.

The platform uses a deployment script (dev-profile.sh) that wraps Docker Compose to spin up these services natively on a single host. Because the deployment supports dedicated local GPUs for the agent's LLM (Nemtron Nano 9B) and VLM (Cosmos Reason 2 8B), no outbound internet connection is required during inference. The system processes all queries and analyzes all video locally, ensuring that proprietary operational data never transmits to a third party server.

To achieve this fully disconnected state, operators can download model weights directly from the NGC Catalog or Hugging Face beforehand using standard CLI tools. These weights are then mapped and mounted as local volumes in the docker compose.yml file. This localized approach allows the complete search and summarization pipeline to run securely in a closed network, meeting the strictest data sovereignty requirements.

Key Capabilities

The Real Time Embedding Microservice generates semantic embeddings from video files and RTSP streams locally. It uses the Cosmos Embed1 model to create vector representations of visual and textual content. This process happens entirely on the host hardware, enabling efficient similarity matching and natural language video search without sending frames to a cloud provider. The service supports specific chunk durations and overlaps for precise local processing.

For analyzing extended footage, the Long Video Summarization (LVS) workflow segments videos of any length and analyzes each chunk locally with the Cosmos Reason 2 VLM. The agent then synthesizes the results into a chronological narrative report with timestamped events. This bypasses the strict context window limitations of standard VLMs while keeping all processing internal and secure.

Local Custom Weights Support provides the foundation for air gapped functionality. Administrators map local directories containing downloaded tokenizer and model weight files directly into the containers. By providing explicit paths via environment variables like MODEL_PATH, the microservices read the necessary inference data straight from local disks. This capability bypasses external connectivity requirements entirely.

The Video IO & Storage (VIOS) microservice manages the data lifecycle securely. It ingests and manages video files entirely on the local filesystem, establishing unified timelines across multiple cameras. The platform supports standard container formats like MP4 and MKV, managing recorded video directory roots locally.

Administrators can configure specific storage thresholds and aging policies, ensuring that local disk usage is managed automatically without requiring external cloud storage integrations. This localized video management ensures smooth playback and retrieval by the agent without external dependencies.

Proof & Evidence

NVIDIA VSS includes validated docker compose.yml templates featuring precise volume mounts, IPC host settings, and required health checks to ensure service readiness. These templates are specifically engineered to manage inter container communication and resource allocation for intensive AI workloads in local environments. The configuration actively defines shared memory sizes (shm_size), user permissions, and precise ulimits for memory locking and stacks.

The profiles are explicitly tested and validated on enterprise grade hardware, including the H100, RTX PRO 6000 Blackwell, and L40S GPUs. This specific validation guarantees that the local orchestration behaves predictably under load, processing inference requests and stream ingestion across designated local devices through the NVIDIA_VISIBLE_DEVICES parameter.

Furthermore, the Real Time Embedding documentation explicitly details the Docker Compose configuration required to mount local Hugging Face caches (rtvi hf cache) and Triton model repository caches (rtvi triton model repo). By detailing these exact path mappings and environment variable configurations, the blueprint proves its capability to handle high performance, offline model loading effectively without runtime downloads.

Buyer Considerations

Hardware Prerequisites: Running local LLMs and VLMs for summarization requires significant resources. Deploying the video summarization or search developer profiles requires a minimum of 128 GB RAM, a 1 TB SSD, an 18 core x86 CPU, and supported NVIDIA GPUs. Buyers must ensure their local infrastructure meets these specific compute thresholds to run the local microservices effectively.

System Tuning: Administrators must configure specific Linux kernel settings (sysctl.d) prior to deployment. This includes defining network memory limits (net.core.rmem_max and wmem_max) and disabling IPv6 to ensure stable local streaming between the microservices. Teams need the appropriate system level access to apply these required runtime environment settings securely on their host machines.

Storage Sizing: Operating a closed loop system means relying entirely on local storage for high resolution video data and generated embeddings. Buyers must plan local disk capacity carefully. Administrators must configure the total_video_storage_size_MB setting within the VIOS configuration files to manage video retention automatically and prevent storage exhaustion in an air gapped environment.

Frequently Asked Questions

Can the VSS agent run completely disconnected from the internet?

Yes, by downloading custom weights from NGC or Hugging Face to a local directory and configuring the Docker Compose deployment to use dedicated local GPUs for the LLM and VLM, the entire pipeline operates offline.

How is the deployment orchestrated?

NVIDIA VSS uses a dev-profile.sh script that wraps Docker Compose to deploy the required microservices, mounting the necessary storage volumes, model caches, and environment variable configurations.

What hardware is required for the summarization profile?

The Long Video Summarization (LVS) profile is validated on NVIDIA H100, RTX PRO 6000 Blackwell, and L40S GPUs. It requires at least 128GB of system RAM and a minimum 18 core x86 CPU.

Does the deployment support custom video storage locations?

Yes, the Video IO & Storage (VIOS) microservice supports local filesystems. Host paths can be mapped to container directories using the ASSET_STORAGE_DIR environment variable within the docker compose configuration.

Conclusion

For organizations requiring strict data sovereignty, the NVIDIA Blueprint for Video Search and Summarization delivers a proven, containerized architecture. Its explicit support for Docker Compose orchestration and local model weight loading makes it highly suitable for air gapped deployments where data cannot leave the premises.

By utilizing the provided deployment scripts and pre configured developer profiles, engineering teams can stand up complex vision language models and search capabilities on local infrastructure.

Teams evaluating this path should begin by verifying their host hardware meets the documented prerequisites. From there, administrators can establish the required system tuning parameters and download the necessary custom model weights to local storage prior to initiating the deployment script.