NemoClaw Architecture Deep Dive
NemoClaw is not a single product — it is a layered security architecture designed to make autonomous AI agents safe for production deployment. This post walks through each layer, explains how they interact, and provides the technical context you need to evaluate NemoClaw for your own infrastructure.
Architecture Overview
The NemoClaw stack consists of four primary layers, each addressing a different dimension of agent security:
┌─────────────────────────────────────────────┐
│ Agent Application │
│ (OpenClaw Agent Framework) │
├─────────────────────────────────────────────┤
│ Privacy Router │
│ (Local vs. Cloud Model Routing) │
├─────────────────────────────────────────────┤
│ Nemotron Policy Engine │
│ (120B MoE Intent Classification) │
├─────────────────────────────────────────────┤
│ OpenShell Runtime │
│ (Kernel-Level Sandbox + Isolation) │
├─────────────────────────────────────────────┤
│ Host OS / Hardware (DGX) │
└─────────────────────────────────────────────┘
Each layer operates independently and can be deployed standalone, but the full stack provides defense-in-depth security that no single layer can achieve alone.
Layer 1: OpenShell Security Runtime
OpenShell is the foundation of NemoClaw's security model. It provides kernel-level sandboxing for agent execution, ensuring that even a compromised agent cannot escape its security boundary.
How OpenShell Works
OpenShell uses a combination of Linux kernel namespaces, seccomp-BPF filters, and NVIDIA's custom eBPF programs to create isolated execution environments for each agent task:
# openshell-policy.yaml
apiVersion: openshell.nvidia.com/v1
kind: SandboxPolicy
metadata:
name: customer-support-agent
spec:
isolation:
network: restricted
filesystem: read-only
syscalls: minimal
resources:
maxMemory: 4Gi
maxCPU: 2
gpuAccess: inference-only
permissions:
allowedAPIs:
- crm.read
- crm.update
- ticket.create
- ticket.resolve
deniedAPIs:
- admin.*
- billing.*
- user.delete
auditLog:
enabled: true
destination: siem://security-events
Every system call made by the agent is intercepted by OpenShell's eBPF layer, classified against the policy, and either allowed, denied, or escalated for human approval. The entire decision pipeline runs in kernel space, adding less than 50 microseconds of latency per system call.
Operator Approval Workflows
For high-risk operations — deleting data, modifying infrastructure, sending external communications — OpenShell can pause agent execution and route the action to a human operator for approval:
// Approval workflow configuration
const approvalPolicy = {
triggers: [
{ action: 'data.delete', threshold: 'always' },
{ action: 'infra.modify', threshold: 'always' },
{ action: 'email.send', threshold: 'external-only' },
{ action: 'payment.process', threshold: 'above-100-usd' },
],
channels: ['slack', 'teams', 'pagerduty'],
timeout: '15m',
defaultAction: 'deny',
};
This ensures that agents can operate autonomously for routine tasks while maintaining human oversight for consequential actions.
Layer 2: Nemotron 120B MoE Policy Engine
The Nemotron 120B Mixture-of-Experts model serves as NemoClaw's intelligent policy evaluation engine. Unlike traditional rule-based security systems, Nemotron can understand the intent behind agent actions and evaluate them against natural-language security policies.
Intent Classification
When an agent requests an action, Nemotron classifies the intent across multiple dimensions:
- •Sensitivity: How sensitive is the data involved? (public, internal, confidential, restricted)
- •Reversibility: Can this action be undone? (fully reversible, partially reversible, irreversible)
- •Scope: How many systems or users are affected? (single, team, organization, external)
- •Compliance: Does this action fall under any regulatory frameworks? (GDPR, HIPAA, SOC 2, PCI-DSS)
The classification runs in under 200ms on a single A100 GPU, and under 50ms on a DGX Spark with the quantized model variant.
Natural Language Policies
Security teams can define policies in plain English, which Nemotron interprets and enforces:
Policy: "Customer support agents may access customer records for active tickets only.
They may not access financial data, modify account settings, or communicate
with customers outside of the ticketing system. All PII must be redacted
from internal logs."
Nemotron converts these natural-language policies into executable security rules, bridging the gap between security team intent and technical enforcement.
Layer 3: Privacy Router
The Privacy Router is NemoClaw's intelligent model routing layer. It determines whether each agent task should be processed by a local model (Nemotron running on-premises) or routed to a cloud model endpoint, based on the data sensitivity classification.
Routing Logic
# Simplified Privacy Router logic
def route_request(request: AgentRequest) -> ModelEndpoint:
sensitivity = classify_sensitivity(request.context)
if sensitivity in ['restricted', 'confidential']:
# Highly sensitive data stays local
return local_nemotron_endpoint
if sensitivity == 'internal':
# Internal data can use cloud with encryption
return cloud_endpoint_with_e2e_encryption
if sensitivity == 'public':
# Public data can use any endpoint for best performance
return optimal_cloud_endpoint
# Default: local processing
return local_nemotron_endpoint
The Privacy Router maintains a real-time classification cache, so repeated requests with similar context are routed without re-evaluation. In benchmarks, the router adds less than 5ms of latency to the request pipeline.
Data Residency Compliance
For organizations operating under data residency requirements (EU GDPR, China's PIPL, etc.), the Privacy Router can enforce geographic routing constraints:
privacyRouter:
residencyRules:
- region: EU
dataTypes: [personalData, financialData]
allowedEndpoints: [eu-west-1-local, eu-central-1-local]
- region: CN
dataTypes: [all]
allowedEndpoints: [cn-north-1-local]
fallback: local-only
Layer 4: Network Policy Engine
The Network Policy Engine controls what external resources an agent can access. It operates as a transparent proxy, inspecting and filtering all outbound network requests from agent sandboxes.
Policy Definition
networkPolicy:
name: sales-ops-agent
egress:
allow:
- domain: "*.salesforce.com"
methods: [GET, POST, PATCH]
- domain: "api.hubspot.com"
methods: [GET]
- domain: "smtp.company.com"
ports: [587]
deny:
- domain: "*" # deny all other outbound traffic
ingress:
allow:
- source: "webhook.salesforce.com"
path: "/api/v1/events"
inspection:
tlsDecrypt: true
logPayloads: false
scanForPII: true
The Network Policy Engine supports TLS interception for outbound requests (with proper certificate management), allowing it to scan request payloads for accidental PII leakage.
Putting It All Together
When an OpenClaw agent receives a task, the request flows through the NemoClaw stack as follows:
- 1.OpenClaw receives the task and constructs an execution plan
- 2.Privacy Router classifies the data sensitivity and selects the appropriate model endpoint
- 3.Nemotron evaluates the execution plan against security policies and classifies intent
- 4.OpenShell creates an isolated sandbox for the task execution
- 5.Network Policy Engine configures the sandbox's network access based on the agent's role
- 6.The agent executes within the sandbox, with every action audited
- 7.High-risk actions are escalated to human operators for approval
- 8.Results are returned through the Privacy Router, with PII redacted from logs
This entire pipeline adds approximately 300ms of latency to the first request in a session, and under 100ms for subsequent requests (due to caching). For most enterprise workloads, this overhead is negligible compared to the model inference time.
Performance Benchmarks
On a single DGX Spark:
| Metric | Value |
|---|---|
| Policy evaluation latency (p50) | 45ms |
| Policy evaluation latency (p99) | 180ms |
| Sandbox creation time | 120ms |
| Network policy application | 15ms |
| Throughput (concurrent agents) | 64 |
| Memory overhead per sandbox | 256MB |
On a DGX H100 cluster (8 GPUs):
| Metric | Value |
|---|---|
| Policy evaluation latency (p50) | 12ms |
| Policy evaluation latency (p99) | 45ms |
| Throughput (concurrent agents) | 512 |
Getting Started
The NemoClaw architecture documentation is available on GitHub at nvidia/nemoclaw. Each layer can be deployed independently, so you can adopt NemoClaw incrementally — starting with OpenShell sandboxing and adding the other layers as your security requirements evolve.
In the next post, we'll walk through a hands-on tutorial for deploying the full NemoClaw stack on a DGX Spark.