OpenShell：NemoClaw 背后的安全运行时

如果说 NemoClaw 是安全栈，那么 OpenShell 就是它的基石。由 NVIDIA 安全工程团队在过去 18 个月中开发，OpenShell 为 AI 代理执行提供内核级沙箱化——确保即使代理被完全攻破也无法访问其安全边界之外的资源。

本文是对 OpenShell 架构、隔离机制以及它如何以最小性能开销执行安全策略的技术深度解析。

为什么需要内核级隔离？

传统的应用沙箱化——容器、虚拟机、进程级隔离——是为行为可预测的软件设计的。AI 代理则根本不同。它们在运行时生成自己的执行计划，进行与外部系统交互的工具调用，并且可以产生任何测试套件都未曾预料到的新行为。

这种不可预测性要求安全模型在尽可能底层的级别运行：即内核级别。OpenShell 拦截代理进程发出的每一个系统调用，根据当前安全策略对其进行分类，并在调用到达内核之前做出允许/拒绝的决策。

Agent Process
     │
     ▼
OpenShell eBPF Layer  ←── Policy Engine
     │
     ├── ALLOW → System Call → Kernel
     │
     ├── DENY → Error returned to agent
     │
     └── ESCALATE → Human approval queue

eBPF：OpenShell 背后的技术

OpenShell 基于 eBPF（扩展伯克利包过滤器）构建，这是一项 Linux 内核技术，允许自定义程序在内核空间运行而无需修改内核本身。NVIDIA 团队编写了一套专门针对 AI 代理工作负载优化的 eBPF 程序：

系统调用拦截器

系统调用拦截器挂载到 sys_enter 跟踪点，对每个系统调用根据当前策略进行评估：

// Simplified OpenShell eBPF syscall interceptor
SEC("tracepoint/raw_syscalls/sys_enter")
int openshell_syscall_enter(struct trace_event_raw_sys_enter *ctx) {
    u32 pid = bpf_get_current_pid_tgid() >> 32;
    struct sandbox_policy *policy = bpf_map_lookup_elem(&sandbox_policies, &pid);

    if (!policy)
        return 0;  // Not a sandboxed process

    long syscall_nr = ctx->id;
    int decision = evaluate_policy(policy, syscall_nr, ctx->args);

    if (decision == DENY) {
        // Send event to userspace audit log
        emit_security_event(pid, syscall_nr, DENY);
        // Override return value to -EPERM
        bpf_override_return(ctx, -EPERM);
    } else if (decision == ESCALATE) {
        // Pause the process and notify approval queue
        emit_approval_request(pid, syscall_nr, ctx->args);
        send_signal(pid, SIGSTOP);
    }

    return 0;
}

文件系统守卫

文件系统守卫限制代理可以访问的文件和目录。它在 VFS（虚拟文件系统）操作级别运行，拦截 open、read、write、unlink 和 rename 调用：

yaml

# Filesystem policy for a customer support agent
filesystem:
  # Agent can read its own configuration
  - path: "/etc/nemoclaw/agent.yaml"
    permissions: [read]

  # Agent can read/write to its workspace
  - path: "/var/nemoclaw/workspace/**"
    permissions: [read, write, create]

  # Agent can read shared data
  - path: "/var/nemoclaw/shared/**"
    permissions: [read]

  # Everything else is denied by default
  defaultAction: deny

网络哨兵

网络哨兵挂钩到套接字操作，在连接级别控制代理的网络访问：

SEC("cgroup/connect4")
int openshell_connect4(struct bpf_sock_addr *ctx) {
    u32 pid = bpf_get_current_pid_tgid() >> 32;
    struct network_policy *policy = bpf_map_lookup_elem(&net_policies, &pid);

    if (!policy)
        return 1;  // Allow non-sandboxed processes

    __be32 dst_ip = ctx->user_ip4;
    __be16 dst_port = ctx->user_port;

    if (!is_allowed_destination(policy, dst_ip, dst_port)) {
        emit_security_event(pid, NETWORK_BLOCKED, dst_ip, dst_port);
        return 0;  // Block connection
    }

    return 1;  // Allow connection
}

策略执行架构

OpenShell 策略被编译为 eBPF 字节码，以实现最高的执行速度。编译管线工作流程如下：

1.YAML 策略文件由安全团队以人类可读格式编写
2.策略编译器将 YAML 转换为中间表示（IR）
3.Nemotron 策略验证器检查 IR 的逻辑一致性和冲突
4.eBPF 编译器生成经过验证的字节码并加载到内核中
5.运行时验证器确保 eBPF 程序能够终止且内存安全

对于典型策略集，整个编译管线在 2 秒内完成，策略可以热重载而无需重启代理。

bash

# Compile and load a policy
nemoclaw policy compile policies/customer-support.yaml
nemoclaw policy load customer-support

# Hot-reload a modified policy (no agent restart required)
nemoclaw policy reload customer-support

# Verify policy is active
nemoclaw policy status
# Output:
# POLICY              STATUS    LOADED AT            RULES
# customer-support    active    2026-03-19 14:30:01  47
# network-default     active    2026-03-19 14:30:01  12
# filesystem-strict   active    2026-03-19 14:30:01  23

操作员审批工作流

OpenShell 最具特色的功能之一是其内置的操作员审批系统。当代理尝试被分类为高风险的操作时，OpenShell 会暂停代理执行，并将审批请求路由给人工操作员。

审批工作原理

1.代理尝试高风险系统调用（如写入受保护文件、连接未批准的端点）
2.OpenShell 的 eBPF 程序向代理进程发送 SIGSTOP 信号
3.生成审批请求并通过配置的渠道发送（Slack、Teams、PagerDuty、电子邮件）
4.操作员审查请求上下文并批准或拒绝
5.如果批准，OpenShell 发送 SIGCONT 恢复代理；如果拒绝，返回 EPERM

审批请求包含完整的上下文：

json

{
  "request_id": "apr-2026031914-00042",
  "agent": "customer-support-agent-01",
  "action": "email.send",
  "target": "[email protected]",
  "context": {
    "ticket_id": "TKT-12345",
    "customer_name": "[REDACTED]",
    "reason": "Agent wants to send a follow-up email to the customer regarding their refund request",
    "email_preview": "Dear Customer, your refund of $250 has been processed..."
  },
  "risk_level": "medium",
  "policy_rule": "external-communication-requires-approval",
  "timestamp": "2026-03-19T14:30:42Z"
}

审批超时与默认操作

操作员可以配置审批超时后的行为：

yaml

approvalConfig:
  timeout: 15m
  onTimeout: deny          # deny | allow | escalate
  onEscalate:
    target: security-team
    channel: pagerduty
  maxPendingApprovals: 10  # Queue limit per agent
  autoApprove:
    # Automatically approve if the same action was approved
    # 3 times in the past 24 hours for this agent
    repeatThreshold: 3
    repeatWindow: 24h

性能特征

OpenShell 专为延迟敏感的生产工作负载设计。以下是在 DGX Spark 上测量的开销：

操作	开销
系统调用拦截（允许）	8 微秒
系统调用拦截（拒绝）	12 微秒
文件系统检查	15 微秒
网络连接检查	20 微秒
策略热重载	< 500 毫秒
审批往返（Slack）	2-30 秒（取决于人工响应）

作为对比，典型的 LLM 推理调用需要 500 毫秒至 5000 毫秒，因此 OpenShell 的开销在代理工作负载的背景下可以忽略不计。

与现有沙箱方案的比较

特性	OpenShell	Docker/OCI	gVisor	Firecracker
隔离级别	内核（eBPF）	命名空间	用户空间内核	微虚拟机
系统调用过滤	按策略，可热重载	静态 seccomp	完全拦截	完全隔离
网络策略	按代理，L7 感知	iptables	iptables	iptables
人工审批	内置	无	无	无
AI 感知策略	是（Nemotron）	否	否	否
开销	约 10 微秒	约 5 微秒	约 50 微秒	约 100ms 启动
GPU 直通	原生	NVIDIA CTK	有限	有限

OpenShell 的核心差异化在于它从一开始就是为 AI 代理工作负载设计的，内置支持自然语言策略、人工审批工作流和 GPU 加速策略评估。

开始使用 OpenShell

OpenShell 可以脱离 NemoClaw 栈的其余部分独立使用：

bash

# Install OpenShell standalone
curl -fsSL https://github.com/NVIDIA/OpenShell | bash

# Create a minimal sandbox policy
cat > my-policy.yaml << 'EOF'
apiVersion: openshell.nvidia.com/v1
kind: SandboxPolicy
metadata:
  name: my-first-sandbox
spec:
  isolation:
    network: restricted
    filesystem: workspace-only
    syscalls: minimal
EOF

# Run any process inside the sandbox
openshell run --policy my-policy.yaml -- python my_agent.py

OpenShell 以 Apache 2.0 许可证开源，可在 GitHub 的 nvidia/openshell 获取。在下一篇文章中，我们将探索完整 NemoClaw 栈的实际企业部署场景。

OpenShell：NemoClaw 背后的安全运行时