A Visual Guide to Sandboxing

Design the execution environment for an AI agent that writes and runs its own code. Start naive, discover each gap, and build defense in depth.

System DesignSecurityContainersAI Agents22 min

The Prompt

Your team is shipping an AI agent. It takes a user's request, writes Python or JavaScript, executes that code, browses the web, pulls data together, and returns results. Think of tools like ChatGPT's Code Interpreter, Devin, or any agent framework where the LLM can run arbitrary code on your infrastructure.

The core problem: you control the agent's system prompt, but you do not control what code it writes, what URLs it fetches, or what system calls it makes. A prompt injection, a hallucinated import, or a malicious user request could turn your agent into an attacker running inside your infrastructure.

Design the execution environment. Here is the first thing most teams sketch on a whiteboard:

User Request

Container

AI Agent

Results Out

The starting point. What could go wrong?

Watch Out

A container is a starting point, not a solution. Containers use Linux namespaces and cgroups to isolate processes, but they share the host kernel. Every syscall the agent makes goes directly to the same kernel running your other services. A single kernel vulnerability (and there are dozens disclosed per year) can break out of the container entirely.

The Isolation Stack

Isolation is not binary. It is a spectrum with five major levels, each trading off security strength against startup latency and resource overhead. The right choice depends on your threat model: are you protecting against accidental bugs, or against an actively adversarial agent trying to escape?

At the weakest end, a chroot jail just changes the apparent root directory. The process can still see the real kernel, the real network stack, and the real process table. At the strongest end, a full VM gives you an entirely separate kernel, virtual hardware, and memory space. Between them sit containers, gVisor (a user-space kernel that intercepts syscalls before they reach the host), and microVMs like Firecracker (a minimal VM that boots a real Linux kernel in ~125ms).

Click each level to explore what it isolates, what it leaks, and how fast it starts:

The Isolation Stack

Click any level to explore its mechanism, weakness, and tradeoffs

Weaker

Stronger

Lower overheadHigher overhead

Each step up the stack trades boot latency and runtime overhead for a stronger kernel-level boundary. Serverless platforms like AWS Lambda (Firecracker) and Google Cloud Run (gVisor) sit in the middle, strong enough for multi-tenant workloads, fast enough for sub-second cold starts.

Key Insight

Most production AI agent platforms land on containers + gVisor or Firecracker microVMs. Google Cloud Run and gVisor-backed GKE use the user-space kernel approach. AWS Lambda and Fly Machines use Firecracker. Both give you strong isolation with sub-second cold starts. Containers alone have too many kernel escape vectors for untrusted code. Full VMs are too slow.

The Escape

You picked your isolation boundary. The sandbox is running. But isolation only limits what the agent can touch on the host machine. There are four major attack surfaces that exist even inside a perfectly isolated sandbox, because they operate through legitimate channels the sandbox must keep open.

Network egress: If the sandbox has internet access (which most agents need, to call LLM APIs or fetch data), it can send stolen data to any server on the internet. It can also use DNS queries as a covert exfiltration channel, encoding data in subdomain lookups that bypass HTTP filters.

Resource exhaustion: A fork bomb or unbounded allocation loop inside one sandbox can consume all host CPU and memory, killing every other sandbox on the same machine and potentially the host itself.

Secret discovery: Environment variables, mounted config files, and the cloud metadata service (169.254.169.254) are all reachable from inside the sandbox. If the agent can read /proc/self/environ or curl the metadata endpoint, it can harvest API keys and IAM credentials.

Metadata service access: On AWS, GCP, and Azure, any process can query 169.254.169.254 to get temporary IAM credentials scoped to the host's role. If that role has broad permissions, the agent can access S3 buckets, databases, or other cloud resources. Click each threat path below to see the full attack sequence:

Attack Surface Visualizer

Click a threat vector to see how an uncontained agent escapes the sandbox.

SandboxCONTAINED

Select a threat vector above to trace the attack path

Watch Out

Isolation answers “can the agent break out of the box?” These four attack surfaces answer “what damage can it do from inside the box?” Each one requires its own dedicated defense layer. The rest of this guide builds those layers one by one.

Network Egress

The most dangerous thing a compromised sandbox can do is talk to the outside world. If it can make arbitrary outbound requests, it can exfiltrate user data to an attacker-controlled server, download and execute malware, or call paid APIs using your credentials.

The defense is an egress proxy. All outbound traffic from the sandbox is routed through a gateway process that enforces an allowlist. The sandbox itself has no direct internet access; its network namespace only has a route to the proxy. The proxy terminates TLS, inspects the destination URL (not just the hostname), and blocks anything not on the list.

A common pattern for AI agents: instead of giving the sandbox a raw OpenAI API key, you run the LLM calls through your own proxy URL. The sandbox calls https://llm-proxy.internal/v1/chat which forwards to OpenAI with your key attached server-side. The sandbox never sees the real API key, and you get a single chokepoint for rate limiting, cost tracking, and content filtering.

DNS filtering is also critical. Even if you block all HTTP egress, the agent can encode data in DNS queries (e.g., stolen-data.attacker.com) and exfiltrate through your DNS resolver. The egress proxy should also control DNS resolution, returning NXDOMAIN for anything not on the allowlist.

Toggle the domains below and run the simulation to see how different allowlist configurations affect what gets through:

Egress Proxy Simulator

Configure the allowlist, then run the simulation

Allowlist

api.openai.com

pypi.org

github.com

evil-exfil.comalways blocked

*dangerous

Toggle domains to see how the proxy responds. Wildcard (*) admits all traffic; use with caution.

Request Log

Press "Run Simulation" to stream requests through the proxy

How to implement it

The simplest production setup uses a forward proxy like Squid or Envoy running as a sidecar. The sandbox's network namespace has iptables rules that redirect all outbound traffic (ports 80, 443, and 53) to the proxy. The proxy holds the allowlist and makes the real outbound connection on behalf of the sandbox.

For LLM API calls specifically, a reverse proxy pattern works well: the sandbox is configured with an internal endpoint (e.g., http://10.0.0.1:8080/v1/chat) that your proxy translates to the real OpenAI/Anthropic endpoint, injecting the API key from a secrets manager. This way the sandbox never touches the real credential, and you can swap providers, add rate limits, or log prompts without changing the sandbox code.

Key Insight

A URL-level allowlist is much stronger than a DNS-only allowlist. DNS allowlists can be bypassed with IP literals, DNS rebinding attacks, or by tunneling data inside DNS queries themselves. The proxy should terminate TLS (via its own CA certificate trusted inside the sandbox) and inspect the full request URL and headers.

Resource Limits

Even if the agent cannot escape or talk to the outside world, it can still cause damage by exhausting host resources. A Python loop that appends to a list forever, a fork bomb that spawns thousands of processes, or a dd if=/dev/zero that fills the disk can all take down the host and every other sandbox running on it.

Linux cgroups v2 solve this. A cgroup is a kernel feature that lets you set hard caps on resource usage for a group of processes. You configure three main limits:

CPUcpu.max throttles the process to a percentage of a core. If set to 100000 100000, the sandbox gets at most 1 full CPU core. It can burst above that briefly but gets throttled by the kernel scheduler.
Memorymemory.max sets a hard ceiling (e.g., 512MB). When a process in the cgroup tries to allocate beyond this limit, the kernel's OOM killer terminates it with exit code 137. The host and other sandboxes are unaffected.
DiskUse a size-limited tmpfs mount (e.g., mount -t tmpfs -o size=1G tmpfs /workspace) so disk writes are capped and automatically cleaned up when the sandbox exits. The root filesystem should be read-only.

Watch what happens when a runaway process hits these limits:

Resource Limits

cgroup enforcement in action

Waiting to start...

CPU

limit: 100%

Memory

0 MB

limit: 512 MB

Disk I/O

0 MB

limit: 1 GB

Normal

Over limit / Clamped

cgroup limit

Event Log

no events yet

Note

Do not forget wallclock timeouts. Cgroups handle CPU and memory, but an agent that sleeps for hours while holding a connection open burns money without triggering any cgroup limit. Set a maximum execution time (e.g., 5 minutes) and kill the entire sandbox when it expires. A simple timeout(1) wrapper or a systemd timer works well.

Key Insight

For process limits specifically, set pids.max in the cgroup to prevent fork bombs. A limit of 64 or 128 processes is usually enough for any legitimate agent workload. Also consider io.max to cap disk I/O bandwidth, preventing a single sandbox from saturating the host's storage.

Secret Scoping

The agent needs credentials to do useful work: an LLM API key, database connection strings, OAuth tokens for third-party services. The question is how those credentials get into the sandbox, what scope they have, and how long they live.

The anti-pattern is baking credentials into the container image (in a Dockerfile ENV line, a config file, or an .env committed to the image layer). This is dangerous because container images are layered and immutable. Even if you delete the secret in a later layer, it persists in the earlier layer and can be extracted with docker history or by pulling the image and inspecting its layers.

The correct approach has three parts:

1.Runtime injection. Credentials are fetched from a secrets manager (Vault, AWS Secrets Manager, GCP Secret Manager) at sandbox startup time and injected as environment variables or tmpfs-mounted files. They never touch the image.
2.Short-lived tokens. Instead of a static API key that lives forever, the orchestrator mints a short-lived token (e.g., 15-minute TTL) scoped to exactly the APIs the agent needs. AWS STS AssumeRole with a session policy is one way to do this. When the sandbox exits, the token automatically expires.
3.Proxy-based credential hiding. For high-value keys (like your LLM provider API key), don't give the sandbox the key at all. Route the calls through an internal proxy that injects the key server-side. The sandbox calls http://proxy.internal/v1/chat, and the proxy adds the Authorization header before forwarding to OpenAI. The sandbox never sees the real key.

Press Play to compare the anti-pattern with the correct lifecycle side by side:

Credential Lifecycle

Two paths for managing secrets in agent execution

BadAnti-Pattern

GoodBest Practice

Press Play to step through both lifecycles simultaneously

The metadata service problem

On cloud VMs, the instance metadata service at 169.254.169.254 exposes temporary IAM credentials to any process on the machine. If your sandbox runs on an EC2 instance with a broad IAM role, the agent can curl that endpoint and get credentials to access S3, DynamoDB, or any other AWS service the role permits.

The fix: block the metadata IP in the sandbox's network namespace with an iptables rule, or use IMDSv2 (which requires a PUT request with a hop limit of 1, blocking containers from reaching it). Better yet, run sandboxes on instances with minimal IAM roles and inject only the specific credentials each sandbox needs.

Key Insight

The principle is least privilege, shortest lifetime. Each sandbox execution gets a fresh, scoped token that expires when the execution ends. If the token leaks, the blast radius is one request window, not your entire infrastructure.

The Hard Parts

Everything above is a security layer. But in production, the hardest problem is often performance: how fast can you spin up a fresh sandbox? For interactive AI agents, users expect responses to start within a second or two. If your sandbox takes 5 seconds to cold start, the user is staring at a spinner before the agent even begins working.

There are three main strategies to manage cold starts:

1.Pre-warming. Keep a pool of idle sandboxes ready to go. When a request arrives, assign an already-booted sandbox instead of starting a new one. This is how AWS Lambda Provisioned Concurrency works. Cold start drops to effectively zero, but you pay for idle capacity.
2.Snapshot/restore. Boot a sandbox once, get it into a ready state (kernel loaded, language runtime warmed up, packages installed), then snapshot its entire memory to disk. When a new sandbox is needed, restore from the snapshot instead of booting from scratch. Firecracker can restore a snapshot in ~125ms, giving you VM-level isolation with near-container startup times.
3.Lightweight isolation. Use gVisor or a seccomp-hardened container instead of a full VM. You trade some isolation strength for faster startup. The right choice depends on your threat model.

Different isolation approaches have very different cold start profiles. Watch them race:

Cold Start Race

How long until the first request is served? Bars fill proportionally in real time.

Firecracker boots cold (~500 ms)

200 ms threshold

Container (Docker)

gVisor

Firecracker

Full VM (QEMU)

0 ms750 ms1 500 ms2 250 ms3 000 ms

Cold start latency directly affects user-facing P99. In multi-tenant serverless, cold starts happen constantly. Isolation choice is a latency choice.

Read-only root filesystem

Another hard-but-important detail: the sandbox's root filesystem should be read-only. This prevents the agent from modifying its own runtime, injecting persistent backdoors, or tampering with binaries. Temporary writes go to a size-limited tmpfs mount that the kernel discards when the sandbox exits.

In Docker, this is --read-only --tmpfs /tmp:size=512m. The agent gets scratch space for temporary files, but nothing it writes survives past the current execution. This also means each new execution starts from a known clean state, which makes debugging and auditing much easier.

Key Insight

Snapshot/restore is the key technique for production. Firecracker can snapshot a booted microVM's memory and restore it in ~125ms, giving you VM-level isolation with near-container startup times. This is how AWS Lambda and Fly.io achieve fast cold starts with strong isolation boundaries.

Observability

You have locked down the sandbox. But how do you know what the agent actually did during a run? If something goes wrong, or if a user reports suspicious behavior, or if you need to investigate an incident, you need a complete record of every action the agent took.

A production sandbox audit trail captures four types of signals:

SyscallsTools like strace, seccomp-bpf, or gVisor's built-in logging record every kernel interaction: what files were opened, what processes were spawned, what sockets were created. An unusual burst of fork() calls is a fork bomb. A connect() to an unexpected IP is a data exfiltration attempt.
NetworkThe egress proxy logs every request: method, URL, response code, latency, and bytes transferred. DNS queries are logged separately. This gives you a complete picture of every external service the agent talked to.
FilesystemAfter the sandbox exits, diff the filesystem against its initial state. What files were created, modified, or deleted? An overlay filesystem (used by Docker by default) makes this diff trivial: just inspect the upper layer.
ResourcesRecord CPU usage, memory RSS, disk I/O, and network bytes over time. A sudden spike in memory from 50MB to 480MB in 2 seconds is a red flag. Correlate resource anomalies with syscall and network events to reconstruct what happened.

Drag the scrubber to replay a sandbox execution and see how these signals come together:

Audit Replay

Scrub the timeline to replay agent activity. Click any event for details.

3 alerts3 warns

0.0 st = 10.0 s10.0 s

Click an event
to inspect details

Every syscall, file access, and network connection is recorded. The sandbox can replay any run to audit exactly what the agent did, and catch what it tried to do.

Structured logging for AI agents

Beyond kernel-level tracing, log the agent's LLM calls as structured events: the prompt, the completion, the model used, token count, and latency. This is your application-level audit trail. If a prompt injection caused the agent to misbehave, you can trace it back to the exact message that triggered the behavior.

Store all logs with the sandbox execution ID as a correlation key. When you need to investigate, you can pull up a single execution and see everything: what the user asked, what code the agent wrote, what syscalls it made, what network requests it sent, and what resources it consumed.

Watch Out

Observability is not optional. Without an audit trail, you cannot distinguish between an agent that failed because of a bug and an agent that was actively trying to exfiltrate data. The two can look identical from the outside. The audit trail is also your evidence if you ever need to respond to a security incident or comply with a data access request.

Building It

You understand the layers. Now the interviewer asks: “How would you actually build this? What infrastructure would you use?” There are two paths: use an off-the-shelf sandbox platform, or build your own from cloud primitives.

Off-the-shelf options

If you do not need to build the sandbox layer yourself, several platforms provide it as a service:

E2BPurpose-built for AI agent sandboxes. Firecracker microVMs, sub-second cold starts, built-in filesystem snapshots, and a simple API. You get strong isolation without managing infrastructure.
ModalServerless containers with GPU support. Good if the agent needs to run ML inference or heavy compute inside the sandbox. Handles scaling and cold starts for you.
Fly MachinesFirecracker microVMs with a REST API. You can start, stop, and snapshot machines programmatically. Low-level enough to customize, managed enough that you do not run your own hypervisor fleet.
DaytonaDevelopment environment platform that provisions isolated, standardized sandboxes with built-in identity, networking, and lifecycle management. Good fit if you want managed infrastructure with fine-grained access controls out of the box.

For most teams, starting with one of these is the right call. You focus on the agent logic, and the platform handles isolation, cold starts, and cleanup.

The safe DIY answer: Docker on Fargate

If the interviewer pushes on building it yourself, or you need more control, the pragmatic answer is Docker containers on AWS Fargate (or Cloud Run on GCP). Here is why this is a solid default:

1.No servers to manage. Fargate runs each task on its own Firecracker microVM under the hood, so you get VM-level isolation between tasks without managing EC2 instances. Each sandbox is a Fargate task with its own network namespace.
2.Resource limits are built in. You set CPU and memory in the task definition. Fargate enforces them at the VM level. No cgroup configuration needed.
3.Network isolation via VPC. Put the Fargate tasks in a private subnet with no internet gateway. Route outbound traffic through a NAT gateway or, better, through an egress proxy running as a separate service. Security groups block the metadata service IP.
4.Secrets via IAM. Each task gets its own IAM task role with minimal permissions. Secrets come from AWS Secrets Manager, injected as environment variables at task launch. No static keys anywhere.
5.Logging for free. Fargate sends stdout/stderr to CloudWatch. Add an egress proxy log and a filesystem diff step at the end, and you have a reasonable audit trail.

The tradeoff is cold start latency. Fargate tasks take 10-30 seconds to launch, which is too slow for interactive use. To fix this, keep a warm pool: run N idle tasks that sit waiting for work, and replenish the pool as tasks are consumed. This is essentially building your own provisioned concurrency.

Be upfront about the cost: a warm pool means you are paying for idle compute 24/7. If you keep 20 warm Fargate tasks at 1 vCPU / 2GB each, that is ~$1,400/month sitting idle. You also add operational complexity: you need a pool manager that monitors pool size, replenishes consumed tasks, drains stale ones, and scales with traffic patterns. Autoscaling the pool (e.g., larger during business hours, smaller overnight) helps control costs but adds another moving part. This is exactly why most teams start with an off-the-shelf platform and only build custom when they have a clear reason.

If you need faster cold starts

For sub-second startup, you need to go lower. Run Firecracker directly on EC2 bare-metal instances (e.g., i3.metal) and manage the VM lifecycle yourself. This is what the off-the-shelf platforms do under the hood. You get snapshot/restore for ~125ms cold starts, but you also take on managing the hypervisor fleet, snapshot storage, and VM scheduling.

The middle ground is Lambda with container images. Package your sandbox runtime as a Lambda container (up to 10GB image), use provisioned concurrency for warm starts, and get Firecracker isolation for free. The 15-minute execution limit and 10GB ephemeral storage cap may or may not fit your workload.

Key Insight

In an interview, lead with the off-the-shelf answer (“we would use E2B or Fly Machines”), then say “if we needed to own it, Docker on Fargate with a warm pool and an egress proxy.” This shows you know when to build vs. buy and that you can design the custom version if pressed.

The Full Picture

We started with a container and a dream. Now we have seven defense layers, each solving a specific failure mode that the previous layers left open. This is defense in depth: no single layer is sufficient on its own, but together they form a system that is much harder to compromise.

The orchestrator at the outermost layer ties everything together. It provisions a fresh sandbox for each execution, injects scoped credentials, configures the egress proxy allowlist, sets cgroup limits, starts the audit logger, and sets a wallclock timeout. When the execution finishes (or times out), it collects the filesystem diff, archives the audit log, revokes the credentials, and tears down the sandbox.

Press Build to watch the full architecture assemble layer by layer:

The Full Picture

All sandbox layers assembled. Watch them wrap around the agent.

Layers applied0 / 7

Press "Build" to assemble the sandbox layers one by one

Active Flows

Flows appear once all layers are assembled

Legend

Protection active

Agent runtime

Monitored traffic

Key Insight

In a system design interview, this is the diagram you would draw on the whiteboard. Start with the agent in the center, then add each layer outward while explaining the specific threat it addresses. The interviewer wants to see that you understand why each layer exists, not just that you can name them. Walk through a concrete scenario: “The agent gets prompt-injected and tries to exfiltrate data via DNS. Here is how each layer responds.”

Knowledge Check

8 questions. Test whether you understood the tradeoffs, not just the terminology. Your answers are saved locally.