A Visual Guide to Sandboxing
Design the execution environment for an AI agent that writes and runs its own code. Start naive, discover each gap, and build defense in depth.
The Prompt
Your team is shipping an AI agent. It takes a user's request, writes Python or JavaScript, executes that code, browses the web, pulls data together, and returns results. Think of tools like ChatGPT's Code Interpreter, Devin, or any agent framework where the LLM can run arbitrary code on your infrastructure.
The core problem: you control the agent's system prompt, but you do not control what code it writes, what URLs it fetches, or what system calls it makes. A prompt injection, a hallucinated import, or a malicious user request could turn your agent into an attacker running inside your infrastructure.
Design the execution environment. Here is the first thing most teams sketch on a whiteboard:
The starting point. What could go wrong?
Watch Out
The Isolation Stack
Isolation is not binary. It is a spectrum with five major levels, each trading off security strength against startup latency and resource overhead. The right choice depends on your threat model: are you protecting against accidental bugs, or against an actively adversarial agent trying to escape?
At the weakest end, a chroot jail just changes the apparent root directory. The process can still see the real kernel, the real network stack, and the real process table. At the strongest end, a full VM gives you an entirely separate kernel, virtual hardware, and memory space. Between them sit containers, gVisor (a user-space kernel that intercepts syscalls before they reach the host), and microVMs like Firecracker (a minimal VM that boots a real Linux kernel in ~125ms).
Click each level to explore what it isolates, what it leaks, and how fast it starts:
The Isolation Stack
Click any level to explore its mechanism, weakness, and tradeoffs
Each step up the stack trades boot latency and runtime overhead for a stronger kernel-level boundary. Serverless platforms like AWS Lambda (Firecracker) and Google Cloud Run (gVisor) sit in the middle, strong enough for multi-tenant workloads, fast enough for sub-second cold starts.
Key Insight
The Escape
You picked your isolation boundary. The sandbox is running. But isolation only limits what the agent can touch on the host machine. There are four major attack surfaces that exist even inside a perfectly isolated sandbox, because they operate through legitimate channels the sandbox must keep open.
Network egress: If the sandbox has internet access (which most agents need, to call LLM APIs or fetch data), it can send stolen data to any server on the internet. It can also use DNS queries as a covert exfiltration channel, encoding data in subdomain lookups that bypass HTTP filters.
Resource exhaustion: A fork bomb or unbounded allocation loop inside one sandbox can consume all host CPU and memory, killing every other sandbox on the same machine and potentially the host itself.
Secret discovery: Environment variables, mounted config files, and the cloud metadata service (169.254.169.254) are all reachable from inside the sandbox. If the agent can read /proc/self/environ or curl the metadata endpoint, it can harvest API keys and IAM credentials.
Metadata service access: On AWS, GCP, and Azure, any process can query 169.254.169.254 to get temporary IAM credentials scoped to the host's role. If that role has broad permissions, the agent can access S3 buckets, databases, or other cloud resources. Click each threat path below to see the full attack sequence:
Attack Surface Visualizer
Click a threat vector to see how an uncontained agent escapes the sandbox.
Select a threat vector above to trace the attack path
Watch Out
Network Egress
The most dangerous thing a compromised sandbox can do is talk to the outside world. If it can make arbitrary outbound requests, it can exfiltrate user data to an attacker-controlled server, download and execute malware, or call paid APIs using your credentials.
The defense is an egress proxy. All outbound traffic from the sandbox is routed through a gateway process that enforces an allowlist. The sandbox itself has no direct internet access; its network namespace only has a route to the proxy. The proxy terminates TLS, inspects the destination URL (not just the hostname), and blocks anything not on the list.
A common pattern for AI agents: instead of giving the sandbox a raw OpenAI API key, you run the LLM calls through your own proxy URL. The sandbox calls https://llm-proxy.internal/v1/chat which forwards to OpenAI with your key attached server-side. The sandbox never sees the real API key, and you get a single chokepoint for rate limiting, cost tracking, and content filtering.
DNS filtering is also critical. Even if you block all HTTP egress, the agent can encode data in DNS queries (e.g., stolen-data.attacker.com) and exfiltrate through your DNS resolver. The egress proxy should also control DNS resolution, returning NXDOMAIN for anything not on the allowlist.
Toggle the domains below and run the simulation to see how different allowlist configurations affect what gets through:
Egress Proxy Simulator
Configure the allowlist, then run the simulation
Toggle domains to see how the proxy responds. Wildcard (*) admits all traffic; use with caution.
Press "Run Simulation" to stream requests through the proxy
How to implement it
The simplest production setup uses a forward proxy like Squid or Envoy running as a sidecar. The sandbox's network namespace has iptables rules that redirect all outbound traffic (ports 80, 443, and 53) to the proxy. The proxy holds the allowlist and makes the real outbound connection on behalf of the sandbox.
For LLM API calls specifically, a reverse proxy pattern works well: the sandbox is configured with an internal endpoint (e.g., http://10.0.0.1:8080/v1/chat) that your proxy translates to the real OpenAI/Anthropic endpoint, injecting the API key from a secrets manager. This way the sandbox never touches the real credential, and you can swap providers, add rate limits, or log prompts without changing the sandbox code.
Key Insight
Resource Limits
Even if the agent cannot escape or talk to the outside world, it can still cause damage by exhausting host resources. A Python loop that appends to a list forever, a fork bomb that spawns thousands of processes, or a dd if=/dev/zero that fills the disk can all take down the host and every other sandbox running on it.
Linux cgroups v2 solve this. A cgroup is a kernel feature that lets you set hard caps on resource usage for a group of processes. You configure three main limits:
- CPU
cpu.maxthrottles the process to a percentage of a core. If set to100000 100000, the sandbox gets at most 1 full CPU core. It can burst above that briefly but gets throttled by the kernel scheduler. - Memory
memory.maxsets a hard ceiling (e.g., 512MB). When a process in the cgroup tries to allocate beyond this limit, the kernel's OOM killer terminates it with exit code 137. The host and other sandboxes are unaffected. - DiskUse a size-limited
tmpfsmount (e.g.,mount -t tmpfs -o size=1G tmpfs /workspace) so disk writes are capped and automatically cleaned up when the sandbox exits. The root filesystem should be read-only.
Watch what happens when a runaway process hits these limits:
Resource Limits
cgroup enforcement in action
Event Log
Note
timeout(1) wrapper or a systemd timer works well.Key Insight
pids.max in the cgroup to prevent fork bombs. A limit of 64 or 128 processes is usually enough for any legitimate agent workload. Also consider io.max to cap disk I/O bandwidth, preventing a single sandbox from saturating the host's storage.Secret Scoping
The agent needs credentials to do useful work: an LLM API key, database connection strings, OAuth tokens for third-party services. The question is how those credentials get into the sandbox, what scope they have, and how long they live.
The anti-pattern is baking credentials into the container image (in a Dockerfile ENV line, a config file, or an .env committed to the image layer). This is dangerous because container images are layered and immutable. Even if you delete the secret in a later layer, it persists in the earlier layer and can be extracted with docker history or by pulling the image and inspecting its layers.
The correct approach has three parts:
- 1.Runtime injection. Credentials are fetched from a secrets manager (Vault, AWS Secrets Manager, GCP Secret Manager) at sandbox startup time and injected as environment variables or tmpfs-mounted files. They never touch the image.
- 2.Short-lived tokens. Instead of a static API key that lives forever, the orchestrator mints a short-lived token (e.g., 15-minute TTL) scoped to exactly the APIs the agent needs. AWS STS
AssumeRolewith a session policy is one way to do this. When the sandbox exits, the token automatically expires. - 3.Proxy-based credential hiding. For high-value keys (like your LLM provider API key), don't give the sandbox the key at all. Route the calls through an internal proxy that injects the key server-side. The sandbox calls
http://proxy.internal/v1/chat, and the proxy adds theAuthorizationheader before forwarding to OpenAI. The sandbox never sees the real key.
Press Play to compare the anti-pattern with the correct lifecycle side by side:
Credential Lifecycle
Two paths for managing secrets in agent execution
Press Play to step through both lifecycles simultaneously
The metadata service problem
On cloud VMs, the instance metadata service at 169.254.169.254 exposes temporary IAM credentials to any process on the machine. If your sandbox runs on an EC2 instance with a broad IAM role, the agent can curl that endpoint and get credentials to access S3, DynamoDB, or any other AWS service the role permits.
The fix: block the metadata IP in the sandbox's network namespace with an iptables rule, or use IMDSv2 (which requires a PUT request with a hop limit of 1, blocking containers from reaching it). Better yet, run sandboxes on instances with minimal IAM roles and inject only the specific credentials each sandbox needs.
Key Insight
The Hard Parts
Everything above is a security layer. But in production, the hardest problem is often performance: how fast can you spin up a fresh sandbox? For interactive AI agents, users expect responses to start within a second or two. If your sandbox takes 5 seconds to cold start, the user is staring at a spinner before the agent even begins working.
There are three main strategies to manage cold starts:
- 1.Pre-warming. Keep a pool of idle sandboxes ready to go. When a request arrives, assign an already-booted sandbox instead of starting a new one. This is how AWS Lambda Provisioned Concurrency works. Cold start drops to effectively zero, but you pay for idle capacity.
- 2.Snapshot/restore. Boot a sandbox once, get it into a ready state (kernel loaded, language runtime warmed up, packages installed), then snapshot its entire memory to disk. When a new sandbox is needed, restore from the snapshot instead of booting from scratch. Firecracker can restore a snapshot in ~125ms, giving you VM-level isolation with near-container startup times.
- 3.Lightweight isolation. Use gVisor or a seccomp-hardened container instead of a full VM. You trade some isolation strength for faster startup. The right choice depends on your threat model.
Different isolation approaches have very different cold start profiles. Watch them race:
Cold Start Race
How long until the first request is served? Bars fill proportionally in real time.
Cold start latency directly affects user-facing P99. In multi-tenant serverless, cold starts happen constantly. Isolation choice is a latency choice.
Read-only root filesystem
Another hard-but-important detail: the sandbox's root filesystem should be read-only. This prevents the agent from modifying its own runtime, injecting persistent backdoors, or tampering with binaries. Temporary writes go to a size-limited tmpfs mount that the kernel discards when the sandbox exits.
In Docker, this is --read-only --tmpfs /tmp:size=512m. The agent gets scratch space for temporary files, but nothing it writes survives past the current execution. This also means each new execution starts from a known clean state, which makes debugging and auditing much easier.
Key Insight
Observability
You have locked down the sandbox. But how do you know what the agent actually did during a run? If something goes wrong, or if a user reports suspicious behavior, or if you need to investigate an incident, you need a complete record of every action the agent took.
A production sandbox audit trail captures four types of signals:
- SyscallsTools like
strace,seccomp-bpf, or gVisor's built-in logging record every kernel interaction: what files were opened, what processes were spawned, what sockets were created. An unusual burst offork()calls is a fork bomb. Aconnect()to an unexpected IP is a data exfiltration attempt. - NetworkThe egress proxy logs every request: method, URL, response code, latency, and bytes transferred. DNS queries are logged separately. This gives you a complete picture of every external service the agent talked to.
- FilesystemAfter the sandbox exits, diff the filesystem against its initial state. What files were created, modified, or deleted? An overlay filesystem (used by Docker by default) makes this diff trivial: just inspect the upper layer.
- ResourcesRecord CPU usage, memory RSS, disk I/O, and network bytes over time. A sudden spike in memory from 50MB to 480MB in 2 seconds is a red flag. Correlate resource anomalies with syscall and network events to reconstruct what happened.
Drag the scrubber to replay a sandbox execution and see how these signals come together:
Audit Replay
Scrub the timeline to replay agent activity. Click any event for details.
Click an event
to inspect details
Every syscall, file access, and network connection is recorded. The sandbox can replay any run to audit exactly what the agent did, and catch what it tried to do.
Structured logging for AI agents
Beyond kernel-level tracing, log the agent's LLM calls as structured events: the prompt, the completion, the model used, token count, and latency. This is your application-level audit trail. If a prompt injection caused the agent to misbehave, you can trace it back to the exact message that triggered the behavior.
Store all logs with the sandbox execution ID as a correlation key. When you need to investigate, you can pull up a single execution and see everything: what the user asked, what code the agent wrote, what syscalls it made, what network requests it sent, and what resources it consumed.
Watch Out
Building It
You understand the layers. Now the interviewer asks: “How would you actually build this? What infrastructure would you use?” There are two paths: use an off-the-shelf sandbox platform, or build your own from cloud primitives.
Off-the-shelf options
If you do not need to build the sandbox layer yourself, several platforms provide it as a service:
- E2BPurpose-built for AI agent sandboxes. Firecracker microVMs, sub-second cold starts, built-in filesystem snapshots, and a simple API. You get strong isolation without managing infrastructure.
- ModalServerless containers with GPU support. Good if the agent needs to run ML inference or heavy compute inside the sandbox. Handles scaling and cold starts for you.
- Fly MachinesFirecracker microVMs with a REST API. You can start, stop, and snapshot machines programmatically. Low-level enough to customize, managed enough that you do not run your own hypervisor fleet.
- DaytonaDevelopment environment platform that provisions isolated, standardized sandboxes with built-in identity, networking, and lifecycle management. Good fit if you want managed infrastructure with fine-grained access controls out of the box.
For most teams, starting with one of these is the right call. You focus on the agent logic, and the platform handles isolation, cold starts, and cleanup.
The safe DIY answer: Docker on Fargate
If the interviewer pushes on building it yourself, or you need more control, the pragmatic answer is Docker containers on AWS Fargate (or Cloud Run on GCP). Here is why this is a solid default:
- 1.No servers to manage. Fargate runs each task on its own Firecracker microVM under the hood, so you get VM-level isolation between tasks without managing EC2 instances. Each sandbox is a Fargate task with its own network namespace.
- 2.Resource limits are built in. You set CPU and memory in the task definition. Fargate enforces them at the VM level. No cgroup configuration needed.
- 3.Network isolation via VPC. Put the Fargate tasks in a private subnet with no internet gateway. Route outbound traffic through a NAT gateway or, better, through an egress proxy running as a separate service. Security groups block the metadata service IP.
- 4.Secrets via IAM. Each task gets its own IAM task role with minimal permissions. Secrets come from AWS Secrets Manager, injected as environment variables at task launch. No static keys anywhere.
- 5.Logging for free. Fargate sends stdout/stderr to CloudWatch. Add an egress proxy log and a filesystem diff step at the end, and you have a reasonable audit trail.
The tradeoff is cold start latency. Fargate tasks take 10-30 seconds to launch, which is too slow for interactive use. To fix this, keep a warm pool: run N idle tasks that sit waiting for work, and replenish the pool as tasks are consumed. This is essentially building your own provisioned concurrency.
Be upfront about the cost: a warm pool means you are paying for idle compute 24/7. If you keep 20 warm Fargate tasks at 1 vCPU / 2GB each, that is ~$1,400/month sitting idle. You also add operational complexity: you need a pool manager that monitors pool size, replenishes consumed tasks, drains stale ones, and scales with traffic patterns. Autoscaling the pool (e.g., larger during business hours, smaller overnight) helps control costs but adds another moving part. This is exactly why most teams start with an off-the-shelf platform and only build custom when they have a clear reason.
If you need faster cold starts
For sub-second startup, you need to go lower. Run Firecracker directly on EC2 bare-metal instances (e.g., i3.metal) and manage the VM lifecycle yourself. This is what the off-the-shelf platforms do under the hood. You get snapshot/restore for ~125ms cold starts, but you also take on managing the hypervisor fleet, snapshot storage, and VM scheduling.
The middle ground is Lambda with container images. Package your sandbox runtime as a Lambda container (up to 10GB image), use provisioned concurrency for warm starts, and get Firecracker isolation for free. The 15-minute execution limit and 10GB ephemeral storage cap may or may not fit your workload.
Key Insight
The Full Picture
We started with a container and a dream. Now we have seven defense layers, each solving a specific failure mode that the previous layers left open. This is defense in depth: no single layer is sufficient on its own, but together they form a system that is much harder to compromise.
The orchestrator at the outermost layer ties everything together. It provisions a fresh sandbox for each execution, injects scoped credentials, configures the egress proxy allowlist, sets cgroup limits, starts the audit logger, and sets a wallclock timeout. When the execution finishes (or times out), it collects the filesystem diff, archives the audit log, revokes the credentials, and tears down the sandbox.
Press Build to watch the full architecture assemble layer by layer:
The Full Picture
All sandbox layers assembled. Watch them wrap around the agent.
Press "Build" to assemble the sandbox layers one by one
Flows appear once all layers are assembled
Key Insight
Knowledge Check
8 questions. Test whether you understood the tradeoffs, not just the terminology. Your answers are saved locally.