Sandboxes

Cookbook

Task-oriented recipes for common sandbox jobs - running model-generated code, wiring a sandbox into an agent loop, installing packages, parking a session between turns, and fanning out across many sandboxes at once. Each recipe is copy-paste ready in Python and JavaScript. If you are new here, start with the Quickstart and come back for patterns.

Public beta
Sandboxes are in public beta and free to start. Enroll in one click from the Sandboxes console, then mint a token scoped to sandboxes:read and sandboxes:write.

Run model-generated code safely

The core use case: a model hands you code, you need its output but not its side effects. Write it into a fresh sandbox, run it, read back only the result. The context manager destroys the sandbox on exit, so nothing the code touched - files, processes, network state - outlives the call.

run_generated
from orkestr import Sandbox

# A code string your model produced. Never exec this on your own host -
# run it in a throwaway sandbox and read back only the result.
generated = """
import statistics
nums = [4, 8, 15, 16, 23, 42]
print("mean:", statistics.mean(nums))
"""

with Sandbox.create(template="python-3.12") as sbx:
    sbx.files.write("/workspace/gen.py", generated)
    result = sbx.exec("python /workspace/gen.py")
    answer = result.stdout if result.exit_code == 0 else f"error: {result.stderr}"
    print(answer)
# The sandbox is destroyed on block exit - nothing the code did survives.

Wire a sandbox into an agent loop

Give the model a single long-lived sandbox and let it run a sequence of commands, feeding each result back as context. Create once, exec many, terminate when the session ends. Every command, file write and lifecycle event is recorded - watch the run unfold on the sandbox's activity timeline in the console while the agent works.

agent_loop
from orkestr import Sandbox

# One sandbox for the whole agent session, reused across tool calls.
# restricted network lets the code pip-install and reach allowed APIs.
sbx = Sandbox.create(template="python-3.12", network="restricted")
try:
    history = []
    while not task_complete(history):
        # 1) your model decides the next shell command
        command = agent.next_command(history)

        # 2) run it in the sandbox, feed the result back to the model
        result = sbx.exec(command, timeout_seconds=120)
        history.append({
            "command": command,
            "stdout": result.stdout,
            "stderr": result.stderr,
            "exit_code": result.exit_code,
        })
finally:
    sbx.terminate()   # always free the sandbox when the session ends

Install packages with restricted egress

Use network="restricted" when the code needs to pull dependencies but you do not want to hand it open internet. Package registries, GitHub and the major LLM APIs are reachable through an allowlisting proxy; everything else is refused. Proxy-aware tools (pip, npm, curl, standard HTTP libraries) work with no setup.

packages
with Sandbox.create(template="python-3.12", network="restricted") as sbx:
    # In restricted mode pip / npm / curl go through an allowlisting proxy:
    # package registries, GitHub and the major LLM APIs are reachable,
    # everything else is blocked. No proxy setup needed - HTTP_PROXY is
    # already set inside the sandbox.
    sbx.exec("pip install --quiet requests")
    sbx.files.write(
        "/workspace/check.py",
        "import requests; print(requests.get('https://api.github.com').status_code)",
    )
    print(sbx.exec("python /workspace/check.py").stdout)  # 200

Park a session between agent turns

For agents that work in bursts, pause the sandbox between turns to stop the compute meter and resume from the exact same state - installed packages, files, everything - minutes or hours later, even from a different process. pause() returns the sandbox id; persist it with your agent state and pass it to Sandbox.resume().

pause_resume
# Turn 1: set up an environment, then park it to stop the compute meter.
sbx = Sandbox.create(template="node-22", timeout_seconds=3600)
sbx.exec("npm init -y && npm install lodash")
sandbox_id = sbx.pause()          # snapshot taken, meter stops
save_to_session(sandbox_id)        # your DB / Redis / agent memory

# Turn 2, minutes or hours later, possibly in another process:
sbx = Sandbox.resume(load_from_session())
out = sbx.exec("node -e \"console.log(require('lodash').VERSION)\"")
print(out.stdout)   # the installed deps are still there

Data in, artifact out

Upload input, run a script, read back the artifact it produced. The whole /workspace directory is yours to write to; the sandbox never sees your other inputs or outputs.

analyze
csv = "name,score\nada,91\nlinus,88\ngrace,95\n"

analyze = '''
import csv
rows = list(csv.DictReader(open("/workspace/scores.csv")))
top = max(rows, key=lambda r: int(r["score"]))
open("/workspace/winner.txt", "w").write(top["name"])
'''

with Sandbox.create(template="python-3.12") as sbx:
    sbx.files.write("/workspace/scores.csv", csv)
    sbx.files.write("/workspace/analyze.py", analyze)
    sbx.exec("python /workspace/analyze.py")
    print(sbx.files.read("/workspace/winner.txt"))   # grace

Stream a long-running command

For builds, test suites or training runs, stream output as it arrives instead of waiting for the whole thing to finish. Iterate to the final chunk to get the exit code - and always iterate to completion, since breaking early leaves the in-sandbox process running until its own timeout fires.

stream
with Sandbox.create(template="python-3.12") as sbx:
    sbx.files.write(
        "/workspace/build.py",
        "import time\nfor i in range(5):\n    print(f'step {i}', flush=True); time.sleep(1)",
    )
    for chunk in sbx.exec_stream("python /workspace/build.py"):
        if chunk.stream == "stdout":
            print(chunk.data, end="", flush=True)
        if chunk.is_final and chunk.exit_code != 0:
            raise RuntimeError("build failed")

Fan out across many sandboxes

Each sandbox is fully isolated, so running several at once is natural - evaluate N model candidates, test N branches, process N inputs in parallel. Stay within your plan's concurrency cap; check Sandbox.limits().max_concurrent before fanning out wide.

fan_out
from concurrent.futures import ThreadPoolExecutor
from orkestr import Sandbox

def run_candidate(code: str) -> str:
    with Sandbox.create(template="python-3.12") as sbx:
        sbx.files.write("/workspace/c.py", code)
        return sbx.exec("python /workspace/c.py").stdout

# Evaluate several model candidates in parallel. Stay within your plan's
# concurrency cap - check Sandbox.limits().max_concurrent first.
with ThreadPoolExecutor(max_workers=3) as pool:
    outputs = list(pool.map(run_candidate, candidates))

Handle timeouts and limits

A timed-out command does not kill the sandbox - it stays alive so you can collect partial state before deciding what to do. Catch ExecTimeout for that, and PlanLimitError when you are out of concurrent sandboxes or monthly budget. See the SDK reference for the full error hierarchy.

errors
from orkestr import Sandbox, ExecTimeout, PlanLimitError

try:
    with Sandbox.create(template="python-3.12") as sbx:
        try:
            result = sbx.exec("python train.py", timeout_seconds=300)
        except ExecTimeout:
            # The command timed out but the sandbox is still alive -
            # grab partial state before the block exits and terminates it.
            logs = sbx.files.read("/workspace/train.log")
            raise
except PlanLimitError as e:
    # Out of concurrent sandboxes or monthly budget.
    print(f"hit a plan limit: {e}")

Production tips

  • Prefer the context manager (with) / withTemp so a crash in your agent loop still terminates the sandbox and bounds your bill.
  • Mint tokens scoped only to sandboxes:read / sandboxes:write for agent runtimes - a leaked scoped token cannot reach the rest of your account.
  • Set a tight timeout_seconds on both the sandbox and each exec; agent-written commands hang more often than yours do.
  • Call Sandbox.limits() once at startup to pick a size and concurrency that fit the running token's plan.
  • Use pause() for idle sessions instead of keeping a sandbox running - a paused sandbox does not accrue compute.

Next steps