How We Run Untrusted Haskell Code Safely

When you run a code challenge on Typify, you type Haskell into a browser, hit "Run," and get back results telling you which test cases passed. Behind the scenes, your code is compiled and executed on our server. This is a problem. Running arbitrary code from the internet on your own machines is one of the most dangerous things a server can do.

We built runGhcBWrap to solve it. It is a sandboxed Haskell execution system that compiles and runs untrusted code inside a bubblewrap container with no network access, no access to the host filesystem, and a multi-phase compilation strategy designed to prevent a class of attacks that most sandboxes do not even think about.

The Obvious Problem

The surface-level problem is straightforward: user code should not be able to read your database credentials, send network requests, delete files, or consume unlimited resources. Most code execution sandboxes handle this with containers or VMs. We use bubblewrap (bwrap), a lightweight sandboxing tool that provides filesystem isolation without the overhead of a full container runtime.

Inside the sandbox:

/nix/store is mounted read-only, providing GHC and all Haskell packages
Two temporary directories are the only writable locations: one for the project source, one for GHC's scratch space
There is no network access
There is no home directory
There is no access to any file on the host machine outside these mounts
Execution has a strict timeout

Every binary path (GHC, bwrap, ghc-pkg) is resolved at compile time using Template Haskell via the which package. There are no string-based binary lookups at runtime, so PATH manipulation attacks are not possible.

This handles the obvious threats. But Haskell has a feature that creates a much more interesting attack surface.

The Template Haskell Problem

GHC's Template Haskell (TH) runs arbitrary Haskell code at compile time. This is a powerful feature for metaprogramming, but in a code challenge context, it is a security hole. TH splices can call readFile, getDirectoryContents, or any IO action during compilation.

Why does this matter? Because our test system works by compiling the user's code alongside a reference solution. The user's function is imported into a Main module that also imports the expected answers. If the user's code contains a TH splice, it runs during compilation, and at that point the reference solution module exists as a .hs file on disk.

A malicious user could write:

{-# LANGUAGE TemplateHaskell #-}
module UserModule where

import Language.Haskell.TH
import System.Directory (listDirectory)

myFunc x = $(do
  files <- runIO $ listDirectory "."
  -- find the test module, read it, extract answers
  runIO $ putStrLn $ show files
  [| x |]
  )

This splice would execute during compilation, list the directory, find the test module file, read the reference solution, and produce "correct" answers. The sandbox prevents reading /etc/passwd, but it cannot prevent reading files that are inside the sandbox with the user's code.

Four-Phase Compilation

Our solution is architectural: we split compilation into four phases so that trusted and untrusted code never coexist on disk when TH can run.

Phase 1: Compile user code alone. Only the user's .hs files are written to the project directory. No test modules, no reference solutions, no Main.hs. We run ghc -c on the user modules. Any TH splices execute during this phase, but there is nothing secret to find. The object files (.o, .hi) are produced and left in place.

Phase 2: Write the trusted modules. After Phase 1 completes, we write the test library module (containing the reference solution and test data) and Main.hs to disk. These files now exist alongside the already-compiled user object files.

Phase 3: Compile and link. We run ghc --make Main.hs -o Main. GHC's recompilation checker sees that the user module .o files are newer than the .hs files, so it skips recompiling them. TH does not re-execute. GHC only compiles the newly-written trusted modules and links everything together.

Phase 4: Execute. The compiled binary runs inside the sandbox with the timeout enforced.

The key insight is that GHC's recompilation avoidance is not just a performance optimization. It is a security boundary. By ensuring that user code is compiled before trusted code exists on disk, we guarantee that TH splices in user code never have access to the reference solution.

Randomized Names as Defense in Depth

Even without the phase separation, stealing the reference solution would be difficult because the test module uses randomly generated names. The module name, function names, and variable names are all random alphanumeric strings generated fresh for each execution.

A user cannot write import TestSolution because the module might be called M_a7kQm3xP this time and M_rW9fBn2L next time. Combined with the phase separation (the file does not exist when TH runs), this provides layered security: two independent mechanisms, either of which would make the attack impractical.

The Test Harness

The other half of the system is runGhcBWrap-core, which handles code generation. Its job is to construct the Haskell source files that get compiled inside the sandbox.

For a code challenge, the generated code follows this structure:

UserModule.hs: The user's submitted code, either as-is (if they wrote a full module) or wrapped in a generated module header (if they wrote bare expressions).
A randomly-named test library module: Contains the reference solution and test input data. This module exposes a function that produces test cases and the expected implementation.
Main.hs: Imports both modules, runs both implementations against the same inputs, and compares results. Output is JSON-encoded so the server can parse which test cases passed.

The comparison handles arbitrary function arities (1 through 20 arguments) by generating lambda expressions that unpack tuples. A 3-argument function gets wrapped as \(a, b, c) -> f a b c, converting between the test harness's tuple-based input format and the user function's curried signature. Both pure and monadic (IO) functions are supported through a purity flag that determines whether results need to be lifted into IO.

User Input Parsing

Users submit code in various forms. Some write complete modules with headers and imports. Others write bare function definitions. The system needs to handle both without the user thinking about it.

The parser classifies input by scanning for module declarations, import statements, and language pragmas. If any are found, the input is treated as a complete module and used as-is. Otherwise, it is treated as bare expressions and wrapped in a generated module with the appropriate header, extensions, and imports.

This means a user can submit:

myFunc :: Int -> Int -> Int
myFunc x y = x + y

and the system wraps it in:

module UserModule where
myFunc :: Int -> Int -> Int
myFunc x y = x + y

Or the user can submit a full module and the system respects it:

module UserModule where
import Data.List (sort)
myFunc :: [Int] -> [Int]
myFunc = sort

Type-Level Signatures

The core library includes a type-level DSL for describing and validating function signatures. This system uses GHC's type families to verify at compile time that test case types are consistent: if a polymorphic slot a is instantiated as Int in one position, it must be Int everywhere. The types support higher-kinded type applications (for testing functions over Maybe, lists, etc.) and typeclass constraints.

This means the code challenge system can test polymorphic functions by instantiating type variables to concrete types for testing, while verifying that the instantiation is consistent. A challenge asking for a function of type a -> [a] -> [a] might test it with Int -> [Int] -> [Int], and the type system ensures the test harness is well-typed before it ever reaches the sandbox.

Why Not Docker?

Docker would work for basic isolation, but it is heavy. Each execution would require either a pre-built image or a build step. Bubblewrap gives us the isolation we need (filesystem, no network) with minimal overhead because it operates at the namespace level rather than requiring a full container runtime. Since everything we need (GHC, packages) is already in the nix store, we just mount it read-only and go.

The result is fast execution with strong isolation. No image pulls, no layer caching, no daemon. Just a namespace with the right mounts.

The Architecture in Summary

User code (browser)
  |
  v
Backend receives code + challenge ID
  |
  v
runGhcBWrap-core generates:
  - UserModule.hs (from user input)
  - M_<random>.hs (reference solution + test data)
  - Main.hs (comparison harness)
  |
  v
runGhcBWrap executes 4-phase compilation in bwrap sandbox:
  Phase 1: ghc -c UserModule.hs (TH runs, nothing secret exists)
  Phase 2: Write M_<random>.hs and Main.hs to disk
  Phase 3: ghc --make Main.hs -o Main (skips recompiling user code)
  Phase 4: ./Main (produces JSON results)
  |
  v
Backend parses JSON, returns results to browser

Every step is designed so that the untrusted parts (user code, TH execution) and the trusted parts (reference solutions, test data) are temporally separated. Security comes not from trying to restrict what TH can do, but from ensuring there is nothing dangerous to find when it runs.

Open Source

Both repositories are open source:

runGhcBWrap: The sandbox execution layer
runGhcBWrap-core: Code generation and test harness construction

If you are building a system that needs to run untrusted Haskell code, or if you are curious about what it takes to make Template Haskell safe in a multi-tenant environment, the source is there. The approach generalizes to any language with compile-time code execution.

The best sandbox is one where there is nothing worth stealing.