What is an RL Runtime?

A runtime is responsible for the hard stuff around execution. It's the infrastructure layer that takes your trained RL policy and makes it work reliably in production environments.

Core Capabilities

Policy Execution

Load policy artifacts from various formats and run inference efficiently:

Support for multiple formats: Stable-Baselines3, TorchScript, ONNX
Optimized inference pipelines
Hardware acceleration (GPU/CPU optimization)

Stateful Sessions

Maintain per-agent state across interactions:

RNN hidden states for recurrent policies
Observation and action normalization statistics
Last observation/action history
Episode context and metadata

Environment Loop Support

Optionally run step loops server-side with consistent timing:

Agent ↔ environment interaction cycles
Controlled frame rates and timing
Reduced network latency for tight loops

Determinism & Replay

Capture what happened and reproduce it exactly:

Random seed tracking and control
Version and configuration snapshots
Complete observation/action/reward traces
Full episode replay capabilities

Performance

Low overhead per step with production-grade optimizations:

Request batching for throughput
Efficient memory layout and allocation
CPU core pinning for consistency
Zero-copy data transfer where possible

Isolation

Crashes or bad policies shouldn't take down the system:

Process-level isolation per session
Resource caps (memory, CPU time)
Timeout enforcement
Graceful degradation

Observability

Minimal but essential signals for production monitoring:

Prometheus-compatible metrics
Distributed tracing support
Structured logging
Health checks and liveness probes

Integration Surface

Stable API/SDK that robotics, simulators, and games can embed against:

RESTful HTTP API for standard integrations
WebSocket support for real-time streaming
Language-agnostic protocol (gRPC, JSON-RPC)
Client SDKs for Python, JavaScript, C++

Why This Matters

Most RL deployment discussions focus solely on "serving a model," but that's only a small part of the picture. A production RL runtime needs to handle the complexity of stateful interactions, provide reproducibility guarantees, maintain performance under load, and integrate seamlessly with diverse systems.

Fournex provides all of these capabilities out of the box, letting you focus on training better policies instead of building infrastructure.