With the amount of new subnets being added it can be hard to get up to date information across all subnets, so data may be slightly out of date from time to time
What TrajectoryRL Is
TrajectoryRL is a decentralized prompt optimization tournament built as Subnet 11 on the Bittensor network. It addresses the escalating cost of LLM-driven AI agent deployments by transforming prompt engineering into a verifiable, on-chain competition. Miners submit self-contained policy bundles — small natural-language instructions that guide LLM behavior — while validators enforce deterministic safety and correctness gates. By awarding rewards exclusively to the lowest-cost qualified submissions, TrajectoryRL motivates a global community of “prompt engineers” to iteratively discover cost-effective prompting strategies, skill compositions, and multi-model routing schemes that deliver production-grade agent policies at minimal inference expense.
Miner and Validator Workflow
Within TrajectoryRL, participants operate as either miners or validators, forming a closed loop that integrates on-chain commitments with off-chain evaluation. Miners author policy packs — AGENTS.md, SOUL.md, and tool_policy files — and upload them to any HTTP endpoint before committing their SHA256 hash on-chain. Validators periodically fetch these commitments, download the corresponding packs, and run them through a suite of five ClawBench scenarios under identical conditions. Each scenario applies deterministic regex-based rubric checks for safety and correctness; only packs that pass all gates qualify for cost measurement. Finally, validators aggregate scenario costs via an exponential moving average (EMA) and commit weight updates through a commit-reveal Yuma Consensus 3 (YC3) cycle, completing the loop.
Miner Responsibilities
Miners focus exclusively on optimizing policy bundles that instruct AI agents to perform real-world tasks efficiently and safely. These bundles encapsulate system prompts, behavioral rules, tool-usage constraints, and stop conditions, all expressed in markdown and JSON formats. With a strict 32 KB size cap and content-addressed commitment, miners iterate on prompt phrasing, instruction compression, skill definitions, and multi-LLM routing strategies to reduce total token utilization. Typical optimizations include redundancy elimination, stop-rule calibration, and dynamic model selection (e.g. routing triage steps to smaller, cheaper LLMs), which can yield 50–70% cost reductions at stage 1 and up to 93% with full hybrid routing. No GPU or persistent server is required; miners pay zero ongoing infrastructure cost.
Validator Responsibilities
Validators are responsible for running the deterministic ClawBench evaluation pipeline and scoring miner submissions. Each validator node autonomously reads on-chain commitments and retrieves policy packs via HTTP, verifying integrity with SHA256. In a two-phase evaluation, Phase 1 enforces a qualification gate wherein every safety and correctness check must pass or the pack is disqualified. Phase 2 measures inference cost — input, output, and caching token utilization — across fixed scenarios. Costs are smoothed through scenario-level and pack-level EMAs (α = 0.3), and validators coordinate weight updates on-chain through a commit-reveal scheme aggregated by YC3 with Liquid Alpha, ensuring a transparent and censorship-resistant scoring process.
Incentives and Output
TrajectoryRL employs a winner-take-all incentive mechanism: the lowest-cost qualified miner captures 100% of the alpha emission for each epoch, driving continuous evolutionary pressure. A first-mover advantage threshold (δ = 0.10) prevents trivial cost-based dethroning, and an initial bootstrap phase distributes rewards to the top 3 (70/20/10) until sufficient miner diversity emerges. The subnet emits a fixed alpha volume each day, converted to TAO by stakers and delegated to validators. Compared to centralized prompt engineering — where a single team iterates on costly GPU runs — TrajectoryRL democratizes optimization, harnesses global collaboration, and eliminates infrastructure overhead while producing open-source policy bundles for immediate deployment.
What TrajectoryRL Is
TrajectoryRL is a decentralized prompt optimization tournament built as Subnet 11 on the Bittensor network. It addresses the escalating cost of LLM-driven AI agent deployments by transforming prompt engineering into a verifiable, on-chain competition. Miners submit self-contained policy bundles — small natural-language instructions that guide LLM behavior — while validators enforce deterministic safety and correctness gates. By awarding rewards exclusively to the lowest-cost qualified submissions, TrajectoryRL motivates a global community of “prompt engineers” to iteratively discover cost-effective prompting strategies, skill compositions, and multi-model routing schemes that deliver production-grade agent policies at minimal inference expense.
Miner and Validator Workflow
Within TrajectoryRL, participants operate as either miners or validators, forming a closed loop that integrates on-chain commitments with off-chain evaluation. Miners author policy packs — AGENTS.md, SOUL.md, and tool_policy files — and upload them to any HTTP endpoint before committing their SHA256 hash on-chain. Validators periodically fetch these commitments, download the corresponding packs, and run them through a suite of five ClawBench scenarios under identical conditions. Each scenario applies deterministic regex-based rubric checks for safety and correctness; only packs that pass all gates qualify for cost measurement. Finally, validators aggregate scenario costs via an exponential moving average (EMA) and commit weight updates through a commit-reveal Yuma Consensus 3 (YC3) cycle, completing the loop.
Miner Responsibilities
Miners focus exclusively on optimizing policy bundles that instruct AI agents to perform real-world tasks efficiently and safely. These bundles encapsulate system prompts, behavioral rules, tool-usage constraints, and stop conditions, all expressed in markdown and JSON formats. With a strict 32 KB size cap and content-addressed commitment, miners iterate on prompt phrasing, instruction compression, skill definitions, and multi-LLM routing strategies to reduce total token utilization. Typical optimizations include redundancy elimination, stop-rule calibration, and dynamic model selection (e.g. routing triage steps to smaller, cheaper LLMs), which can yield 50–70% cost reductions at stage 1 and up to 93% with full hybrid routing. No GPU or persistent server is required; miners pay zero ongoing infrastructure cost.
Validator Responsibilities
Validators are responsible for running the deterministic ClawBench evaluation pipeline and scoring miner submissions. Each validator node autonomously reads on-chain commitments and retrieves policy packs via HTTP, verifying integrity with SHA256. In a two-phase evaluation, Phase 1 enforces a qualification gate wherein every safety and correctness check must pass or the pack is disqualified. Phase 2 measures inference cost — input, output, and caching token utilization — across fixed scenarios. Costs are smoothed through scenario-level and pack-level EMAs (α = 0.3), and validators coordinate weight updates on-chain through a commit-reveal scheme aggregated by YC3 with Liquid Alpha, ensuring a transparent and censorship-resistant scoring process.
Incentives and Output
TrajectoryRL employs a winner-take-all incentive mechanism: the lowest-cost qualified miner captures 100% of the alpha emission for each epoch, driving continuous evolutionary pressure. A first-mover advantage threshold (δ = 0.10) prevents trivial cost-based dethroning, and an initial bootstrap phase distributes rewards to the top 3 (70/20/10) until sufficient miner diversity emerges. The subnet emits a fixed alpha volume each day, converted to TAO by stakers and delegated to validators. Compared to centralized prompt engineering — where a single team iterates on costly GPU runs — TrajectoryRL democratizes optimization, harnesses global collaboration, and eliminates infrastructure overhead while producing open-source policy bundles for immediate deployment.
Live Components and Development Roadmap
As of Q1 2026, TrajectoryRL has completed its first two milestone phases and entered Mainnet Launch under Milestone 3. Foundation features—deterministic evaluation of five ClawBench scenarios, on-chain pack commitments via set_commitment, binary quality gates, EMA-smoothed scoring (α = 0.3, ρ = 0.1), first-mover δ = 0.05 protection, and NCD-based anti-copy detection—shipped during M1. M2 introduced pure cost-based scoring (lower $/episode wins), cost-based δ = 0.10 dethroning, and updated leaderboards showing qualification status, cost per episode in USD, and percentage improvements over baselines. Under M3 in progress, full Docker-based validator deployment, live miner pack submissions, independent evaluation loops, automated weight-setting, node status API activation, and live leaderboard with EMA convergence will come online.
Technical Architecture
TrajectoryRL’s infrastructure is built on a four-layer stack that leverages existing open source components with minimal custom code. At the base, the Bittensor network provides a secure, decentralized blockchain layer for on-chain commitments, staking, and alpha emissions. The incentive layer (SN11) implements protocol logic for miners and validators, interfacing with the network via BTCLI and the Bittensor Python SDK. ClawBench (v0.3.0) serves as the deterministic evaluation engine, executing fixed scenarios with regex-based rubric checks. Above it, OpenClaw (222K+ stars) functions as the AI agent runtime, orchestrating LLM calls, tool invocations, and cost accounting. All infrastructure components are containerized and orchestrated via Docker and GitHub Actions for automated builds and Watchtower-driven updates.
Repository Structure
The GitHub repository trajectoryRL/trajectoryRL organizes code, documentation, and deployment artifacts across seven main directories: .github/workflows for CI/CD, clawbench for scenario definitions, docker for validator and miner compose files, neurons for core mining and validation scripts, openclaw submodule for agent runtime, scripts for auxiliary tooling, and tests for integration and unit suites. Key files include README.md (high-level overview and quick-start), INCENTIVE_MECHANISM.md (full reward and anti-gaming specification), MINER_OPERATIONS.md and VALIDATOR_OPERATIONS.md (detailed runbooks), pack.json template, and .env.example for both miner and validator environments. The repository has accumulated over 335 commits, 9 stars, and 9 forks, reflecting rapid iteration and growing community engagement.
Key Metrics and Tokenomics
According to TaoStats, TrajectoryRL’s alpha token trades at approximately 0.0118 TAO, with a daily alpha emission rate of about 1,314 α per day distributed among active miners. Node registration for validators and miners requires a burn of 0.2 TAO per registration, adjustable every 360 blocks with an adjustment alpha of 0.97 and a target of 1 registration per interval. The subnet currently allows three concurrent registrations, with a 10,800-block immunity period preventing early deregistration. These parameters govern the supply of alpha, miner churn, and long-term stability of the competition.
Validator Scoring and Integration Points
Validators implement a commit-reveal pattern aggregated by YC3 with Liquid Alpha to set on-chain weights every tempo (~72 minutes). Cost is the sole competitive axis among qualified miners: validators measure inference cost — including input, output, and cache tokens converted to USD — across five scenarios and smooth values via EMA (α = 0.3). First-mover protection enforces that any new submission must be 10% cheaper than the incumbent to dethrone it. Developers can integrate via the Bittensor CLI (BTCLI) for wallet and hotkey management, the Bittensor Python SDK for programmatic chain interactions, and a node status reporting HTTP API provided by the M3 deployment for real-time monitoring.
Live Components and Development Roadmap
As of Q1 2026, TrajectoryRL has completed its first two milestone phases and entered Mainnet Launch under Milestone 3. Foundation features—deterministic evaluation of five ClawBench scenarios, on-chain pack commitments via set_commitment, binary quality gates, EMA-smoothed scoring (α = 0.3, ρ = 0.1), first-mover δ = 0.05 protection, and NCD-based anti-copy detection—shipped during M1. M2 introduced pure cost-based scoring (lower $/episode wins), cost-based δ = 0.10 dethroning, and updated leaderboards showing qualification status, cost per episode in USD, and percentage improvements over baselines. Under M3 in progress, full Docker-based validator deployment, live miner pack submissions, independent evaluation loops, automated weight-setting, node status API activation, and live leaderboard with EMA convergence will come online.
Technical Architecture
TrajectoryRL’s infrastructure is built on a four-layer stack that leverages existing open source components with minimal custom code. At the base, the Bittensor network provides a secure, decentralized blockchain layer for on-chain commitments, staking, and alpha emissions. The incentive layer (SN11) implements protocol logic for miners and validators, interfacing with the network via BTCLI and the Bittensor Python SDK. ClawBench (v0.3.0) serves as the deterministic evaluation engine, executing fixed scenarios with regex-based rubric checks. Above it, OpenClaw (222K+ stars) functions as the AI agent runtime, orchestrating LLM calls, tool invocations, and cost accounting. All infrastructure components are containerized and orchestrated via Docker and GitHub Actions for automated builds and Watchtower-driven updates.
Repository Structure
The GitHub repository trajectoryRL/trajectoryRL organizes code, documentation, and deployment artifacts across seven main directories: .github/workflows for CI/CD, clawbench for scenario definitions, docker for validator and miner compose files, neurons for core mining and validation scripts, openclaw submodule for agent runtime, scripts for auxiliary tooling, and tests for integration and unit suites. Key files include README.md (high-level overview and quick-start), INCENTIVE_MECHANISM.md (full reward and anti-gaming specification), MINER_OPERATIONS.md and VALIDATOR_OPERATIONS.md (detailed runbooks), pack.json template, and .env.example for both miner and validator environments. The repository has accumulated over 335 commits, 9 stars, and 9 forks, reflecting rapid iteration and growing community engagement.
Key Metrics and Tokenomics
According to TaoStats, TrajectoryRL’s alpha token trades at approximately 0.0118 TAO, with a daily alpha emission rate of about 1,314 α per day distributed among active miners. Node registration for validators and miners requires a burn of 0.2 TAO per registration, adjustable every 360 blocks with an adjustment alpha of 0.97 and a target of 1 registration per interval. The subnet currently allows three concurrent registrations, with a 10,800-block immunity period preventing early deregistration. These parameters govern the supply of alpha, miner churn, and long-term stability of the competition.
Validator Scoring and Integration Points
Validators implement a commit-reveal pattern aggregated by YC3 with Liquid Alpha to set on-chain weights every tempo (~72 minutes). Cost is the sole competitive axis among qualified miners: validators measure inference cost — including input, output, and cache tokens converted to USD — across five scenarios and smooth values via EMA (α = 0.3). First-mover protection enforces that any new submission must be 10% cheaper than the incumbent to dethrone it. Developers can integrate via the Bittensor CLI (BTCLI) for wallet and hotkey management, the Bittensor Python SDK for programmatic chain interactions, and a node status reporting HTTP API provided by the M3 deployment for real-time monitoring.
Project Ownership
TrajectoryRL is maintained by the trajectoryRL organization on GitHub (UID 74 in the Bittensor registry), an entity dedicated to developing decentralized reinforcement learning workflows on the Bittensor network. The subnet owner’s hotkey (coldkey UID 74) controls on-chain parameters and deployment, and validators temporarily assign weights to this owner during shadow mode for stability testing. While trajectoryRL’s GitHub organization has no public members listed, core maintainers appear across repositories such as clawbench and openclaw, indicating a multidisciplinary team of reinforcement learning engineers and blockchain developers.
GitHub and Community Contributions
The trajectoryRL/trajectoryRL repository has garnered 9 stars, 9 forks, and over 335 commits, reflecting sustained development since 2024. Contributions span code reviews, pull requests, and documentation updates across a diverse contributor base, underscoring a community-driven ethos. The project’s Twitter presence (@TrajectoryRL) has grown to approximately 140 followers, broadcasting milestone updates, new releases, and community calls to miners and validators, further amplifying developer engagement and real-time discussion threads on X.
Rebranding and Evolution
Subnet 11 was originally launched as “Dippy Roleplay” under the impel-intelligence GitHub organization, focusing on open-source roleplay LLM evaluation and addressing digital loneliness through companion AI models. That project was archived in September 2025, and stewardship of the codebase transferred to trajectoryRL, which reoriented the subnet towards cost-optimized agent policies under the new moniker “TrajectoryRL.” Foundational evaluation pipelines and test suites from Dippy Roleplay were refactored into ClawBench and integrated with OpenClaw for trajectory-centric reinforcement learning scenarios.
Partnerships and Ecosystem Engagement
TrajectoryRL benefits from partnerships with the Opentensor Foundation and Latent Holdings, which maintain core ecosystem tools such as the Bittensor CLI, SDK, and TAO.app block explorer. The subnet is featured on analytics platforms like TaoStats and Backprop Finance, extending visibility to staking and trading communities. Validators and miners coordinate on the official Bittensor Discord server, and ecosystem X channels like Tao Alerts deliver real-time updates. Community-contributed scenarios, governance proposals, and educational content further strengthen TrajectoryRL’s role as a cornerstone in the decentralized AI landscape.
Project Ownership
TrajectoryRL is maintained by the trajectoryRL organization on GitHub (UID 74 in the Bittensor registry), an entity dedicated to developing decentralized reinforcement learning workflows on the Bittensor network. The subnet owner’s hotkey (coldkey UID 74) controls on-chain parameters and deployment, and validators temporarily assign weights to this owner during shadow mode for stability testing. While trajectoryRL’s GitHub organization has no public members listed, core maintainers appear across repositories such as clawbench and openclaw, indicating a multidisciplinary team of reinforcement learning engineers and blockchain developers.
GitHub and Community Contributions
The trajectoryRL/trajectoryRL repository has garnered 9 stars, 9 forks, and over 335 commits, reflecting sustained development since 2024. Contributions span code reviews, pull requests, and documentation updates across a diverse contributor base, underscoring a community-driven ethos. The project’s Twitter presence (@TrajectoryRL) has grown to approximately 140 followers, broadcasting milestone updates, new releases, and community calls to miners and validators, further amplifying developer engagement and real-time discussion threads on X.
Rebranding and Evolution
Subnet 11 was originally launched as “Dippy Roleplay” under the impel-intelligence GitHub organization, focusing on open-source roleplay LLM evaluation and addressing digital loneliness through companion AI models. That project was archived in September 2025, and stewardship of the codebase transferred to trajectoryRL, which reoriented the subnet towards cost-optimized agent policies under the new moniker “TrajectoryRL.” Foundational evaluation pipelines and test suites from Dippy Roleplay were refactored into ClawBench and integrated with OpenClaw for trajectory-centric reinforcement learning scenarios.
Partnerships and Ecosystem Engagement
TrajectoryRL benefits from partnerships with the Opentensor Foundation and Latent Holdings, which maintain core ecosystem tools such as the Bittensor CLI, SDK, and TAO.app block explorer. The subnet is featured on analytics platforms like TaoStats and Backprop Finance, extending visibility to staking and trading communities. Validators and miners coordinate on the official Bittensor Discord server, and ecosystem X channels like Tao Alerts deliver real-time updates. Community-contributed scenarios, governance proposals, and educational content further strengthen TrajectoryRL’s role as a cornerstone in the decentralized AI landscape.
Milestone 1: Foundation
Milestone 1 established the core infrastructure and incentive primitives for TrajectoryRL. It introduced five deterministic ClawBench scenarios with regex-scored rubrics, enabling reproducible safety and correctness checks without reliance on LLM-based judges. On-chain pack commitments via set_commitment provided content-addressed, SHA256-verifiable submission integrity. The incentive mechanism adopted a winner-take-all model with an initial bootstrap phase distributing rewards to the top 3 miners (70/20/10), along with EMA-smoothed scoring parameters (α = 0.3, ρ = 0.1) and first-mover protection (δ = 0.05). NCD similarity detection (σ = 0.80) blocked plagiarism, and YC3 with Liquid Alpha ensured transparent, committed weight updates.
Milestone 2: Cost-Based Scoring
Milestone 2 pivoted the competition to pure cost optimization. Validators enforced a binary qualification gate where all safety and correctness checks must pass or the pack is disqualified, and adopted per-episode cost tracking measured in USD based on token usage. Among qualified miners, the lowest cost per episode wins, with a cost-based δ = 0.10 threshold protecting incumbents 10% cheaper. Cost EMAs run alongside score EMAs (α = 0.3), and the public leaderboard was enhanced to display qualification status, $/episode metrics, and improvement percentages against a baseline. This shift aligned rewards directly with tangible inference efficiency gains.
Milestone 3: Mainnet Launch In Progress
Under Milestone 3, TrajectoryRL is transitioning from shadow mode to full Mainnet operation. Docker-based validator deployments with Watchtower auto-updates and GitHub Container Registry (GHCR) image streams are being finalized. The ClawBench evaluation pipeline is fully activated, and on-chain submission of HTTP-hosted policy packs is open. Independent validator loops periodically evaluate submissions and set weights on-chain. A node status reporting API, live leaderboard with EMA convergence tracking, and miner inactivity detection (14,400 blocks ≈ 48 h) are in development, enabling robust monitoring, failure alerts, and transparent competition staging.
Future Milestones and Vision
Beyond the initial launch, Milestone 4 will introduce multi-model support as a competitive axis, allowing miners to optimize per-scenario model routing strategies (e.g. Opus for safety, Haiku for triage) and engage new scenario domains in finance, legal, and DevOps. Enhanced rubric checks—anti-hallucination, conciseness, structured outputs—and distilled LoRA models for enterprise deployment (100× cost reduction) will expand use cases. Milestone 5 aims to scale ecosystem growth through community-contributed scenarios, dynamic difficulty scaling, cross-subnet composability, and an enterprise licensing marketplace for winning policy packs, realizing a fully decentralized prompt-engineering ecosystem.
Milestone 1: Foundation
Milestone 1 established the core infrastructure and incentive primitives for TrajectoryRL. It introduced five deterministic ClawBench scenarios with regex-scored rubrics, enabling reproducible safety and correctness checks without reliance on LLM-based judges. On-chain pack commitments via set_commitment provided content-addressed, SHA256-verifiable submission integrity. The incentive mechanism adopted a winner-take-all model with an initial bootstrap phase distributing rewards to the top 3 miners (70/20/10), along with EMA-smoothed scoring parameters (α = 0.3, ρ = 0.1) and first-mover protection (δ = 0.05). NCD similarity detection (σ = 0.80) blocked plagiarism, and YC3 with Liquid Alpha ensured transparent, committed weight updates.
Milestone 2: Cost-Based Scoring
Milestone 2 pivoted the competition to pure cost optimization. Validators enforced a binary qualification gate where all safety and correctness checks must pass or the pack is disqualified, and adopted per-episode cost tracking measured in USD based on token usage. Among qualified miners, the lowest cost per episode wins, with a cost-based δ = 0.10 threshold protecting incumbents 10% cheaper. Cost EMAs run alongside score EMAs (α = 0.3), and the public leaderboard was enhanced to display qualification status, $/episode metrics, and improvement percentages against a baseline. This shift aligned rewards directly with tangible inference efficiency gains.
Milestone 3: Mainnet Launch In Progress
Under Milestone 3, TrajectoryRL is transitioning from shadow mode to full Mainnet operation. Docker-based validator deployments with Watchtower auto-updates and GitHub Container Registry (GHCR) image streams are being finalized. The ClawBench evaluation pipeline is fully activated, and on-chain submission of HTTP-hosted policy packs is open. Independent validator loops periodically evaluate submissions and set weights on-chain. A node status reporting API, live leaderboard with EMA convergence tracking, and miner inactivity detection (14,400 blocks ≈ 48 h) are in development, enabling robust monitoring, failure alerts, and transparent competition staging.
Future Milestones and Vision
Beyond the initial launch, Milestone 4 will introduce multi-model support as a competitive axis, allowing miners to optimize per-scenario model routing strategies (e.g. Opus for safety, Haiku for triage) and engage new scenario domains in finance, legal, and DevOps. Enhanced rubric checks—anti-hallucination, conciseness, structured outputs—and distilled LoRA models for enterprise deployment (100× cost reduction) will expand use cases. Milestone 5 aims to scale ecosystem growth through community-contributed scenarios, dynamic difficulty scaling, cross-subnet composability, and an enterprise licensing marketplace for winning policy packs, realizing a fully decentralized prompt-engineering ecosystem.