Subnet 14

Cacheon

Alpha Price

Value

Market Cap

Value

Neurons

Value

Registration Cost

Value

TAO Liquidity

Value

Alpha in Pool

Value

Total Alpha Supply

Value

% Alpha Staked

Value

ABOUT

What exactly does it do?

Purpose and Problem
Cacheon (Bittensor Subnet 14) addresses the growing challenge of serving large language models efficiently. Inference speed, latency, and cost have become the bottleneck for AI deployment. Cacheon turns this into an on-chain competition: miners submit Dockerized inference servers for a fixed open-source model (Qwen2.5-72B-Instruct) via an OpenAI-compatible chat API, and Validators run all submissions under identical conditions to benchmark performance. The subnet rewards the fastest servers that still produce correct outputs, effectively forcing a fair, apples-to-apples race to improve inference efficiency.

Miner Contributions
Miners participate by packaging their inference code in a container and committing it on-chain. Each miner pays a registration fee and attaches a container reference digest to the Bittensor chain, indicating their candidate server. The miners’ contribution is pure compute optimization: anything from new kernel algorithms (FlashAttention, quantized cache, etc.) to novel threading can be used, so long as the server serves the specified model faster while preserving answer correctness. In return, the best-performing miners earn almost the entire reward stream. By protocol design, the current leader (fastest server) receives 80% of the competition emission, and the runner-up gets 20%. This winner-take-most scheme means successful miners effectively earn the majority of the subnet’s block rewards so long as they stay unbeaten.

Validator Process
Validators are special nodes that orchestrate the competition. They continuously scan the Bittensor blockchain for new miner submissions (Docker image commits). When validators detect challengers, they automatically rent cloud GPU resources (e.g. an 8×H200 pod) via providers like Targon or Lium. Each validator then pulls the Docker images of all active submissions (including the current leader and runner-up) and evaluates them on the same hardware. The evaluation consists of two passes: measuring end-to-end response time (time-to-first-token plus throughput) against a pinned vLLM baseline, and then checking answer correctness (using log probabilities). Any submission failing correctness receives a zero score. To update stakes, the validator then writes the results to S3 and sets on-chain weights: if a challenger beats the old leader by more than ~1% (the built-in “moat” threshold), it becomes the new leader. Exact copies of the leader are disqualified to prevent plagiarism. In practice, this automated loop ensures the fastest valid server reigns, and its operator reaps the weighted reward.

End Product for Users
For end users and developers, Cacheon effectively produces a continuously-optimized inference service. The winning server from the competition can be used as a drop-in API: any client that supports the standard OpenAI chat completions endpoint can redirect inference to it. In theory, if Cacheon consistently finds faster implementations, AI developers will “route inference through” the subnet’s leader to get better performance. This directly translates to lower latency and per-token cost for applications (chatbots, agents, analytics, etc.). In that sense, Cacheon provides the community with the fastest open-source LLM serving systems. For investors and stakers, the benefit is in the emphasis on continual improvement: unlike a one-off sprint, Cacheon rewards are an ongoing emission stream tied to real performance gains, so the leading miner’s token accrual grows as long as it stays on top.

Comparison with Other Subnets
Cacheon differs from typical Bittensor subnets because it’s *not* focused on training new models or producing content. It is explicitly *not* a training subnet – the model weights are fixed at Qwen2.5-72B and cannot be updated through mining. It’s also not a general hosting platform: miners aren’t free to serve arbitrary models, only the specified model, and only speed (with correctness) matters. In essence, Cacheon is a Bittensor-native MLPerf: only the fastest correct inference server is paid. This means rewards concentrate on a single winner) rather than being distributed broadly. Other subnets (like content-generation or pretraining networks) often reward many participants; Cacheon’s unique niche is a continuous speed contest for LLM inference.

What exactly does it do?

PURPOSE

What exactly is the 'product/build'?

Deployment Status
After development and testnet evaluation, Cacheon launched its mainnet in May 2026. According to official announcements, the subnet (SN14) went live on May 19–20, 2026. That launch activated the first live competition: at least 13 inference servers were queued to compete on identical 8×H200 pods, racing for rewards up to $10,000/day. This means the core platform is currently live – miners can submit servers, and validators are benchmarking them against the baseline system. The first public test rounds took place just before launch (early May 2026) as planned. In summary, the V1 phase (single-model, 72B Qwen) is fully deployed on the network, while later features are in development.

Technical Architecture
Cacheon runs on the Bittensor framework (a Substrate chain). Its codebase is open under an MIT license. The architecture splits duties between a lightweight CPU orchestrator and externally provisioned GPU compute. Validators continuously pull on-chain commitments and rent an 8-GPU machine when needed. For evaluation, the system requires an 8×NVLink GPU setup (e.g. 8×H200 SXM) to run the large model with tensor parallelism. Miner servers are expected to expose an OpenAI-style `/v1/chat/completions` API, so validators can query them with test prompts. In operation, each evaluation round uploads results to an S3 storage layer, allowing the CPU gateway to fetch scoring data and set the appropriate on-chain weights. This S3 handoff ensures the GPUs never have wallet keys, improving security. In short, Cacheon combines Bittensor blockchain with on-demand GPU pods (via partners like Targon and Lium) and standard web interfaces to create a fair benchmarking pipeline.

What exactly is the 'product/build'?

WHO

Team Info

Core Contributors
Cacheon’s development team appears to come from the Bittensor developer community. Public records list the main contributors as **Xavier Lyu** (research lead), **Clément Blaise** (infrastructure), **Dera Okeke** (frontend), and **Cameron Fairchild** (Bittensor core). These individuals authored the Cacheon GitHub repo and documentation. Notably, they were involved with the prior TAOHash project: Subnet 14 was originally registered under the TAOHash name (a decentralized Bitcoin mining pool), and in 2026 the same owner and team pivoted it to the Cacheon inference project. No formal corporate backers or venture funding have been published; instead, Cacheon is largely community-driven via the latent-to/ Bittensor core organization. Communication is mainly through Bittensor Discord and social channels (e.g. the @cacheon_ai Twitter), and the team provides an email contact for inquiries.

History
Cacheon joined the Bittensor ecosystem in early 2026. The GitHub repo was created on April 13, 2026, and the first testnet evaluation rounds happened in May 2026. The project was publicly rebranded to “Cacheon” around that time, focusing its entire strategy on LLM inference. By mid-May 2026 the mainnet competition opened, attracting initial submissions. The team continues to develop the project post-launch, with active GitHub commits through May 2026. At this time, the core team remains stable, and there are no announcements of external partnerships or new members beyond what is documented.

Team Info

FUTURE

Roadmap

Roadmap Phases
Cacheon’s documentation outlines a multi-phase development plan. Phase V1 (currently live) is the single-model arena with the 72B Qwen-Instruct model. Phase V2 is described as “expanded optimization surface,” implying a broader set of performance parameters or hardware configurations to compete on. Phase V3 is dubbed the “production inference provider” phase, suggesting an eventual public API or service for inference. Phase V4 is “multi-model expansion,” meaning multiple language models would be supported. These phase labels come from the official docs, but no concrete dates have been given for V2–V4. To date, only Phase 1 is active: the subnetwork launched in May 2026 and has been focusing on that initial contest. Future rounds and features will likely roll out as the competition matures.

Future Vision
In the long run, the goal is for Cacheon to become both an optimization marketplace and an actual inference service. The subnet aims to continuously uncover faster inference engines, then expose them for real-world use. As the docs note, if Cacheon consistently produces the fastest and most cost-efficient way to serve a model, developers will naturally route their inference through it. In effect, Cacheon could evolve into a low-latency serving layer for AI applications, capturing value from every token served by the winning models. So far the announcements have centered on getting mainnet live and verifying the competition loop (May 2026). Beyond that, the team plans to follow the outlined phases above, with the fully-realized vision being a global inference optimization platform as described in their roadmap and docs.

Roadmap

Subnet 14

Cacheon

ABOUT

What exactly does it do?

PURPOSE

What exactly is the 'product/build'?

WHO

Team Info

FUTURE

Roadmap

View other subnets