With the amount of new subnets being added it can be hard to get up to date information across all subnets, so data may be slightly out of date from time to time
Chutes (Subnet 64) is a decentralized serverless AI compute platform built on the Bittensor network. In essence, Chutes provides an open, on-demand AI inference service that lets developers deploy, run, and scale AI models in seconds without managing any infrastructure. It achieves this by combining a simple API (and web UI) for users with a decentralized backend of GPU “miners” that execute the AI models at scale. Chutes is designed as a Web3 alternative to services like OpenAI’s API, offering better accessibility, a greater diversity of models, and often superior performance. Developers can integrate Chutes into their applications via a standard API, or even access it through AI model aggregators (Chutes is a top provider on OpenRouter, alongside giants like Anthropic). The platform supports a wide range of model types – from large language models (LLMs) for text generation to image and audio models – reflecting its mission to be a one-stop, model-agnostic AI cloud.
Since its launch in late January 2025, Chutes’ adoption and scale have been explosive. It quickly grew from zero to processing tens of billions of tokens per day within the first few months. By May 28, 2025, Chutes was handling on the order of 100 billion tokens per day (roughly 3 trillion tokens per month), a 250× increase in usage since January. This volume is already about one-third of Google’s entire NLP processing throughput from a year prior – a staggering achievement for a decentralized network. At peak load, Chutes serves millions of AI inference requests daily. Its free-to-use period in early 2025 helped attract over 100,000 users to the API, and even after introducing paid tiers, many projects have stuck with Chutes due to its performance and cost advantages. In fact, Chutes’ latency and throughput have been publicly praised – for example, it was the first to offer cutting-edge models like DeepSeek V3, and partners noted its execution speed as best-in-class on OpenRouter.
A cornerstone benefit of Chutes is cost-efficiency. Because it taps a decentralized network of GPU providers with a crypto micropayment model, it can deliver AI compute far cheaper than traditional cloud services. Analyses indicate Chutes’ inference service operates at ~85% lower cost than AWS for comparable tasks. The platform was initially free to use (to bootstrap adoption), and as of April 2025 it introduced monetization for certain models – but it keeps prices well below competitors and offers some free model access to remain developer-friendly. Users pay per use (per query or per token) using the Bittensor TAO token (or fiat), rather than committing to hefty subscriptions. This pay-as-you-go design, enabled by on-chain micropayments, means you only pay for the exact compute you consume. All those token payments are cycled back into the network: Chutes automatically stakes the revenues to buy back its subnet token and reward miners/validators. This creates a virtuous economic loop aligning the interests of users (who get low-cost AI service), miners (who earn TAO for supplying GPU power), and token holders (as demand for the service drives token value).
Some of Chutes’ key capabilities and advantages include:
Decentralized GPU Compute: The subnet is powered by hundreds of high-end GPUs (e.g. NVIDIA H100/A6000 class cards) contributed by miners, collectively processing billions of tokens daily in a distributed fashion. This large, global GPU pool gives Chutes elasticity and resilience that rival or exceed centralized clouds.
Model Diversity & Rapid Innovation: Chutes supports text, image, and audio models, from popular open-source LLMs to custom ML models. Developers can choose pre-integrated models or deploy their own. The platform is quick to integrate the latest AI research – for example, it was first to host new models like DeepSeek v3, and regularly updates its catalog to give users cutting-edge options.
Serverless Ease of Use: As a serverless platform, Chutes abstracts away all infrastructure. Through a web interface or API, a user can spin up an AI model within seconds. Scaling is automatic – the network will allocate more GPUs as needed – and there’s built-in monitoring and optimization to ensure efficient usage. This makes advanced AI accessible even to small teams who don’t have cloud expertise.
Cost & Micropayments: Chutes offers significantly cheaper AI inference than traditional providers, leveraging crypto micropayments in TAO for a true pay-per-query model. This fine-grained billing (down to each token of output) eliminates waste and can save developers an estimated 40% or more on AI costs vs. subscription models. The micropayment system is seamlessly handled by the platform – users just get an API key and the usage is metered transparently in the background.
Alignment and Innovation: By operating on Bittensor, Chutes inherits a unique incentive structure: miners and validators compete to provide the best service, which continuously improves quality over time. The project’s commitment to features like Trusted Execution Environments (TEE) for privacy (in development) shows a focus on unlocking enterprise and sensitive use-cases soon. Chutes is not just a static service – it’s evolving rapidly, with new capabilities (agents, fine-tuning, etc.) rolling out to stay at the forefront of AI and Web3 convergence.
Chutes (Subnet 64) is a decentralized serverless AI compute platform built on the Bittensor network. In essence, Chutes provides an open, on-demand AI inference service that lets developers deploy, run, and scale AI models in seconds without managing any infrastructure. It achieves this by combining a simple API (and web UI) for users with a decentralized backend of GPU “miners” that execute the AI models at scale. Chutes is designed as a Web3 alternative to services like OpenAI’s API, offering better accessibility, a greater diversity of models, and often superior performance. Developers can integrate Chutes into their applications via a standard API, or even access it through AI model aggregators (Chutes is a top provider on OpenRouter, alongside giants like Anthropic). The platform supports a wide range of model types – from large language models (LLMs) for text generation to image and audio models – reflecting its mission to be a one-stop, model-agnostic AI cloud.
Since its launch in late January 2025, Chutes’ adoption and scale have been explosive. It quickly grew from zero to processing tens of billions of tokens per day within the first few months. By May 28, 2025, Chutes was handling on the order of 100 billion tokens per day (roughly 3 trillion tokens per month), a 250× increase in usage since January. This volume is already about one-third of Google’s entire NLP processing throughput from a year prior – a staggering achievement for a decentralized network. At peak load, Chutes serves millions of AI inference requests daily. Its free-to-use period in early 2025 helped attract over 100,000 users to the API, and even after introducing paid tiers, many projects have stuck with Chutes due to its performance and cost advantages. In fact, Chutes’ latency and throughput have been publicly praised – for example, it was the first to offer cutting-edge models like DeepSeek V3, and partners noted its execution speed as best-in-class on OpenRouter.
A cornerstone benefit of Chutes is cost-efficiency. Because it taps a decentralized network of GPU providers with a crypto micropayment model, it can deliver AI compute far cheaper than traditional cloud services. Analyses indicate Chutes’ inference service operates at ~85% lower cost than AWS for comparable tasks. The platform was initially free to use (to bootstrap adoption), and as of April 2025 it introduced monetization for certain models – but it keeps prices well below competitors and offers some free model access to remain developer-friendly. Users pay per use (per query or per token) using the Bittensor TAO token (or fiat), rather than committing to hefty subscriptions. This pay-as-you-go design, enabled by on-chain micropayments, means you only pay for the exact compute you consume. All those token payments are cycled back into the network: Chutes automatically stakes the revenues to buy back its subnet token and reward miners/validators. This creates a virtuous economic loop aligning the interests of users (who get low-cost AI service), miners (who earn TAO for supplying GPU power), and token holders (as demand for the service drives token value).
Some of Chutes’ key capabilities and advantages include:
Decentralized GPU Compute: The subnet is powered by hundreds of high-end GPUs (e.g. NVIDIA H100/A6000 class cards) contributed by miners, collectively processing billions of tokens daily in a distributed fashion. This large, global GPU pool gives Chutes elasticity and resilience that rival or exceed centralized clouds.
Model Diversity & Rapid Innovation: Chutes supports text, image, and audio models, from popular open-source LLMs to custom ML models. Developers can choose pre-integrated models or deploy their own. The platform is quick to integrate the latest AI research – for example, it was first to host new models like DeepSeek v3, and regularly updates its catalog to give users cutting-edge options.
Serverless Ease of Use: As a serverless platform, Chutes abstracts away all infrastructure. Through a web interface or API, a user can spin up an AI model within seconds. Scaling is automatic – the network will allocate more GPUs as needed – and there’s built-in monitoring and optimization to ensure efficient usage. This makes advanced AI accessible even to small teams who don’t have cloud expertise.
Cost & Micropayments: Chutes offers significantly cheaper AI inference than traditional providers, leveraging crypto micropayments in TAO for a true pay-per-query model. This fine-grained billing (down to each token of output) eliminates waste and can save developers an estimated 40% or more on AI costs vs. subscription models. The micropayment system is seamlessly handled by the platform – users just get an API key and the usage is metered transparently in the background.
Alignment and Innovation: By operating on Bittensor, Chutes inherits a unique incentive structure: miners and validators compete to provide the best service, which continuously improves quality over time. The project’s commitment to features like Trusted Execution Environments (TEE) for privacy (in development) shows a focus on unlocking enterprise and sensitive use-cases soon. Chutes is not just a static service – it’s evolving rapidly, with new capabilities (agents, fine-tuning, etc.) rolling out to stay at the forefront of AI and Web3 convergence.
At its core, Chutes is an open-source AI deployment platform – essentially a decentralized PaaS (Platform-as-a-Service) for machine learning models. The product consists of two primary parts: (1) a developer-facing API & web interface to register, configure, and invoke models, and (2) a behind-the-scenes network of GPU miners that actually run the model computations. When a user deploys a model on Chutes, the platform handles packaging the code, scheduling it on available GPUs, scaling it up or down, and routing incoming inference requests to it. All of this happens “serverlessly” – the user doesn’t worry about which machine runs the model or how many instances, as the Chutes infrastructure orchestrates it across the decentralized network.
Technical architecture: The Chutes backend utilizes containerization and orchestration to run arbitrary AI workloads across many nodes. Developers typically use the Chutes SDK/CLI to prepare their model as a Docker container image with all necessary dependencies. Chutes provides a streamlined process for this – for example, a base image (parachutes/base-python with CUDA and common libraries) can be extended with your model code and libraries. With a single CLI command, that image is built and uploaded to the platform’s registry (the build actually occurs on Chutes’ infrastructure). Once the image is ready, the developer can deploy a “chute” (model instance) via API or CLI, specifying requirements like how many GPUs, memory, etc.. The Chutes scheduler will then assign the model to suitable miner nodes in Subnet 64 that meet those specs (for example, a model needing 1 A100 GPU will be placed on a miner with available A100 capacity). The platform uses technologies like Docker and Kubernetes under the hood for managing these deployments, ensuring reliability and auto-restarting or relocating instances if a miner goes offline. Developers can monitor their deployments in real time – Chutes offers a web dashboard with performance metrics, logs, and cost statistics for each running model.
Decentralization and consensus: Because Chutes is built on Bittensor, it leverages the blockchain’s consensus and incentive mechanisms for coordination. Miners on Subnet 64 register on-chain and dedicate GPU resources to serve tasks, while validators monitor performance and help rank the miners. Good performance is rewarded with more task assignments and TAO token emissions. Chutes adds its own layer of task scheduling and result validation on top: for each inference request, one or more miners are selected to run the model, and results can be checked by others. In fact, Chutes has implemented adversarial validation whereby multiple miners may run the same query and their outputs are cross-verified using custom “cord” functions (a Bittensor primitive for validation). This helps ensure accuracy and trust in a trustless setting – if a miner tries to cheat or returns a bad result, validators or peer miners can flag it. The platform is also rolling out “auditing” features (as noted in their roadmap) to further guarantee that models behave correctly and safely.
AI functionality: Chutes is built to be flexible and model-agnostic. It already supports a broad array of AI models out-of-the-box, and importantly, it allows user-defined models and code to run as well. This means you’re not limited to a preset catalog – you can bring your own model (say a custom PyTorch model or a HuggingFace model) and deploy it. The platform’s support for “arbitrary code” execution means even complex pipelines or unique model architectures can be hosted. This opens the door to some novel possibilities. For example, Chutes can host not only inference endpoints but potentially fine-tuning jobs or other AI services. The team has demonstrated this by building Squad (AI Agents platform) on top of Chutes – Squad lets users create autonomous AI agents that chain model calls and even use tools like web search, with all the heavy lifting executed on Chutes’ infrastructure. Such agents could even be designed to spin up new model instances or allocate resources on the fly, since Chutes allows programmatic deployment and has built-in crypto payments to pay for those resources. In other words, the combination of a permissionless compute API and on-demand payments in code could enable fully autonomous services (imagine an AI that improves itself by deploying new subtasks on Chutes and paying for them with TAO – all without human intervention). While these are emerging use-cases, they highlight how Chutes’ architecture enables a new paradigm of self-service AI in a decentralized environment.
From a user perspective, the Chutes product can be accessed in two ways: through a web platform interface (for example, a dashboard where you can click to deploy popular models or manage your deployments), or through the direct API/CLI for automation and integration into applications. The web UI includes features like a Playground to test models, a Discover section to browse community-contributed models, and a Create section to set up a new deployment with custom settings. For developers, the REST API (and Python SDK) provides endpoints to list models, create deployments, send inference requests, and manage keys and billing. Security and privacy are also being taken seriously in the build: the team is actively working on Trusted Execution Environments (TEE) integration so that in the near future, models can run inside secure enclaves on miner hardware – meaning even the GPU operators cannot see the raw input data or model parameters. This feature will be crucial for enterprise users who need to handle sensitive data on a public network.
Finally, Chutes’ payment and billing system is tightly integrated into the product. Each user has an account (linked to an API key) that can be topped up with credits (TAO tokens or fiat payments). Every time an inference is run, the system calculates the cost (for example, based on GPU time or tokens processed) and deducts the micro-fee from the user’s balance, distributing it to the miners. The auto-staking mechanism then uses those fees to reward the subnet’s contributors and maintain the token economy. As of Q2 2025, this mechanism is fully live – Chutes has “fully integrated TAO payment” across the platform, and demonstrated real product-market fit for decentralized AI-as-a-service by successfully converting usage into revenue for miners. Notably, the team also made the platform developer-friendly for Bittensor insiders: if you are a Bittensor validator or a subnet owner, you can link your on-chain identity to Chutes and get free access and a developer role (bypassing the normal deposit requirement) – a nice touch that encourages the Bittensor community to build on Chutes.
At its core, Chutes is an open-source AI deployment platform – essentially a decentralized PaaS (Platform-as-a-Service) for machine learning models. The product consists of two primary parts: (1) a developer-facing API & web interface to register, configure, and invoke models, and (2) a behind-the-scenes network of GPU miners that actually run the model computations. When a user deploys a model on Chutes, the platform handles packaging the code, scheduling it on available GPUs, scaling it up or down, and routing incoming inference requests to it. All of this happens “serverlessly” – the user doesn’t worry about which machine runs the model or how many instances, as the Chutes infrastructure orchestrates it across the decentralized network.
Technical architecture: The Chutes backend utilizes containerization and orchestration to run arbitrary AI workloads across many nodes. Developers typically use the Chutes SDK/CLI to prepare their model as a Docker container image with all necessary dependencies. Chutes provides a streamlined process for this – for example, a base image (parachutes/base-python with CUDA and common libraries) can be extended with your model code and libraries. With a single CLI command, that image is built and uploaded to the platform’s registry (the build actually occurs on Chutes’ infrastructure). Once the image is ready, the developer can deploy a “chute” (model instance) via API or CLI, specifying requirements like how many GPUs, memory, etc.. The Chutes scheduler will then assign the model to suitable miner nodes in Subnet 64 that meet those specs (for example, a model needing 1 A100 GPU will be placed on a miner with available A100 capacity). The platform uses technologies like Docker and Kubernetes under the hood for managing these deployments, ensuring reliability and auto-restarting or relocating instances if a miner goes offline. Developers can monitor their deployments in real time – Chutes offers a web dashboard with performance metrics, logs, and cost statistics for each running model.
Decentralization and consensus: Because Chutes is built on Bittensor, it leverages the blockchain’s consensus and incentive mechanisms for coordination. Miners on Subnet 64 register on-chain and dedicate GPU resources to serve tasks, while validators monitor performance and help rank the miners. Good performance is rewarded with more task assignments and TAO token emissions. Chutes adds its own layer of task scheduling and result validation on top: for each inference request, one or more miners are selected to run the model, and results can be checked by others. In fact, Chutes has implemented adversarial validation whereby multiple miners may run the same query and their outputs are cross-verified using custom “cord” functions (a Bittensor primitive for validation). This helps ensure accuracy and trust in a trustless setting – if a miner tries to cheat or returns a bad result, validators or peer miners can flag it. The platform is also rolling out “auditing” features (as noted in their roadmap) to further guarantee that models behave correctly and safely.
AI functionality: Chutes is built to be flexible and model-agnostic. It already supports a broad array of AI models out-of-the-box, and importantly, it allows user-defined models and code to run as well. This means you’re not limited to a preset catalog – you can bring your own model (say a custom PyTorch model or a HuggingFace model) and deploy it. The platform’s support for “arbitrary code” execution means even complex pipelines or unique model architectures can be hosted. This opens the door to some novel possibilities. For example, Chutes can host not only inference endpoints but potentially fine-tuning jobs or other AI services. The team has demonstrated this by building Squad (AI Agents platform) on top of Chutes – Squad lets users create autonomous AI agents that chain model calls and even use tools like web search, with all the heavy lifting executed on Chutes’ infrastructure. Such agents could even be designed to spin up new model instances or allocate resources on the fly, since Chutes allows programmatic deployment and has built-in crypto payments to pay for those resources. In other words, the combination of a permissionless compute API and on-demand payments in code could enable fully autonomous services (imagine an AI that improves itself by deploying new subtasks on Chutes and paying for them with TAO – all without human intervention). While these are emerging use-cases, they highlight how Chutes’ architecture enables a new paradigm of self-service AI in a decentralized environment.
From a user perspective, the Chutes product can be accessed in two ways: through a web platform interface (for example, a dashboard where you can click to deploy popular models or manage your deployments), or through the direct API/CLI for automation and integration into applications. The web UI includes features like a Playground to test models, a Discover section to browse community-contributed models, and a Create section to set up a new deployment with custom settings. For developers, the REST API (and Python SDK) provides endpoints to list models, create deployments, send inference requests, and manage keys and billing. Security and privacy are also being taken seriously in the build: the team is actively working on Trusted Execution Environments (TEE) integration so that in the near future, models can run inside secure enclaves on miner hardware – meaning even the GPU operators cannot see the raw input data or model parameters. This feature will be crucial for enterprise users who need to handle sensitive data on a public network.
Finally, Chutes’ payment and billing system is tightly integrated into the product. Each user has an account (linked to an API key) that can be topped up with credits (TAO tokens or fiat payments). Every time an inference is run, the system calculates the cost (for example, based on GPU time or tokens processed) and deducts the micro-fee from the user’s balance, distributing it to the miners. The auto-staking mechanism then uses those fees to reward the subnet’s contributors and maintain the token economy. As of Q2 2025, this mechanism is fully live – Chutes has “fully integrated TAO payment” across the platform, and demonstrated real product-market fit for decentralized AI-as-a-service by successfully converting usage into revenue for miners. Notably, the team also made the platform developer-friendly for Bittensor insiders: if you are a Bittensor validator or a subnet owner, you can link your on-chain identity to Chutes and get free access and a developer role (bypassing the normal deposit requirement) – a nice touch that encourages the Bittensor community to build on Chutes.
JonDurbin – Dev
PaperMoney – Dev
Sirouk – Business Development
Veight – Business Development
Fezicles – Community Manager
JonDurbin – Dev
PaperMoney – Dev
Sirouk – Business Development
Veight – Business Development
Fezicles – Community Manager
Achievements
Q1 2025 – Core Platform Evolution
Q2 2025 – Pretraining as a Service
H2 2025 – Long Jobs
Achievements
Q1 2025 – Core Platform Evolution
Q2 2025 – Pretraining as a Service
H2 2025 – Long Jobs
A special thanks to Mark Jeffrey for his amazing Hash Rate series! In this series, he provides valuable insights into Bittensor Subnets and the world of decentralized AI. Be sure to check out the full series on his YouTube channel for more expert analysis and deep dives.
Recorded in June 2025, this episode of Hash Rate features an in-depth conversation between host Mark Jeffrey and John Durban of Chutes, one of the leading subnets in the Bittensor. They explore Chutes rapid rise to prominence as a decentralized AI compute platform, capable of handling over 100 billion tokens daily—about one-third of Google’s AI load from a year prior. John explains how Chutes simplifies AI deployment by abstracting away infrastructure challenges, discusses its revenue model powered by Bittensor emissions, and outlines the platform’s long-term vision of delivering uncensored, privacy-protecting AI services globally. The episode also dives into broader Bittensor dynamics, including root emissions, subnet profitability, and the evolving interplay between crypto and AI innovation.
Novelty Search is great, but for most investors trying to understand Bittensor, the technical depth is a wall, not a bridge. If we’re going to attract investment into this ecosystem then we need more people to understand it! That’s why Siam Kidd and Mark Creaser from DSV Fund have launched Revenue Search, where they ask the simple questions that investors want to know the answers to.
Recorded in June 2025, this episode of Revenue Search dives into the business side of Chutes, Bittensor’s leading AI subnet. Jon Durbin, the founder of Chutes, explains how the platform enables developers to deploy AI models without the heavy lifting of traditional infrastructure. With a focus on serverless, scalable GPU-based compute, Chutes simplifies AI deployment while offering pricing up to 20 times cheaper than traditional cloud providers. The episode covers Chutes’ shift toward revenue generation—including new fiat payments, early monetization metrics, and plans to introduce privacy-guaranteeing Trusted Execution Environments (TEs). Durbin also discusses future growth through enterprise adoption, agent platforms like Squad, and second-order AI apps—all aiming to make Chutes the “Linux of AI.”
FYI, our SGLang fork is public here: (branch chutes) to boost your SGLang perf from 73% to 97% 👀
When I tested a few manually there were a few discrepancies where instead of generating a string it generated an array of strings, etc.; curious if switching…
GitHub - chutesai/sglang: SGLang is a fast serving framework for large language models and vision...
SGLang is a fast serving framework for large language models and vision language models. - chutesai/sglang
github.com
Great to see the improvement of @chutes_ai on this! Its now up among the best performers!
The new #1 Open Source model according to Artificial Intelligence, now available for free on @chutes_ai.
Link below
Chutes infrastructure update 🚀
Currently running:
• Thousands of H200 & A6000 GPUs
• Processing billions of tokens daily
• leading open source inference provider on OpenRouter
• 100% decentralized on Bittensor Subnet 64
All with ~85% cost savings vs AWS.
Open source AI is…
Thanks to all the users who tried Ling-1T 🚀 on @OpenRouterAI! We just hit the #2 trending model spot! A huge thank you to the incredible teams at @SiliconFlowAI and @chutes_ai for powering this run. 🙏✨Also worth noting that Ring-1T (the reasoning) also became available!🎓
so... we just found out something absolutely insane 🧵
we weren't even looking for this data.
one of our engineers was going through OpenRouter's Publicly available charts at 3am (as you do) and discovered something that made us all stop and stare at our screens.
This is one of those moments where you realize you've been building something bigger than you thought.
We're not just "a decentralized AI platform."
We might be building the Linux of AI inference.
And we're just getting started.
If you've been sleeping on open source AI infrastructure, maybe it's time to wake up.
Turns out the future of AI might not be owned by 3 companies.
It might be decentralized.
And it might already be happening.
Chutes | Breakthrough Serverless AI Compute
Powering Trillions of Tokens per Month, Chutes is the leading open-source, decentralized compute provider for deploying,...
chutes.ai