With the amount of new subnets being added it can be hard to get up to date information across all subnets, so data may be slightly out of date from time to time

Subnet 09

IOTA

Emissions
Value
Recycled
Value
Recycled (24h)
Value
Registration Cost
Value
Active Validators
Value
Active Miners
Value
Active Dual Miners/Validators
Value

ABOUT

What exactly does it do?

IOTA (previously known as Pre-Training) is a specialized subnet of the Bittensor network designed to incentivize the open training of large language models (“foundation models”) on a massive web dataset. In August 2024, Bittensor’s Subnet 9 (SN9) demonstrated that a decentralized network of incentivized, permissionless actors could pretrain large language models (LLMs) ranging from 700 million to 14 billion parameters, surpassing established baselines. This work proved that blockchain-based decentralized pretraining was a viable approach, but it also revealed key issues: each miner had to fit an entire model locally, and the “winner-takes-all” reward structure led to model hoarding.

Now, they introduce IOTA (Incentivised Orchestrated Training Architecture), an architecture that solves these issues by transforming SN9’s previously isolated competitors into a single, cooperating unit. This approach allows them to scale arbitrarily while ensuring fair rewards for each contributor. IOTA is a data- and pipeline-parallel training algorithm designed to work across a network of heterogeneous, unreliable devices in adversarial and trustless environments. The outcome is a permissionless system capable of pretraining frontier-scale models without overloading individual devices with GPU demands. It also accommodates unreliable devices and aligns participants through transparent token economics.

Various solutions have tried to address key technical challenges in distributed training, but many either lack a proper incentive model or have failed to match the performance of a coordinated cluster. IOTA bridges this gap by combining novel techniques that tackle all three of these limitations together.

 

 

IOTA (previously known as Pre-Training) is a specialized subnet of the Bittensor network designed to incentivize the open training of large language models (“foundation models”) on a massive web dataset. In August 2024, Bittensor’s Subnet 9 (SN9) demonstrated that a decentralized network of incentivized, permissionless actors could pretrain large language models (LLMs) ranging from 700 million to 14 billion parameters, surpassing established baselines. This work proved that blockchain-based decentralized pretraining was a viable approach, but it also revealed key issues: each miner had to fit an entire model locally, and the “winner-takes-all” reward structure led to model hoarding.

Now, they introduce IOTA (Incentivised Orchestrated Training Architecture), an architecture that solves these issues by transforming SN9’s previously isolated competitors into a single, cooperating unit. This approach allows them to scale arbitrarily while ensuring fair rewards for each contributor. IOTA is a data- and pipeline-parallel training algorithm designed to work across a network of heterogeneous, unreliable devices in adversarial and trustless environments. The outcome is a permissionless system capable of pretraining frontier-scale models without overloading individual devices with GPU demands. It also accommodates unreliable devices and aligns participants through transparent token economics.

Various solutions have tried to address key technical challenges in distributed training, but many either lack a proper incentive model or have failed to match the performance of a coordinated cluster. IOTA bridges this gap by combining novel techniques that tackle all three of these limitations together.

 

 

PURPOSE

What exactly is the 'product/build'?

IOTA is structured around three core roles: the Orchestrator, Miners, and Validators. Rather than using a fully peer-to-peer topology, IOTA adopts a hub-and-spoke architecture centered around the Orchestrator. This design choice ensures global visibility and facilitates comprehensive monitoring of all participant interactions, which is essential for enforcing incentives, auditing behavior, and maintaining system integrity.

The Orchestrator’s primary responsibility is to monitor the training progress of each miner across all discrete layers and initiate weight-merging events accordingly. Given the heterogeneous nature of miner hardware and their unreliability, it is impractical to wait for all miners to complete an equal number of batches. Instead, a minimum batch threshold is defined for each miner, and once at least a specified fraction of miners complete the required number of batches, the Orchestrator prompts all qualifying miners to upload their weights. This mechanism is inspired by centralized training practices, where global batch sizes are used in typical large language model training. In the decentralized setting, it is coupled with DiLoCo, which allows miners to perform local optimization steps before synchronization. DiLoCo is especially suited for this paradigm because it:

  • Embraces partial participation from miners
  • Supports asynchronous and layer-wise updates
  • Reduces communication overhead by focusing on the most informative coordinate updates locally

 

Miners can register for the subnetwork at any time. Upon registration, the Orchestrator assigns each miner a model layer to train. Miners will wait until the next full synchronization period to begin active participation. During full synchronization, miners update their weights and optimizer states to align with the rest of the network, allowing them to proceed with the forward and backward activations in the training stage.

Validators play a key role in determining whether the work completed by a miner is valid. Validators rely on computational reproducibility to validate the work. The validator tracks a specific miner and reruns a portion of the miner’s training to verify its accuracy. Forward and backward passes are compared to the miner’s submitted activations using cosine similarity. While there are challenges related to reliable validation, these are addressed in the system’s incentive structure and in later sections of the documentation, which explore anomaly detection and adversarial robustness using Shapley values.

The incentive structure is designed with the trade-offs between optimization and reproducibility in mind. Since validation depends on the validator’s ability to reproduce specific sections of the training process, miners are not granted the ability to innovate algorithmically. Validators monitor randomly assigned miners during full synchronization stages to ensure comprehensive oversight. The monitoring period is kept as short as possible to maximize the number of miners overseen by each validator. Importantly, miners are not aware of when they are being monitored, preventing them from selectively behaving correctly during observed intervals. After each validation stage, mining rewards are calculated based on the number of backward passes successfully processed by each miner.

The system incorporates a temporal decay mechanism governed by a hyperparameter, which determines how long a miner’s score is valid. After a fixed period, the score drops to zero. This ensures that miners are only rewarded for their active participation during validation periods, discouraging gaming strategies or manipulation of throughput during non-validation stages. The simple linear reward structure ensures that miners receive fixed compensation for each processed activation, and the recomputation requirement during validation stages provides additional security against exploitation.

 

IOTA Mining Setup

IOTA (Incentivized Orchestrated Training Architecture) is a data- and pipeline-parallel training algorithm designed to operate across a network of heterogeneous, unreliable devices in adversarial and trustless environments.

Miners’ Purpose in IOTA

In the decentralized LLM-training network, miners are responsible for providing GPU compute, memory, and bandwidth to collaboratively train models. IOTA employs data- and pipeline-parallelism, which means miners are responsible for training sections of the model instead of the entire model. This approach reduces hardware requirements for participation. Miners download their assigned section, process forward and backward passes of activations, and periodically synchronize their weights with peers via a merging process. Distributing workloads across many independent miners allows the network to achieve massive parallelism, fault tolerance, and censorship resistance while eliminating single-point infrastructure costs.

The IOTA incentive mechanism continuously scores miners based on throughput and the quality of their work throughout the training and merging processes. Miners are rewarded with subnet 9 alpha tokens according to the quality of their contributions.

Joining the Network

Miners can join the network and register with the orchestrator using their API client, which assigns them to a training layer. Up to 50 miners can operate per layer. Once registered, miners download the current global weights for their assigned layer and begin processing activations.

Activations

There are two types of activations: forward and backward.

  • Forward activations propagate samples through the model to calculate losses.
  • Backward activations propagate samples in the opposite direction to produce gradients for training the layer’s weights.

 

Backward activations are given precedence as they provide the learning signal. If a miner fails to process an activation assigned to it, a penalty is applied. This design operates like an assembly line, where workers pass between adjacent stages of the process. The flow of samples through the pipeline is randomized and stochastic.

Activation processing occurs across all layers simultaneously, but miners process samples asynchronously. Miners are incentivized to process as many activations as possible in each epoch, with their scores based on throughput.

Merging

Once enough samples have been processed in the network, the orchestrator signals the transition from training mode to merging mode. In this phase, miners engage in a multi-stage process based on a modified version of the Butterfly All-Reduce technique.

  • Miners upload their local weights and optimizer states to the cloud storage.
  • They are then assigned random weight partitions, with multiple miners assigned to the same partitions for redundancy and improved fault tolerance.
  • Miners download their partitions, perform a local merge (currently using the element-wise geometric mean), and upload the merged partitions.
  • Finally, miners download the complete set of merged weights and optimizer states.

 

This process is designed to tolerate miner failures, so merging can continue even if some miners fail during this stage. Merging is the slowest part of the process, so the training stage runs for longer to amortize this delay by effectively training on larger batch sizes. After merging is complete, the orchestrator returns the system to training mode, and the cycle continues between training and merging modes.

Prerequisites

To begin setting up the miner on IOTA, the following are required:

  • Bittensor wallet – Setup instructions are available in the documentation.
  • Training infrastructure – Miners must operate on GPUs with at least 80 GB of VRAM (A100-class or higher). Hardware with less memory will process updates more slowly and may earn significantly lower rewards.
  • HuggingFace Access token – Basic access to pull the model from HuggingFace. There is no need to modify permissions.

 

 

Validating

Within the system, validators have a vital role in ensuring the integrity of the work completed by miners. They primarily rely on computational reproducibility to verify the miner’s work. To do so, the validator tracks a specific miner and reruns a portion of the miner’s training. This includes checking both forward and backward passes against the miner’s submitted activations using cosine similarity. However, reliable validation presents several challenges, which are further explored in the following sections.

Joining the Network

Validators join the network by registering with the orchestrator through their API client.

Shadowing Miners

Validators periodically reproduce an epoch’s worth of work from a randomly selected miner. This process can take up to an hour per miner in the current version, and efforts will be made to reduce this time post-launch. There is a tradeoff between the effective batch size and network stability—validators have more work to reproduce when scoring a miner during longer training stages. The system currently relies on basic and robust reproducibility checks, though a future integration of CLASP will enable an auditor-like mechanism to estimate miner contributions. Once a validator confirms that all activations were processed correctly by the miner, they perform a local all-reduce to compare their own local weights and optimizer state with those uploaded by the miner. If the miner passes the check, they earn a score based on the total number of activations processed during that period.

Setting Weights

Validators communicate through the orchestrator, submitting the scores they assign to miners after performing spot checks. These scores are pooled together to calculate a consensus score for each miner. Periodically, validators request miner scores from the orchestrator, normalize the data, and set weights. This process ensures agreement among validators and helps maintain a high level of trust within the validation system.

 

 

IOTA is structured around three core roles: the Orchestrator, Miners, and Validators. Rather than using a fully peer-to-peer topology, IOTA adopts a hub-and-spoke architecture centered around the Orchestrator. This design choice ensures global visibility and facilitates comprehensive monitoring of all participant interactions, which is essential for enforcing incentives, auditing behavior, and maintaining system integrity.

The Orchestrator’s primary responsibility is to monitor the training progress of each miner across all discrete layers and initiate weight-merging events accordingly. Given the heterogeneous nature of miner hardware and their unreliability, it is impractical to wait for all miners to complete an equal number of batches. Instead, a minimum batch threshold is defined for each miner, and once at least a specified fraction of miners complete the required number of batches, the Orchestrator prompts all qualifying miners to upload their weights. This mechanism is inspired by centralized training practices, where global batch sizes are used in typical large language model training. In the decentralized setting, it is coupled with DiLoCo, which allows miners to perform local optimization steps before synchronization. DiLoCo is especially suited for this paradigm because it:

  • Embraces partial participation from miners
  • Supports asynchronous and layer-wise updates
  • Reduces communication overhead by focusing on the most informative coordinate updates locally

 

Miners can register for the subnetwork at any time. Upon registration, the Orchestrator assigns each miner a model layer to train. Miners will wait until the next full synchronization period to begin active participation. During full synchronization, miners update their weights and optimizer states to align with the rest of the network, allowing them to proceed with the forward and backward activations in the training stage.

Validators play a key role in determining whether the work completed by a miner is valid. Validators rely on computational reproducibility to validate the work. The validator tracks a specific miner and reruns a portion of the miner’s training to verify its accuracy. Forward and backward passes are compared to the miner’s submitted activations using cosine similarity. While there are challenges related to reliable validation, these are addressed in the system’s incentive structure and in later sections of the documentation, which explore anomaly detection and adversarial robustness using Shapley values.

The incentive structure is designed with the trade-offs between optimization and reproducibility in mind. Since validation depends on the validator’s ability to reproduce specific sections of the training process, miners are not granted the ability to innovate algorithmically. Validators monitor randomly assigned miners during full synchronization stages to ensure comprehensive oversight. The monitoring period is kept as short as possible to maximize the number of miners overseen by each validator. Importantly, miners are not aware of when they are being monitored, preventing them from selectively behaving correctly during observed intervals. After each validation stage, mining rewards are calculated based on the number of backward passes successfully processed by each miner.

The system incorporates a temporal decay mechanism governed by a hyperparameter, which determines how long a miner’s score is valid. After a fixed period, the score drops to zero. This ensures that miners are only rewarded for their active participation during validation periods, discouraging gaming strategies or manipulation of throughput during non-validation stages. The simple linear reward structure ensures that miners receive fixed compensation for each processed activation, and the recomputation requirement during validation stages provides additional security against exploitation.

 

IOTA Mining Setup

IOTA (Incentivized Orchestrated Training Architecture) is a data- and pipeline-parallel training algorithm designed to operate across a network of heterogeneous, unreliable devices in adversarial and trustless environments.

Miners’ Purpose in IOTA

In the decentralized LLM-training network, miners are responsible for providing GPU compute, memory, and bandwidth to collaboratively train models. IOTA employs data- and pipeline-parallelism, which means miners are responsible for training sections of the model instead of the entire model. This approach reduces hardware requirements for participation. Miners download their assigned section, process forward and backward passes of activations, and periodically synchronize their weights with peers via a merging process. Distributing workloads across many independent miners allows the network to achieve massive parallelism, fault tolerance, and censorship resistance while eliminating single-point infrastructure costs.

The IOTA incentive mechanism continuously scores miners based on throughput and the quality of their work throughout the training and merging processes. Miners are rewarded with subnet 9 alpha tokens according to the quality of their contributions.

Joining the Network

Miners can join the network and register with the orchestrator using their API client, which assigns them to a training layer. Up to 50 miners can operate per layer. Once registered, miners download the current global weights for their assigned layer and begin processing activations.

Activations

There are two types of activations: forward and backward.

  • Forward activations propagate samples through the model to calculate losses.
  • Backward activations propagate samples in the opposite direction to produce gradients for training the layer’s weights.

 

Backward activations are given precedence as they provide the learning signal. If a miner fails to process an activation assigned to it, a penalty is applied. This design operates like an assembly line, where workers pass between adjacent stages of the process. The flow of samples through the pipeline is randomized and stochastic.

Activation processing occurs across all layers simultaneously, but miners process samples asynchronously. Miners are incentivized to process as many activations as possible in each epoch, with their scores based on throughput.

Merging

Once enough samples have been processed in the network, the orchestrator signals the transition from training mode to merging mode. In this phase, miners engage in a multi-stage process based on a modified version of the Butterfly All-Reduce technique.

  • Miners upload their local weights and optimizer states to the cloud storage.
  • They are then assigned random weight partitions, with multiple miners assigned to the same partitions for redundancy and improved fault tolerance.
  • Miners download their partitions, perform a local merge (currently using the element-wise geometric mean), and upload the merged partitions.
  • Finally, miners download the complete set of merged weights and optimizer states.

 

This process is designed to tolerate miner failures, so merging can continue even if some miners fail during this stage. Merging is the slowest part of the process, so the training stage runs for longer to amortize this delay by effectively training on larger batch sizes. After merging is complete, the orchestrator returns the system to training mode, and the cycle continues between training and merging modes.

Prerequisites

To begin setting up the miner on IOTA, the following are required:

  • Bittensor wallet – Setup instructions are available in the documentation.
  • Training infrastructure – Miners must operate on GPUs with at least 80 GB of VRAM (A100-class or higher). Hardware with less memory will process updates more slowly and may earn significantly lower rewards.
  • HuggingFace Access token – Basic access to pull the model from HuggingFace. There is no need to modify permissions.

 

 

Validating

Within the system, validators have a vital role in ensuring the integrity of the work completed by miners. They primarily rely on computational reproducibility to verify the miner’s work. To do so, the validator tracks a specific miner and reruns a portion of the miner’s training. This includes checking both forward and backward passes against the miner’s submitted activations using cosine similarity. However, reliable validation presents several challenges, which are further explored in the following sections.

Joining the Network

Validators join the network by registering with the orchestrator through their API client.

Shadowing Miners

Validators periodically reproduce an epoch’s worth of work from a randomly selected miner. This process can take up to an hour per miner in the current version, and efforts will be made to reduce this time post-launch. There is a tradeoff between the effective batch size and network stability—validators have more work to reproduce when scoring a miner during longer training stages. The system currently relies on basic and robust reproducibility checks, though a future integration of CLASP will enable an auditor-like mechanism to estimate miner contributions. Once a validator confirms that all activations were processed correctly by the miner, they perform a local all-reduce to compare their own local weights and optimizer state with those uploaded by the miner. If the miner passes the check, they earn a score based on the total number of activations processed during that period.

Setting Weights

Validators communicate through the orchestrator, submitting the scores they assign to miners after performing spot checks. These scores are pooled together to calculate a consensus score for each miner. Periodically, validators request miner scores from the orchestrator, normalize the data, and set weights. This process ensures agreement among validators and helps maintain a high level of trust within the validation system.

 

 

WHO

Team Info

Will Squires – CEO and Co-Founder

Will has dedicated his career to navigating complexity, spanning from designing and constructing significant infrastructure to spearheading the establishment of an AI accelerator. With a background in engineering, he made notable contributions to transport projects such as Crossrail and HS2. Will’s expertise led to an invitation to serve on the Mayor of London’s infrastructure advisory panel and to lecture at UCL’s Centre for Advanced Spatial Analysis (CASA). He was appointed by AtkinsRéalis to develop an AI accelerator, which expanded to encompass over 60 staff members globally. At XYZ Reality, a company specializing in augmented reality headsets, Will played a pivotal role in product and software development, focusing on holographic technology. Since 2023, Will has provided advisory services for the Opentensor Foundation, contributing to the launch of Revolution.

Steffen Cruz – CTO and Co-Founder

Steffen earned his PhD in subatomic physics from the University of British Columbia, Canada, focusing on developing software to enhance the detection of extremely rare events (10^-7). His groundbreaking research contributed to the identification of novel exotic states of nuclear matter and has been published in prestigious scientific journals. As the founding engineer of SolidState AI, he pioneered innovative techniques for physics-informed machine learning (PIML). Steffen was subsequently appointed as the Chief Technology Officer of the Opentensor Foundation, where he played a pivotal role as a core developer of Subnet 1, the foundation’s flagship subnet. In this capacity, he enhanced the adoption and accessibility of Bittensor by authoring technical documentation, tutorials, and collaborating on the development of the subnet template.

Michael Bunting – CFO

Before joining Macrocosmos, Mike spent 12 years in investment banking, where he guided clients through major strategic and financial transitions across more than £1 billion in international M&A and capital raising deals. Most recently serving as a Director at Piper Sandler, he brings deep experience in advising high-growth startups on strategy, business planning, funding pathways, and corporate governance. Mike has also worked closely with multinational corporations and prominent financial investors throughout his career.

Elena Nesterova – Head of Delivery

Volodymyr Truba – Senior Machine Learning Engineer

Alma Schalèn – Head of Product Design

Felix Quinque – Machine Learning Lead

Dmytro Bobrenko – Machine Learning/AI Lead

Alan Aboudib – Machine Learning Lead

Alex Williams – People & Talent Manager

Chris Zacharia – Communications Lead

Brian McCrindle – Senior Machine Learning Engineer

Lawrence Hunt – Frontend Engineer

Nicholas Miller – Senior Software Engineer

Kalei Brady – Data Scientist

Szymon Fonau – Machine Learning Engineer

Monika Stankiewicz – Executive Assistant

Amy Chai – Junior Machine Learning Engineer

Giannis Evagorou – Senior Software Engineer

Richard Wardle – Junior Software Engineer

Kai Morris – Content & Community specialist

Lewis Sword – Junior Software Engineer

Will Squires – CEO and Co-Founder

Will has dedicated his career to navigating complexity, spanning from designing and constructing significant infrastructure to spearheading the establishment of an AI accelerator. With a background in engineering, he made notable contributions to transport projects such as Crossrail and HS2. Will’s expertise led to an invitation to serve on the Mayor of London’s infrastructure advisory panel and to lecture at UCL’s Centre for Advanced Spatial Analysis (CASA). He was appointed by AtkinsRéalis to develop an AI accelerator, which expanded to encompass over 60 staff members globally. At XYZ Reality, a company specializing in augmented reality headsets, Will played a pivotal role in product and software development, focusing on holographic technology. Since 2023, Will has provided advisory services for the Opentensor Foundation, contributing to the launch of Revolution.

Steffen Cruz – CTO and Co-Founder

Steffen earned his PhD in subatomic physics from the University of British Columbia, Canada, focusing on developing software to enhance the detection of extremely rare events (10^-7). His groundbreaking research contributed to the identification of novel exotic states of nuclear matter and has been published in prestigious scientific journals. As the founding engineer of SolidState AI, he pioneered innovative techniques for physics-informed machine learning (PIML). Steffen was subsequently appointed as the Chief Technology Officer of the Opentensor Foundation, where he played a pivotal role as a core developer of Subnet 1, the foundation’s flagship subnet. In this capacity, he enhanced the adoption and accessibility of Bittensor by authoring technical documentation, tutorials, and collaborating on the development of the subnet template.

Michael Bunting – CFO

Before joining Macrocosmos, Mike spent 12 years in investment banking, where he guided clients through major strategic and financial transitions across more than £1 billion in international M&A and capital raising deals. Most recently serving as a Director at Piper Sandler, he brings deep experience in advising high-growth startups on strategy, business planning, funding pathways, and corporate governance. Mike has also worked closely with multinational corporations and prominent financial investors throughout his career.

Elena Nesterova – Head of Delivery

Volodymyr Truba – Senior Machine Learning Engineer

Alma Schalèn – Head of Product Design

Felix Quinque – Machine Learning Lead

Dmytro Bobrenko – Machine Learning/AI Lead

Alan Aboudib – Machine Learning Lead

Alex Williams – People & Talent Manager

Chris Zacharia – Communications Lead

Brian McCrindle – Senior Machine Learning Engineer

Lawrence Hunt – Frontend Engineer

Nicholas Miller – Senior Software Engineer

Kalei Brady – Data Scientist

Szymon Fonau – Machine Learning Engineer

Monika Stankiewicz – Executive Assistant

Amy Chai – Junior Machine Learning Engineer

Giannis Evagorou – Senior Software Engineer

Richard Wardle – Junior Software Engineer

Kai Morris – Content & Community specialist

Lewis Sword – Junior Software Engineer

FUTURE

Roadmap

IOTA is still in active development and at the time of writing a current roadmap has yet to be released. Below is the previous roadmap which may still have some similarities.

  • Scaling Model Competitions: The subnet started with a single model size (~700M parameters) but has moved to multiple concurrent competitions for different model scales. As of August 2024, Subnet 9 was expanded to support parallel training contests at ≈700M, 3B, and 7B parameter models, with a 14B model competition introduced by the end of August 2024​. This allows more miners to participate (each can focus on a scale that matches their compute resources) and produces a range of pretrained models. Going forward, the team plans to introduce even larger model categories in coming months as hardware and collaboration allow​. They are also exploring multi-modal and “omni” models – e.g. extending pretraining beyond text to include other data modalities – as a future direction once the text-only LLM competitions are stable​.
  • Incentive Mechanism Refinements: The Subnet 9 designers are continuously tweaking the reward algorithm (incentive scheme) to ensure it maximizes useful work. One focus is the epsilon advantage given to top models. In August 2024 they launched an experimental parallel competition called “7B★” (7B star) which was identical to the regular 7B competition except it used a much smaller ε (only 0.1% advantage)​. By comparing outcomes between the standard 7B (with ε = 3% initially) and 7B★ (ε = 0.1%), they aimed to empirically find the optimal epsilon value that balances giving leaders a boost versus keeping the playing field fair​. Additionally, the team is working on a “dynamic epsilon” scheme where the epsilon value decays over time within a competition, to avoid any stagnation or permanent lead that could occur with a fixed advantage​. This dynamic epsilon was slated for release by end of August 2024​. Beyond epsilon, they are considering relaxing certain constraints – for example, allowing a broader range of model architectures or tokenizers as the competition matures – in order to foster more innovation once they are confident it won’t be gamed unfairly​. All these changes are first tested as live “open experiments” with the community of miners to gather data on their effects​.
  • Improved Evaluation & Benchmarks: Currently, the primary metric for ranking models is the language modeling loss on the Falcon Web dataset (i.e. how well a model predicts text). While this directly measures pretraining quality, the team recognizes it’s one-dimensional. A future step is to incorporate a suite of diverse benchmark tests – for example, coding challenges, math word problems, and other NLP tasks – to evaluate models in a more holistic way​. This would mirror how centralized model evaluations are done (with benchmarks like MMLU, HumanEval, etc.). They are also developing synthetic benchmarking datasets specifically for the subnet, which can serve as standardized test sets to objectively measure improvements across iterations​. The goal is that Subnet 9 will not only produce models that do well on the training data, but can be rigorously assessed on a variety of tasks, thus guiding miners to train models that are robust and generally useful. These enhanced evaluations build on work in other Bittensor subnets (for example, a code-focused subnet or a math problem subnet could provide test data for the pretraining models)​.
  • Productization and Use-Case Integration: The broader vision is to make Subnet 9 a foundation for other AI services. The roadmap envisions that startups or researchers could effectively “outsource” their expensive pretraining runs to Subnet 9 – tapping into the decentralized network’s combined compute and expertise​. In practice, this means developing APIs or pipelines where a user can specify a model/dataset and have the subnet train it (or use an existing pretrained model from the subnet) for further fine-tuning. Already, the pretraining subnet is providing base models for Bittensor’s fine-tuning subnet (another subnet focuses on fine-tuning models on specific tasks)​. Over time, they plan to make these base models easily accessible so that new subnets or external projects can bootstrap with a Subnet 9 model instead of starting from scratch. This “directability” could evolve into a marketplace where specific pretraining jobs are directed to the network. Ultimately, by delivering high-quality open models, Subnet 9 aims to attract partnerships with organizations that might otherwise rely on closed AI models, thereby demonstrating real-world economic value of decentralized pertaining​.
  • Fully Decentralized Collaborative Training: While Subnet 9’s current format is competitive (each miner trains their own model in parallel), the team is researching ways to enable collaborative training where many miners collectively train one model (or a shared set of models) in a distributed fashion​. This is a harder problem (as it involves coordinating gradient updates, parameter averaging, etc., securely on untrusted nodes), but it could unlock the ability to train much larger models than a single miner’s hardware could handle. Experiments are “already underway at Macrocosmos” on a prototype where the subnet functions more like a distributed SGD (stochastic gradient descent) system rather than a leaderboard​. The vision is to “evolve the pretraining subnet towards a decentralized training model where miners are collaborating on model development, rather than each developing their own model”​. This might involve techniques from federated learning or swarm learning, adapted to the blockchain context. If achieved, it would mean the network could tackle training tasks beyond the capability of any single participant – truly pooling compute and expertise. The roadmap does not give a set date for this, but emphasizes it as a long-term goal once the competitive framework is thoroughly proven.

 

 

IOTA is still in active development and at the time of writing a current roadmap has yet to be released. Below is the previous roadmap which may still have some similarities.

  • Scaling Model Competitions: The subnet started with a single model size (~700M parameters) but has moved to multiple concurrent competitions for different model scales. As of August 2024, Subnet 9 was expanded to support parallel training contests at ≈700M, 3B, and 7B parameter models, with a 14B model competition introduced by the end of August 2024​. This allows more miners to participate (each can focus on a scale that matches their compute resources) and produces a range of pretrained models. Going forward, the team plans to introduce even larger model categories in coming months as hardware and collaboration allow​. They are also exploring multi-modal and “omni” models – e.g. extending pretraining beyond text to include other data modalities – as a future direction once the text-only LLM competitions are stable​.
  • Incentive Mechanism Refinements: The Subnet 9 designers are continuously tweaking the reward algorithm (incentive scheme) to ensure it maximizes useful work. One focus is the epsilon advantage given to top models. In August 2024 they launched an experimental parallel competition called “7B★” (7B star) which was identical to the regular 7B competition except it used a much smaller ε (only 0.1% advantage)​. By comparing outcomes between the standard 7B (with ε = 3% initially) and 7B★ (ε = 0.1%), they aimed to empirically find the optimal epsilon value that balances giving leaders a boost versus keeping the playing field fair​. Additionally, the team is working on a “dynamic epsilon” scheme where the epsilon value decays over time within a competition, to avoid any stagnation or permanent lead that could occur with a fixed advantage​. This dynamic epsilon was slated for release by end of August 2024​. Beyond epsilon, they are considering relaxing certain constraints – for example, allowing a broader range of model architectures or tokenizers as the competition matures – in order to foster more innovation once they are confident it won’t be gamed unfairly​. All these changes are first tested as live “open experiments” with the community of miners to gather data on their effects​.
  • Improved Evaluation & Benchmarks: Currently, the primary metric for ranking models is the language modeling loss on the Falcon Web dataset (i.e. how well a model predicts text). While this directly measures pretraining quality, the team recognizes it’s one-dimensional. A future step is to incorporate a suite of diverse benchmark tests – for example, coding challenges, math word problems, and other NLP tasks – to evaluate models in a more holistic way​. This would mirror how centralized model evaluations are done (with benchmarks like MMLU, HumanEval, etc.). They are also developing synthetic benchmarking datasets specifically for the subnet, which can serve as standardized test sets to objectively measure improvements across iterations​. The goal is that Subnet 9 will not only produce models that do well on the training data, but can be rigorously assessed on a variety of tasks, thus guiding miners to train models that are robust and generally useful. These enhanced evaluations build on work in other Bittensor subnets (for example, a code-focused subnet or a math problem subnet could provide test data for the pretraining models)​.
  • Productization and Use-Case Integration: The broader vision is to make Subnet 9 a foundation for other AI services. The roadmap envisions that startups or researchers could effectively “outsource” their expensive pretraining runs to Subnet 9 – tapping into the decentralized network’s combined compute and expertise​. In practice, this means developing APIs or pipelines where a user can specify a model/dataset and have the subnet train it (or use an existing pretrained model from the subnet) for further fine-tuning. Already, the pretraining subnet is providing base models for Bittensor’s fine-tuning subnet (another subnet focuses on fine-tuning models on specific tasks)​. Over time, they plan to make these base models easily accessible so that new subnets or external projects can bootstrap with a Subnet 9 model instead of starting from scratch. This “directability” could evolve into a marketplace where specific pretraining jobs are directed to the network. Ultimately, by delivering high-quality open models, Subnet 9 aims to attract partnerships with organizations that might otherwise rely on closed AI models, thereby demonstrating real-world economic value of decentralized pertaining​.
  • Fully Decentralized Collaborative Training: While Subnet 9’s current format is competitive (each miner trains their own model in parallel), the team is researching ways to enable collaborative training where many miners collectively train one model (or a shared set of models) in a distributed fashion​. This is a harder problem (as it involves coordinating gradient updates, parameter averaging, etc., securely on untrusted nodes), but it could unlock the ability to train much larger models than a single miner’s hardware could handle. Experiments are “already underway at Macrocosmos” on a prototype where the subnet functions more like a distributed SGD (stochastic gradient descent) system rather than a leaderboard​. The vision is to “evolve the pretraining subnet towards a decentralized training model where miners are collaborating on model development, rather than each developing their own model”​. This might involve techniques from federated learning or swarm learning, adapted to the blockchain context. If achieved, it would mean the network could tackle training tasks beyond the capability of any single participant – truly pooling compute and expertise. The roadmap does not give a set date for this, but emphasizes it as a long-term goal once the competitive framework is thoroughly proven.

 

 

MEDIA

Huge thanks to Keith Singery (aka Bittensor Guru) for all of his fantastic work in the Bittensor community. Make sure to check out his other video/audio interviews by clicking HERE.

Steffen Cruz, previously the CTO of the Opentensor Foundation, has joined forces with his longtime friend Will Squires to establish Macrocosmos. Leading subnets 1, 9, 13, 25 and 37, this team is actively shaping the future of Bittensor and stands as one of the most influential entities within the ecosystem.

In this second video, they spend much of the episode covering Subnet 13’s rebranding to “Gravity” and the team’s prediction of a Trump victory along with how this has managed to build a team of PHDs and machine learning professionals around Bittensor.

A big thank you to Tao Stats for producing these insightful videos in the Novelty Search series. We appreciate the opportunity to dive deep into the groundbreaking work being done by Subnets within Bittensor! Check out some of their other videos HERE.

In this session, the team from Macrocosmos discuss various exciting updates and innovations surrounding their subnets. They delve into the launch and functionalities of new subnets, including the Data Universe subnet for web-scale data scraping and the Apex subnet for advanced AI models. The team highlights the integration of decentralized, real-time data scraping using custom crawlers, which surpass traditional methods in efficiency and scalability. They also explore how the Bittensor community can drive decentralized AI research, leveraging massive data sets and advanced AI training. Further, the discussion touches on the evolution of scientific research within BitTensor, with the introduction of the Mainframe subnet for protein docking and drug discovery. The session concludes with an emphasis on Macrocosmos’ commitment to pushing the boundaries of decentralized AI and computational science, and the potential collaborations that can be achieved in the growing ecosystem.

This Novelty Search session, recorded in late 2024, focuses on the ongoing work and developments of the Macrocosmos team within the Bittensor ecosystem. The team dives into their progress with multiple subnets, including subnet 13, which has become one of the largest social media datasets in the world. They also discuss their work in various sectors, such as decentralized AI, data scraping, and protein folding, which is set to revolutionize fields like healthcare and pharmaceuticals. The team highlights their ongoing efforts to create a decentralized, incentive-driven system that encourages miners to contribute and optimize their computational resources. They also introduce their latest advancements in distributed training, including novel approaches to handling latency and improving gradient averaging. With a focus on data and model quality, the discussion explores how they plan to scale and refine their subnets to provide value for both validators and token holders within the Bit Tensor network. The session ends with an exciting preview of upcoming decentralized games and the future potential of the Bittensor ecosystem.

A special thanks to Mark Jeffrey for his amazing Hash Rate series! In this series, he provides valuable insights into Bittensor Subnets and the world of decentralized AI. Be sure to check out the full series on his YouTube channel for more expert analysis and deep dives.

This session, recorded in mid-2024, features a conversation with Will Squires and Stefan Cruz from Macrocosmos, a prominent player in the Bittensor ecosystem. They discuss their involvement in the decentralized AI network Bittensor, which operates through various subnets that run AI models and create computational power through a decentralized system. Will and Stefan explain their backgrounds, including their expertise in AI, machine learning, and computational research, and how these experiences led them to explore the potential of Bittensor. The conversation covers several exciting aspects, from the challenges of training AI models to the importance of creating a decentralized and transparent AI ecosystem. They also highlight the innovative and competitive nature of Bittensor, where miners are incentivized to innovate and optimize processes, driving the network forward. The discussion touches on both technical details, like the development of the protein folding subnet, and broader concepts such as the future of decentralized AI and the evolving business model of the Bittensor network.

Another video from Mark Jeffery recorded in early 2025. Mark Jeffrey sits down with Stefan and Will from Macrocosmos to discuss the launch of DAOW (the Decentralized Autonomous Oracle Wallet) and the broader changes in the Bittensor ecosystem. The team reflects on the technical success of the DAOW launch, which exceeded expectations in terms of adoption and market dynamics. They delve into the complexities of the subnet launch, including the unexpected volumes and price fluctuations, and discuss how retail investors can approach the market strategically. The conversation also covers the development of Macrocosmos’ subnets, such as their work on protein folding, data scraping, and AI training, which are all part of their “Constellation” platform. This platform aims to unite various subnets into a seamless experience, providing powerful AI tools and decentralized computing. The team emphasizes the need for both strong technical development and effective marketing to make their products accessible and impactful in the growing decentralized AI space.

NEWS

Announcements

Load More