With the amount of new subnets being added it can be hard to get up to date information across all subnets, so data may be slightly out of date from time to time

Subnet 32

It’s AI

Emissions
Value
Recycled
Value
Recycled (24h)
Value
Registration Cost
Value
Active Validators
Value
Active Miners
Value
Active Dual Miners/Validators
Value

ABOUT

What exactly does it do?

Subnet 32 focuses on detecting AI-generated content amid the rapid growth of Large Language Models (LLMs), such as ChatGPT producing 100 billion words daily compared to humans’ 100 trillion. As AI-generated text becomes ubiquitous, accurately discerning its origin is increasingly crucial.

To address this challenge, they have developed a subnet that incentivizes distributed solutions for identifying LLM-generated content. This includes defining incentive mechanisms, validation processes, and establishing a baseline model for miners.

Subnet 32 offers a front end that can determine if text input is AI-generated or human-authored. This tool is valuable for verifying data authenticity, particularly given the rise of large language models in various applications. Aside from AI detection, Subnet 32’s capabilities extend to various fields. From aiding ML engineers in filtering data for model training to assisting educators in detecting AI-generated student work, the subnet provides versatile tools for diverse user needs.

Subnet 32 focuses on detecting AI-generated content amid the rapid growth of Large Language Models (LLMs), such as ChatGPT producing 100 billion words daily compared to humans’ 100 trillion. As AI-generated text becomes ubiquitous, accurately discerning its origin is increasingly crucial.

To address this challenge, they have developed a subnet that incentivizes distributed solutions for identifying LLM-generated content. This includes defining incentive mechanisms, validation processes, and establishing a baseline model for miners.

Subnet 32 offers a front end that can determine if text input is AI-generated or human-authored. This tool is valuable for verifying data authenticity, particularly given the rise of large language models in various applications. Aside from AI detection, Subnet 32’s capabilities extend to various fields. From aiding ML engineers in filtering data for model training to assisting educators in detecting AI-generated student work, the subnet provides versatile tools for diverse user needs.

PURPOSE

What exactly is the 'product/build'?

This Subnet is crucial in several scenarios. In schools, teachers need to distinguish between student-completed assignments and those done by AI. Bloggers and social media users aim to maintain authentic comment sections, free from AI-generated spam. Companies rely on identifying genuine job applications over AI-generated ones. Additionally, in more critical contexts, this technology aids in detecting fraudulent emails from scammers.

The project team tested their system and confirmed its ability to accurately identify AI-written text approximately 85% of the time, with minimal errors in mislabeling human-written text as AI. This marks significant progress in ensuring that despite advancements in AI writing capabilities, authenticity remains preserved online.

 

Validation Mechanism and Miner Evolution

The subnet uses a clever method to secure the validation process. Validators modify a database of 18 million human-authored texts slightly, preventing miners from easily deriving responses solely from existing data. Bittensor faced challenges with miners manipulating responses from models in Subnet 1, leading to the exploitation of computational resources. Subnet 32, however, employs a strategy where validators alter human-generated texts, making it difficult for miners to train on predetermined text.

Early miners raised concerns that reward models using language models could limit miner evolution, as fine-tuning responses solely for the existing model could hinder progress. Subnet 32 tackles this by utilizing known truths (human or AI generated) rather than a scoring mechanism.

 

AI Detection

The subnet’s AI detection tool holds significance in educational settings, where it helps teachers identify AI-generated submissions, curbing academic dishonesty. Furthermore, it can filter out automated comments on social media, preventing attention-seeking bots from influencing authentic interactions. The team found the idea to use Bittensor for AI detection through one of the team members who had knowledge of the rapid growth of Bittensor and expertise in machine learning. The idea emerged from the need for a tool connected to AI that could be extended and have substantial resources, making it a valuable application in the Bittensor network. Initial discussions led to the focus on detecting whether text was written by a human, aligning with the increasing demand for such capabilities with the rise of large language models.

 

Text Authenticity

Utilizing probability likelihood calculations, the project assesses text authenticity by comparing generated text with target content to ensure AI model usage. By evaluating probabilities of word sequences, a probability value (PPL) is derived to determine if text is AI-generated. Using a threshold alone is insufficient since it lacks normalization and accuracy in determining the probability of text being AI-generated or human-written. The system compares actual and predicted words in sequences, assessing the likelihood that a word is AI-generated based on previous words through loss calculations.

Text is split into chunks, loss is calculated within each chunk, and this loss is used in a linear model to provide a normalized and robotic probability for the generated text.

 

Use of “No Lama” Tool

The subnet utilizes “No Lama” as an aggregator for large language models, optimizing them for faster performance. “No Lama” allows for easy utilization of over 30,000 optimized language models, making it a valuable tool for prompt generation.

 

Validator and Miner Roles

Validators generate text prompts and human reference data for miners, utilizing distinct data sets to ensure authenticity and prevent training off stored text.

Miners assess the generated text for authenticity by leveraging advanced AI models to discern AI-generated content from human-written text. To prevent miners from training off the open data set, text augmentations like misspellings and alterations are added to ensure uniqueness and prevent memorization. The subnet aims to motivate miners by offering a competitive platform where miners pursue improvements in the system to identify AI-generated text effectively. Baseline models are offered to miners, who can either use these models as is or improve upon them through machine learning techniques to enhance text identification accuracy.

This Subnet is crucial in several scenarios. In schools, teachers need to distinguish between student-completed assignments and those done by AI. Bloggers and social media users aim to maintain authentic comment sections, free from AI-generated spam. Companies rely on identifying genuine job applications over AI-generated ones. Additionally, in more critical contexts, this technology aids in detecting fraudulent emails from scammers.

The project team tested their system and confirmed its ability to accurately identify AI-written text approximately 85% of the time, with minimal errors in mislabeling human-written text as AI. This marks significant progress in ensuring that despite advancements in AI writing capabilities, authenticity remains preserved online.

 

Validation Mechanism and Miner Evolution

The subnet uses a clever method to secure the validation process. Validators modify a database of 18 million human-authored texts slightly, preventing miners from easily deriving responses solely from existing data. Bittensor faced challenges with miners manipulating responses from models in Subnet 1, leading to the exploitation of computational resources. Subnet 32, however, employs a strategy where validators alter human-generated texts, making it difficult for miners to train on predetermined text.

Early miners raised concerns that reward models using language models could limit miner evolution, as fine-tuning responses solely for the existing model could hinder progress. Subnet 32 tackles this by utilizing known truths (human or AI generated) rather than a scoring mechanism.

 

AI Detection

The subnet’s AI detection tool holds significance in educational settings, where it helps teachers identify AI-generated submissions, curbing academic dishonesty. Furthermore, it can filter out automated comments on social media, preventing attention-seeking bots from influencing authentic interactions. The team found the idea to use Bittensor for AI detection through one of the team members who had knowledge of the rapid growth of Bittensor and expertise in machine learning. The idea emerged from the need for a tool connected to AI that could be extended and have substantial resources, making it a valuable application in the Bittensor network. Initial discussions led to the focus on detecting whether text was written by a human, aligning with the increasing demand for such capabilities with the rise of large language models.

 

Text Authenticity

Utilizing probability likelihood calculations, the project assesses text authenticity by comparing generated text with target content to ensure AI model usage. By evaluating probabilities of word sequences, a probability value (PPL) is derived to determine if text is AI-generated. Using a threshold alone is insufficient since it lacks normalization and accuracy in determining the probability of text being AI-generated or human-written. The system compares actual and predicted words in sequences, assessing the likelihood that a word is AI-generated based on previous words through loss calculations.

Text is split into chunks, loss is calculated within each chunk, and this loss is used in a linear model to provide a normalized and robotic probability for the generated text.

 

Use of “No Lama” Tool

The subnet utilizes “No Lama” as an aggregator for large language models, optimizing them for faster performance. “No Lama” allows for easy utilization of over 30,000 optimized language models, making it a valuable tool for prompt generation.

 

Validator and Miner Roles

Validators generate text prompts and human reference data for miners, utilizing distinct data sets to ensure authenticity and prevent training off stored text.

Miners assess the generated text for authenticity by leveraging advanced AI models to discern AI-generated content from human-written text. To prevent miners from training off the open data set, text augmentations like misspellings and alterations are added to ensure uniqueness and prevent memorization. The subnet aims to motivate miners by offering a competitive platform where miners pursue improvements in the system to identify AI-generated text effectively. Baseline models are offered to miners, who can either use these models as is or improve upon them through machine learning techniques to enhance text identification accuracy.

WHO

Team Info

While some team members still hold other jobs, most are dedicated full-time to building and enhancing the subnet. The team comprises individuals specializing in various areas such as data science, crypto, and business, each contributing their expertise to the project.

While some team members still hold other jobs, most are dedicated full-time to building and enhancing the subnet. The team comprises individuals specializing in various areas such as data science, crypto, and business, each contributing their expertise to the project.

FUTURE

Roadmap

Currently, many subnets have implemented state-of-the-art (SOTA) models in their miner codes to achieve high quality instantly for their tasks. Improving these solutions could only yield marginal improvements for miners, leaving little room for growth. This subnet takes a different approach. Detecting AI-generated content to high quality is a challenging task. Instead of creating just a marketplace for inference SOTA models like other subnets, they aim to establish a continually evolving environment where miners must progressively improve over time, rather than running the same models for extended periods.

To implement such an environment, the following steps are necessary:

 

Validators

Validators currently use a single large dataset with human data and two models (Mistral and Vicuna) for generating AI texts. To enhance this:

  1. Use softmax on miners’ scores for higher miners motivation
  2. Add more models. By increasing the number and diversity of models, they will improve the overall quality of detection
  3. Add more languages
  4. Paraphrasing of AI texts
  5. Make it resilient to tricks and attacks
  6. Various types of text: differentiate articles/comments/posts/etc., in order to improve quality on each distinct type
  7. Save all data that validator generates into cloud to make an open-source dataset in future

 

Miners

While miners’ improvement is primarily their responsibility, there are actions this Subnet can take to support them:

  1. Host testnet validators so miners can start without wasting TAO.
  2. Make leaderboard and local dataset: they will list miners’ metrics and allow people who want to start mining to evaluate their solution on a local dataset to compare them with existing ones before going to the mainnet.
  3. Create Kaggle competition to introduce some of the best ML engineers to their subnet and make them run their top solution on-chain.
  4. Despite the fact that solving LLM detection is a miner’s problem, they are going to continue their own research in this field to improve baseline solution and increase the overall subnet quality.

 

Applications

Deploying this subnet for practical use is crucial. Given the demand for solutions to this problem, plans include:

 

Web service: Developing a full version of their website where users, including those outside the Bittensor community, can detect AI-generated texts.

 

Twitter extension: Creating an extension for Twitter to label tweets and comments as AI-generated or human-written based on predictions from our subnet, helping users identify content quality.

 

Browser extension: Developing a browser extension that allows users to instantly check whether text is AI-generated or human-written by highlighting it.

 

API: Providing an API service for developers to integrate LLM detection into their applications or process large amounts of text for AI engineers cleaning up datasets.

 

Commerce

Each service mentioned above will offer subscription plans to monetize Subnet 32. These plans will be based on an API managed by validators to grant miners access, thereby allowing validators to earn additional revenue.

By commercializing this product, they aim to reduce reliance on emissions and establish real-world usage. This strategy will ensure their token has significant utility by the time dynamic TAO is introduced and validator emissions are phased out, allowing validators to earn from the services mentioned above.

Currently, many subnets have implemented state-of-the-art (SOTA) models in their miner codes to achieve high quality instantly for their tasks. Improving these solutions could only yield marginal improvements for miners, leaving little room for growth. This subnet takes a different approach. Detecting AI-generated content to high quality is a challenging task. Instead of creating just a marketplace for inference SOTA models like other subnets, they aim to establish a continually evolving environment where miners must progressively improve over time, rather than running the same models for extended periods.

To implement such an environment, the following steps are necessary:

 

Validators

Validators currently use a single large dataset with human data and two models (Mistral and Vicuna) for generating AI texts. To enhance this:

  1. Use softmax on miners’ scores for higher miners motivation
  2. Add more models. By increasing the number and diversity of models, they will improve the overall quality of detection
  3. Add more languages
  4. Paraphrasing of AI texts
  5. Make it resilient to tricks and attacks
  6. Various types of text: differentiate articles/comments/posts/etc., in order to improve quality on each distinct type
  7. Save all data that validator generates into cloud to make an open-source dataset in future

 

Miners

While miners’ improvement is primarily their responsibility, there are actions this Subnet can take to support them:

  1. Host testnet validators so miners can start without wasting TAO.
  2. Make leaderboard and local dataset: they will list miners’ metrics and allow people who want to start mining to evaluate their solution on a local dataset to compare them with existing ones before going to the mainnet.
  3. Create Kaggle competition to introduce some of the best ML engineers to their subnet and make them run their top solution on-chain.
  4. Despite the fact that solving LLM detection is a miner’s problem, they are going to continue their own research in this field to improve baseline solution and increase the overall subnet quality.

 

Applications

Deploying this subnet for practical use is crucial. Given the demand for solutions to this problem, plans include:

 

Web service: Developing a full version of their website where users, including those outside the Bittensor community, can detect AI-generated texts.

 

Twitter extension: Creating an extension for Twitter to label tweets and comments as AI-generated or human-written based on predictions from our subnet, helping users identify content quality.

 

Browser extension: Developing a browser extension that allows users to instantly check whether text is AI-generated or human-written by highlighting it.

 

API: Providing an API service for developers to integrate LLM detection into their applications or process large amounts of text for AI engineers cleaning up datasets.

 

Commerce

Each service mentioned above will offer subscription plans to monetize Subnet 32. These plans will be based on an API managed by validators to grant miners access, thereby allowing validators to earn additional revenue.

By commercializing this product, they aim to reduce reliance on emissions and establish real-world usage. This strategy will ensure their token has significant utility by the time dynamic TAO is introduced and validator emissions are phased out, allowing validators to earn from the services mentioned above.

MEDIA

Huge thanks to Keith Singery (aka Bittensor Guru) for all of his fantastic work in the Bittensor community. Make sure to check out his other video/audio interviews by clicking HERE.

In this audio interview, Keith interviews one of the lead developers of Subnet 32, Sergey. This subnet enables users to test whether a text block was generated by an LLM or a human. It’s a fascinating technology, and they’ve integrated some truly innovative aspects into their validation mechanism, which Keith delves into with enthusiasm.

NEWS

Announcements

MORE INFO

Useful Links