With the amount of new subnets being added it can be hard to get up to date information across all subnets, so data may be slightly out of date from time to time
Subnet 32 focuses on detecting AI-generated content amid the rapid growth of Large Language Models (LLMs), such as ChatGPT producing 100 billion words daily compared to humans’ 100 trillion. As AI-generated text becomes ubiquitous, accurately discerning its origin is increasingly crucial.
To address this challenge, they have developed a subnet that incentivizes distributed solutions for identifying LLM-generated content. This includes defining incentive mechanisms, validation processes, and establishing a baseline model for miners.
Subnet 32 offers a front end that can determine if text input is AI-generated or human-authored. This tool is valuable for verifying data authenticity, particularly given the rise of large language models in various applications. Aside from AI detection, Subnet 32’s capabilities extend to various fields. From aiding ML engineers in filtering data for model training to assisting educators in detecting AI-generated student work, the subnet provides versatile tools for diverse user needs.
Subnet 32 focuses on detecting AI-generated content amid the rapid growth of Large Language Models (LLMs), such as ChatGPT producing 100 billion words daily compared to humans’ 100 trillion. As AI-generated text becomes ubiquitous, accurately discerning its origin is increasingly crucial.
To address this challenge, they have developed a subnet that incentivizes distributed solutions for identifying LLM-generated content. This includes defining incentive mechanisms, validation processes, and establishing a baseline model for miners.
Subnet 32 offers a front end that can determine if text input is AI-generated or human-authored. This tool is valuable for verifying data authenticity, particularly given the rise of large language models in various applications. Aside from AI detection, Subnet 32’s capabilities extend to various fields. From aiding ML engineers in filtering data for model training to assisting educators in detecting AI-generated student work, the subnet provides versatile tools for diverse user needs.
This Subnet is crucial in several scenarios. In schools, teachers need to distinguish between student-completed assignments and those done by AI. Bloggers and social media users aim to maintain authentic comment sections, free from AI-generated spam. Companies rely on identifying genuine job applications over AI-generated ones. Additionally, in more critical contexts, this technology aids in detecting fraudulent emails from scammers.
The project team tested their system and confirmed its ability to accurately identify AI-written text approximately 85% of the time, with minimal errors in mislabeling human-written text as AI. This marks significant progress in ensuring that despite advancements in AI writing capabilities, authenticity remains preserved online.
Validation Mechanism and Miner Evolution
The subnet uses a clever method to secure the validation process. Validators modify a database of 18 million human-authored texts slightly, preventing miners from easily deriving responses solely from existing data. Bittensor faced challenges with miners manipulating responses from models in Subnet 1, leading to the exploitation of computational resources. Subnet 32, however, employs a strategy where validators alter human-generated texts, making it difficult for miners to train on predetermined text.
Early miners raised concerns that reward models using language models could limit miner evolution, as fine-tuning responses solely for the existing model could hinder progress. Subnet 32 tackles this by utilizing known truths (human or AI generated) rather than a scoring mechanism.
AI Detection
The subnet’s AI detection tool holds significance in educational settings, where it helps teachers identify AI-generated submissions, curbing academic dishonesty. Furthermore, it can filter out automated comments on social media, preventing attention-seeking bots from influencing authentic interactions. The team found the idea to use Bittensor for AI detection through one of the team members who had knowledge of the rapid growth of Bittensor and expertise in machine learning. The idea emerged from the need for a tool connected to AI that could be extended and have substantial resources, making it a valuable application in the Bittensor network. Initial discussions led to the focus on detecting whether text was written by a human, aligning with the increasing demand for such capabilities with the rise of large language models.
Text Authenticity
Utilizing probability likelihood calculations, the project assesses text authenticity by comparing generated text with target content to ensure AI model usage. By evaluating probabilities of word sequences, a probability value (PPL) is derived to determine if text is AI-generated. Using a threshold alone is insufficient since it lacks normalization and accuracy in determining the probability of text being AI-generated or human-written. The system compares actual and predicted words in sequences, assessing the likelihood that a word is AI-generated based on previous words through loss calculations.
Text is split into chunks, loss is calculated within each chunk, and this loss is used in a linear model to provide a normalized and robotic probability for the generated text.
Use of “No Lama” Tool
The subnet utilizes “No Lama” as an aggregator for large language models, optimizing them for faster performance. “No Lama” allows for easy utilization of over 30,000 optimized language models, making it a valuable tool for prompt generation.
Validator and Miner Roles
Validators generate text prompts and human reference data for miners, utilizing distinct data sets to ensure authenticity and prevent training off stored text.
Miners assess the generated text for authenticity by leveraging advanced AI models to discern AI-generated content from human-written text. To prevent miners from training off the open data set, text augmentations like misspellings and alterations are added to ensure uniqueness and prevent memorization. The subnet aims to motivate miners by offering a competitive platform where miners pursue improvements in the system to identify AI-generated text effectively. Baseline models are offered to miners, who can either use these models as is or improve upon them through machine learning techniques to enhance text identification accuracy.
This Subnet is crucial in several scenarios. In schools, teachers need to distinguish between student-completed assignments and those done by AI. Bloggers and social media users aim to maintain authentic comment sections, free from AI-generated spam. Companies rely on identifying genuine job applications over AI-generated ones. Additionally, in more critical contexts, this technology aids in detecting fraudulent emails from scammers.
The project team tested their system and confirmed its ability to accurately identify AI-written text approximately 85% of the time, with minimal errors in mislabeling human-written text as AI. This marks significant progress in ensuring that despite advancements in AI writing capabilities, authenticity remains preserved online.
Validation Mechanism and Miner Evolution
The subnet uses a clever method to secure the validation process. Validators modify a database of 18 million human-authored texts slightly, preventing miners from easily deriving responses solely from existing data. Bittensor faced challenges with miners manipulating responses from models in Subnet 1, leading to the exploitation of computational resources. Subnet 32, however, employs a strategy where validators alter human-generated texts, making it difficult for miners to train on predetermined text.
Early miners raised concerns that reward models using language models could limit miner evolution, as fine-tuning responses solely for the existing model could hinder progress. Subnet 32 tackles this by utilizing known truths (human or AI generated) rather than a scoring mechanism.
AI Detection
The subnet’s AI detection tool holds significance in educational settings, where it helps teachers identify AI-generated submissions, curbing academic dishonesty. Furthermore, it can filter out automated comments on social media, preventing attention-seeking bots from influencing authentic interactions. The team found the idea to use Bittensor for AI detection through one of the team members who had knowledge of the rapid growth of Bittensor and expertise in machine learning. The idea emerged from the need for a tool connected to AI that could be extended and have substantial resources, making it a valuable application in the Bittensor network. Initial discussions led to the focus on detecting whether text was written by a human, aligning with the increasing demand for such capabilities with the rise of large language models.
Text Authenticity
Utilizing probability likelihood calculations, the project assesses text authenticity by comparing generated text with target content to ensure AI model usage. By evaluating probabilities of word sequences, a probability value (PPL) is derived to determine if text is AI-generated. Using a threshold alone is insufficient since it lacks normalization and accuracy in determining the probability of text being AI-generated or human-written. The system compares actual and predicted words in sequences, assessing the likelihood that a word is AI-generated based on previous words through loss calculations.
Text is split into chunks, loss is calculated within each chunk, and this loss is used in a linear model to provide a normalized and robotic probability for the generated text.
Use of “No Lama” Tool
The subnet utilizes “No Lama” as an aggregator for large language models, optimizing them for faster performance. “No Lama” allows for easy utilization of over 30,000 optimized language models, making it a valuable tool for prompt generation.
Validator and Miner Roles
Validators generate text prompts and human reference data for miners, utilizing distinct data sets to ensure authenticity and prevent training off stored text.
Miners assess the generated text for authenticity by leveraging advanced AI models to discern AI-generated content from human-written text. To prevent miners from training off the open data set, text augmentations like misspellings and alterations are added to ensure uniqueness and prevent memorization. The subnet aims to motivate miners by offering a competitive platform where miners pursue improvements in the system to identify AI-generated text effectively. Baseline models are offered to miners, who can either use these models as is or improve upon them through machine learning techniques to enhance text identification accuracy.
While some team members still hold other jobs, most are dedicated full-time to building and enhancing the subnet. The team comprises individuals specializing in various areas such as data science, crypto, and business, each contributing their expertise to the project.
While some team members still hold other jobs, most are dedicated full-time to building and enhancing the subnet. The team comprises individuals specializing in various areas such as data science, crypto, and business, each contributing their expertise to the project.
Currently, many subnets have implemented state-of-the-art (SOTA) models in their miner codes to achieve high quality instantly for their tasks. Improving these solutions could only yield marginal improvements for miners, leaving little room for growth. This subnet takes a different approach. Detecting AI-generated content to high quality is a challenging task. Instead of creating just a marketplace for inference SOTA models like other subnets, they aim to establish a continually evolving environment where miners must progressively improve over time, rather than running the same models for extended periods.
To implement such an environment, the following steps are necessary:
Validators
Validators currently use a single large dataset with human data and two models (Mistral and Vicuna) for generating AI texts. To enhance this:
Miners
While miners’ improvement is primarily their responsibility, there are actions this Subnet can take to support them:
Applications
Deploying this subnet for practical use is crucial. Given the demand for solutions to this problem, plans include:
Web service: Developing a full version of their website where users, including those outside the Bittensor community, can detect AI-generated texts.
Twitter extension: Creating an extension for Twitter to label tweets and comments as AI-generated or human-written based on predictions from our subnet, helping users identify content quality.
Browser extension: Developing a browser extension that allows users to instantly check whether text is AI-generated or human-written by highlighting it.
API: Providing an API service for developers to integrate LLM detection into their applications or process large amounts of text for AI engineers cleaning up datasets.
Commerce
Each service mentioned above will offer subscription plans to monetize Subnet 32. These plans will be based on an API managed by validators to grant miners access, thereby allowing validators to earn additional revenue.
By commercializing this product, they aim to reduce reliance on emissions and establish real-world usage. This strategy will ensure their token has significant utility by the time dynamic TAO is introduced and validator emissions are phased out, allowing validators to earn from the services mentioned above.
Currently, many subnets have implemented state-of-the-art (SOTA) models in their miner codes to achieve high quality instantly for their tasks. Improving these solutions could only yield marginal improvements for miners, leaving little room for growth. This subnet takes a different approach. Detecting AI-generated content to high quality is a challenging task. Instead of creating just a marketplace for inference SOTA models like other subnets, they aim to establish a continually evolving environment where miners must progressively improve over time, rather than running the same models for extended periods.
To implement such an environment, the following steps are necessary:
Validators
Validators currently use a single large dataset with human data and two models (Mistral and Vicuna) for generating AI texts. To enhance this:
Miners
While miners’ improvement is primarily their responsibility, there are actions this Subnet can take to support them:
Applications
Deploying this subnet for practical use is crucial. Given the demand for solutions to this problem, plans include:
Web service: Developing a full version of their website where users, including those outside the Bittensor community, can detect AI-generated texts.
Twitter extension: Creating an extension for Twitter to label tweets and comments as AI-generated or human-written based on predictions from our subnet, helping users identify content quality.
Browser extension: Developing a browser extension that allows users to instantly check whether text is AI-generated or human-written by highlighting it.
API: Providing an API service for developers to integrate LLM detection into their applications or process large amounts of text for AI engineers cleaning up datasets.
Commerce
Each service mentioned above will offer subscription plans to monetize Subnet 32. These plans will be based on an API managed by validators to grant miners access, thereby allowing validators to earn additional revenue.
By commercializing this product, they aim to reduce reliance on emissions and establish real-world usage. This strategy will ensure their token has significant utility by the time dynamic TAO is introduced and validator emissions are phased out, allowing validators to earn from the services mentioned above.
Huge thanks to Keith Singery (aka Bittensor Guru) for all of his fantastic work in the Bittensor community. Make sure to check out his other video/audio interviews by clicking HERE.
In this audio interview, Keith interviews one of the lead developers of Subnet 32, Sergey. This subnet enables users to test whether a text block was generated by an LLM or a human. It’s a fascinating technology, and they’ve integrated some truly innovative aspects into their validation mechanism, which Keith delves into with enthusiasm.
Keep ahead of the Bittensor exponential development curve…
Subnet Alpha is an informational platform for Bittensor Subnets.
This site is not affiliated with the Opentensor Foundation or TaoStats.
The content provided on this website is for informational purposes only. We make no guarantees regarding the accuracy or currency of the information at any given time.
Subnet Alpha is created and maintained by The Realistic Trader. If you have any suggestions or encounter any issues, please contact us at [email protected].
Copyright 2024