Subnet 105

SoundsRight

ABOUT

What exactly does it do?

SoundsRight, designated as Subnet 105, is dedicated to the research and development of non-proprietary speech enhancement models. As more of our daily lives revolve around consuming online content, there is growing emphasis on high-quality audio. Speech enhancement is a complex field that involves tasks like separating desired speech from background noise, which requires training sophisticated models capable of distinguishing between different audio components under various circumstances.

The fundamental challenge that SoundsRight addresses is that much of speech enhancement technology is currently hidden behind paywalls, despite all necessary components for open-source innovation being readily available. SoundsRight aims to spearhead open-source speech enhancement technology through daily fine-tuning competitions, making high-quality audio processing more accessible to the broader community.

What exactly does it do?

PURPOSE

What exactly is the 'product/build'?

SoundsRight operates as a specialized subnet within the Bittensor decentralized ecosystem, focusing exclusively on speech enhancement technology. The subnet creates a competitive environment where participants (miners) develop and fine-tune speech enhancement models, which are then evaluated by validators to determine the best-performing solutions.

The core function of SoundsRight is to facilitate daily fine-tuning competitions for speech enhancement models. These competitions currently focus on two primary tasks:

Denoising: Removing unwanted background noise from speech recordings while preserving the quality and intelligibility of the desired speech.
Dereverberation: Reducing or eliminating the echo and reverberation effects that occur when audio is recorded in spaces with reflective surfaces.

Each competition follows a winner-takes-all format, which incentivizes miners to submit their absolute best models rather than multiple variations. This format, combined with the validation mechanism, deters miner factions by making model duplication unviable.

How SoundsRight Works

The SoundsRight subnet operates through a well-defined workflow involving miners, validators, HuggingFace (as a model repository), the Bittensor blockchain, and the subnet’s website. Here’s a detailed breakdown of how the system functions:

Miner-Validator Architecture

There are two main entities in the subnet:

Miners: These participants upload fine-tuned speech enhancement models to HuggingFace. Miners are responsible for developing and continuously improving speech enhancement models that can effectively perform denoising or dereverberation tasks.
Validators: These entities benchmark the models and determine which miners’ models perform best. Validators generate fresh datasets daily, download models from HuggingFace, verify model ownership, run benchmarks, and assign scores based on performance metrics.

Competition Workflow

The daily competition process follows these steps:

Miners fine-tune speech enhancement models and upload them to HuggingFace.
Validators generate new benchmarking datasets to ensure models are not susceptible to overfitting.
Validators send synapse requests for model information to the Bittensor chain.
The Bittensor chain returns synapse containing model information.
Validators reference model metadata and confirm model ownership.
Validators download models from HuggingFace.
Validators benchmark models on locally generated datasets.
Validators report benchmarking results to the subnet website.
The subnet website constructs competition leaderboards.
Validators set weights for miners based on performance.

This continuous cycle ensures that models are constantly being improved and evaluated on fresh data, driving innovation in speech enhancement technology.

Technical Architecture

The SoundsRight subnet is built on the Bittensor ecosystem, with a technical architecture designed to facilitate the competition and evaluation process efficiently.

Repository Structure

The codebase is organized into several key directories:

soundsright/: Main code directory containing the core implementation
base/: Contains base classes and utilities for the subnet
core/: Core functionality of the subnet
neurons/: Implementation of validator and miner neurons
benchmarking/: Code for benchmarking models
data/: Data handling utilities
models/: Model definitions and implementations
templates/: Template files
utils/: Utility functions

Core Components

BaseNeuron Class: Handles base operations for both miner and validator neurons.
Validator Neurons: Responsible for benchmarking models and setting weights.
Miner Neurons: Responsible for developing and uploading models.
Configuration System: Manages paths, logging, and other parameters.

Technical Implementation

The implementation uses Python with dependencies including:

argparse for command-line arguments
bittensor for blockchain integration
numpy for numerical operations
Various file system and path handling utilities

Competition Metrics

The subnet currently hosts competitions at a 16 kHz sample rate, with plans to expand to 48 kHz competitions in upcoming updates. The benchmarking metrics used include:

PESQ (Perceptual Evaluation of Speech Quality): 15% of total weights for denoising, 15% for dereverberation
ESTOI (Extended Short-Time Objective Intelligibility): 12.5% of total weights for denoising, 12.5% for dereverberation
SI-SDR (Scale-Invariant Signal-to-Distortion Ratio): 7.5% of total weights for denoising, 7.5% for dereverberation
SI-SAR (Scale-Invariant Signal-to-Artifacts Ratio): 7.5% of total weights for denoising, 7.5% for dereverberation
SI-SIR (Scale-Invariant Signal-to-Interference Ratio): 7.5% of total weights for denoising, 7.5% for dereverberation

These metrics ensure comprehensive evaluation of model performance across different aspects of speech enhancement quality.

What exactly is the 'product/build'?

The core function of SoundsRight is to facilitate daily fine-tuning competitions for speech enhancement models. These competitions currently focus on two primary tasks:

Denoising: Removing unwanted background noise from speech recordings while preserving the quality and intelligibility of the desired speech.
Dereverberation: Reducing or eliminating the echo and reverberation effects that occur when audio is recorded in spaces with reflective surfaces.

How SoundsRight Works

Miner-Validator Architecture

There are two main entities in the subnet:

Miners: These participants upload fine-tuned speech enhancement models to HuggingFace. Miners are responsible for developing and continuously improving speech enhancement models that can effectively perform denoising or dereverberation tasks.
Validators: These entities benchmark the models and determine which miners’ models perform best. Validators generate fresh datasets daily, download models from HuggingFace, verify model ownership, run benchmarks, and assign scores based on performance metrics.

Competition Workflow

The daily competition process follows these steps:

Miners fine-tune speech enhancement models and upload them to HuggingFace.
Validators generate new benchmarking datasets to ensure models are not susceptible to overfitting.
Validators send synapse requests for model information to the Bittensor chain.
The Bittensor chain returns synapse containing model information.
Validators reference model metadata and confirm model ownership.
Validators download models from HuggingFace.
Validators benchmark models on locally generated datasets.
Validators report benchmarking results to the subnet website.
The subnet website constructs competition leaderboards.
Validators set weights for miners based on performance.

This continuous cycle ensures that models are constantly being improved and evaluated on fresh data, driving innovation in speech enhancement technology.

Technical Architecture

The SoundsRight subnet is built on the Bittensor ecosystem, with a technical architecture designed to facilitate the competition and evaluation process efficiently.

Repository Structure

The codebase is organized into several key directories:

soundsright/: Main code directory containing the core implementation
base/: Contains base classes and utilities for the subnet
core/: Core functionality of the subnet
neurons/: Implementation of validator and miner neurons
benchmarking/: Code for benchmarking models
data/: Data handling utilities
models/: Model definitions and implementations
templates/: Template files
utils/: Utility functions

Core Components

BaseNeuron Class: Handles base operations for both miner and validator neurons.
Validator Neurons: Responsible for benchmarking models and setting weights.
Miner Neurons: Responsible for developing and uploading models.
Configuration System: Manages paths, logging, and other parameters.

Technical Implementation

The implementation uses Python with dependencies including:

argparse for command-line arguments
bittensor for blockchain integration
numpy for numerical operations
Various file system and path handling utilities

Competition Metrics

The subnet currently hosts competitions at a 16 kHz sample rate, with plans to expand to 48 kHz competitions in upcoming updates. The benchmarking metrics used include:

PESQ (Perceptual Evaluation of Speech Quality): 15% of total weights for denoising, 15% for dereverberation
ESTOI (Extended Short-Time Objective Intelligibility): 12.5% of total weights for denoising, 12.5% for dereverberation
SI-SDR (Scale-Invariant Signal-to-Distortion Ratio): 7.5% of total weights for denoising, 7.5% for dereverberation
SI-SAR (Scale-Invariant Signal-to-Artifacts Ratio): 7.5% of total weights for denoising, 7.5% for dereverberation
SI-SIR (Scale-Invariant Signal-to-Interference Ratio): 7.5% of total weights for denoising, 7.5% for dereverberation

These metrics ensure comprehensive evaluation of model performance across different aspects of speech enhancement quality.

WHO

Team Info

Based on the GitHub repository, the project is maintained by the @synapsec-ai/subnet-owners team. Contributors visible in the GitHub interface include:

m4k1-dev
ceterum1

Team Info

Based on the GitHub repository, the project is maintained by the @synapsec-ai/subnet-owners team. Contributors visible in the GitHub interface include:

m4k1-dev
ceterum1

FUTURE

Roadmap

The subnet uses semantic versioning (Major.Minor.Patch) with specific implications for each release type:

Major Releases (X.0.0):

May include breaking changes
Updates are mandatory for all subnet users
The weights_version hyperparameter is adjusted immediately after release
Major releases are communicated at least 1 week in advance
Registration may be disabled for up to 24 hours

Minor Releases (0.X.0):

May include breaking changes
If breaking changes are included, updates are announced at least 48 hours in advance
Otherwise, a minimum of 24-hour notice is given
Updates are mandatory for all subnet users
Registration may be disabled for up to 24 hours

Patch Releases (0.0.X):

Do not contain breaking changes
Updates are not mandatory unless they include hotfixes for scoring or penalty algorithms
Releases without changes to scoring or penalty algorithms are pushed without prior notice

Version Milestones

SoundsRight v1.0.0

Register on testnet
16 kHz competitions for denoising and dereverberation tasks

SoundsRight v1.1.0

SoundsRight v2.0.0

TTS generation upgrade
48 kHz competitions for denoising and dereverberation tasks

SoundsRight v3.0.0

More utilities provided to miners and validators
Validator performance dashboards

SoundsRight v4.0.0

Complete subnet overhaul to a monetized API

Long-term Vision

The current goal for the subnet is to facilitate open-source research and development of state-of-the-art speech enhancement models. The documentation acknowledges that there is potential to create far more open-source work in this field.
The ultimate goal of the subnet is to create a monetized product in the form of an API. However, to make the product as competitive as possible, the subnet’s first goal is to create a large body of work for miners to draw their inspiration from.

Roadmap

The subnet uses semantic versioning (Major.Minor.Patch) with specific implications for each release type:

Major Releases (X.0.0):

May include breaking changes
Updates are mandatory for all subnet users
The weights_version hyperparameter is adjusted immediately after release
Major releases are communicated at least 1 week in advance
Registration may be disabled for up to 24 hours

Minor Releases (0.X.0):

May include breaking changes
If breaking changes are included, updates are announced at least 48 hours in advance
Otherwise, a minimum of 24-hour notice is given
Updates are mandatory for all subnet users
Registration may be disabled for up to 24 hours

Patch Releases (0.0.X):

Do not contain breaking changes
Updates are not mandatory unless they include hotfixes for scoring or penalty algorithms
Releases without changes to scoring or penalty algorithms are pushed without prior notice

Version Milestones

SoundsRight v1.0.0

Register on testnet
16 kHz competitions for denoising and dereverberation tasks

SoundsRight v1.1.0

SoundsRight v2.0.0

TTS generation upgrade
48 kHz competitions for denoising and dereverberation tasks

SoundsRight v3.0.0

More utilities provided to miners and validators
Validator performance dashboards

SoundsRight v4.0.0

Complete subnet overhaul to a monetized API

Long-term Vision

Subnet 105

SoundsRight

ABOUT

What exactly does it do?

PURPOSE

What exactly is the 'product/build'?

WHO

Team Info

FUTURE

Roadmap

View other subnets