Self Supervised Pre Training On Unlabeled Speech And Text Corpora

1

wav2vec2-base-960hModel51/100

via “multilingual-transfer-learning-through-pretrained-representations”

automatic-speech-recognition model by undefined. 12,10,723 downloads.

Unique: Leverages self-supervised pretraining on unlabeled audio to learn language-agnostic acoustic representations that transfer across languages — the feature extractor learns universal speech patterns (pitch, formants, spectral dynamics) without linguistic supervision, enabling zero-shot transfer to unseen languages

vs others: Requires 10-100x less labeled data for new languages compared to training supervised ASR from scratch because the pretrained feature extractor already captures acoustic patterns, and outperforms language-specific models trained on equivalent amounts of data due to the quality of self-supervised pretraining

2

w2v-bert-2.0Model49/100

via “self-supervised acoustic representation learning without labeled data”

feature-extraction model by undefined. 33,41,362 downloads.

Unique: Combines wav2vec2's contrastive learning (predicting masked frames from context) with BERT's masked language modeling on speech, creating a dual-objective pretraining approach that learns both acoustic and contextual patterns without labels — unlike supervised models requiring phoneme or speaker annotations

vs others: Eliminates annotation requirements compared to supervised acoustic models, while providing better generalization than single-objective self-supervised approaches (wav2vec2 alone) due to dual pretraining objectives

3

SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language... (SpeechT5)Product24/100

via “self-supervised pre-training on unlabeled speech and text corpora”

* ⭐ 06/2022: [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing (WavLM)](https://ieeexplore.ieee.org/abstract/document/9814838)

Unique: Uses random mixing of speech/text latent states with vector quantization as the pre-training objective, forcing modality-agnostic semantic learning rather than modality-specific pre-training. This approach enables a single model to handle multiple speech tasks without separate task-specific pre-training.

vs others: Unified cross-modal pre-training enables knowledge transfer between speech and text tasks compared to separate speech-only (WavLM, HuBERT) and text-only (BERT, GPT) pre-training, though specific improvements in downstream task performance are not documented in the abstract.

4

BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for ASR (BigSSL)Product21/100

via “large-scale semi-supervised asr pre-training with unlabeled audio”

* ⭐ 08/2022: [MuLan: A Joint Embedding of Music Audio and Natural Language (MuLan)](https://arxiv.org/abs/2208.12415)

Unique: Combines three-stage pipeline (SSL pre-training → self-training → fine-tuning) on 8B-parameter Conformer models trained on 1M hours of unlabeled audio, achieving state-of-the-art ASR with only 3% of typical labeled training data; specific SSL objective and self-training methodology not disclosed but represents frontier-scale semi-supervised approach for speech

vs others: Achieves better ASR performance than supervised-only baselines while requiring 97% less labeled data, outperforming prior state-of-the-art when using full training sets; advantage over alternatives depends on access to massive unlabeled audio corpora and computational resources

5

CS 601.471/671 NLP: Self-supervised Models - Johns Hopkins UniversityProduct18/100

via “self-supervised nlp model training curriculum”

![](https://img.shields.io/badge/Level-Medium-yellow)

Unique: University-level curriculum specifically focused on self-supervised NLP at Johns Hopkins, combining theoretical foundations with hands-on implementation of techniques like masked prediction, contrastive objectives (SimCLR, MoCo), and momentum-based learning — taught by NLP researchers actively publishing in this space

vs others: Deeper theoretical grounding and research-oriented perspective compared to industry bootcamp courses; provides access to cutting-edge self-supervised techniques before they become mainstream, with faculty expertise in representation learning

6

SynthetaicProduct

via “self-supervised-model-training”

Top Matches

Also Known As

Company