Neural Audio Compression With Encodec

1

AudioCraftRepository55/100

Meta's library for music and audio generation.

Unique: Uses residual vector quantization across multiple codebooks (typically 4) to represent audio at different frequency bands and temporal resolutions, enabling variable bitrate compression while maintaining perceptual quality. Trained end-to-end with adversarial loss for realistic reconstruction.

vs others: Achieves better perceptual quality than traditional codecs (MP3, AAC) at equivalent bitrates and enables discrete token representation required for language model-based generation; more efficient than raw waveform processing.

2

BarkRepository55/100

via “encodec-based neural audio waveform reconstruction”

Open-source text-to-audio — speech, music, sound effects, 13+ languages, runs locally.

Unique: Leverages Facebook's EnCodec neural codec for efficient, high-quality waveform reconstruction from discrete tokens, enabling end-to-end generative audio without traditional vocoder artifacts

vs others: Neural codec approach produces fewer artifacts than traditional vocoders (WaveGlow, HiFi-GAN); learned compression maintains perceptual quality at lower bitrates than hand-crafted codecs

3

AudioCraftRepository26/100

via “audio codec compression with discrete token representation”

A single-stop code base for generative audio needs, by Meta. Includes MusicGen for music and AudioGen for sounds. #opensource

Unique: Combines convolutional autoencoders with vector quantization to create a learned codec that produces discrete tokens suitable for language model training, rather than using traditional codecs (MP3, AAC) or continuous latent representations that don't integrate naturally with transformer architectures

vs others: More efficient than raw waveform generation because it reduces sequence length by 50-100x, and more flexible than traditional audio codecs because the discrete representation is learned end-to-end for the downstream task rather than optimized for human perception alone

4

High Fidelity Neural Audio Compression (EnCodec)Product22/100

via “real-time streaming audio encoding with quantized latent representation”

* ⭐ 12/2022: [Robust Speech Recognition via Large-Scale Weak Supervision (Whisper)](https://arxiv.org/abs/2212.04356)

Unique: Uses a single multiscale spectrogram adversary instead of traditional multi-discriminator approaches, combined with a novel loss balancer mechanism that decouples loss weight from loss scale, enabling more stable training of the quantized latent space. Streaming architecture supports real-time encoding/decoding without buffering entire audio segments.

vs others: Outperforms baseline codecs across speech, noisy speech, and music domains according to MUSHRA subjective evaluation, while maintaining real-time performance on standard hardware — a capability gap for traditional neural codecs that typically require offline processing or significant computational overhead.

5

BarkRepository21/100

via “encodec-based audio tokenization and reconstruction”

A transformer-based text-to-audio model. #opensource

Top Matches

Also Known As

Company