Skip to content

ICASSP 2024 - Generative De-Quantization for Neural Speech Codec via Latent Diffusion.

haiciyang/LaDiffCodec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 

Repository files navigation

LaDiffCodec

Open-sourced codes for paper - GENERATIVE DE-QUANTIZATION FOR NEURAL SPEECH CODEC VIA LATENT DIFFUSION (Accepted by ICASSP 2024)

Cite as: Yang, Haici, Inseon Jang, and Minje Kim. "Generative De-Quantization for Neural Speech Codec Via Latent Diffusion." ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024.

Prerequisites

Environment

pip install -r requirements.txt

Data

Librispeech

Dependencies

Hyper-Parameters:

Symbol Description
run_diff Running diffusion model
diff_dims Dimension of input feature to the diffusion model
cond_quantization Whether the condition features should be quantized . Turn it on when training diffusion model on codecs.
cond_bandwidth The designated bitrate of this codec model
scaling_feature Apply scaling on each feature map only
scaling_global Apply scaling globally
ratios The downsampling ratios of encoder (and decoder)

Pretrained Checkpoints:

We provided a pretrained LaDiffCodec checkpoint with scalable bitrates at link. The bitrates can be chosen from 1.5kbps, 3kbps, 6kbps, 9kbps, 12kbps.

To use the pretrained models -
python -m srcs.main --synthesis --load_model [path]/0907_diffusor.amlt --continuous_AE [path]/continuous_AE.amlt --discrete_AE [path]/discrete_AE.amlt --cond_bandwidth [BANDWIDTH] --diff_dims 256 --input_dir [INPUT_DIR] --output_dir [OUTPUT_DIR] --orig_sampling

You can also remove --orig_sampling to use midway infilling for a much faster sampling, with a slight compromise to the quality.

About

ICASSP 2024 - Generative De-Quantization for Neural Speech Codec via Latent Diffusion.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages