Skip to content

Let torchaudio.load() and torchaudio.save() rely on load_with_torchcodec() and save_with_torchcodec(). #4039

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
Aug 18, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
2e25279
Add torchcodec mock with wav loading and saving
samanklesaria Jul 18, 2025
fe375f4
Merge branch 'main' into test_wav_hack
NicolasHug Jul 28, 2025
a300221
Let load and save rely on *_with_torchcodec
NicolasHug Jul 16, 2025
07e3b77
install torchcodec in doc job
NicolasHug Jul 16, 2025
92719d3
Add docstring and arguments for load and save
samanklesaria Aug 12, 2025
4a98ee5
Revise docstring
samanklesaria Aug 13, 2025
7b02754
Add typing imports
samanklesaria Aug 13, 2025
74edc0a
Try ffmpeg>4
samanklesaria Aug 13, 2025
80f5eb7
Install conda deps before pip deps
samanklesaria Aug 13, 2025
7f063a6
Add scipy hack for load and save
samanklesaria Aug 13, 2025
700c6c9
Only import scipy during testing
samanklesaria Aug 13, 2025
6995b21
Revert "Install conda deps before pip deps"
samanklesaria Aug 13, 2025
4ab5993
Revert "Try ffmpeg>4"
samanklesaria Aug 13, 2025
43c4602
Revert torchcodec installation changes
samanklesaria Aug 13, 2025
f74f004
Use existing wav_utils
samanklesaria Aug 13, 2025
953fc65
Support frame_offset and num_frames in load hack
samanklesaria Aug 13, 2025
dd3ff90
Use rand instead of randn for test_save_channels_first
samanklesaria Aug 14, 2025
72539b9
Merge branch 'test_wav_hack' into torchcodec_loading
samanklesaria Aug 14, 2025
c94e011
Remove pytest-aware code in src
samanklesaria Aug 14, 2025
b622d82
Remove torchcodec version check
samanklesaria Aug 14, 2025
93351a2
Fix bugs in torchcodec mock
samanklesaria Aug 14, 2025
5407163
Skip test_load_save_torchcodec
samanklesaria Aug 14, 2025
bd7eb52
Correct call to pytest skip
samanklesaria Aug 14, 2025
c3d0cc2
Remove torchcodec installation
samanklesaria Aug 14, 2025
d10fc19
Add torchcodec to build installation
samanklesaria Aug 15, 2025
92fee51
Remove redundant wav_utils
samanklesaria Aug 15, 2025
cc37073
Merge branch 'main' of github.com:pytorch/audio into torchcodec_loading
NicolasHug Aug 18, 2025
2646e59
remove sys
NicolasHug Aug 18, 2025
6c43c04
Add comments
NicolasHug Aug 18, 2025
498ce49
clarify comment
NicolasHug Aug 18, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions .github/scripts/unittest-linux/install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ case $GPU_ARCH_TYPE in
;;
esac
PYTORCH_WHEEL_INDEX="https://download.pytorch.org/whl/${UPLOAD_CHANNEL}/${GPU_ARCH_ID}"
pip install --progress-bar=off --pre torch torchcodec --index-url="${PYTORCH_WHEEL_INDEX}"
pip install --progress-bar=off --pre torch --index-url="${PYTORCH_WHEEL_INDEX}"


# 2. Install torchaudio
Expand All @@ -54,6 +54,5 @@ pip install . -v --no-build-isolation
printf "* Installing test tools\n"
# On this CI, for whatever reason, we're only able to install ffmpeg 4.
conda install -y "ffmpeg<5"
python -c "import torch; import torchaudio; import torchcodec; print(torch.__version__, torchaudio.__version__, torchcodec.__version__)"

pip3 install parameterized requests coverage pytest pytest-cov scipy numpy expecttest
2 changes: 1 addition & 1 deletion .github/workflows/build_docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ jobs:

GPU_ARCH_ID=cu126 # This is hard-coded and must be consistent with gpu-arch-version.
PYTORCH_WHEEL_INDEX="https://download.pytorch.org/whl/${CHANNEL}/${GPU_ARCH_ID}"
pip install --progress-bar=off --pre torch --index-url="${PYTORCH_WHEEL_INDEX}"
pip install --progress-bar=off --pre torch torchcodec --index-url="${PYTORCH_WHEEL_INDEX}"

echo "::endgroup::"
echo "::group::Install TorchAudio"
Expand Down
171 changes: 169 additions & 2 deletions src/torchaudio/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
from torchaudio._internal.module_utils import dropping_io_support, dropping_class_io_support
from typing import Union, BinaryIO, Optional, Tuple
import os
import torch

# Initialize extension and backend first
from . import _extension # noqa # usort: skip
Expand All @@ -7,8 +10,6 @@
get_audio_backend as _get_audio_backend,
info as _info,
list_audio_backends as _list_audio_backends,
load,
save,
set_audio_backend as _set_audio_backend,
)
from ._torchcodec import load_with_torchcodec, save_with_torchcodec
Expand Down Expand Up @@ -41,6 +42,172 @@
pass


def load(
uri: Union[BinaryIO, str, os.PathLike],
frame_offset: int = 0,
num_frames: int = -1,
normalize: bool = True,
channels_first: bool = True,
format: Optional[str] = None,
buffer_size: int = 4096,
backend: Optional[str] = None,
) -> Tuple[torch.Tensor, int]:
"""Load audio data from source using TorchCodec's AudioDecoder.

.. note::

As of TorchAudio 2.9, this function relies on TorchCodec's decoding capabilities under the hood. It is
provided for convenience, but we do recommend that you port your code to
natively use ``torchcodec``'s ``AudioDecoder`` class for better
performance:
https://docs.pytorch.org/torchcodec/stable/generated/torchcodec.decoders.AudioDecoder.
Because of the reliance on Torchcodec, the parameters ``normalize``, ``buffer_size``, and
``backend`` are ignored and accepted only for backwards compatibility.


Args:
uri (path-like object or file-like object):
Source of audio data. The following types are accepted:

* ``path-like``: File path or URL.
* ``file-like``: Object with ``read(size: int) -> bytes`` method.

frame_offset (int, optional):
Number of samples to skip before start reading data.
num_frames (int, optional):
Maximum number of samples to read. ``-1`` reads all the remaining samples,
starting from ``frame_offset``.
normalize (bool, optional):
TorchCodec always returns normalized float32 samples. This parameter
is ignored and a warning is issued if set to False.
Default: ``True``.
channels_first (bool, optional):
When True, the returned Tensor has dimension `[channel, time]`.
Otherwise, the returned Tensor's dimension is `[time, channel]`.
format (str or None, optional):
Format hint for the decoder. May not be supported by all TorchCodec
decoders. (Default: ``None``)
buffer_size (int, optional):
Not used by TorchCodec AudioDecoder. Provided for API compatibility.
backend (str or None, optional):
Not used by TorchCodec AudioDecoder. Provided for API compatibility.

Returns:
(torch.Tensor, int): Resulting Tensor and sample rate.
Always returns float32 tensors. If ``channels_first=True``, shape is
`[channel, time]`, otherwise `[time, channel]`.

Raises:
ImportError: If torchcodec is not available.
ValueError: If unsupported parameters are used.
RuntimeError: If TorchCodec fails to decode the audio.

Note:
- TorchCodec always returns normalized float32 samples, so the ``normalize``
parameter has no effect.
- The ``buffer_size`` and ``backend`` parameters are ignored.
- Not all audio formats supported by torchaudio backends may be supported
by TorchCodec.
"""
return load_with_torchcodec(
uri,
frame_offset=frame_offset,
num_frames=num_frames,
normalize=normalize,
channels_first=channels_first,
format=format,
buffer_size=buffer_size,
backend=backend
)

def save(
uri: Union[str, os.PathLike],
src: torch.Tensor,
sample_rate: int,
channels_first: bool = True,
format: Optional[str] = None,
encoding: Optional[str] = None,
bits_per_sample: Optional[int] = None,
buffer_size: int = 4096,
backend: Optional[str] = None,
compression: Optional[Union[float, int]] = None,
) -> None:
"""Save audio data to file using TorchCodec's AudioEncoder.

.. note::

As of TorchAudio 2.9, this function relies on TorchCodec's encoding capabilities under the hood.
It is provided for convenience, but we do recommend that you port your code to
natively use ``torchcodec``'s ``AudioEncoder`` class for better
performance:
https://docs.pytorch.org/torchcodec/stable/generated/torchcodec.encoders.AudioEncoder.
Because of the reliance on Torchcodec, the parameters ``format``, ``encoding``,
``bits_per_sample``, ``buffer_size``, and ``backend``, are ignored and accepted only for
backwards compatibility.

Args:
uri (path-like object):
Path to save the audio file. The file extension determines the format.

src (torch.Tensor):
Audio data to save. Must be a 1D or 2D tensor with float32 values
in the range [-1, 1]. If 2D, shape should be [channel, time] when
channels_first=True, or [time, channel] when channels_first=False.

sample_rate (int):
Sample rate of the audio data.

channels_first (bool, optional):
Indicates whether the input tensor has channels as the first dimension.
If True, expects [channel, time]. If False, expects [time, channel].
Default: True.

format (str or None, optional):
Audio format hint. Not used by TorchCodec (format is determined by
file extension). A warning is issued if provided.
Default: None.

encoding (str or None, optional):
Audio encoding. Not fully supported by TorchCodec AudioEncoder.
A warning is issued if provided. Default: None.

bits_per_sample (int or None, optional):
Bits per sample. Not directly supported by TorchCodec AudioEncoder.
A warning is issued if provided. Default: None.

buffer_size (int, optional):
Not used by TorchCodec AudioEncoder. Provided for API compatibility.
A warning is issued if not default value. Default: 4096.

backend (str or None, optional):
Not used by TorchCodec AudioEncoder. Provided for API compatibility.
A warning is issued if provided. Default: None.

compression (float, int or None, optional):
Compression level or bit rate. Maps to bit_rate parameter in
TorchCodec AudioEncoder. Default: None.

Raises:
ImportError: If torchcodec is not available.
ValueError: If input parameters are invalid.
RuntimeError: If TorchCodec fails to encode the audio.

Note:
- TorchCodec AudioEncoder expects float32 samples in [-1, 1] range.
- Some parameters (format, encoding, bits_per_sample, buffer_size, backend)
are not used by TorchCodec but are provided for API compatibility.
- The output format is determined by the file extension in the uri.
- TorchCodec uses FFmpeg under the hood for encoding.
"""
return save_with_torchcodec(uri, src, sample_rate,
channels_first=channels_first,
format=format,
encoding=encoding,
bits_per_sample=bits_per_sample,
buffer_size=buffer_size,
backend=backend,
compression=compression)

__all__ = [
"AudioMetaData",
"load",
Expand Down
23 changes: 23 additions & 0 deletions test/conftest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
import sys
from pathlib import Path

# Note: [TorchCodec test dependency mocking hack]
# We are adding the `test/` directory to the system path. This causes the
# `tests/torchcodec` folder to be importable, and in particular, this makes it
# possible to mock torchcodec utilities. E.g. executing:
#
# ```
# from torchcodec.decoders import AudioDecoder
# ```
# directly or indirectly when running the tests will effectively be loading the
# mocked `AudioDecoder` implemented in `test/torchcodec/decoders.py`, which
# relies on scipy instead of relying on torchcodec.
#
# So whenever `torchaudio.load()` is called from within the tests, it's the
# mocked scipy `AudioDecoder` that gets used. Ultimately, this allows us *not*
# to add torchcodec as a test dependency of torchaudio: we can just rely on
# scipy.
#
# This is VERY hacky and ideally we should implement a more robust way to mock
# torchcodec.
sys.path.append(str(Path(__file__).parent.resolve()))
Loading
Loading