Transforms tutorial #1123

scotts · 2025-12-11T04:45:35Z

First draft of the transform tutorial. Things to consider:

The order in which things are presented is, to me, a natural teaching order. But the actual thing we're demonstrating comes a quarter of the way down the page! Does that seem okay when read, or should we try to find a way to pull the transforms usage in VideoDecoder up higher?
I just copied the guarantees that are part of the DecoderTransform docstring because I feel that information is critical, and I didn't see a point in trying to rephrase it.
The second "Note," after those guarantees, has a possibility of being confusing. I think we have to say something on this, but we've made this difficult to talk about because we accept TorchVision transform objects. So in writing, it's hard to distinguish between accepting TorchVision transform objects and applying TorchVision transform objects without getting super wordy. Let me know if you find it potentially confusing.
I am underwhelmed by the lack of demo for memory efficiency, but I don't have a way around it. And I think it needs to be said.
The runtime guidance is subtle. Too subtle?

Once we align on 4 and 5, we should also update the performance tutorial. I think that should be a separate PR.

…tutorial

examples/decoding/transforms.py

NicolasHug · 2025-12-11T11:54:42Z

examples/decoding/transforms.py

+        v2.Resize(size=(480, 640)),
+        v2.CenterCrop(size=(315, 220))


It usually makes more sense to first crop and then resize, because resize will then work on a smaller surface.

Indeed it does, and curiously, it actually makes decoder transforms faster than the TorchVision version now (at least on my dev machine).

Results with the old way:

0: decoder transforms: times_med = 1474.17ms +- 79.85 torchvision transform: times_med = 4683.55ms +- 28.71 1: decoder transforms: times_med = 18486.50ms +- 165.66 torchvision transform: times_med = 16066.02ms +- 164.19

Results with the new way:

0: decoder transforms: times_med = 1352.46ms +- 34.86 torchvision transform: times_med = 4077.44ms +- 45.63 1: decoder transforms: times_med = 14771.99ms +- 148.83 torchvision transform: times_med = 16112.88ms +- 62.15

NicolasHug · 2025-12-11T12:26:26Z

examples/decoding/transforms.py

+# particularly when applying transforms that reduce the size of a frame, such
+# as resize and crop. Because the transforms are applied during decoding, the
+# full frame is never returned to the Python layer. As a result, there is
+# significantly less pressure on the Python gargabe collector.


I think there's another core reason why that's more memory efficient: the decompressed RGB frame is never materialized in its original resolution.

Without decoder-native transform we have:

YUV compressed frame in original res -> RGB decompressed frame in original res -> RGB decompressed frame in final (smaller) res

WIth the decoder-native transform we have:

YUV compressed frame in original res -> RGB decompressed frame in final (smaller) res

i.e. we can skip the "RGB decompressed frame in original res" materialization, which is the most memory-expensive bit.

The garbage collector being less pressure is a consequence of that.

That's not entirely accurate - we definitely never get the "RGB decompressed frame in original res" in the Python layer, but it exists in FFmpeg. This is because we ensure that the FFmpeg filters get applied in the output color space. So without decoder transforms we have (parenthesis to indicate where it happens, TC or TV):

YUV compressed, original res (TC) -> RGB decompressed , original res (TC) -> RGB decompressed, smaller res (TV)

With decoder transforms it's:

YUV compressed, original res (TC) -> RGB decompressed, original res (TC) -> RGB decompressed, smaller res (TC)

So we really do go through the same steps in decoder transforms. That middle step - getting the RGB image in the original resolution - is because of this line:

torchcodec/src/torchcodec/_core/CpuDeviceInterface.cpp

Line 84 in ee8ce04

filters_ = "format=rgb24," + filters.str();

Eliminating the explicit "format=rgb24" does improve performance a lot, but at the cost of similarity with using TorchVision transforms on full frames.

Since the filtergraph inputs and outputs are known statically, I suspect they're able to optimize things and reuse memory. That is, it's possible for them to allocate exactly the memory they need for each step and reuse it every time. But I don't know that's the case. I'll try to say something about all this.

examples/decoding/transforms.py

Dan-Flores · 2025-12-12T15:18:12Z

Let's add a grid card item so this tutorial appears on the "Home" tab / index page:

torchcodec/docs/source/index.rst

Lines 87 to 93 in 6300361

    
           .. grid-item-card:: :octicon:`file-code;1em` 
        
              Performance Tips 
        
              :img-top: _static/img/card-background.svg 
        
              :link: generated_examples/decoding/performance_tips.html 
        
              :link-type: url 
        
              Tips for optimizing video decoding performance

Dan-Flores · 2025-12-12T15:18:32Z

examples/decoding/transforms.py

+the :class:`~torchcodec.decoders.VideoDecoder` class. This parameter allows us
+to specify a list of :class:`torchcodec.transforms.DecoderTransform` or
+:class:`torchvision.transforms.v2.Transform` objects. These objects serve as
+transform specificiations that the :class:`~torchcodec.decoders.VideoDecoder`


nit: specifications

I should probably start asking Claude to do spell check on the comments. 🤔

Dan-Flores · 2025-12-12T15:19:03Z

examples/decoding/transforms.py

+"""
+
+# %%
+# First, a bit of boilerplate and definitions that we will use later:


Regarding point 1 in the PR description about the demonstration starting a quarter down the page - we have a pattern of having a link to skip past the boiler plate section, that might help this gap feel smaller.

torchcodec/examples/decoding/sampling.py

Lines 17 to 20 in 6300361

# %%

# First, a bit of boilerplate: we'll download a video from the web, and define a

# plotting utility. You can ignore that part and jump right below to

# :ref:`sampling_tuto_start`.

examples/decoding/transforms.py

mollyxu · 2025-12-12T18:22:19Z

examples/decoding/transforms.py

+print(f"torchvision transform: {bench(sample_torchvision_transforms, num_threads=1)}")
+
+# %%
+# In brief, our performance guidance is:


Would it be worth mentioning decoder native transforms in the performance tips docs?

@mollyxu, yes, absolutely. I'd like to do that in a follow-up PR.

examples/decoding/transforms.py

scotts added 4 commits December 5, 2025 16:58

Initial sketch

138e22e

Commit to move on

408ac57

Merge branch 'main' of github.com:pytorch/torchcodec into transforms_…

41e3343

…tutorial

First draft

2a16bbd

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 11, 2025

Lint

ba993ad

scotts marked this pull request as ready for review December 11, 2025 04:50

NicolasHug reviewed Dec 11, 2025

View reviewed changes

Apply edits from review

1a37b74

Dan-Flores reviewed Dec 12, 2025

View reviewed changes

Address more review comments

ade732c

mollyxu reviewed Dec 12, 2025

View reviewed changes

examples/decoding/transforms.py Show resolved Hide resolved

mollyxu reviewed Dec 12, 2025

View reviewed changes

examples/decoding/transforms.py Show resolved Hide resolved

More clarifications

8ea5384

	# %%
	# First, a bit of boilerplate: we'll download a video from the web, and define a
	# plotting utility. You can ignore that part and jump right below to
	# :ref:`sampling_tuto_start`.

Transforms tutorial #1123

Are you sure you want to change the base?

Transforms tutorial #1123

Uh oh!

Conversation

scotts commented Dec 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scotts Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Dan-Flores commented Dec 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

scotts Dec 12, 2025 •

edited

Loading