-
Notifications
You must be signed in to change notification settings - Fork 77
Add performance tips tutorial #1065
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 7 commits
304fdf9
5693776
e8b2a73
7ac0d2f
a74f653
2286285
547d8e5
cc737b1
9e0f33a
b32e6f3
cf5b718
6e69c8c
5ac8321
e97490e
f353758
6a05947
52ea290
8b75eac
14ad6c7
f9e0bd1
bddfa7c
0e52bb6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,175 @@ | ||||||||||||||||||||||||||||||||||||||
| # Copyright (c) Meta Platforms, Inc. and affiliates. | ||||||||||||||||||||||||||||||||||||||
| # All rights reserved. | ||||||||||||||||||||||||||||||||||||||
| # | ||||||||||||||||||||||||||||||||||||||
| # This source code is licensed under the BSD-style license found in the | ||||||||||||||||||||||||||||||||||||||
| # LICENSE file in the root directory of this source tree. | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||||||||
| ==================================== | ||||||||||||||||||||||||||||||||||||||
| Performance Tips and Best Practices | ||||||||||||||||||||||||||||||||||||||
| ==================================== | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| This tutorial consolidates performance optimization techniques for video | ||||||||||||||||||||||||||||||||||||||
| decoding with TorchCodec. Learn when and how to apply various strategies | ||||||||||||||||||||||||||||||||||||||
| to increase performance. | ||||||||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| # %% | ||||||||||||||||||||||||||||||||||||||
| # Overview | ||||||||||||||||||||||||||||||||||||||
| # -------- | ||||||||||||||||||||||||||||||||||||||
| # | ||||||||||||||||||||||||||||||||||||||
| # When decoding videos with TorchCodec, several techniques can significantly | ||||||||||||||||||||||||||||||||||||||
| # improve performance depending on your use case. This guide covers: | ||||||||||||||||||||||||||||||||||||||
| # | ||||||||||||||||||||||||||||||||||||||
| # 1. **Batch APIs** - Decode multiple frames at once | ||||||||||||||||||||||||||||||||||||||
| # 2. **Approximate Mode & Keyframe Mappings** - Trade accuracy for speed | ||||||||||||||||||||||||||||||||||||||
| # 3. **Multi-threading** - Parallelize decoding across videos or chunks | ||||||||||||||||||||||||||||||||||||||
| # 4. **CUDA Acceleration** - Use GPU decoding for supported formats | ||||||||||||||||||||||||||||||||||||||
| # | ||||||||||||||||||||||||||||||||||||||
| # We'll explore each technique and when to use it. | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| # %% | ||||||||||||||||||||||||||||||||||||||
| # 1. Use Batch APIs When Possible | ||||||||||||||||||||||||||||||||||||||
| # -------------------------------- | ||||||||||||||||||||||||||||||||||||||
| # | ||||||||||||||||||||||||||||||||||||||
| # If you need to decode multiple frames at once, the batch methods are faster than calling single-frame decoding methods multiple times. | ||||||||||||||||||||||||||||||||||||||
| # For example, :meth:`~torchcodec.decoders.VideoDecoder.get_frames_at` is faster than calling :meth:`~torchcodec.decoders.VideoDecoder.get_frame_at` multiple times. | ||||||||||||||||||||||||||||||||||||||
| # TorchCodec's batch APIs reduce overhead and can leverage internal optimizations. | ||||||||||||||||||||||||||||||||||||||
| # | ||||||||||||||||||||||||||||||||||||||
| # **Key Methods:** | ||||||||||||||||||||||||||||||||||||||
| # | ||||||||||||||||||||||||||||||||||||||
| # For index-based frame retrieval: | ||||||||||||||||||||||||||||||||||||||
| # | ||||||||||||||||||||||||||||||||||||||
| # - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_at` for specific indices | ||||||||||||||||||||||||||||||||||||||
| # - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_in_range` for ranges | ||||||||||||||||||||||||||||||||||||||
| # | ||||||||||||||||||||||||||||||||||||||
| # For timestamp-based frame retrieval: | ||||||||||||||||||||||||||||||||||||||
| # | ||||||||||||||||||||||||||||||||||||||
| # - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_played_at` for timestamps | ||||||||||||||||||||||||||||||||||||||
| # - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_played_in_range` for time ranges | ||||||||||||||||||||||||||||||||||||||
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||||||||||||||||||
| # | ||||||||||||||||||||||||||||||||||||||
| # **When to use:** | ||||||||||||||||||||||||||||||||||||||
| # | ||||||||||||||||||||||||||||||||||||||
| # - Decoding multiple frames | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| # %% | ||||||||||||||||||||||||||||||||||||||
| # .. note:: | ||||||||||||||||||||||||||||||||||||||
| # | ||||||||||||||||||||||||||||||||||||||
| # For complete examples with runnable code demonstrating batch decoding, | ||||||||||||||||||||||||||||||||||||||
| # iteration, and frame retrieval, see: | ||||||||||||||||||||||||||||||||||||||
| # | ||||||||||||||||||||||||||||||||||||||
| # - :ref:`sphx_glr_generated_examples_decoding_basic_example.py` | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
| # iteration, and frame retrieval, see: | |
| # | |
| # - :ref:`sphx_glr_generated_examples_decoding_basic_example.py` | |
| # iteration, and frame retrieval, see :ref:`sphx_glr_generated_examples_decoding_basic_example.py` |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure "random access" is really relevant here. For perf, this is less about frame access than it is about decoder initialization
| # **Performance impact:** Enables consistent, predictable performance for repeated | |
| # random access without the overhead of exact mode's scanning. | |
| # **Performance impact:** speeds up decoder instantiation, similarly to ``seek_mode="approximate"``. |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # - **FFmpeg-based parallelism** - Using FFmpeg's internal threading capabilities for intra-frame parallelism, where parallelization happens within individual frames rather than across frames | |
| # - **FFmpeg-based parallelism** - Using FFmpeg's internal threading capabilities for intra-frame parallelism, where parallelization happens within individual frames rather than across frames. For that, use the `num_ffmpeg_threads` parameter of the :class:`~torchcodec.decoders.VideoDecoder` |
mollyxu marked this conversation as resolved.
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove these two. The first one isn't very precise, and the second one is redundant with another entry just above.
| # - GPU-intensive pipelines with transforms like scaling and cropping | |
| # - CPU is saturated and you want to free it up for other work |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # - You need bit-exact results | |
| # - You need bit-exact results with CPU decoding |
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "transforms" stuff is slightly misleading and might become even more misleading soon when we actually release native transforms - but they'll be CPU-only for a bit.
| # **Performance impact:** CUDA decoding can significantly outperform CPU decoding, | |
| # especially for high-resolution videos and when combined with GPU-based transforms. | |
| # **Performance impact:** CUDA decoding can significantly outperform CPU decoding, | |
| # especially for high-resolution videos and when decoding a lot of frames. |
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this section above, it should be the first one we see (before the "when to use" section"). Let's also make it slightly more obvious:
| # %% | |
| # **Recommended Usage for Beta Interface** | |
| # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
| # | |
| # .. code-block:: python | |
| # | |
| # with set_cuda_backend("beta"): | |
| # decoder = VideoDecoder("file.mp4", device="cuda") | |
| # %% | |
| # **Recommended: use the Beta Interface!!** | |
| # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
| # | |
| # We recommend you use the new "beta" CUDA interface which is significantly faster than the previous one, and supports the same features: | |
| # | |
| # .. code-block:: python | |
| # | |
| # with set_cuda_backend("beta"): | |
| # decoder = VideoDecoder("file.mp4", device="cuda") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a more descriptive title would help with discoverability : TorchCodec Performance Tips and Best Practices.
Also adding meta directive at the top should help as well. SOmething like: