Skip to content

Commit 8669861

Browse files
committed
Merge branch 'main' of https://github.com/meta-pytorch/torchcodec into python314_on_ci
2 parents 7ecd09d + cb82662 commit 8669861

File tree

6 files changed

+66
-22
lines changed

6 files changed

+66
-22
lines changed

.github/workflows/build_ffmpeg.yaml

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,33 @@ jobs:
4848
mkdir -p "${artifact_dir}"
4949
mv ffmpeg.tar.gz "${artifact_dir}/${FFMPEG_VERSION}.tar.gz"
5050
51+
LGPL-Linux-aarch64:
52+
strategy:
53+
fail-fast: false
54+
matrix:
55+
ffmpeg-version: ["4.4.4", "5.1.4", "6.1.1", "7.0.1", "8.0"]
56+
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
57+
permissions:
58+
id-token: write
59+
contents: read
60+
with:
61+
job-name: Build
62+
upload-artifact: ffmpeg-lgpl-linux_aarch64-${{ matrix.ffmpeg-version }}
63+
repository: meta-pytorch/torchcodec
64+
runner: linux.arm64.2xlarge
65+
docker-image: pytorch/manylinux2_28_aarch64-builder:cpu-aarch64
66+
script: |
67+
export FFMPEG_VERSION="${{ matrix.ffmpeg-version }}"
68+
export FFMPEG_ROOT="${PWD}/ffmpeg"
69+
70+
packaging/build_ffmpeg.sh
71+
72+
tar -cf ffmpeg.tar.gz ffmpeg/include ffmpeg/lib
73+
74+
artifact_dir="${RUNNER_ARTIFACT_DIR}/$(date +%Y-%m-%d)/linux_aarch64"
75+
mkdir -p "${artifact_dir}"
76+
mv ffmpeg.tar.gz "${artifact_dir}/${FFMPEG_VERSION}.tar.gz"
77+
5178
LGPL-macOS:
5279
strategy:
5380
fail-fast: false

docs/requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ sphinx==5.0.0
33
sphinx_design
44
sphinx_copybutton
55
sphinx-tabs
6+
sphinx-sitemap
67
matplotlib
78
torchvision
89
ipython

docs/source/conf.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@
5555
"sphinx_tabs.tabs",
5656
"sphinx_design",
5757
"sphinx_copybutton",
58+
"sphinx_sitemap",
5859
]
5960

6061

@@ -216,6 +217,15 @@ def __call__(self, filename):
216217
"matplotlib": ("https://matplotlib.org/stable/", None),
217218
}
218219

220+
# sitemap config
221+
html_baseurl = "https://meta-pytorch.org/torchcodec/stable/"
222+
sitemap_locales = [None]
223+
sitemap_excludes = [
224+
"search.html",
225+
"genindex.html",
226+
]
227+
sitemap_url_scheme = "{link}"
228+
219229

220230
def inject_minigalleries(app, what, name, obj, options, lines):
221231
"""Inject a minigallery into a docstring.

examples/decoding/approximate_mode.py

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@
6666
# Performance: ``VideoDecoder`` creation
6767
# --------------------------------------
6868
#
69-
# In terms of performance, the ``seek_mode`` parameter ultimately affects the
69+
# In terms of performance, the ``seek_mode`` parameter mainly affects the
7070
# **creation** of a :class:`~torchcodec.decoders.VideoDecoder` object. The
7171
# longer the video, the higher the performance gain.
7272

@@ -104,7 +104,7 @@ def bench(f, average_over=50, warmup=2, **f_kwargs):
104104
# ---------------------------------------------
105105
#
106106
# Strictly speaking the ``seek_mode`` parameter only affects the performance of
107-
# the :class:`~torchcodec.decoders.VideoDecoder` creation. It does not have a
107+
# the :class:`~torchcodec.decoders.VideoDecoder` creation. It usually does not have a
108108
# direct effect on the performance of frame decoding or sampling. **However**,
109109
# because frame decoding and sampling patterns typically involve the creation of
110110
# the :class:`~torchcodec.decoders.VideoDecoder` (one per video), ``seek_mode``
@@ -168,20 +168,21 @@ def sample_clips(seek_mode):
168168
# duration), and also builds an internal index of frames and key-frames. This
169169
# internal index is potentially more accurate than the one in the file's
170170
# headers, which leads to more accurate seeking behavior.
171-
# Without the scan, TorchCodec relies only on the metadata contained in the
172-
# file, which may not always be as accurate.
171+
# Without the scan (in approximate mode), TorchCodec relies only on the metadata
172+
# contained in the file, which may not always be as accurate. In some rare
173+
# cases, relying on this less accurate data may also lead to slower frame
174+
# decoding, because it can involve unnecessary seeks.
173175
#
174176
# Which mode should I use?
175177
# ------------------------
176178
#
177179
# The general rule of thumb is as follows:
178180
#
179181
# - If you really care about exactness of frame seeking, use "exact".
180-
# - If you can sacrifice exactness of seeking for speed, which is usually the
181-
# case when doing clip sampling, use "approximate".
182-
# - If your videos don't have variable framerate and their metadata is correct,
183-
# then "approximate" mode is a net win: it will be just as accurate as the
184-
# "exact" mode while still being significantly faster.
182+
# - If your videos are short (less then a few minutes) then "exact" will usually
183+
# be preferable, as the scan's fixed cost will be negligible.
184+
# - For long videos, if you can sacrifice exactness of seeking for speed, which
185+
# is usually the case when doing clip sampling, consider using "approximate".
185186

186187
# %%
187188
shutil.rmtree(temp_dir)

src/torchcodec/_core/SingleStreamDecoder.cpp

Lines changed: 17 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1092,13 +1092,6 @@ bool SingleStreamDecoder::canWeAvoidSeeking() const {
10921092
// Returns true if we can avoid seeking in the AVFormatContext based on
10931093
// heuristics that rely on the target cursor_ and the last decoded frame.
10941094
// Seeking is expensive, so we try to avoid it when possible.
1095-
// Note that this function itself isn't always that cheap to call: in
1096-
// particular the calls to getKeyFrameIndexForPts below in approximate mode
1097-
// are sometimes slow.
1098-
// TODO we should understand why (is it because it reads the file?) and
1099-
// potentially optimize it. E.g. we may not want to ever seek, or even *check*
1100-
// if we need to seek in some cases, like if we're going to decode 80% of the
1101-
// frames anyway.
11021095
const StreamInfo& streamInfo = streamInfos_.at(activeStreamIndex_);
11031096
if (streamInfo.avMediaType == AVMEDIA_TYPE_AUDIO) {
11041097
// For audio, we only need to seek if a backwards seek was requested
@@ -1145,10 +1138,10 @@ bool SingleStreamDecoder::canWeAvoidSeeking() const {
11451138
// I P P P I P P P I P P I P
11461139
// x j y
11471140
// (2) is only more efficient than (1) if there is an I frame between x and y.
1148-
int lastKeyFrameIndex = getKeyFrameIndexForPts(lastDecodedAvFramePts_);
1149-
int targetKeyFrameIndex = getKeyFrameIndexForPts(cursor_);
1150-
return lastKeyFrameIndex >= 0 && targetKeyFrameIndex >= 0 &&
1151-
lastKeyFrameIndex == targetKeyFrameIndex;
1141+
int lastKeyFrame = getKeyFrameIdentifier(lastDecodedAvFramePts_);
1142+
int targetKeyFrame = getKeyFrameIdentifier(cursor_);
1143+
return lastKeyFrame >= 0 && targetKeyFrame >= 0 &&
1144+
lastKeyFrame == targetKeyFrame;
11521145
}
11531146

11541147
// This method looks at currentPts and desiredPts and seeks in the
@@ -1365,7 +1358,19 @@ torch::Tensor SingleStreamDecoder::maybePermuteHWC2CHW(
13651358
// PTS <-> INDEX CONVERSIONS
13661359
// --------------------------------------------------------------------------
13671360

1368-
int SingleStreamDecoder::getKeyFrameIndexForPts(int64_t pts) const {
1361+
int SingleStreamDecoder::getKeyFrameIdentifier(int64_t pts) const {
1362+
// This function "identifies" a key frame for a given pts value.
1363+
// We use the term "identifier" rather than "index" because the nature of the
1364+
// index that is returned depends on various factors:
1365+
// - If seek_mode is exact, we return the index of the key frame in the
1366+
// scanned key-frame vector (streamInfo.keyFrames). So the returned value is
1367+
// in [0, num_key_frames).
1368+
// - If seek_mode is approximate, we use av_index_search_timestamp() which
1369+
// may return a value in [0, num_key_frames) like for mkv, but also a value
1370+
// in [0, num_frames) like for mp4. It really depends on the container.
1371+
//
1372+
// The range of the "identifier" doesn't matter that much, for now we only
1373+
// use it to uniquely identify a key frame in canWeAvoidSeeking().
13691374
const StreamInfo& streamInfo = streamInfos_.at(activeStreamIndex_);
13701375
if (streamInfo.keyFrames.empty()) {
13711376
return av_index_search_timestamp(

src/torchcodec/_core/SingleStreamDecoder.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -282,7 +282,7 @@ class SingleStreamDecoder {
282282
// PTS <-> INDEX CONVERSIONS
283283
// --------------------------------------------------------------------------
284284

285-
int getKeyFrameIndexForPts(int64_t pts) const;
285+
int getKeyFrameIdentifier(int64_t pts) const;
286286

287287
// Returns the key frame index of the presentation timestamp using our index.
288288
// We build this index by scanning the file in

0 commit comments

Comments
 (0)