BETA CUDA interface: support for approximate mode and time-based APIs #917

NicolasHug · 2025-10-02T17:18:46Z

This PR:

(Is a lot simpler than it seems, 80% of it are just comments and tests)
Adds support for approximate mode
Adds support for time-based APIs.
Drastically simplifies the logic of the BETA CUDA interface. We now rely on the NVCUVID callback which tells us when a frame is ready in display order:
- We don't have to solve the frame reordering problem anymore, the callback is triggered in the proper order.
- It correctly assigns the frame's PTS without us having to do any guess.

If we weren't relying on the NVCUVID callback, then we would have to solve both problems above ourselves, with codec-specific solutions. As a resut this PR also drastically simplifies future support for additional codecs - spoiler, I already added #919 and #920 for HEVC and AV1.

In #910, I described this design alternative and at the time, I thought it wasn't compatible enough with our sendPacket() / receiveFrame() architecture. With #910 now merged as a minimal clean-ish skeleton of the interface, I can reason about this more clearly. And after spending a few days trying (and failing) to solve the frame-reordering problem for H264 only, I came to the conclusion that this solution, in this PR, is well worth it.

This new simplified design does come with a minor trade-off. I explain it in a note, in the code.

Why is approximate mode and time-based APIs now supported? Let's first answer: why was approximate mode and time-based APIs not supported before? It was because receiveFrame(avFrame, desiredPts) was only able to return a frame if we were able to find one with the exact desiredPts. On approximate mode, we can't guarantee that desiredPts corresponds to a frame's pts, so there generally can't be a match. Same with time-based APIs: desiredPts may not correspond to where a frame starts.

In this PR, we don't need that exact desiredPts matching logic anymore. But we can still guarantee that receiveFrame returns frames in display order, so we got approximate mode and time-based support for free.

…ptional

…k-frame-ordering

scotts · 2025-10-03T15:44:18Z

src/torchcodec/_core/BetaCudaDeviceInterface.cpp

+
+static int CUDAAPI
+pfnDisplayPictureCallback(void* pUserData, CUVIDPARSERDISPINFO* dispInfo) {
+  BetaCudaDeviceInterface* decoder =


Nit: I prefer auto when the expression on the right has the literal type we're getting on the left.

scotts · 2025-10-03T16:08:26Z

src/torchcodec/_core/BetaCudaDeviceInterface.cpp

  parserParams.pfnSequenceCallback = pfnSequenceCallback;
  parserParams.pfnDecodePicture = pfnDecodePictureCallback;
-  parserParams.pfnDisplayPicture = nullptr;
+  parserParams.pfnDisplayPicture = pfnDisplayPictureCallback;


This is the key difference, correct? That is, by registering this callback, we get the new behavior and can delete all of the relevant code?

yes that's correct

scotts

Amazing improvement! :)

…k-frame-ordering

Dan-Flores · 2025-10-07T04:14:12Z

src/torchcodec/_core/BetaCudaDeviceInterface.cpp

+int BetaCudaDeviceInterface::frameReadyInDisplayOrder(
+    CUVIDPARSERDISPINFO* dispInfo) {
+  readyFrames_.push(*dispInfo);
+  return 1; // success


To clarify for my understanding, when the frameReadyInDisplayOrder callback is triggered, the parser has written the one frame that is next according to PTS in CUVIDPARSERDISPINFO?

Are the function signatures for this and other callbacks defined somwhere in documentation?

Yes, your understanding is correct! The CUVIDPARSERDISPINFO struct contains two key fields:

torchcodec/src/torchcodec/_core/nvcuvid_include/nvcuvid.h

Lines 501 to 509 in 6377dfc

typedef struct _CUVIDPARSERDISPINFO {

int picture_index; /**< OUT: Index of the current picture */

int progressive_frame; /**< OUT: 1 if progressive frame; 0 otherwise */

int top_field_first; /**< OUT: 1 if top field is displayed first; 0 otherwise

*/

int repeat_first_field; /**< OUT: Number of additional fields (1=ivtc, 2=frame

doubling, 4=frame tripling, -1=unpaired field) */

CUvideotimestamp timestamp; /**< OUT: Presentation time stamp */

} CUVIDPARSERDISPINFO;

timestamp, which is the pts of the frame

the picture_index field which uniquely identifies the frame. That's not the "frame index" as we have it in torchcodec, it's just an index internal to nvdec. That's what we then use here to "map" the frame:

torchcodec/src/torchcodec/_core/BetaCudaDeviceInterface.cpp

Lines 512 to 513 in 6377dfc

CUresult result = cuvidMapVideoFrame(

*decoder_.get(), dispInfo.picture_index, &framePtr, &pitch, &procParams);

Are the function signatures for this and other callbacks defined somwhere in documentation?

Not in the docs, but in the headers:

torchcodec/src/torchcodec/_core/nvcuvid_include/nvcuvid.h

Lines 529 to 533 in 6377dfc

typedef int(CUDAAPI* PFNVIDSEQUENCECALLBACK)(void*, CUVIDEOFORMAT*);

typedef int(CUDAAPI* PFNVIDDECODECALLBACK)(void*, CUVIDPICPARAMS*);

typedef int(CUDAAPI* PFNVIDDISPLAYCALLBACK)(void*, CUVIDPARSERDISPINFO*);

typedef int(CUDAAPI* PFNVIDOPPOINTCALLBACK)(void*, CUVIDOPERATINGPOINTINFO*);

typedef int(CUDAAPI* PFNVIDSEIMSGCALLBACK)(void*, CUVIDSEIMESSAGEINFO*);

Strictly speaking, this is the callaback we're defining:

torchcodec/src/torchcodec/_core/BetaCudaDeviceInterface.cpp

Lines 49 to 53 in 6377dfc

static int CUDAAPI

pfnDisplayPictureCallback(void* pUserData, CUVIDPARSERDISPINFO* dispInfo) {

auto decoder = static_cast<BetaCudaDeviceInterface*>(pUserData);

return decoder->frameReadyInDisplayOrder(dispInfo);

}

It's a pure C function that calls the corresponding method on the Interface object. We have to do this gymnastic because the pure C callbacks have no notion of the Interface object.

NicolasHug added 29 commits September 25, 2025 18:11

Let's just commit 3k loc in a single commit

78ab058

Fixes

b45decc

Merge branch 'main' of github.com:pytorch/torchcodec into aeaenjfjanef

316f218

GetCache -> getCache

d0192ec

Make UniqueCUvideodecoder a pointer on CUvideodecoder, not void

515deb5

Make device and device_variant have a default instead of being std::o…

13fad10

…ptional

Remove old registerDeviceInterface

eb8de72

Call std::memset

4f7a4fb

remove unnecessary cuda_runtime.h include, update cmake accordingly

dcf3124

abstract frameBuffer_ into a FrameBuffer class

0ad7370

Cleanup BSF logic

aad142e

Return int in callback instead of unsigned char

2592888

define width and height as unsigned int

b5fe9bc

Rework frame ordering and pts matching

5605c90

Merge branch 'main' of github.com:pytorch/torchcodec into aeaenjfjanef

7494259

Fix cuda context initialization

560b376

Merge branch 'aeaenjfjanef' into nvdec-rework-frame-ordering

88196c5

Renaming

2a78b84

Comment

5d194e5

Merge branch 'main' of github.com:pytorch/torchcodec into aeaenjfjanef

d1e51b3

Skip equality check on ffmepg 4

f9c7297

Merge branch 'aeaenjfjanef' into nvdec-rework-frame-ordering

b7bbfb2

Refac, simplify

390fd7c

Update comment

f55dcc0

Define constant, add TODO for AVRational

7e4dd10

Use uint32_t types

f614846

Create packet.reset() and add P0 TODO

aa6e253

Add TODO

186eaa4

Merge branch 'aeaenjfjanef' into nvdec-rework-frame-ordering

1cb4890

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 2, 2025

NicolasHug added 2 commits October 2, 2025 18:37

Merge branch 'main' of github.com:pytorch/torchcodec into nvdec-rewor…

c5b32a4

…k-frame-ordering

lint

70873bf

NicolasHug changed the title ~~[WIP] BETA CUDA interface: support for approximate mode, time-based APIs~~ BETA CUDA interface: support for approximate mode, time-based APIs Oct 2, 2025

NicolasHug changed the title ~~BETA CUDA interface: support for approximate mode, time-based APIs~~ BETA CUDA interface: support for approximate mode and time-based APIs Oct 2, 2025

NicolasHug marked this pull request as ready for review October 2, 2025 17:44

This was referenced Oct 3, 2025

BETA CUDA interface: Add TODOs and more explicit initialization #918

Merged

BETA CUDA interface: H265 support #919

Merged

scotts reviewed Oct 3, 2025

View reviewed changes

scotts approved these changes Oct 3, 2025

View reviewed changes

NicolasHug added 2 commits October 3, 2025 18:26

Merge branch 'main' of github.com:pytorch/torchcodec into nvdec-rewor…

799f1dd

…k-frame-ordering

Use auto

8cc80e5

NicolasHug merged commit 6d72f11 into meta-pytorch:main Oct 3, 2025
49 of 50 checks passed

NicolasHug deleted the nvdec-rework-frame-ordering branch October 3, 2025 19:53

Dan-Flores reviewed Oct 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BETA CUDA interface: support for approximate mode and time-based APIs #917

BETA CUDA interface: support for approximate mode and time-based APIs #917

NicolasHug commented Oct 2, 2025 •

edited

Loading

Uh oh!

scotts Oct 3, 2025

Uh oh!

scotts Oct 3, 2025

Uh oh!

NicolasHug Oct 3, 2025

Uh oh!

scotts left a comment

Uh oh!

Uh oh!

Dan-Flores Oct 7, 2025

Uh oh!

NicolasHug Oct 7, 2025

Uh oh!

Uh oh!

	typedef struct _CUVIDPARSERDISPINFO {
	int picture_index; /*< OUT: Index of the current picture /
	int progressive_frame; /*< OUT: 1 if progressive frame; 0 otherwise /
	int top_field_first; /**< OUT: 1 if top field is displayed first; 0 otherwise
	*/
	int repeat_first_field; /**< OUT: Number of additional fields (1=ivtc, 2=frame
	doubling, 4=frame tripling, -1=unpaired field) */
	CUvideotimestamp timestamp; /*< OUT: Presentation time stamp /
	} CUVIDPARSERDISPINFO;

	CUresult result = cuvidMapVideoFrame(
	*decoder_.get(), dispInfo.picture_index, &framePtr, &pitch, &procParams);

	typedef int(CUDAAPI* PFNVIDSEQUENCECALLBACK)(void, CUVIDEOFORMAT);
	typedef int(CUDAAPI* PFNVIDDECODECALLBACK)(void, CUVIDPICPARAMS);
	typedef int(CUDAAPI* PFNVIDDISPLAYCALLBACK)(void, CUVIDPARSERDISPINFO);
	typedef int(CUDAAPI* PFNVIDOPPOINTCALLBACK)(void, CUVIDOPERATINGPOINTINFO);
	typedef int(CUDAAPI* PFNVIDSEIMSGCALLBACK)(void, CUVIDSEIMESSAGEINFO);

	static int CUDAAPI
	pfnDisplayPictureCallback(void* pUserData, CUVIDPARSERDISPINFO* dispInfo) {
	auto decoder = static_cast<BetaCudaDeviceInterface*>(pUserData);
	return decoder->frameReadyInDisplayOrder(dispInfo);
	}

BETA CUDA interface: support for approximate mode and time-based APIs #917

BETA CUDA interface: support for approximate mode and time-based APIs #917

Conversation

NicolasHug commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scotts Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

scotts Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

NicolasHug Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

scotts left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Dan-Flores Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

NicolasHug Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NicolasHug commented Oct 2, 2025 •

edited

Loading