-
Notifications
You must be signed in to change notification settings - Fork 62
BETA CUDA interface: support for approximate mode and time-based APIs #917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BETA CUDA interface: support for approximate mode and time-based APIs #917
Conversation
|
||
static int CUDAAPI | ||
pfnDisplayPictureCallback(void* pUserData, CUVIDPARSERDISPINFO* dispInfo) { | ||
BetaCudaDeviceInterface* decoder = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I prefer auto
when the expression on the right has the literal type we're getting on the left.
parserParams.pfnSequenceCallback = pfnSequenceCallback; | ||
parserParams.pfnDecodePicture = pfnDecodePictureCallback; | ||
parserParams.pfnDisplayPicture = nullptr; | ||
parserParams.pfnDisplayPicture = pfnDisplayPictureCallback; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the key difference, correct? That is, by registering this callback, we get the new behavior and can delete all of the relevant code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that's correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing improvement! :)
int BetaCudaDeviceInterface::frameReadyInDisplayOrder( | ||
CUVIDPARSERDISPINFO* dispInfo) { | ||
readyFrames_.push(*dispInfo); | ||
return 1; // success |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify for my understanding, when the frameReadyInDisplayOrder
callback is triggered, the parser has written the one frame that is next according to PTS in CUVIDPARSERDISPINFO
?
Are the function signatures for this and other callbacks defined somwhere in documentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, your understanding is correct! The CUVIDPARSERDISPINFO
struct contains two key fields:
torchcodec/src/torchcodec/_core/nvcuvid_include/nvcuvid.h
Lines 501 to 509 in 6377dfc
typedef struct _CUVIDPARSERDISPINFO { | |
int picture_index; /**< OUT: Index of the current picture */ | |
int progressive_frame; /**< OUT: 1 if progressive frame; 0 otherwise */ | |
int top_field_first; /**< OUT: 1 if top field is displayed first; 0 otherwise | |
*/ | |
int repeat_first_field; /**< OUT: Number of additional fields (1=ivtc, 2=frame | |
doubling, 4=frame tripling, -1=unpaired field) */ | |
CUvideotimestamp timestamp; /**< OUT: Presentation time stamp */ | |
} CUVIDPARSERDISPINFO; |
timestamp
, which is the pts of the frame- the
picture_index
field which uniquely identifies the frame. That's not the "frame index" as we have it in torchcodec, it's just an index internal to nvdec. That's what we then use here to "map" the frame:
torchcodec/src/torchcodec/_core/BetaCudaDeviceInterface.cpp
Lines 512 to 513 in 6377dfc
CUresult result = cuvidMapVideoFrame( | |
*decoder_.get(), dispInfo.picture_index, &framePtr, &pitch, &procParams); |
Are the function signatures for this and other callbacks defined somwhere in documentation?
Not in the docs, but in the headers:
torchcodec/src/torchcodec/_core/nvcuvid_include/nvcuvid.h
Lines 529 to 533 in 6377dfc
typedef int(CUDAAPI* PFNVIDSEQUENCECALLBACK)(void*, CUVIDEOFORMAT*); | |
typedef int(CUDAAPI* PFNVIDDECODECALLBACK)(void*, CUVIDPICPARAMS*); | |
typedef int(CUDAAPI* PFNVIDDISPLAYCALLBACK)(void*, CUVIDPARSERDISPINFO*); | |
typedef int(CUDAAPI* PFNVIDOPPOINTCALLBACK)(void*, CUVIDOPERATINGPOINTINFO*); | |
typedef int(CUDAAPI* PFNVIDSEIMSGCALLBACK)(void*, CUVIDSEIMESSAGEINFO*); |
Strictly speaking, this is the callaback we're defining:
torchcodec/src/torchcodec/_core/BetaCudaDeviceInterface.cpp
Lines 49 to 53 in 6377dfc
static int CUDAAPI | |
pfnDisplayPictureCallback(void* pUserData, CUVIDPARSERDISPINFO* dispInfo) { | |
auto decoder = static_cast<BetaCudaDeviceInterface*>(pUserData); | |
return decoder->frameReadyInDisplayOrder(dispInfo); | |
} |
It's a pure C function that calls the corresponding method on the Interface object. We have to do this gymnastic because the pure C callbacks have no notion of the Interface object.
This PR:
If we weren't relying on the NVCUVID callback, then we would have to solve both problems above ourselves, with codec-specific solutions. As a resut this PR also drastically simplifies future support for additional codecs - spoiler, I already added #919 and #920 for HEVC and AV1.
In #910, I described this design alternative and at the time, I thought it wasn't compatible enough with our
sendPacket() / receiveFrame()
architecture. With #910 now merged as a minimal clean-ish skeleton of the interface, I can reason about this more clearly. And after spending a few days trying (and failing) to solve the frame-reordering problem for H264 only, I came to the conclusion that this solution, in this PR, is well worth it.This new simplified design does come with a minor trade-off. I explain it in a note, in the code.
Why is approximate mode and time-based APIs now supported? Let's first answer: why was approximate mode and time-based APIs not supported before? It was because
receiveFrame(avFrame, desiredPts)
was only able to return a frame if we were able to find one with the exactdesiredPts
. On approximate mode, we can't guarantee that desiredPts corresponds to a frame's pts, so there generally can't be a match. Same with time-based APIs: desiredPts may not correspond to where a frame starts.In this PR, we don't need that exact
desiredPts
matching logic anymore. But we can still guarantee thatreceiveFrame
returns frames in display order, so we got approximate mode and time-based support for free.