-
Notifications
You must be signed in to change notification settings - Fork 73
Enable CUDA device for video encoder #1008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
f1678b7 to
997158f
Compare
0ff2701 to
0246410
Compare
0246410 to
54d1a1f
Compare
src/torchcodec/_core/Encoder.cpp
Outdated
|
|
||
| void VideoEncoder::initializeEncoder( | ||
| const VideoStreamOptions& videoStreamOptions) { | ||
| if (videoStreamOptions.device.is_cuda()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hopiing we wouldn't need to support a device parameter anywhere. Any reason we can't just rely on the input frames device?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per our offline discussion, I've updated this PR to not have an explicit device param, and instead use whichever device the frames Tensor is on.
src/torchcodec/_core/Encoder.cpp
Outdated
| if (device.type() != torch::kCPU) { | ||
| TORCH_CHECK( | ||
| frames.is_cuda(), | ||
| "When using CUDA encoding (device=", | ||
| device.str(), | ||
| "), frames must be on a CUDA device. Got frames on ", | ||
| frames.device().str(), | ||
| ". Please move frames to a CUDA device: frames.to('cuda')"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error above should be an internal bug: if the frames are on CUDA while the device parameter is not, it means it wasn't set properly in the custom ops. So this should be an "internal bug" error message rather than a user-facing thing, since we assume users aren't using the C++ APIs.
But, I think we should push the logic from https://github.com/meta-pytorch/torchcodec/pull/1008/files#r2565059217 further: we don't need a device parameter at all, anywhere. Having a device parameter duplicates the source of truth of the device and leads to potential bugs (like the one above that the TORCH_CHECK is preventing). So I think we shouldn't set options.device at all for encoding, and always rely on the frames. We can add a comment in src/torchcodec/_core/StreamOptions.h to indicate that.
src/torchcodec/_core/custom_ops.cpp
Outdated
| std::optional<std::string_view> preset = std::nullopt, | ||
| std::optional<std::vector<std::string>> extra_options = std::nullopt) { | ||
| VideoStreamOptions videoStreamOptions; | ||
| videoStreamOptions.device = frames.device(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need to do that, it duplicates the source of truth of the device. See longer comment below.
src/torchcodec/_core/Encoder.cpp
Outdated
| avCodec = avcodec_find_encoder(avFormatContext_->oformat->video_codec); | ||
| if (gpuEncoder_) { | ||
| avCodec = gpuEncoder_->findEncoder(avFormatContext_->oformat->video_codec) | ||
| .value_or(avCodec); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there actual value in having findEncoder? Did you notice any problem if we didn't use it? If not, then let's remove it.
If yes, then let's rename it into findCodec and add a comment simlar to this one
torchcodec/src/torchcodec/_core/CudaDeviceInterface.cpp
Lines 332 to 336 in 392bab3
| // inspired by https://github.com/FFmpeg/FFmpeg/commit/ad67ea9 | |
| // we have to do this because of an FFmpeg bug where hardware decoding is not | |
| // appropriately set, so we just go off and find the matching codec for the CUDA | |
| // device | |
| std::optional<const AVCodec*> CudaDeviceInterface::findCodec( |
and also indicate that this findCodec function exists for similar reasons as the one I linked to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's delete it. The intention was to find a hardware enabled encoder if one was not specified, but as far as I can tell, FFmpeg CLI does not support that for encoding, only for decoding via the -hwaccel flag.
src/torchcodec/_core/GpuEncoder.cpp
Outdated
| avFrame->height = static_cast<int>(tensor.size(1)); | ||
| avFrame->pts = frameIndex; | ||
|
|
||
| int ret = av_hwframe_get_buffer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a note here that we're letting FFmpeg allocate the CUDA memory. I think we should explore allocating the memory with pytorch instead, so that we can automatically rely on pytorch's CUDA memory allocator, which should be more efficient. There could be a TODO to investigate how to do that (this is related to my comment about setupEncodingContext above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would an example of this be allocateEmptyHWCTensor?
test/test_encoders.py
Outdated
| VideoEncoder(frames, frame_rate=30).to_file(dest=dest, **common_params) | ||
| with open(dest, "rb") as f: | ||
| return torch.frombuffer(f.read(), dtype=torch.uint8) | ||
| return torch.frombuffer(f.read(), dtype=torch.uint8).clone() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why was that needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an attempt to fix a warning on this test, but it does not prevent the warning, I'll clean it up.
test/test_encoders.py::TestVideoEncoder::test_contiguity[cuda-to_file]
/home/dev/torchcodec/test/test_encoders.py:835: UserWarning: The given buffer is not writable, and PyTorch does not support non-writable tensors.
This means you can write to the underlying (supposedly non-writable) buffer using the tensor.
You may want to copy the buffer to protect its data or make it writable before converting it to a tensor.
This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_new.cpp:1581.)
return torch.frombuffer(f.read(), dtype=torch.uint8)
| if b"No NVENC capable devices found" in e.stderr: | ||
| pytest.skip("NVENC not available on this system") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a TODO to make sure our CI never ever skips those tests. I.e. we should have a mechanism in place that makes sure our CI fails here, instead of skipping these tests. This should be the first follow-up of this PR.
test/test_encoders.py
Outdated
|
|
||
| @pytest.mark.needs_cuda | ||
| @pytest.mark.skipif(in_fbcode(), reason="ffmpeg CLI not available") | ||
| @pytest.mark.parametrize("pixel_format", ("nv12", "yuv420p")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised to see this. I thought nvenc only supports NV12 output. Is that not the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out, I took another look at my code:
-
h264_nvencsupports multiple output pixel formats:yuv420p nv12 p010le yuv444p p016le nv16 ... -
The bottleneck is that
GpuEncoder::convertTensorToAVFramealways usesnppiRGBToNV12_8u_ColorTwist32f_C3P2R_Ctx, which only handlesnv12. -
Since
nv12andyuv420pdo the same chroma subsampling, the results appeared to be correct.
I'll add a TODO to enable utilizing the user's selected pixel formats. There are other nvidia functions we can use based on the target pixel format, or I could investigate using filtergraph's scale_cuda to handle conversion, as is done in maybeConvertAVFrameToNV12OrRGB24.
This PR adds CUDA support to the VideoEncoder.
GpuEncoder.cpp