Skip to content

Conversation

@dfbakin
Copy link
Contributor

@dfbakin dfbakin commented Apr 27, 2025

Test

Summary

Created inference node that performs detection and (if object is found) segmentation of the region of interest. Lock-free multithreading and LibTorch are used. Inference is performed on CUDA GPU and test on Nvidia Jetson AGX

Main changes

  • upgraded container to Jetpack 6.0, torch 2.2.0, torchvision 0.17.0, CUDA API 12.2 to allow TorchScript to work without "driver and API mismatch" error
  • Added BallInferenceNode with lock-free queues and inference loops
  • Implemented preprocessing for images that is TTNet for detection and U-Net for segmentation require
  • LibTorch library is used for matrix operations and inferencing pre-trained models

Ideas for further optimization:

  • switch preprocessing from OpenCV to base CUDA multimedia operation which should me significantly faster. About 8 ms is used for detection, 10ms for pre-processing. So, it might really help and provide required FPS
  • implement and measure TensorRT operations and compare them to TorchScript

dfbakin added 2 commits April 27, 2025 03:54
and introduced inference header
todo: sample test and test CUDA API calls if needed
@dfbakin dfbakin self-assigned this Apr 27, 2025
@dfbakin dfbakin added the enhancement New feature or request label Apr 27, 2025
@dfbakin dfbakin marked this pull request as ready for review May 20, 2025 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants