Skip to content

Conversation

liligwu
Copy link
Contributor

@liligwu liligwu commented Sep 24, 2025

bwd performance optimization for ROCm.
Fix numerical issues

@meta-cla meta-cla bot added the cla signed label Sep 24, 2025
Copy link

netlify bot commented Sep 24, 2025

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 2c4a497
🔍 Latest deploy log https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/68ed6512a0f9200008a224cd
😎 Deploy Preview https://deploy-preview-4925--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@facebook-github-bot
Copy link
Contributor

@haoyuz has imported this pull request. If you are a Meta employee, you can view this in D83116315.

@q10
Copy link
Contributor

q10 commented Oct 2, 2025

@liligwu we're seeing

OSError: libtbb.so.12: cannot open shared object file: No such file or directory

We already install tbb here, so it might just be an issue of updating the build scripts to put libtbb in the LD_LIBRARY_PATH

Copy link
Contributor

meta-codesync bot commented Oct 13, 2025

@q10 has imported this pull request. If you are a Meta employee, you can view this in D83116315.

@liligwu
Copy link
Contributor Author

liligwu commented Oct 13, 2025

@liligwu we're seeing

OSError: libtbb.so.12: cannot open shared object file: No such file or directory

We already install tbb here, so it might just be an issue of updating the build scripts to put libtbb in the LD_LIBRARY_PATH

Hi @q10 , sorry I missed your message.
I actually have this commit 4d2bfdd that links tbb explicitly, it works in your container. Do you have any suggestions for fixing this issue in CI, please?

BTW, we discovered a numerical issue in 986cceb and reverted it in 85417b4. It unblocks merging the bwd optimization first.

Thank you.

@q10
Copy link
Contributor

q10 commented Oct 14, 2025

@liligwu we're seeing

OSError: libtbb.so.12: cannot open shared object file: No such file or directory

We already install tbb here, so it might just be an issue of updating the build scripts to put libtbb in the LD_LIBRARY_PATH

Hi @q10 , sorry I missed your message. I actually have this commit 4d2bfdd that links tbb explicitly, it works in your container. Do you have any suggestions for fixing this issue in CI, please?

BTW, we discovered a numerical issue in 986cceb and reverted it in 85417b4. It unblocks merging the bwd optimization first.

Thank you.

I think this commit only addresses the build step, where we need to link to tbb. However, for runtime, you might need to do a find in $CONDA_PREFIX from inside the container, and manually update LD_LIBRARY_PATH, or create a symlink (something like

(print_exec ln -s "${conda_prefix}/lib/librhash.so" "${conda_prefix}/lib/librhash.so.0") || return 1
).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants