Skip to content

Add palletization/codebook support to CoreML backend #13051

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Aug 6, 2025

Conversation

metascroy
Copy link
Contributor

@metascroy metascroy commented Jul 31, 2025

This adds palletization support for embedding/linear layers in CoreML using TorchAO's quantize_ API.

Note, this needs to wait for pytorch/ao#2648 to land in ao + a pin bump in ET before landing.

Copy link

pytorch-bot bot commented Jul 31, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13051

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 2 Unrelated Failures

As of commit 5ee46af with merge base 1709a83 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 31, 2025
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@metascroy metascroy changed the title [Draft] Add palletization/codebook support to CoreML backend Add palletization/codebook support to CoreML backend Aug 1, 2025
@metascroy metascroy marked this pull request as ready for review August 1, 2025 00:18
nbits = inputs[2].val

# information in block_size is redundant with codebook.shape
block_size = inputs[3].val # noqa: F841
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YifanShenSZ is there any restriction on the block size here needed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aware of, and I don't see any from our constexpr_lut_to_dense op doc

@metascroy metascroy requested a review from YifanShenSZ August 1, 2025 00:20
Copy link
Collaborator

@YifanShenSZ YifanShenSZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! 💯

Speak of pin, we have released coremltools 9.0b1

nbits = inputs[2].val

# information in block_size is redundant with codebook.shape
block_size = inputs[3].val # noqa: F841
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aware of, and I don't see any from our constexpr_lut_to_dense op doc

torch_alias=["quant::dequantize_codebook", "quant.dequantize_codebook"],
override=False,
)
def dequantize_codebook(context, node):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq: seems that "codebook" corresponds to our look up table (lut)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, codebook is the same as the LUT and codes are the same as the indices.

@metascroy
Copy link
Contributor Author

Nice! 💯

Speak of pin, we have released coremltools 9.0b1

Nice! Where can I learn more about what's new?

@metascroy metascroy force-pushed the add-palletization-support branch from 0fa6302 to c4ca106 Compare August 4, 2025 22:40
@metascroy metascroy requested a review from GregoryComer as a code owner August 4, 2025 22:40
@metascroy
Copy link
Contributor Author

@cccclai @digantdesai can I get a stamp here.

@YifanShenSZ has approved, but I need an approver in Pytorch to merge.

torch_alias=["quant::dequantize_codebook", "quant.dequantize_codebook"],
override=False,
)
def dequantize_codebook(context, node):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have quantize variant because the weights are always folded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm enabling weight-only right now, so there is no quantize variant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm enabling weight-only right now, so there is no quantize variant

CodebookWeightOnlyConfig(dtype=torch.uint2, block_size=[-1, 16]),
)
ep = torch.export.export(model, example_inputs)
print("ORIGINAL MODEL", ep)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: remove?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to assert if dequantize_codebook is present in the graph?

@@ -8,6 +8,7 @@
# coremltools than is used by ExecuTorch. Each op registered here should have a link to a PR in coremltools that adds
# the op to the coremltools library.

import numpy as np
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any constraint on numpy version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most versions would work. I only use it for np.int8

model, example_inputs = self._get_test_model()
quantize_(
model,
CodebookWeightOnlyConfig(dtype=torch.uint3, block_size=[-1, 16]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does coremltools recognize _construct_constexpr_lut_op followed by embedding lookup as special pattern for quantized embedding that gets optimized? Same for say LUT quantized linear?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From @cymbalrush, during on-device compilation CoreML fuses dequant ops with linear ops into one kernel.

f"Core ML ignores output_dtype {out_np_dtype} on torchao.dequantize_affine and instead uses the native precision."
)

output = _utils._construct_constexpr_lut_op(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont follow the constexpr thing for this though? what does that mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This translates the dequantize_codebook op to one of the following CoreML ops:

These ops get fused with the following linear op at runtime.

@metascroy metascroy force-pushed the add-palletization-support branch from 17a2728 to 5ee46af Compare August 6, 2025 18:53
@metascroy metascroy merged commit 9574270 into main Aug 6, 2025
234 of 238 checks passed
@metascroy metascroy deleted the add-palletization-support branch August 6, 2025 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants