Add palletization/codebook support to CoreML backend #13051

metascroy · 2025-07-31T21:34:22Z

This adds palletization support for embedding/linear layers in CoreML using TorchAO's quantize_ API.

Note, this needs to wait for pytorch/ao#2648 to land in ao + a pin bump in ET before landing.

pytorch-bot · 2025-07-31T21:34:25Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13051

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 2 Unrelated Failures

As of commit 5ee46af with merge base 1709a83 ():

NEW FAILURES - The following jobs have failed:

Build documentation / build (buck2) / Build doc (gh)
At least one of the pre-conditions you specified did not hold
pull / test-moshi-linux / linux-job (gh)
RuntimeError: Command docker exec -t 7055eca457d059a8f6e379e0f42ee02bf7108beb2b980da48892930e12270a03 /exec failed with exit code 1

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / test-openvino-linux / linux-job (gh) (trunk failure)
AttributeError: '_OpNamespace' 'quantized_decomposed' object has no attribute 'convert_element_type'
trunk / test-models-arm-zephyr (add) / linux-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2025-07-31T21:34:55Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

metascroy · 2025-08-01T00:20:24Z

backends/apple/coreml/compiler/torch_ops.py

+    nbits = inputs[2].val
+
+    # information in block_size is redundant with codebook.shape
+    block_size = inputs[3].val  # noqa: F841


@YifanShenSZ is there any restriction on the block size here needed?

I'm not aware of, and I don't see any from our constexpr_lut_to_dense op doc

YifanShenSZ

Nice! 💯

Speak of pin, we have released coremltools 9.0b1

YifanShenSZ · 2025-08-01T00:32:41Z

backends/apple/coreml/compiler/torch_ops.py

+    nbits = inputs[2].val
+
+    # information in block_size is redundant with codebook.shape
+    block_size = inputs[3].val  # noqa: F841


I'm not aware of, and I don't see any from our constexpr_lut_to_dense op doc

YifanShenSZ · 2025-08-01T00:33:27Z

backends/apple/coreml/compiler/torch_ops.py

+    torch_alias=["quant::dequantize_codebook", "quant.dequantize_codebook"],
+    override=False,
+)
+def dequantize_codebook(context, node):


qq: seems that "codebook" corresponds to our look up table (lut)?

Yes, codebook is the same as the LUT and codes are the same as the indices.

metascroy · 2025-08-01T01:24:47Z

Nice! 💯

Speak of pin, we have released coremltools 9.0b1

Nice! Where can I learn more about what's new?

metascroy · 2025-08-04T22:41:58Z

@cccclai @digantdesai can I get a stamp here.

@YifanShenSZ has approved, but I need an approver in Pytorch to merge.

digantdesai · 2025-08-06T11:51:06Z

backends/apple/coreml/compiler/torch_ops.py

+    torch_alias=["quant::dequantize_codebook", "quant.dequantize_codebook"],
+    override=False,
+)
+def dequantize_codebook(context, node):


We don't have quantize variant because the weights are always folded?

I'm enabling weight-only right now, so there is no quantize variant

digantdesai · 2025-08-06T11:52:02Z

backends/apple/coreml/test/test_torch_ops.py

+            CodebookWeightOnlyConfig(dtype=torch.uint2, block_size=[-1, 16]),
+        )
+        ep = torch.export.export(model, example_inputs)
+        print("ORIGINAL MODEL", ep)


Nit: remove?

Do you want to assert if dequantize_codebook is present in the graph?

digantdesai · 2025-08-06T11:54:55Z

backends/apple/coreml/compiler/torch_ops.py

@@ -8,6 +8,7 @@
 # coremltools than is used by ExecuTorch.  Each op registered here should have a link to a PR in coremltools that adds
 # the op to the coremltools library.

+import numpy as np


Any constraint on numpy version?

I think most versions would work. I only use it for np.int8

kimishpatel · 2025-08-06T17:22:02Z

backends/apple/coreml/test/test_torch_ops.py

+        model, example_inputs = self._get_test_model()
+        quantize_(
+            model,
+            CodebookWeightOnlyConfig(dtype=torch.uint3, block_size=[-1, 16]),


So does coremltools recognize _construct_constexpr_lut_op followed by embedding lookup as special pattern for quantized embedding that gets optimized? Same for say LUT quantized linear?

From @cymbalrush, during on-device compilation CoreML fuses dequant ops with linear ops into one kernel.

kimishpatel · 2025-08-06T17:23:25Z

backends/apple/coreml/compiler/torch_ops.py

+            f"Core ML ignores output_dtype {out_np_dtype} on torchao.dequantize_affine and instead uses the native precision."
+        )
+
+    output = _utils._construct_constexpr_lut_op(


i dont follow the constexpr thing for this though? what does that mean?

This translates the dequantize_codebook op to one of the following CoreML ops:

iOS16.constexpr_ops.constexpr_lut_to_dense (https://fburl.com/hccppb8q)

iOS18.compression.constexpr_lut_to_dense (https://fburl.com/51xpft2d)

These ops get fused with the following linear op at runtime.

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 31, 2025

metascroy mentioned this pull request Jul 31, 2025

[CoreML] Enable palletization via quantize_ #12923

Closed

metascroy changed the title ~~[Draft] Add palletization/codebook support to CoreML backend~~ Add palletization/codebook support to CoreML backend Aug 1, 2025

metascroy marked this pull request as ready for review August 1, 2025 00:18

metascroy requested review from shoumikhin and cccclai as code owners August 1, 2025 00:18

metascroy commented Aug 1, 2025

View reviewed changes

metascroy requested a review from YifanShenSZ August 1, 2025 00:20

YifanShenSZ approved these changes Aug 1, 2025

View reviewed changes

metascroy force-pushed the add-palletization-support branch from 0fa6302 to c4ca106 Compare August 4, 2025 22:40

metascroy requested a review from GregoryComer as a code owner August 4, 2025 22:40

metascroy added the ciflow/trunk label Aug 4, 2025

digantdesai reviewed Aug 6, 2025

View reviewed changes

digantdesai approved these changes Aug 6, 2025

View reviewed changes

kimishpatel reviewed Aug 6, 2025

View reviewed changes

metascroy added 8 commits August 6, 2025 11:30

Add palletization support

d2a8f04

Add codebook support to CoreML using quantize_

44acb33

Update torchao pin

ae46ad9

up

e2a5fb5

up

88b7f77

up

5be96ca

up

efbe98d

up

5ee46af

metascroy force-pushed the add-palletization-support branch from 17a2728 to 5ee46af Compare August 6, 2025 18:53

metascroy merged commit 9574270 into main Aug 6, 2025
234 of 238 checks passed

metascroy deleted the add-palletization-support branch August 6, 2025 21:17

Add palletization/codebook support to CoreML backend #13051

Add palletization/codebook support to CoreML backend #13051

Uh oh!

Conversation

metascroy commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13051

❌ 2 New Failures, 2 Unrelated Failures

Uh oh!

github-actions bot commented Jul 31, 2025

This PR needs a release notes: label

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

YifanShenSZ left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

metascroy commented Aug 1, 2025

Uh oh!

metascroy commented Aug 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

metascroy commented Jul 31, 2025 •

edited

Loading

pytorch-bot bot commented Jul 31, 2025 •

edited

Loading

This PR needs a `release notes:` label