Skip to content

Commit 2a96ae1

Browse files
committed
[TRTLLM-5252][fix] Propagate mapping to intermediate layers
Signed-off-by: William Zhang <[email protected]>
1 parent 6135f75 commit 2a96ae1

File tree

3 files changed

+6
-1
lines changed

3 files changed

+6
-1
lines changed

examples/llm-api/quickstart_advanced.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -150,7 +150,9 @@ def parse_arguments():
150150
def setup_llm(args, **kwargs):
151151
kv_cache_config = KvCacheConfig(
152152
enable_block_reuse=not args.disable_kv_cache_reuse,
153-
free_gpu_memory_fraction=args.kv_cache_fraction,
153+
# free_gpu_memory_fraction=args.kv_cache_fraction,
154+
free_gpu_memory_fraction=0.5,
155+
max_tokens=10_000,
154156
dtype=args.kv_cache_dtype,
155157
)
156158

tensorrt_llm/_torch/models/modeling_mistral.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -475,6 +475,7 @@ def __init__(self, model_config: ModelConfig[Mistral3Config]):
475475
out_features=hidden_size,
476476
bias=False,
477477
dtype=config.torch_dtype,
478+
mapping=model_config.mapping,
478479
)
479480

480481
@torch.inference_mode()
@@ -546,6 +547,7 @@ def __init__(self, model_config: ModelConfig[Mistral3Config]):
546547
out_features=config.text_config.hidden_size,
547548
bias=config.multimodal_projector_bias,
548549
dtype=dtype,
550+
mapping=model_config.mapping,
549551
)
550552

551553
@torch.inference_mode()

tests/integration/test_lists/test-db/l0_dgx_h100.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@ l0_dgx_h100:
5151
- accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_ctx_pp_gen_tp_asymmetric[MMLU-gen_tp=1-ctx_pp=2]
5252
- accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_ctx_pp_gen_tp_asymmetric[MMLU-gen_tp=2-ctx_pp=2]
5353
- test_e2e.py::test_ptp_quickstart_advanced_bs1
54+
- unittest/_torch/modeling/test_modeling_pixtral.py::test_tensor_parallelism
5455
- condition:
5556
ranges:
5657
system_gpu_count:

0 commit comments

Comments
 (0)