-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Description
OpenVINO Version
No response
Operating System
Ubuntu 22.04 (LTS)
Device used for inference
CPU
OpenVINO installation
PyPi
Programming Language
Python
Hardware Architecture
x86 (64 bits)
Model used
GPT-2
Model quantization
No
Mentions
Performance issue description
Summary
OpenVINO backend exhibits excessive memory consumption during GPT-2 model inference compared to other Keras backends (TensorFlow, PyTorch, JAX). The issue occurs during the model compilation phase when converting from Keras to OpenVINO format, resulting in significantly higher memory usage that makes OpenVINO unsuitable for memory-constrained environments.
Problem: OpenVINO uses substantially more memory than other backends during the compilation/inference phase.
📊 Complete Analysis & Benchmarks
For comprehensive performance comparison, optimization results, and technical details across all Keras backends:
� Detailed Performance Report & Memory Optimization Analysis
The report includes cross-backend benchmarks, constant sharing optimization implementation, device scope analysis, and production deployment recommendations.
Step-by-step reproduction
Use keras source: https://github.com/keras-team/keras.git
Also use this PR from keras_hub: keras-team/keras-hub#2350
import os
os.environ["KERAS_BACKEND"] = "openvino"
import keras_hub
causal_lm = keras_hub.models.GPT2CausalLM.from_preset("gpt2_medium_en", dtype="float32")
output = causal_lm.generate("Hello", max_length=10) # Memory spike occurs here
Issue submission checklist
- I'm reporting a performance issue. It's not a question.
- I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
- There is reproducer code and related data files such as images, videos, models, etc.