[Performance] High Memory Usage During GPT-2 Generation Using OpenVINO Backend on Keras 3 Compared to other backends

### OpenVINO Version

_No response_

### Operating System

Ubuntu 22.04 (LTS)

### Device used for inference

CPU

### OpenVINO installation

PyPi

### Programming Language

Python

### Hardware Architecture

x86 (64 bits)

### Model used

GPT-2

### Model quantization

No

### Mentions

@rkazants 
@mvafin 
@mlukasze 

### Performance issue description

## Summary

OpenVINO backend exhibits **excessive memory consumption** during GPT-2 model inference compared to other Keras backends (TensorFlow, PyTorch, JAX). The issue occurs during the model compilation phase when converting from Keras to OpenVINO format, resulting in significantly higher memory usage that makes OpenVINO unsuitable for memory-constrained environments.

**Problem**: OpenVINO uses substantially more memory than other backends during the compilation/inference phase.

### 📊 Complete Analysis & Benchmarks
For comprehensive performance comparison, optimization results, and technical details across all Keras backends:

**[� Detailed Performance Report & Memory Optimization Analysis](https://gist.github.com/Mohamed-Ashraf273/1ecc15bd5e83c229d7e3f07851624bc8)**

The report includes cross-backend benchmarks, constant sharing optimization implementation, device scope analysis, and production deployment recommendations.

---

### Step-by-step reproduction

Use keras source: https://github.com/keras-team/keras.git  
Also use this PR from keras_hub: https://github.com/keras-team/keras-hub/pull/2350

```python
import os
os.environ["KERAS_BACKEND"] = "openvino"

import keras_hub
causal_lm = keras_hub.models.GPT2CausalLM.from_preset("gpt2_medium_en", dtype="float32")
output = causal_lm.generate("Hello", max_length=10)  # Memory spike occurs here
```

### Issue submission checklist

- [x] I'm reporting a performance issue. It's not a question.
- [x] I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
- [x] There is reproducer code and related data files such as images, videos, models, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Performance] High Memory Usage During GPT-2 Generation Using OpenVINO Backend on Keras 3 Compared to other backends #31390

OpenVINO Version

Operating System

Device used for inference

OpenVINO installation

Programming Language

Hardware Architecture

Model used

Model quantization

Mentions

Performance issue description

Summary

📊 Complete Analysis & Benchmarks

Step-by-step reproduction

Issue submission checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Performance] High Memory Usage During GPT-2 Generation Using OpenVINO Backend on Keras 3 Compared to other backends #31390

Description

OpenVINO Version

Operating System

Device used for inference

OpenVINO installation

Programming Language

Hardware Architecture

Model used

Model quantization

Mentions

Performance issue description

Summary

📊 Complete Analysis & Benchmarks

Step-by-step reproduction

Issue submission checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions