How to use generate() with inputs_embeds

I hope this is the right place to ask this question. Let me know if I need to move to another repo.

Currently I'm using `NeuronModelForCausalLM`.

I have a use case where I need to be able to do the following:

1. Generate embedding tokens
2. Modify embedding tokens
3. Run inference from modified embedding tokens

I am able to do steps 1 & 2 currently using the following:
```
from optimum.neuron import NeuronModelForCausalLM

llama_model = NeuronModelForCausalLM.from_pretrained('aws-neuron/Llama-2-7b-chat-hf-seqlen-2048-bs-1')

embedded_tokens = llama_model.model.chkpt_model.model.embed_tokens(token_ids)

### Code to modify embedded_tokens
```

However, as far as I can tell, generation with these modified tokens is not possible with `llama_model.generate()`

When I use the 'input_embeds' keyword argument, and set `input_ids=None`, I get the following:
```
ValueError: The following `model_kwargs` are not used by the model: ['inputs_embeds']
```

If this is not possible with the NeuronModelForCausalLM.generate() currently, is there a way to work around this manually? If so, could you provide an example?

Thanks very much for your help! 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to use generate() with inputs_embeds #395

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to use generate() with inputs_embeds #395

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions