Skip to content

How to use generate() with inputs_embeds #395

@liechtym

Description

@liechtym

I hope this is the right place to ask this question. Let me know if I need to move to another repo.

Currently I'm using NeuronModelForCausalLM.

I have a use case where I need to be able to do the following:

  1. Generate embedding tokens
  2. Modify embedding tokens
  3. Run inference from modified embedding tokens

I am able to do steps 1 & 2 currently using the following:

from optimum.neuron import NeuronModelForCausalLM

llama_model = NeuronModelForCausalLM.from_pretrained('aws-neuron/Llama-2-7b-chat-hf-seqlen-2048-bs-1')

embedded_tokens = llama_model.model.chkpt_model.model.embed_tokens(token_ids)

### Code to modify embedded_tokens

However, as far as I can tell, generation with these modified tokens is not possible with llama_model.generate()

When I use the 'input_embeds' keyword argument, and set input_ids=None, I get the following:

ValueError: The following `model_kwargs` are not used by the model: ['inputs_embeds']

If this is not possible with the NeuronModelForCausalLM.generate() currently, is there a way to work around this manually? If so, could you provide an example?

Thanks very much for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions