Skip to content

Add support for soft prompts. #889

@arivero

Description

@arivero

Is your feature request related to a problem? Please describe.

An alternative to fine tuning a whole model, or only some layers, is to fine tuning an ad-hoc prompt with new tokens. Or, if the position and extra info is irrelevant, just prepend an array of "already embedded" extra tokens.

This technique is described in the literature as "soft prompting". It allows for a very controlled counting of the number of trainable parameters, as it is just the product of the hidden vector times the number of extra tokens.

Now, training is subtle. In the first case one needs to train only part of the embedding weights and keep untouched the original ones. In the second case, one needs to add a vector than must be concatenated with the input, causing havoc in _keras_mask.

Describe the solution you'd like

An standardized way for this procedure, as it is general for almost every LLM. Not sure if it should be first form (all the model is a black box except the embedding matrix) or the second one (embedding matrix untouched, but "softprompts" must be concatenated deeper, and in the case of transformers with the option of doing it before or after summing the position embedding).

Describe alternatives you've considered

Both methods can be done by hand. In the case of fine tuning an extended embedding matrix, it can be done combining a mask with a stop_propagation in the product. In the case of additional parameters in the post-embedded, they can be added at the cost of some memory wasted along all the batch, plus rebuilt of the _keras_mask. I have not considered the use of cache for the attention, which probably is a good third alternative.

Metadata

Metadata

Assignees

No one assigned

    Labels

    scoping requiredFeatures that need significant design and planning before being actionabletype:featureNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions