Skip to content

Conversation

@lpizzinidev
Copy link
Contributor

Updates the "Automatic Speech Recognition using CTC" example to support Keras v3.

Copy link
Contributor

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!



# Reference: https://github.com/keras-team/keras/blob/ec67b760ba25e1ccc392d288f7d8c6e9e153eea2/keras/legacy/backend.py#L674-L711
def ctc_label_dense_to_sparse(labels, label_lengths):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than rewriting this code, you can just use the built-in Keras 3 loss function keras.losses.CTC. I expect it will also enable the code example to run with all backends.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback 👍
After removing the legacy code we still have some references to tf in the example and I'm not sure this can be made backend-agnostic.
Please let me know if I should substitute the remaining tf references.

Copy link
Contributor

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you! You can add the generated files.

opt = keras.optimizers.Adam(learning_rate=1e-4)
# Compile the model and return
model.compile(optimizer=opt, loss=CTCLoss)
model.compile(optimizer=opt, loss=keras.losses.ctc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer using CTC() (ends up running the same thing but it's more idiomatic)

input_length = tf.cast(input_length, tf.int32)

if greedy:
(decoded, log_prob) = tf.nn.ctc_greedy_decoder(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we're going to have to use TF for this and ctc_beam_search_decoder I guess, unless we implement them as new backend ops.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, thanks for the feedback 👍
I created an issue to address this.
Please let me know if I should change the description or add/remove details.
Thanks!

@github-actions
Copy link

github-actions bot commented Aug 2, 2024

This PR is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you.

@github-actions github-actions bot added the stale label Aug 2, 2024
@github-actions
Copy link

This PR was closed because it has been inactive for 28 days. Please reopen if you'd like to work on this further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants