This repository was archived by the owner on Sep 18, 2024. It is now read-only.

Description
Please make sure that the boxes below are checked before you submit your issue.
If your issue is an implementation question, please ask your question on StackOverflow or on the Keras Slack channel instead of opening a GitHub issue.
Thank you!
What I expect
Tokenizer should not tokenize characters listed in filters.
What happens
If char_level=True, Tokenizer will tokenize all characters—even those listed in filters.
Code
❯ text = "ae"
❯ tokenizer = keras.preprocessing.text.Tokenizer(filters="e") # Ignore "e"
❯ tokenizer.fit_on_texts(text)
❯ tokenizer.word_index
{'a': 1} # Ignores "e" as expected
❯ tokenizer = keras.preprocessing.text.Tokenizer(char_level=True, filters="e")
❯ tokenizer.fit_on_texts(text)
❯ tokenizer.word_index
{'a': 1, 'e': 2} # "e" is tokenized