Skip to content

Conversation

@kernelmachine
Copy link
Contributor

This PR addresses #52 , adding wordpiece tokenization tools from huggingface's tokenizers.

kernelmachine and others added 29 commits October 20, 2019 15:33
Former-commit-id: cec4164
Former-commit-id: 5f3e396
Former-commit-id: 531873c
Former-commit-id: 8f01d54
Former-commit-id: fabff1f
Former-commit-id: 512325b
Former-commit-id: 196ce8d
Former-commit-id: 52db73b
Former-commit-id: caf3c62
Former-commit-id: 3164ef5
Former-commit-id: d84220f
Former-commit-id: 969ab37
Former-commit-id: 2885a20
Former-commit-id: ce77830
Former-commit-id: 32e91f2
Former-commit-id: ded18a2
Former-commit-id: e634ba4
Former-commit-id: 42f21ab
Former-commit-id: 503eb50
Former-commit-id: 9d89406
Former-commit-id: d2d7620
Former-commit-id: 4b3b678
Former-commit-id: 6b586f6
Former-commit-id: b969b66
Former-commit-id: ab9bacd
@kernelmachine kernelmachine force-pushed the wordpiece-tokenization branch from ab9bacd to 823114d Compare July 29, 2020 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants