-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Added DLRM notebook and generated python script #2131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Left some comments, let me know what you think :).
Here are some notes for posterity (we don't need to implement these now in this PR):
- We should, in the future, also think about the specific case of how how DLRM implements distributed training. The embeddings are sharded across devices, whereas the MLP layers are DDP'd.
- The paper mentions that they use
torch.nn.EmbeddingBag
instead oftorch.nn.Embedding
. Apparently,torch.nn.EmbeddingBag
is more efficient thantorch.nn.Embedding
followed by some aggregation op. We should think about this at some later point (maybe,keras_rs.embeddings.DistributedEmbedding
solves this). Anyway, we can worry about this later!
self.embedding_layers = [] | ||
for feature_name, vocabulary in vocabularies.items(): | ||
self.embedding_layers.append( | ||
keras.layers.Embedding( | ||
input_dim=len(vocabulary) + 1, | ||
output_dim=embedding_dim, | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the highlights of DLRM is that it can process both categorical and dense features. We should use some dense features present in the MovieLens dataset. Do we have any such features?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The movielens 100k-ratings dataset mostly contains categorical features, However it also has user age feature but its bucketized, which transforms it to be a categorical feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, doesn't it have raw_user_age
as a feature? The reason I'm insisting on this is because the two towers for dense and categorical features is a salient part of DLRM. And do you think we can use normalised timestamp as a feature? Will that help/does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! like you mentioned, two towers for dense and categorical features is a salient part of DLRM. I'll have another look at the code and see if any dense features can be used. Timestamp looks like a suitable dense feature. I'll modify the code and try using that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we can remove user_bucketised_age
and use raw_user_age
, maybe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Will try that
Updated docstrings to match the context according to code
@kharshith-k - let me know when this is ready for another round of review. Thanks! |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Great work, this looks good to me, overall. Just one major comment on using dense features (it isn't DLRM without the two blocks for dense and categorical features)
.DS_Store
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this file
examples/.DS_Store
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this file. You can also add these files to .gitignore, BTW, to avoid these getting added by accident.
examples/keras_rs/.DS_Store
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this file. You can also add these files to .gitignore
, BTW, to avoid these getting added by accident.
examples/keras_rs/img/.DS_Store
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove
examples/keras_rs/ipynb/dlrm.ipynb
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two copies of this notebook. Let's delete one of them
examples/keras_rs/ipynb/DLRM.ipynb
examples/keras_rs/ipynb/dlrm.ipynb
self.embedding_layers = [] | ||
for feature_name, vocabulary in vocabularies.items(): | ||
self.embedding_layers.append( | ||
keras.layers.Embedding( | ||
input_dim=len(vocabulary) + 1, | ||
output_dim=embedding_dim, | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, doesn't it have raw_user_age
as a feature? The reason I'm insisting on this is because the two towers for dense and categorical features is a salient part of DLRM. And do you think we can use normalised timestamp as a feature? Will that help/does that make sense?
Added DLRM.ipynb and generated dlrm.py