Skip to content

Conversation

@mehtamansi29
Copy link
Collaborator

This commit adds a new example tutorial demonstrating how to build a Two-Stage Recommender System using keras_rs. The example focuses on a marketing interaction use case (Ad Click Prediction), covering both the Retrieval stage (Two-Tower model) and the Ranking stage.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @mehtamansi29, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new, detailed tutorial for implementing a two-stage recommender system within the keras_rs framework. It provides an end-to-end guide for a marketing interaction use case, demonstrating how to effectively predict ad click-through rates. The tutorial walks through data preparation, building a retrieval model to narrow down potential ad candidates, and then developing a ranking model to optimize the final selection, offering a complete solution for personalized ad delivery.

Highlights

  • New Tutorial Added: A new example tutorial has been added, demonstrating how to build a Two-Stage Recommender System using the keras_rs library.
  • Marketing Interaction Use Case: The tutorial focuses on a practical marketing interaction scenario, specifically Ad Click Prediction, to illustrate the recommender system's application.
  • Two-Stage Architecture: The example comprehensively covers both the Retrieval stage, utilizing a Two-Tower model, and the subsequent Ranking stage to refine recommendations.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces a new example tutorial for a Two-Stage Recommender System. The overall structure and explanation are clear and provide a good overview of the system. However, there are several minor issues related to typos, unused imports, and some potentially confusing or inefficient code patterns that could be improved for clarity and robustness. Specifically, there are multiple instances of "Retrival" instead of "Retrieval", some unused imports, and a loss function named bpr_hinge_loss that does not implement BPR Hinge Loss. Additionally, the Python script version of the notebook contains shell commands that are not valid Python syntax.

Comment on lines +394 to +396
"def bpr_hinge_loss(y_true, y_pred):\n",
" margin = 1.0\n",
" return -tf.math.log(tf.nn.sigmoid(y_pred) + 1e-10)\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The bpr_hinge_loss function is misnamed. The current implementation -tf.math.log(tf.nn.sigmoid(y_pred) + 1e-10) is a form of logistic loss, not BPR Hinge Loss, which typically involves a margin and max(0, margin - (pos_score - neg_score)). Also, margin = 1.0 is defined but not used. Please rename the function to accurately reflect its implementation or implement the actual BPR Hinge Loss.

def pairwise_logistic_loss(y_true, y_pred):
    return -tf.math.log(tf.nn.sigmoid(y_pred) + 1e-10)

Comment on lines +334 to +335
self.user_tower = user_tower
self.ad_tower = ad_tower
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The __init__ method of RetrievalModel takes user_tower_instance and ad_tower_instance as arguments but then overwrites them with the global user_tower and ad_tower variables. This makes the passed arguments redundant and can lead to unexpected behavior if different tower instances were intended to be used. It should use the passed arguments.

        self.user_tower = user_tower_instance
        self.ad_tower = ad_tower_instance

Comment on lines +327 to +329
margin = 1.0
return -tf.math.log(tf.nn.sigmoid(y_pred) + 1e-10)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The bpr_hinge_loss function is misnamed. The current implementation -tf.math.log(tf.nn.sigmoid(y_pred) + 1e-10) is a form of logistic loss, not BPR Hinge Loss, which typically involves a margin and max(0, margin - (pos_score - neg_score)). Also, margin = 1.0 is defined but not used. Please rename the function to accurately reflect its implementation or implement the actual BPR Hinge Loss.

def pairwise_logistic_loss(y_true, y_pred):
    return -tf.math.log(tf.nn.sigmoid(y_pred) + 1e-10)

Comment on lines +84 to +87
pip install -q kaggle
# Download the dataset (requires Kaggle API key in ~/.kaggle/kaggle.json)
kaggle datasets download -d mafrojaakter/ad-click-data --unzip -p ./ad_click_dataset
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Shell commands like pip install and kaggle datasets download are specific to Jupyter notebooks and will cause a SyntaxError if this Python file is run directly as a script. These lines should be removed or commented out for a pure Python file. Also, !# is an incorrect comment for a shell command.

Suggested change
pip install -q kaggle
# Download the dataset (requires Kaggle API key in ~/.kaggle/kaggle.json)
kaggle datasets download -d mafrojaakter/ad-click-data --unzip -p ./ad_click_dataset
"""
# pip install -q kaggle
# # Download the dataset (requires Kaggle API key in ~/.kaggle/kaggle.json)
# kaggle datasets download -d mafrojaakter/ad-click-data --unzip -p ./ad_click_dataset

Comment on lines +59 to +60
!pip install -q keras-rs
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Shell commands like !pip install are specific to Jupyter notebooks and will cause a SyntaxError if this Python file is run directly as a script. These lines should be removed or commented out for a pure Python file.

Suggested change
!pip install -q keras-rs
"""
# !pip install -q keras-rs

Comment on lines +103 to +104
"import tensorflow_datasets as tfds\n",
"from mpl_toolkits.axes_grid1 import make_axes_locatable\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The tensorflow_datasets and mpl_toolkits.axes_grid1 imports are not used in this notebook. Consider removing unused imports to keep the code clean.

import pandas as pd
import keras_rs

history = retrieval_model.fit(retrieval_train_dataset, epochs=30)

pd.DataFrame(history.history).plot(
subplots=True, layout=(1, 3), figsize=(12, 4), title="Retrival Model Metrics"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Typo: "Retrival Model Metrics" should be "Retrieval Model Metrics".

    subplots=True, layout=(1, 3), figsize=(12, 4), title="Retrieval Model Metrics"

plt.show()

"""
# **Predictions of Retrival Model**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Typo: "Retrival Model" should be "Retrieval Model".

# **Predictions of Retrieval Model**

Comment on lines +440 to +442
Retrieval model only calculates a simple similarity score (Dot Product). It doesn't
account for complex feature interactions.
So we need to build ranking model after words retrival model.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Typo and grammatical error: "Retrival model" should be "Retrieval model", and "after words retrival model" should be "after the retrieval model".

Retrieval model only calculates a simple similarity score (Dot Product). It doesn't
account for complex feature interactions.
So we need to build a ranking model after the retrieval model.

top_ads = retrieval_engine.decode_results(scores, indices)[0]
final_ranked_ads = rerank_ads_for_user(sample_user, top_ads, ranking_model)
print(f"User: {sample_user['user_id']}")
print(f"{'Ad ID':<10} | {'Topic':<30} | {'Retrival Score':<11} | {'Rank Probability'}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Typo: "Retrival Score" should be "Retrieval Score".

print(f"{'Ad ID':<10} | {'Topic':<30} | {'Retrieval Score':<11} | {'Rank Probability'}")

@gemini-code-assist
Copy link
Contributor

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

@sachinprasadhs
Copy link
Collaborator

Looks like you have added two .ipynb, remove the one which is not necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants