Skip to content

Zero Collision Hash Benchmark Framework #3127

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lizhouyu
Copy link
Contributor

Differential Revision: D77033290

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 23, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D77033290

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D77033290

lizhouyu added a commit to lizhouyu/torchrec that referenced this pull request Jul 6, 2025
Summary: Pull Request resolved: pytorch#3127

Differential Revision: D77033290
@lizhouyu lizhouyu force-pushed the export-D77033290 branch from 729264d to f95b8fa Compare July 6, 2025 01:34
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D77033290

lizhouyu added a commit to lizhouyu/torchrec that referenced this pull request Jul 6, 2025
Summary: Pull Request resolved: pytorch#3127

Differential Revision: D77033290
@lizhouyu lizhouyu force-pushed the export-D77033290 branch from f95b8fa to c44d954 Compare July 6, 2025 04:35
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D77033290

lizhouyu added a commit to lizhouyu/torchrec that referenced this pull request Jul 6, 2025
Summary: Pull Request resolved: pytorch#3127

Differential Revision: D77033290
@lizhouyu lizhouyu force-pushed the export-D77033290 branch from c44d954 to 1a691d5 Compare July 6, 2025 04:59
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D77033290

@lizhouyu lizhouyu force-pushed the export-D77033290 branch from 1a691d5 to 61b76e8 Compare July 6, 2025 05:33
lizhouyu added a commit to lizhouyu/torchrec that referenced this pull request Jul 6, 2025
Summary: Pull Request resolved: pytorch#3127

Differential Revision: D77033290
lizhouyu added a commit to lizhouyu/torchrec that referenced this pull request Jul 6, 2025
Summary: Pull Request resolved: pytorch#3127

Differential Revision: D77033290
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D77033290

lizhouyu added a commit to lizhouyu/torchrec that referenced this pull request Jul 8, 2025
Summary: Pull Request resolved: pytorch#3127

Differential Revision: D77033290
@lizhouyu lizhouyu force-pushed the export-D77033290 branch from 61b76e8 to 115412b Compare July 8, 2025 00:09
lizhouyu added a commit to lizhouyu/torchrec that referenced this pull request Jul 8, 2025
Summary: Pull Request resolved: pytorch#3127

Differential Revision: D77033290
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D77033290

@lizhouyu lizhouyu force-pushed the export-D77033290 branch from 115412b to c3663f5 Compare July 9, 2025 17:56
lizhouyu added a commit to lizhouyu/torchrec that referenced this pull request Jul 9, 2025
Summary: Pull Request resolved: pytorch#3127

Differential Revision: D77033290
lizhouyu added a commit to lizhouyu/torchrec that referenced this pull request Jul 9, 2025
Summary: Pull Request resolved: pytorch#3127

Differential Revision: D77033290
lizhouyu added a commit to lizhouyu/torchrec that referenced this pull request Jul 14, 2025
Summary: Pull Request resolved: pytorch#3127

Differential Revision: D77033290
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D77033290

lizhouyu added a commit to lizhouyu/torchrec that referenced this pull request Jul 14, 2025
Summary: Pull Request resolved: pytorch#3127

Differential Revision: D77033290
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D77033290

lizhouyu added a commit to lizhouyu/torchrec that referenced this pull request Jul 15, 2025
Summary:
Pull Request resolved: https://github.com/pytorch/torchrec/pull/3127M## Context

### Context
The high-level intention of this diffs is to open-source (OSS) the Benchmark test bed for hash collision management algorithms. The goal is to provide a testbed for accurate and fair Benchmark.

### Major Changes

**Summary**

This diff introduces a new benchmark testbed for hash collision management modules. The framework includes several new files and modules:

* `torchrec/distributed/benchmark/benchmark_zch/models`: a folder to keep configuration files and wrapper classes for models to benchmark.
* `torchrec/distributed/benchmark/benchmark_zch/data`: a folder to keep configuration files and wrapper classes for dataset used for benchmark.
* `torchrec/modules/mc_adapter.py`: a new module that implements the MC Adapter algorithm which enables hash collision management modules into embedding collection modules of existing and future models in a plug-and-play manner. The adapter simulates all the APIs of embedding collection and embedding bag collection modules, with a managed collision module being called before embedding look-up.
* `torchrec/distributed/benchmark/benchmark_zch/data`: a new module for data-related functions, currently empty.
* `torchrec/distributed/benchmark/benchmark_zch/benchmark_zch.py`: a new script that runs the ZCH benchmark.
* `torchrec/distributed/benchmark/benchmark_zch/plots`: a new module for plotting training metrics, including an example notebook `plot_training_metrics.ipynb`.

The diff includes a significant amount of new code, including model definitions, data loading, and plotting functions. The `benchmark_zch.py` script is the main entry point for running the ZCH benchmark.

### Key Features

* Implements the MC Adapter algorithm for ZCH models.
* Includes a new benchmark framework for evaluating ZCH models.
* Provides data loading and plotting functions for training metrics.

### Implications

This diff provides a comprehensive framework for evaluating and optimizing ZCH models. The MC Adapter algorithm is a key component of this framework, and the benchmark script provides a convenient way to run and compare different ZCH models. The plotting functions allow for easy visualization of training metrics, facilitating model optimization and improvement.

Differential Revision: D77033290
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D77033290

lizhouyu added a commit to lizhouyu/torchrec that referenced this pull request Jul 15, 2025
Summary:
Pull Request resolved: https://github.com/pytorch/torchrec/pull/3127M## Context

### Context
The high-level intention of this diffs is to open-source (OSS) the Benchmark test bed for hash collision management algorithms. The goal is to provide a testbed for accurate and fair Benchmark.

### Major Changes

**Summary**

This diff introduces a new benchmark testbed for hash collision management modules. The framework includes several new files and modules:

* `torchrec/distributed/benchmark/benchmark_zch/models`: a folder to keep configuration files and wrapper classes for models to benchmark.
* `torchrec/distributed/benchmark/benchmark_zch/data`: a folder to keep configuration files and wrapper classes for dataset used for benchmark.
* `torchrec/modules/mc_adapter.py`: a new module that implements the MC Adapter algorithm which enables hash collision management modules into embedding collection modules of existing and future models in a plug-and-play manner. The adapter simulates all the APIs of embedding collection and embedding bag collection modules, with a managed collision module being called before embedding look-up.
* `torchrec/distributed/benchmark/benchmark_zch/benchmark_zch.py`: the main entrance of the benchmark testbed.
* `torchrec/distributed/benchmark/benchmark_zch/plots`: a folder that keeps plotting notebooks for training and evaluation metrics.

### Key Features

* Implements the MC Adapter algorithm for ZCH models.
* Includes a new benchmark framework for evaluating hash collision management models.
* Provides data loading and plotting functions for training metrics.
* Metrics will be output to tensorboard during training for users to inspect the real-time results.

### Implications

This diff provides a comprehensive framework for evaluating and optimizing hash collision management models. The MC Adapter algorithm is a key component of this framework, and the benchmark script provides a unified, convenient way to run and compare different hash collision management models. The plotting functions allow for easy visualization of training metrics, facilitating model optimization and improvement.

Differential Revision: D77033290
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D77033290

lizhouyu added a commit to lizhouyu/torchrec that referenced this pull request Jul 16, 2025
Summary:
Pull Request resolved: https://github.com/pytorch/torchrec/pull/3127M## Context

### Context
The high-level intention of this diffs is to open-source (OSS) the Benchmark test bed for hash collision management algorithms. The goal is to provide a testbed for accurate and fair Benchmark.

### Major Changes

**Summary**

This diff introduces a new benchmark testbed for hash collision management modules. The framework includes several new files and modules:

* `torchrec/distributed/benchmark/benchmark_zch/models`: a folder to keep configuration files and wrapper classes for models to benchmark.
* `torchrec/distributed/benchmark/benchmark_zch/data`: a folder to keep configuration files and wrapper classes for dataset used for benchmark.
* `torchrec/modules/mc_adapter.py`: a new module that implements the MC Adapter algorithm which enables hash collision management modules into embedding collection modules of existing and future models in a plug-and-play manner. The adapter simulates all the APIs of embedding collection and embedding bag collection modules, with a managed collision module being called before embedding look-up.
* `torchrec/distributed/benchmark/benchmark_zch/benchmark_zch.py`: the main entrance of the benchmark testbed.
* `torchrec/distributed/benchmark/benchmark_zch/plots`: a folder that keeps plotting notebooks for training and evaluation metrics.

### Key Features

* Implements the MC Adapter algorithm for ZCH models.
* Includes a new benchmark framework for evaluating hash collision management models.
* Provides data loading and plotting functions for training metrics.
* Metrics will be output to tensorboard during training for users to inspect the real-time results.

### Implications

This diff provides a comprehensive framework for evaluating and optimizing hash collision management models. The MC Adapter algorithm is a key component of this framework, and the benchmark script provides a unified, convenient way to run and compare different hash collision management models. The plotting functions allow for easy visualization of training metrics, facilitating model optimization and improvement.

Differential Revision: D77033290
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D77033290

lizhouyu added a commit to lizhouyu/torchrec that referenced this pull request Jul 17, 2025
Summary:
Pull Request resolved: https://github.com/pytorch/torchrec/pull/3127M## Context

### Context
The high-level intention of this diffs is to open-source (OSS) the Benchmark test bed for hash collision management algorithms. The goal is to provide a testbed for accurate and fair Benchmark.

### Major Changes

**Summary**

This diff introduces a new benchmark testbed for hash collision management modules. The framework includes several new files and modules:

* `torchrec/distributed/benchmark/benchmark_zch/models`: a folder to keep configuration files and wrapper classes for models to benchmark.
* `torchrec/distributed/benchmark/benchmark_zch/data`: a folder to keep configuration files and wrapper classes for dataset used for benchmark. It also includes a pre-hash script `sparse_kuairand_dataset.py` which takes an input of kuairand dataset and make the input values evenly distributed among the input hash space.
* `torchrec/modules/mc_adapter.py`: a new module that implements the MC Adapter algorithm which enables hash collision management modules into embedding collection modules of existing and future models in a plug-and-play manner. The adapter simulates all the APIs of embedding collection and embedding bag collection modules, with a managed collision module being called before embedding look-up.
* `torchrec/distributed/benchmark/benchmark_zch/benchmark_zch.py`: the main entrance of the benchmark testbed.
* `torchrec/distributed/benchmark/benchmark_zch/plots`: a folder that keeps plotting notebooks for training and evaluation metrics.

### Key Features

* Implements the MC Adapter algorithm for ZCH models.
* Includes a new benchmark framework for evaluating hash collision management models.
* Provides data loading and plotting functions for training metrics.
* Metrics will be output to tensorboard during training for users to inspect the real-time results.

### Implications

This diff provides a comprehensive framework for evaluating and optimizing hash collision management models. The MC Adapter algorithm is a key component of this framework, and the benchmark script provides a unified, convenient way to run and compare different hash collision management models. The plotting functions allow for easy visualization of training metrics, facilitating model optimization and improvement.

Reviewed By: aporialiao

Differential Revision: D77033290
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D77033290

lizhouyu added a commit to lizhouyu/torchrec that referenced this pull request Jul 17, 2025
Summary:
Pull Request resolved: https://github.com/pytorch/torchrec/pull/3127M## Context

### Context
The high-level intention of this diffs is to open-source (OSS) the Benchmark test bed for hash collision management algorithms. The goal is to provide a testbed for accurate and fair Benchmark.

### Major Changes

**Summary**

This diff introduces a new benchmark testbed for hash collision management modules. The framework includes several new files and modules:

* `torchrec/distributed/benchmark/benchmark_zch/models`: a folder to keep configuration files and wrapper classes for models to benchmark.
* `torchrec/distributed/benchmark/benchmark_zch/data`: a folder to keep configuration files and wrapper classes for dataset used for benchmark. It also includes a pre-hash script `sparse_kuairand_dataset.py` which takes an input of kuairand dataset and make the input values evenly distributed among the input hash space.
* `torchrec/modules/mc_adapter.py`: a new module that implements the MC Adapter algorithm which enables hash collision management modules into embedding collection modules of existing and future models in a plug-and-play manner. The adapter simulates all the APIs of embedding collection and embedding bag collection modules, with a managed collision module being called before embedding look-up.
* `torchrec/distributed/benchmark/benchmark_zch/benchmark_zch.py`: the main entrance of the benchmark testbed.
* `torchrec/distributed/benchmark/benchmark_zch/plots`: a folder that keeps plotting notebooks for training and evaluation metrics.

### Key Features

* Implements the MC Adapter algorithm for ZCH models.
* Includes a new benchmark framework for evaluating hash collision management models.
* Provides data loading and plotting functions for training metrics.
* Metrics will be output to tensorboard during training for users to inspect the real-time results.

### Implications

This diff provides a comprehensive framework for evaluating and optimizing hash collision management models. The MC Adapter algorithm is a key component of this framework, and the benchmark script provides a unified, convenient way to run and compare different hash collision management models. The plotting functions allow for easy visualization of training metrics, facilitating model optimization and improvement.

Reviewed By: aporialiao

Differential Revision: D77033290
Summary:
Pull Request resolved: https://github.com/pytorch/torchrec/pull/3127M## Context

### Context
The high-level intention of this diffs is to open-source (OSS) the Benchmark test bed for hash collision management algorithms. The goal is to provide a testbed for accurate and fair Benchmark.

### Major Changes

**Summary**

This diff introduces a new benchmark testbed for hash collision management modules. The framework includes several new files and modules:

* `torchrec/distributed/benchmark/benchmark_zch/models`: a folder to keep configuration files and wrapper classes for models to benchmark.
* `torchrec/distributed/benchmark/benchmark_zch/data`: a folder to keep configuration files and wrapper classes for dataset used for benchmark. It also includes a pre-hash script `sparse_kuairand_dataset.py` which takes an input of kuairand dataset and make the input values evenly distributed among the input hash space.
* `torchrec/modules/mc_adapter.py`: a new module that implements the MC Adapter algorithm which enables hash collision management modules into embedding collection modules of existing and future models in a plug-and-play manner. The adapter simulates all the APIs of embedding collection and embedding bag collection modules, with a managed collision module being called before embedding look-up.
* `torchrec/distributed/benchmark/benchmark_zch/benchmark_zch.py`: the main entrance of the benchmark testbed.
* `torchrec/distributed/benchmark/benchmark_zch/plots`: a folder that keeps plotting notebooks for training and evaluation metrics.

### Key Features

* Implements the MC Adapter algorithm for ZCH models.
* Includes a new benchmark framework for evaluating hash collision management models.
* Provides data loading and plotting functions for training metrics.
* Metrics will be output to tensorboard during training for users to inspect the real-time results.

### Implications

This diff provides a comprehensive framework for evaluating and optimizing hash collision management models. The MC Adapter algorithm is a key component of this framework, and the benchmark script provides a unified, convenient way to run and compare different hash collision management models. The plotting functions allow for easy visualization of training metrics, facilitating model optimization and improvement.

Reviewed By: aporialiao

Differential Revision: D77033290
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D77033290

facebook-github-bot pushed a commit that referenced this pull request Jul 17, 2025
Summary:
Pull Request resolved: https://github.com/pytorch/torchrec/pull/3127M## Context

### Context
The high-level intention of this diffs is to open-source (OSS) the Benchmark test bed for hash collision management algorithms. The goal is to provide a testbed for accurate and fair Benchmark.

### Major Changes

**Summary**

This diff introduces a new benchmark testbed for hash collision management modules. The framework includes several new files and modules:

* `torchrec/distributed/benchmark/benchmark_zch/models`: a folder to keep configuration files and wrapper classes for models to benchmark.
* `torchrec/distributed/benchmark/benchmark_zch/data`: a folder to keep configuration files and wrapper classes for dataset used for benchmark. It also includes a pre-hash script `sparse_kuairand_dataset.py` which takes an input of kuairand dataset and make the input values evenly distributed among the input hash space.
* `torchrec/modules/mc_adapter.py`: a new module that implements the MC Adapter algorithm which enables hash collision management modules into embedding collection modules of existing and future models in a plug-and-play manner. The adapter simulates all the APIs of embedding collection and embedding bag collection modules, with a managed collision module being called before embedding look-up.
* `torchrec/distributed/benchmark/benchmark_zch/benchmark_zch.py`: the main entrance of the benchmark testbed.
* `torchrec/distributed/benchmark/benchmark_zch/plots`: a folder that keeps plotting notebooks for training and evaluation metrics.

### Key Features

* Implements the MC Adapter algorithm for ZCH models.
* Includes a new benchmark framework for evaluating hash collision management models.
* Provides data loading and plotting functions for training metrics.
* Metrics will be output to tensorboard during training for users to inspect the real-time results.

### Implications

This diff provides a comprehensive framework for evaluating and optimizing hash collision management models. The MC Adapter algorithm is a key component of this framework, and the benchmark script provides a unified, convenient way to run and compare different hash collision management models. The plotting functions allow for easy visualization of training metrics, facilitating model optimization and improvement.

Reviewed By: aporialiao

Differential Revision: D77033290

fbshipit-source-id: 298a1c6e1ab858641992db7362ccf227725ec12b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants