11# Benchmarking for quantum machine learning models
22
33This repository contains tools to compare the performance of near-term quantum machine learning (QML)
4- as well as standard classical machine learning models on supervised learning tasks.
4+ as well as standard classical machine learning models on supervised and generative learning tasks.
55
66It is based on pipelines using [ Pennylane] ( https://pennylane.ai/ ) for the simulation of quantum circuits,
77[ JAX] ( https://jax.readthedocs.io/en/latest/index.html ) for training,
@@ -39,12 +39,12 @@ Dependencies of this package can be installed in your environment by running
3939pip install -r requirements.txt
4040```
4141
42- ## Adding a custom model
42+ ## Adding a custom classifier
4343
4444We use the [ Scikit-learn API] ( https://scikit-learn.org/stable/developers/develop.html ) to create
4545models and perform hyperparameter search.
4646
47- A minimal template for a new quantum model is as follows, and can be stored
47+ A minimal template for a new quantum classifier is as follows, and can be stored
4848in ` qml_benchmarks/models/my_model.py ` :
4949
5050``` python
@@ -61,18 +61,23 @@ class MyModel(BaseEstimator, ClassifierMixin):
6161
6262 # reproducibility is ensured by creating a numpy PRNG and using it for all
6363 # subsequent random functions.
64- self ._random_state = random_state
65- self ._rng = np.random.default_rng(random_state)
64+ self .random_state = random_state
65+ self .rng = np.random.default_rng(random_state)
6666
6767 # define data-dependent attributes
6868 self .params_ = None
6969 self .n_qubits_ = None
70+
71+ def initialize (self , args ):
72+ """
73+ initialize the model if necessary
74+ """
75+ # ... your code here ...
7076
7177 def fit (self , X , y ):
7278 """ Fit the model to data X and labels y.
7379
7480 Add your custom training loop here and store the trained model parameters in `self.params_`.
75- Set the data-dependent attributes, such as `self.n_qubits_`.
7681
7782 Args:
7883 X (array_like): Data of shape (n_samples, n_features)
@@ -146,9 +151,86 @@ model.fit(X_train, y_train)
146151print (model.score(X_test, y_test))
147152```
148153
154+
155+ ## Adding a custom generative model
156+
157+ The minimal template for a new generative model closely follows that of the classifier models.
158+ Labels are set to ` None ` throughout to maintain sci-kit learn functionality.
159+
160+ ``` python
161+ import numpy as np
162+
163+ from sklearn.base import BaseEstimator
164+
165+
166+ class MyModel (BaseEstimator ):
167+ def __init__ (self , hyperparam1 = " some_value" , random_state = 42 ):
168+
169+ # store hyperparameters as attributes
170+ self .hyperparam1 = hyperparam1
171+
172+ # reproducibility is ensured by creating a numpy PRNG and using it for all
173+ # subsequent random functions.
174+ self .random_state = random_state
175+ self .rng = np.random.default_rng(random_state)
176+
177+ # define data-dependent attributes
178+ self .params_ = None
179+ self .n_qubits_ = None
180+
181+ def initialize (self , args ):
182+ """
183+ initialize the model if necessary
184+ """
185+ # ... your code here ...
186+
187+ def fit (self , X , y = None ):
188+ """ Fit the model to data X.
189+
190+ Add your custom training loop here and store the trained model parameters in `self.params_`.
191+
192+ Args:
193+ X (array_like): Data of shape (n_samples, n_features)
194+ y (array_like): not used (no labels)
195+ """
196+ # ... your code here ...
197+
198+ def sample (self , num_samples ):
199+ """ sample from the generative model
200+
201+ Args:
202+ num_samples (int): number of points to sample
203+
204+ Returns:
205+ array_like: sampled points
206+ """
207+ # ... your code here ...
208+
209+ return samples
210+
211+ def score (self , X , y = None ):
212+ """ A optional custom score function to be used with hyperparameter optimization
213+ Args:
214+ X (array_like): Data of shape (n_samples, n_features)
215+ y: unused (no labels for generative models)
216+
217+ Returns:
218+ (float): score for the dataset X
219+ """
220+ # ... your code here ...
221+ return score
222+ ```
223+
224+ If the model samples binary data, it is recommended to construct models that sample binary strings (rather than $\pm1$ valued strings)
225+ to align with the datasets designed for generative models.
226+ Energy based models can easily be constructed by replacing the multilayer perceptron neural network in ` DeepEBM ` by
227+ any other differentiable network written in ` flax ` .
228+
149229## Datasets
150230
151- The ` qml_benchmarks.data ` module provides generating functions to create datasets for binary classification.
231+ The ` qml_benchmarks.data ` module provides generating functions to create datasets for binary classification and
232+ generative learning.
233+
152234A generating function can be used like this:
153235
154236``` python
@@ -158,7 +240,7 @@ X, y = generate_two_curves(n_samples=200, n_features=4, degree=3, noise=0.1, off
158240```
159241
160242Note that some datasets might have different return data structures, for example if the train/test split
161- is performed by the generating function.
243+ is performed by the generating function. If the dataset does not include labels, ` y = None ` is returned.
162244
163245The original datasets used in the paper can be generated by running the scripts in the ` paper/benchmarks ` folder,
164246such as:
@@ -172,15 +254,18 @@ This will create a new folder in `paper/benchmarks` containing the datasets.
172254## Running hyperparameter optimization
173255
174256In the folder ` scripts ` we provide an example that can be used to
175- generate results for a hyperparameter search for any model and dataset. The script
257+ generate results for a hyperparameter search for any model and dataset. The script functions
258+ for both classifier and generative models. The script
176259can be run as
177260
178261```
179- python run_hyperparameter_search.py --classifier-name "DataReuploadingClassifier" --dataset-path "my_dataset.csv"
262+ python run_hyperparameter_search.py --model "DataReuploadingClassifier" --dataset-path "my_dataset.csv"
180263```
181264
182- where ` my_dataset.csv ` is a CSV file containing the training data such that each column is a feature
183- and the last column is the target.
265+ where` my_dataset.csv ` is a CSV file containing the training data. For classification problems, each column should
266+ correspond to a feature and the last column to the target. For generative learning, each row
267+ should correspond to a binary string that specifies a unique data sample, and the model should implement a ` score `
268+ method.
184269
185270Unless otherwise specified, the hyperparameter grid is loaded from ` qml_benchmarks/hyperparameter_settings.py ` .
186271One can override the default grid of hyperparameters by specifying the hyperparameter list,
@@ -189,7 +274,7 @@ For example, for the `DataReuploadingClassifier` we can run:
189274
190275```
191276python run_hyperparameter_search.py \
192- --classifier-name DataReuploadingClassifier \
277+ --model DataReuploadingClassifier \
193278 --dataset-path "my_dataset.csv" \
194279 --n_layers 1 2 \
195280 --observable_type "single" "full"\
0 commit comments