Skip to content

Conversation

@GeoffNN
Copy link
Contributor

@GeoffNN GeoffNN commented Mar 26, 2021

  • Added data normalizing with StandardScaler
  • Modified the loss function to exactly match the loss function in objective.py

This doesn't fix the issue mentioned here.

This still needs work for the stochastic case.

@GeoffNN GeoffNN marked this pull request as draft March 26, 2021 10:09
solvers/chop.py Outdated
self.X = torch.tensor(X).to(device)
self.y = torch.tensor(y > 0, dtype=torch.float64).to(device)
scaler = StandardScaler()
X = scaler.fit_transform(X)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not be done in the solver but in the dataset as it changes the problem

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be an option in the dataset?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the libsvmdata package does some preprocessing like this. I would just offer the good dataset (normalized or not).

Copy link
Contributor Author

@GeoffNN GeoffNN Mar 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it seems this was the cause of the infinity values... (Since the scaler wasn't applied at eval time) Modifying.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GeoffNN dont change the solver but the dataset. Otherwise solvers will not be comparable

Copy link
Contributor Author

@GeoffNN GeoffNN Mar 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done; I added standardization as a parameter in Covtype, so it runs on both versions of the dataset. We can also just keep the standardized version if you prefer.

Copy link
Contributor

@agramfort agramfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool !

do you have any new figure you can share?

thx a lot @GeoffNN

Comment on lines +16 to +17
requirements = ['pip:https://github.com/openopt/chop/archive/master.zip',
'pip:scikit-learn']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
requirements = ['pip:https://github.com/openopt/chop/archive/master.zip',
'pip:scikit-learn']
requirements = ['pip:https://github.com/openopt/chop/archive/master.zip']


if self.standardized:
scaler = StandardScaler()
X = scaler.fit_transform(X)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would simplify and always standardize.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok -- only for Covtype? It would make sense to do it for all non-sparse datasets, right?

@GeoffNN
Copy link
Contributor Author

GeoffNN commented Mar 28, 2021

I'm still working on the stochastic variant -- I'll share the figures for both ASAP

@agramfort
Copy link
Contributor

I don’t know about Madelon but the others should be already scaled

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants