Skip to content

Conversation

@smarie
Copy link

@smarie smarie commented Mar 25, 2019

Fixes #103 (This is the same fix that was proposed for liblinear.)

On windows platforms, liblinear and libsvm have strong convergence issues because of the way random numbers are generated: max random number in Windows is 15 bits (even on 64 bit windows), which is 32767, while max random number in linux+GCC is 31 bits (resp. 63 bits in 64 bits systems I guess) so that's 2147483647 (resp 9223372036854775807).

If I understand correctly, these random numbers are used in the coordinate gradient descent algorithms, to find the next coordinate to act upon. When the dimensionality (e.g. number of samples) is large, the random number generator on windows has a hard time to explore all dimensions.

This is a known bug documented in liblinear FAQ (strangely enough, not the libsvm FAQ) but the proposed workaround was wrong.

I made a patch for this years ago in liblinear, that was approved by several users yet never merged: cjlin1/liblinear#28 .

Since another user reported it on libsvm as #103, here is the corresponding PR.
Note that I am proposing this simultaneously to the scikit-learn project (python), as they observed some convergence issues. Some of them might be due to this platform-related bug ?

…hecks are now static. Regenerated all binaries.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Possible bug: the random value generator used in svm_binary_svc_probability() function will not work well when training data size is large.

1 participant