Skip to content

PCA power-iteration does one more iteration than asked (off-by-1) #1910

@jucor

Description

@jucor

Reporting here a subtle internal math bug for tracking only.
To be clear: this does not affect the convergence of the PCA: the PCA is still valid at the end once it converges :)

But, if you ask for N power iterations, the current clojure code does N+1 iterations instead. This is due to the way the loop is coded in clojure in:

(loop [iters iters start-vector start-vector last-eigval 0]
(let [product-vector (xtxr data start-vector)
eigval (matrix/length product-vector)
normed (matrix/normalise product-vector)]
(if (or (= iters 0) (= eigval last-eigval))
normed
(recur (dec iters) normed eigval))))))

The let statement is evaluated even when iters is 0.
And on line 55 we return normed upon termination, therefore the result of evaluating the let block.

If we wanted to fix it to do exactly iters iteration, and not iters+1, line 55 should return start-vector instead of normed.

I suggest we do NOT fix this this for now, until I have full clarity of how partial-pca is computed. It will most matter much when we use a small number of iteration.

I see that

(defn partial-pca
"This function takes in the rating matrix, the current pca and a set of row indices and
computes the partial pca off of those, returning a lambda that will take the latest PCA
and make the update on that in case there have been other mini batch updates since started"
[mat pca indices & {:keys [n-comps iters learning-rate]
:or {n-comps 2 iters 10 learning-rate 0.01}}]

defaults to 10 iterations, but I am unclear how often that default is used, knowing that
(def base-conv-update-graph
"Base of all conversation updates; handles default update opts and does named matrix updating"
{:opts' (plmb/fnk [opts]
"Merge in opts with the following defaults"
;; TODO Answer and resolve this question:
;; QUESTION Does it make senes to have the defaults here or in the config.edn or both duplicated?
(merge {:n-comps 2 ; does our code even generalize to others?
:pca-iters 100
:base-iters 100
:base-k 100
:max-k 5
:group-iters 100
;; These three in particular we should be able to tune quickly
:max-ptpts 100000
:max-cmts 10000
:group-k-buffer 4}

defines a default of 100.

As to the python port in #1893 , for now I am reproducing the behaviour bug-for-bug, see 47c217b .

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions