If I run optim.checkgrad, it fails, because the tensors returned by SoftMaxTree.parmeters are not actually the underlying smt.weight, and smt.gradWeigth tensor.
If I do SoftMaxTree.parameters = nil, thne grad check works, for both gradInput and gradWeight.
But SoftMaxTree.parameters looks like a lot of work went into it. So I guess it is there for some reason, and I'm just not understanding how it should be used?