Skip to content

Commit 827f90b

Browse files
eknagfacebook-github-bot
authored andcommitted
changed to proper Xavier initialization, existing implementation was … (#1927)
Summary: …resulting in a large negative bias, which was killing all gradients through the following relu. https://paperswithcode.com/method/xavier-initialization Pull Request resolved: #1927 Reviewed By: davidberard98 Differential Revision: D49754019 Pulled By: xuzhao9 fbshipit-source-id: 436676afed9bcc0f464cd1b25465444a98a52b5a
1 parent 3f11b81 commit 827f90b

File tree

1 file changed

+1
-2
lines changed

1 file changed

+1
-2
lines changed

torchbenchmark/models/dlrm/dlrm_s_pytorch.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -149,8 +149,7 @@ def create_mlp(self, ln, sigmoid_layer):
149149
mean = 0.0 # std_dev = np.sqrt(variance)
150150
std_dev = np.sqrt(2 / (m + n)) # np.sqrt(1 / m) # np.sqrt(1 / n)
151151
W = np.random.normal(mean, std_dev, size=(m, n)).astype(np.float32)
152-
std_dev = np.sqrt(1 / m) # np.sqrt(2 / (m + 1))
153-
bt = np.random.normal(mean, std_dev, size=m).astype(np.float32)
152+
bt = np.zeros(m).astype(np.float32) # see upstream PR at https://github.com/facebookresearch/dlrm/pull/358
154153
# approach 1
155154
LL.weight.data = torch.tensor(W, requires_grad=True)
156155
LL.bias.data = torch.tensor(bt, requires_grad=True)

0 commit comments

Comments
 (0)