Momentum, RMSprop and Adam Optimizers

So, I’m having difficulty getting RMSprop and Adam to work.

I’ve correctly implemented Momentum as an optimizing algorithm, meaning that, comparing Gradient Descent with Momentum, the Cost goes down much faster using Momentum. The model Accuracy, for the same number of Epochs, is also higher for the test set if using Momentum.

Here is the code:

# only momentum elif name == 'momentum':                  # calculate momentum for every layer     for i in range(self.number_of_layers - 1):         self.v[f'dW{i}'] = beta1 * self.v[f'dW{i}'] + (1 - beta1) * self.gradients[f'dW{i}']         self.v[f'db{i}'] = beta1 * self.v[f'db{i}'] + (1 - beta1) * self.gradients[f'db{i}']                      # update parameters     for i in range(self.number_of_layers - 1):         self.weights[i] = self.weights[i] - self.learning_rate * self.v[f'dW{i}']         self.biases[i] = self.biases[i] - self.learning_rate * self.v[f'db{i}']  

I’ve tried everything I could come up with to try to implement both RMSprop and Adam, both to no success. Below the code. Any help on why it is not working would be much appreciated!

# only rms elif name == 'rms':                  # calculate rmsprop for every layer     for i in range(self.number_of_layers - 1):         self.s[f'dW{i}'] = beta2 * self.s[f'dW{i}'] + (1 - beta2) * self.gradients[f'dW{i}']**2          self.s[f'db{i}'] = beta2 * self.s[f'db{i}'] + (1 - beta2) * self.gradients[f'db{i}']**2                     # update parameters     for i in range(self.number_of_layers - 1):         self.weights[i] = self.weights[i] - self.learning_rate * self.gradients[f'dW{i}'] / (np.sqrt(self.s[f'dW{i}']) + epsilon)         self.biases[i] = self.biases[i] - self.learning_rate * self.gradients[f'db{i}'] / (np.sqrt(self.s[f'db{i}']) + epsilon) 
# adam optimizer elif name == 'adam':                  # counter     # this resets every time an epoch finishes     self.t += 1                # loop through layers     for i in range(self.number_of_layers - 1):                          # calculate v and s         self.v[f'dW{i}'] = beta1 * self.v[f'dW{i}'] + (1 - beta1) * self.gradients[f'dW{i}']         self.v[f'db{i}'] = beta1 * self.v[f'db{i}'] + (1 - beta1) * self.gradients[f'db{i}']         self.s[f'dW{i}'] = beta2 * self.s[f'dW{i}'] + (1 - beta2) * np.square(self.gradients[f'dW{i}'])         self.s[f'db{i}'] = beta2 * self.s[f'db{i}'] + (1 - beta2) * np.square(self.gradients[f'db{i}'])                          # bias correction         self.v1[f'dW{i}'] = self.v[f'dW{i}'] / (1 - beta1**self.t)         self.v1[f'db{i}'] = self.v[f'db{i}'] / (1 - beta1**self.t)         self.s1[f'dW{i}'] = self.s[f'dW{i}'] / (1 - beta2**self.t)         self.s1[f'db{i}'] = self.s[f'db{i}'] / (1 - beta2**self.t)                      # update parameters     for i in range(self.number_of_layers - 1):         self.weights[i] = self.weights[i] - self.learning_rate * np.divide(self.v1[f'dW{i}'], (np.sqrt(self.s1[f'dW{i}']) + epsilon))         self.biases[i] = self.biases[i] - self.learning_rate * np.divide(self.v1[f'db{i}'], (np.sqrt(self.s1[f'db{i}']) + epsilon))                  
# additional information # epsilon = 1e-8 # beta1 = 0.9 # beta2 = 0.999 
Add Comment
0 Answer(s)

Your Answer

By posting your answer, you agree to the privacy policy and terms of service.