tensorflow: change in parameter weights is different than it should be based on gradients

Question

tensorflow: change in parameter weights is different than it should be based on gradients

62 views Asked by anon At 21 March 2018 at 08:21

EDIT: Solved -- it was the stupidity of using different training examples for the gradients vs the optimizer update.

OK this has me totally stumped.

I have a parameter vector, let's call it w.

w = [-1.34554319, 0.86998659, 0.52366061, 2.6723526 , 0.18756115, 0.16547382]

I use compute_gradients to figure out the gradients to w, it tells me the gradient is:

dw = [-0.0251517 , 0.88050844, 0.80362262, 0.14870925, 0.10019595, 1.33597524]

My learning rate is 0.1. Ergo:

w_new = w - 0.1 * dw

w_new = [-1.34302802, 0.78193575, 0.44329835, 2.65748168, 0.17754156, 0.0318763 ]

You can check the math yourself but it should check out. However, if I run the tensorflow code and evaluate the value of w_new, I get:

w_new_tf = [-1.27643258, 0.9212401 , 0.09922112, 2.55617223, 0.38039282, 0.15450044]

I honestly have no idea why it's doing this.

Edit: Let me provide you the exact code to show you why it doesn't work. It might be due to indexing, as you will see.

Here is the boilerplate starter code.

import numpy as np
import tensorflow as tf

max_item = 331922
max_user = 1581603
k = 6
np.random.seed(0)
_item_biases = np.random.normal(size=max_item)
np.random.seed(0)
_latent_items = np.random.normal(size=(max_item, k))
np.random.seed(0)
_latent_users = np.random.normal(size=(max_user, k))

item_biases = tf.Variable(_item_biases, name='item_biases')
latent_items = tf.Variable(_latent_items, name='latent_items')
latent_users = tf.Variable(_latent_users, name='latent_users')

input_data = tf.placeholder(tf.int64, shape=[3], name='input_data')

Here is the custom objective function.

def objective(data, lam, item_biases, latent_items, latent_users):  
    with tf.name_scope('indices'):
        user = data[0]
        rated_item = data[1]
        unrated_item = data[2]

    with tf.name_scope('input_slices'):
        rated_item_bias = tf.gather(item_biases, rated_item, name='rated_item_bias')
        unrated_item_bias = tf.gather(item_biases, unrated_item, name='unrated_item_bias')

        rated_latent_item = tf.gather(latent_items, rated_item, name='rated_latent_item')
        unrated_latent_item = tf.gather(latent_items, unrated_item, name='unrated_latent_item')

        latent_user = tf.gather(latent_users, user, name='latent_user')

    with tf.name_scope('bpr_opt'):
        difference = tf.subtract(rated_item_bias, unrated_item_bias, 'bias_difference')
        ld = tf.subtract(rated_latent_item, unrated_latent_item, 'latent_item_difference')
        latent_difference = tf.reduce_sum(tf.multiply(ld, latent_user), name='latent_difference')
        total_difference = tf.add(difference, latent_difference, name='total_difference')

    with tf.name_scope('obj'):        
        obj = tf.sigmoid(total_difference, name='activation')
    with tf.name_scope('regularization'):
        reg = lam * tf.reduce_sum(rated_item_bias**2)
        reg += lam * tf.reduce_sum(unrated_item_bias**2) 
        reg += lam * tf.reduce_sum(rated_latent_item**2) 
        reg += lam * tf.reduce_sum(unrated_latent_item**2)
        reg += lam * tf.reduce_sum(latent_user**2)

    with tf.name_scope('final'):
        final_obj = -tf.log(obj) + reg


    return final_obj

Here is some boilerplate code to actually minimize the function. At two points I do a sess.run call on the tf.Variables to see how the values have changed.

obj = objective(input_data, 0.05, item_biases, latent_items, latent_users)

optimizer = tf.train.GradientDescentOptimizer(0.1)
trainer = optimizer.minimize(obj)
sess = tf.Session()
sess.run(tf.global_variables_initializer())


citem_biases, clatent_items, clatent_users = \
    sess.run([item_biases, latent_items, latent_users])

print (clatent_users[1490103]) # [-1.34554319, 0.86998659, 0.52366061, 2.6723526 , 0.18756115, 0.16547382]

cvalues = sess.run([trainer, obj], feed_dict={input_data:[1490103, 278755, 25729]})
citem_biases, clatent_items, clatent_users = \
    sess.run([item_biases, latent_items, latent_users]) 
print (clatent_users[1490103]) #[-1.27643258,  0.9212401 ,  0.09922112,  2.55617223,  0.38039282, 0.15450044]

Finally, here is some code to actually get the gradients. These gradients are double checked against a hand-derived gradients so they are correct. Sorry for the ugliness of the code, it's a blatant copy & paste of another SO answer:

grads_and_vars = optimizer.compute_gradients(obj, tf.trainable_variables())
sess = tf.Session()
sess.run(tf.global_variables_initializer())
gradients_and_vars = sess.run(grads_and_vars, feed_dict={input_data:[1490103, 278830, 140306]})
print (gradients_and_vars[2][0]) #[-0.0251517 ,  0.88050844,  0.80362262,  0.14870925,  0.10019595, 1.33597524]

Original Q&A

There are 2 answers

**Zvika** · Answer 1 · 2018-03-21T08:57:15+00:00

You did not provide complete code, but I ran a similar example and it did work for me as it should. Here is my code:

with tf.Graph().as_default():
  ph = tf.constant([1., 2., 3.])
  v = tf.get_variable('v', (3,))
  loss = tf.square(ph-v)
  optimizer = tf.train.GradientDescentOptimizer(0.1)
  trainer = optimizer.minimize(loss)
  gradients = optimizer.compute_gradients(loss)[0][0]

  with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    grad_mat = sess.run(gradients)
    v_0 = sess.run(v)
    sess.run(trainer)
    v_1 = sess.run(v)

    print(grad_mat)
    print(v_0)
    print(v_0 - 0.1*grad_mat)
    print(v_1)

Here is the output (obviously it will be a bit different each time, due to the random initialization of get_variable):

[-2.01746035 -5.61006117 -6.7561307 ]
[-0.00873017 -0.80503058 -0.37806535]
[ 0.19301586 -0.24402446  0.29754776]
[ 0.19301586 -0.24402446  0.29754776]

The last two lines are identical, as you would expect.

**anon** · Answer 2 · 2018-03-21T19:32:07+00:00

anon On 21 March 2018 at 19:32

Problem: I was feeding different inputs for gradients vs. the trainer. Solution: Feed the same input.

TechQA.

tensorflow: change in parameter weights is different than it should be based on gradients

There are 2 answers

Related Questions in PYTHON

Related Questions in TENSORFLOW

Related Questions in TENSORFLOW-GRADIENT

Popular Questions

Trending Questions