EDIT: Solved -- it was the stupidity of using different training examples for the gradients vs the optimizer update.
OK this has me totally stumped.
I have a parameter vector, let's call it w.
w = [-1.34554319, 0.86998659, 0.52366061, 2.6723526 , 0.18756115, 0.16547382]
I use compute_gradients to figure out the gradients to w, it tells me the gradient is:
dw = [-0.0251517 , 0.88050844, 0.80362262, 0.14870925, 0.10019595, 1.33597524]
My learning rate is 0.1. Ergo:
w_new = w - 0.1 * dw
w_new = [-1.34302802, 0.78193575, 0.44329835, 2.65748168, 0.17754156, 0.0318763 ]
You can check the math yourself but it should check out. However, if I run the tensorflow code and evaluate the value of w_new, I get:
w_new_tf = [-1.27643258, 0.9212401 , 0.09922112, 2.55617223, 0.38039282, 0.15450044]
I honestly have no idea why it's doing this.
Edit: Let me provide you the exact code to show you why it doesn't work. It might be due to indexing, as you will see.
Here is the boilerplate starter code.
import numpy as np
import tensorflow as tf
max_item = 331922
max_user = 1581603
k = 6
np.random.seed(0)
_item_biases = np.random.normal(size=max_item)
np.random.seed(0)
_latent_items = np.random.normal(size=(max_item, k))
np.random.seed(0)
_latent_users = np.random.normal(size=(max_user, k))
item_biases = tf.Variable(_item_biases, name='item_biases')
latent_items = tf.Variable(_latent_items, name='latent_items')
latent_users = tf.Variable(_latent_users, name='latent_users')
input_data = tf.placeholder(tf.int64, shape=[3], name='input_data')
Here is the custom objective function.
def objective(data, lam, item_biases, latent_items, latent_users):
with tf.name_scope('indices'):
user = data[0]
rated_item = data[1]
unrated_item = data[2]
with tf.name_scope('input_slices'):
rated_item_bias = tf.gather(item_biases, rated_item, name='rated_item_bias')
unrated_item_bias = tf.gather(item_biases, unrated_item, name='unrated_item_bias')
rated_latent_item = tf.gather(latent_items, rated_item, name='rated_latent_item')
unrated_latent_item = tf.gather(latent_items, unrated_item, name='unrated_latent_item')
latent_user = tf.gather(latent_users, user, name='latent_user')
with tf.name_scope('bpr_opt'):
difference = tf.subtract(rated_item_bias, unrated_item_bias, 'bias_difference')
ld = tf.subtract(rated_latent_item, unrated_latent_item, 'latent_item_difference')
latent_difference = tf.reduce_sum(tf.multiply(ld, latent_user), name='latent_difference')
total_difference = tf.add(difference, latent_difference, name='total_difference')
with tf.name_scope('obj'):
obj = tf.sigmoid(total_difference, name='activation')
with tf.name_scope('regularization'):
reg = lam * tf.reduce_sum(rated_item_bias**2)
reg += lam * tf.reduce_sum(unrated_item_bias**2)
reg += lam * tf.reduce_sum(rated_latent_item**2)
reg += lam * tf.reduce_sum(unrated_latent_item**2)
reg += lam * tf.reduce_sum(latent_user**2)
with tf.name_scope('final'):
final_obj = -tf.log(obj) + reg
return final_obj
Here is some boilerplate code to actually minimize the function. At two points I do a sess.run call on the tf.Variables to see how the values have changed.
obj = objective(input_data, 0.05, item_biases, latent_items, latent_users)
optimizer = tf.train.GradientDescentOptimizer(0.1)
trainer = optimizer.minimize(obj)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
citem_biases, clatent_items, clatent_users = \
sess.run([item_biases, latent_items, latent_users])
print (clatent_users[1490103]) # [-1.34554319, 0.86998659, 0.52366061, 2.6723526 , 0.18756115, 0.16547382]
cvalues = sess.run([trainer, obj], feed_dict={input_data:[1490103, 278755, 25729]})
citem_biases, clatent_items, clatent_users = \
sess.run([item_biases, latent_items, latent_users])
print (clatent_users[1490103]) #[-1.27643258, 0.9212401 , 0.09922112, 2.55617223, 0.38039282, 0.15450044]
Finally, here is some code to actually get the gradients. These gradients are double checked against a hand-derived gradients so they are correct. Sorry for the ugliness of the code, it's a blatant copy & paste of another SO answer:
grads_and_vars = optimizer.compute_gradients(obj, tf.trainable_variables())
sess = tf.Session()
sess.run(tf.global_variables_initializer())
gradients_and_vars = sess.run(grads_and_vars, feed_dict={input_data:[1490103, 278830, 140306]})
print (gradients_and_vars[2][0]) #[-0.0251517 , 0.88050844, 0.80362262, 0.14870925, 0.10019595, 1.33597524]
You did not provide complete code, but I ran a similar example and it did work for me as it should. Here is my code:
Here is the output (obviously it will be a bit different each time, due to the random initialization of get_variable):
The last two lines are identical, as you would expect.