Is there a way to only update some of the variables during an eager execution update step? Consider this minimal working example:
import tensorflow as tf
tf.enable_eager_execution()
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
x = tf.Variable([1.0, 2.0])
def train(x):
with tf.GradientTape() as tape:
loss = x[0]**2 + x[1]**2 + 1/(x[0]+x[1])
variables = [x]
grads = tape.gradient(loss, variables)
optimizer.apply_gradients(zip(grads, variables))
for _ in range(2000):
train(x)
print(x.numpy())
Which converges to [0.5, 0.5]
. I'd like to fix the value of x[0]
to it's initial value, while keeping everything else the way it is. What I've tried so far:
x[0].assign(1.0)
operation to the training step which grows the graph unnecessarilyvariables = [x[:-1]]
which gives ValueError: No gradients provided for any variable: ['tf.Tensor([1.], shape=(1,), dtype=float32)']
grads = [grads[0][1:]]
which gives tensorflow.python.framework.errors_impl.InvalidArgumentError: var and delta do not have the same shape[2] [1] [Op:ResourceApplyGradientDescent]
TypeError: 'NoneType' object is not subscriptable
For this MWE I can easily use two separate variables, but I'm interested in the generic case where I only want to update a known slice of an array.
You can set the gradient of the index you don't want to update to 0. In the code snippet bellow, the mask
tensor indicates which elements we want to update (values 1
), and which elements we don't want to update (values 0
).
import tensorflow as tf
tf.enable_eager_execution()
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
x = tf.Variable([1.0, 2.0])
mask = tf.constant([0.0, 1.0])
def train(x):
with tf.GradientTape() as tape:
loss = x[0]**2 + x[1]**2 + 1/(x[0]+x[1])
variables = [x]
grads = tape.gradient(loss, variables) * mask
optimizer.apply_gradients(zip(grads, variables))
for _ in range(100):
train(x)
print(x.numpy())
Another possible solution for your problem can be to stop the gradient on operations that x[0]
is dependent on. For example:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
x = tf.Variable([1.0, 2.0])
def train(x):
with tf.GradientTape() as tape:
loss = tf.stop_gradient(x[0])**2 + x[1]**2 + 1/(tf.stop_gradient(x[0])+x[1])
variables = [x]
grads = tape.gradient(loss, variables)
optimizer.apply_gradients(zip(grads, variables))
for _ in range(100):
train(x)
print(x.numpy())
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments