tf.GradientTape giving None gradient while writing custom training loop

Al Shahreyaj

I'm trying to write a custom training loop. Here is a sample code of what I'm trying to do. I have two training parameter and one parameter is updating another parameter. See the code below:

x1 = tf.Variable(1.0, dtype=float)
x2 = tf.Variable(1.0, dtype=float)

with tf.GradientTape() as tape:
    n = x2 + 4
    x1.assign(n)
    x = x1 + 1
    y = x**2
    val = tape.gradient(y, [x1, x2])
    for v in val:
        print(v)

and the output is

tf.Tensor(12.0, shape=(), dtype=float32)
None

It seems like GradientTape is not watching the first(x2) parameter. Both parameter is tf.Variable type, so GradientTape should watch both the parameter. I also tried tape.watch(x2), which is also not working. Am I missing something?

AloneTogether

Check the docs regarding a gradient of None. To get the gradients for x1, you have to track x with tape.watch(x):

x1 = tf.Variable(1.0, dtype=float)
x2 = tf.Variable(1.0, dtype=float)

with tf.GradientTape() as tape:
    n = x2 + 4
    x1.assign(n)
    x = x1 + 1
    tape.watch(x)
    y = x**2

dv0, dv1 = tape.gradient(y, [x1, x2])
print(dv0)
print(dv1)

However, regarding x2, the output y is not connected to x2 at all, since x1.assign(n) does not seem to be tracked and that is why the gradient is None. This is consistent with the docs:

State stops gradients. When you read from a stateful object, the tape can only observe the current state, not the history that lead to it.

A tf.Tensor is immutable. You can't change a tensor once it's created. It has a value, but no state. All the operations discussed so far are also stateless: the output of a tf.matmul only depends on its inputs.

A tf.Variable has internal state—its value. When you use the variable, the state is read. It's normal to calculate a gradient with respect to a variable, but the variable's state blocks gradient calculations from going farther back

If, for example, you do something like this:

x1 = tf.Variable(1.0, dtype=float)
x2 = tf.Variable(1.0, dtype=float)

with tf.GradientTape() as tape:
    n = x2 + 4
    x1 = n
    x = x1 + 1
    tape.watch(x)
    y = x**2 

dv0, dv1 = tape.gradient(y, [x1, x2])

It should work.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

tf.GradientTape giving None gradient

None gradients from GradientTape inside a batch training loop

when to use tf.GradientTape during writing a custom loss function

tf.GradientTape with outer product returns None

Gradient of tf.floor is None

Tensorflow gradient always gives None when using GradientTape

Custom training loop made with tf.Module doesn't learn

How to use the tf.keras.layers.BatchNormalization() in custom training loop?

Logging tf.variable during custom training loop

Training logistic regression with tf.GradientTape() can't converge

Eager Execution, tf.GradientTape only returns None

tf.GradientTape() returns None value for my neural network function

tf.reshape is not giving ?(None) for first element

Why does training using tf.GradientTape in tensorflow 2 have different behavior to training using fit API?

tf.while_loop() in call() of custom layer in TensorFlow 2

tf2.0: Gradient Tape returns None gradient in RNN model

Training a model with while loop

Reparametrization in tensorflow-probability: tf.GradientTape() doesn't calculate the gradient with respect to a distribution's mean

How can I use while_loop and tf.layers.batch_normalization for training?

How should Exponential Moving Average be used in custom TF2.4 training loop

Correct way to apply gradients in TF2 custom training loop with multiple Keras models

Writing a While Loop in Scheme

How excute custom gradient with tf.multiply?

while loop giving unexpected results

while loop not giving correct results

while loop is not giving expected results

My Python for loop is giving the result None

tf.keras get computed gradient during training

Variable tf.Variable has 'None' for gradient in TensorFlow Probability