Defining a gradient with respect to a subtensor in Theano

jdbrody

I have what is conceptually a simple question about Theano but I haven't been able to find the answer (I'll confess upfront to not really understanding how shared variables work in Theano, despite many hours with the tutorials).

I'm trying to implement a "deconvolutional network"; specifically I have a 3-tensor of inputs (each input is a 2D image) and a 4-tensor of codes; for the ith input codes[i] represents a set of codewords which together code for input i.

I've been having a lot of trouble figuring out how to do gradient descent on the codewords. Here are the relevant parts of my code:

idx = T.lscalar()
pre_loss_conv = conv2d(input = codes[idx].dimshuffle('x', 0, 1,2),
                       filters = dicts.dimshuffle('x', 0,1, 2),
                       border_mode = 'valid')
loss_conv = pre_loss_conv.reshape((pre_loss_conv.shape[2], pre_loss_conv.shape[3]))
loss_in = inputs[idx]
loss = T.sum(1./2.*(loss_in - loss_conv)**2) 

del_codes = T.grad(loss, codes[idx])
delc_fn = function([idx], del_codes)
train_codes = function([input_index], loss, updates = [
    [codes, T.set_subtensor(codes[input_index], codes[input_index] - 
                            learning_rate*del_codes[input_index])     ]])

(here codes and dicts are shared tensor variables). Theano is unhappy with this, specifically with defining

del_codes = T.grad(loss, codes[idx])

The error message I'm getting is: theano.gradient.DisconnectedInputError: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: Subtensor{int64}.0

I'm guessing that it wants a symbolic variable instead of codes[idx]; but then I'm not sure how to get everything connected to get the intended effect. I'm guessing I'll need to change the final line to something like

learning_rate*del_codes)     ]])

Can someone give me some pointers as to how to define this function properly? I think I'm probably missing something basic about working with Theano but I'm not sure what.

Thanks in advance!

-Justin

Update: Kyle's suggestion worked very nicely. Here's the specific code I used

current_codes = T.tensor3('current_codes')
current_codes = codes[input_index]
pre_loss_conv = conv2d(input = current_codes.dimshuffle('x', 0, 1,2),
                       filters = dicts.dimshuffle('x', 0,1, 2),
                       border_mode = 'valid')
loss_conv = pre_loss_conv.reshape((pre_loss_conv.shape[2], pre_loss_conv.shape[3]))
loss_in = inputs[input_index]
loss = T.sum(1./2.*(loss_in - loss_conv)**2)  

del_codes = T.grad(loss, current_codes)
train_codes = function([input_index], loss)
train_dicts = theano.function([input_index], loss, updates = [[dicts, dicts - learning_rate*del_dicts]])
codes_update = ( codes, T.set_subtensor(codes[input_index], codes[input_index] - learning_rate*del_codes) )
codes_update_fn = function([input_index], updates = [codes_update])

for i in xrange(num_inputs):
     current_loss = train_codes(i)
     codes_update_fn(i)
Kyle Kastner

To summarize the findings:

Assigning grad_var = codes[idx], then making a new variable such as: subgrad = T.set_subtensor(codes[input_index], codes[input_index] - learning_rate*del_codes[input_index])

Then calling train_codes = function([input_index], loss, updates = [[codes, subgrad]])

seemed to do the trick. In general, I try to make variables for as many things as possible. Sometimes tricky problems can arise from trying to do too much in a single statement, plus it is hard to debug and understand later! Also, in this case I think theano needs a shared variable, but has issues if the shared variable is created inside the function that requires it.

Glad this worked for you!

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Defining a function with a loop in Theano

Tensorflow gradient with respect to matrix

theano (python): elementwise gradient

Defining grad of a custom Op theano

Theano gradient of sparse matrix multiplication

Tensorflow how to compute the gradient of output with respect to the input?

tf.gradient with respect to looping variable

How to compute gradient of the error with respect to the model input?

Gradient with respect to the parameters of a specific layer in Pytorch

Gradient descent cost plot with respect to Epoch issue

Gradient from Theano expression for filter visualization in Keras

theano hard_sigmoid() breaks gradient descent

Theano - how to override gradient for part of op graph

Theano gradient calculation creates float64

Defining custom gradient as a class method in Tensorflow

How do I get the gradient of a keras model with respect to its inputs?

Keras with TF backend: get gradient of outputs with respect to inputs

Gradient Layer covers all items in Collection View with no respect of spacing

Get gradient of layer before activation with respect to input picture

How to avoid that Theano computing gradient going toward NaN

How to get subtensor by irregular index?

Subtensor of a Tensorflow tensor (C++)

Reparametrization in tensorflow-probability: tf.GradientTape() doesn't calculate the gradient with respect to a distribution's mean

How to calculate the gradient of the Kullback-Leibler divergence of two tensorflow-probability distributions with respect to the distribution's mean?

Can Pytorch autograd compute gradient with respect to only one parameter in neural network?

How to interleave assign 4 subtensor to a larger tensor?

Computing the gradients of new state (of the RNN) with respect to model parameters, (including CNN for inputs), in tensorflow; tf.gradient return None

AttributeError in Theano

Indexing in Theano