Defining a gradient with respect to a subtensor in Theano

jdbrody Published at Dev

jdbrody

I have what is conceptually a simple question about Theano but I haven't been able to find the answer (I'll confess upfront to not really understanding how shared variables work in Theano, despite many hours with the tutorials).

I'm trying to implement a "deconvolutional network"; specifically I have a 3-tensor of inputs (each input is a 2D image) and a 4-tensor of codes; for the ith input codes[i] represents a set of codewords which together code for input i.

I've been having a lot of trouble figuring out how to do gradient descent on the codewords. Here are the relevant parts of my code:

idx = T.lscalar()
pre_loss_conv = conv2d(input = codes[idx].dimshuffle('x', 0, 1,2),
                       filters = dicts.dimshuffle('x', 0,1, 2),
                       border_mode = 'valid')
loss_conv = pre_loss_conv.reshape((pre_loss_conv.shape[2], pre_loss_conv.shape[3]))
loss_in = inputs[idx]
loss = T.sum(1./2.*(loss_in - loss_conv)**2) 

del_codes = T.grad(loss, codes[idx])
delc_fn = function([idx], del_codes)
train_codes = function([input_index], loss, updates = [
    [codes, T.set_subtensor(codes[input_index], codes[input_index] - 
                            learning_rate*del_codes[input_index])     ]])

(here codes and dicts are shared tensor variables). Theano is unhappy with this, specifically with defining

del_codes = T.grad(loss, codes[idx])

The error message I'm getting is: theano.gradient.DisconnectedInputError: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: Subtensor{int64}.0

I'm guessing that it wants a symbolic variable instead of codes[idx]; but then I'm not sure how to get everything connected to get the intended effect. I'm guessing I'll need to change the final line to something like

learning_rate*del_codes)     ]])

Can someone give me some pointers as to how to define this function properly? I think I'm probably missing something basic about working with Theano but I'm not sure what.

Thanks in advance!

-Justin

Update: Kyle's suggestion worked very nicely. Here's the specific code I used

current_codes = T.tensor3('current_codes')
current_codes = codes[input_index]
pre_loss_conv = conv2d(input = current_codes.dimshuffle('x', 0, 1,2),
                       filters = dicts.dimshuffle('x', 0,1, 2),
                       border_mode = 'valid')
loss_conv = pre_loss_conv.reshape((pre_loss_conv.shape[2], pre_loss_conv.shape[3]))
loss_in = inputs[input_index]
loss = T.sum(1./2.*(loss_in - loss_conv)**2)  

del_codes = T.grad(loss, current_codes)
train_codes = function([input_index], loss)
train_dicts = theano.function([input_index], loss, updates = [[dicts, dicts - learning_rate*del_dicts]])
codes_update = ( codes, T.set_subtensor(codes[input_index], codes[input_index] - learning_rate*del_codes) )
codes_update_fn = function([input_index], updates = [codes_update])

for i in xrange(num_inputs):
     current_loss = train_codes(i)
     codes_update_fn(i)

Kyle Kastner

To summarize the findings:

Assigning grad_var = codes[idx], then making a new variable such as: subgrad = T.set_subtensor(codes[input_index], codes[input_index] - learning_rate*del_codes[input_index])

Then calling train_codes = function([input_index], loss, updates = [[codes, subgrad]])

seemed to do the trick. In general, I try to make variables for as many things as possible. Sometimes tricky problems can arise from trying to do too much in a single statement, plus it is hard to debug and understand later! Also, in this case I think theano needs a shared variable, but has issues if the shared variable is created inside the function that requires it.

Glad this worked for you!

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-03-18

Comments

0 comments

Defining a gradient with respect to a subtensor in Theano

Defining a gradient with respect to a subtensor in Theano

Loopback Error: connect ECONNREFUSED 127.0.0.1:3306 (MAMP)

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

pump.io port in URL

How to import an asset in swift using Bundle.main.path() in a react-native native module

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

Emulator wrong screen resolution in Android Studio 1.3

3D Touch Peek Swipe Like Mail

Double spacing in rmarkdown pdf

Svchost high CPU from Microsoft.BingWeather app errors

How to how increase/decrease compared to adjacent cell

Using Response.Redirect with Friendly URLS in ASP.NET

java.lang.NullPointerException: Cannot read the array length because "<local3>" is null

BigQuery - concatenate ignoring NULL

How to fix "pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'" using YOLOv3?

ngClass error (Can't bind ngClass since it isn't a known property of div) in Angular 11.0.3

Can a 32-bit antivirus program protect you from 64-bit threats

Make a B+ Tree concurrent thread safe

Bootstrap 5 Static Modal Still Closes when I Click Outside

Vector input in shiny R and then use it

Assembly definition can't resolve namespaces from external packages