Backpropagation and gradient descent with python

47 views Asked by At

I am new to gradient descent and I'm completely lost on the exercise below. The first part is an explanation with a simple example. Here is that example:

When training the model, we want to find parameters (denoted as Θ ) that minimize the total loss across all training examples:

Θ=argminΘ (Θ). To do this, we will iteratively reduce the error by updating the parameters in the direction that incrementally lowers the loss function. This algorithm is called gradient descent. The most naive application of gradient descent consists of taking the derivative of the loss function. Let us see how to do this.

As a toy example, say that we are interested in differentiating the function =2⊤ with respect to the column vector . To start, let us create the variable x and assign it an initial value.

Here is the code:

x = torch.arange(4.0)
x.requires_grad_(True) 
x.grad 
y = 2 * torch.dot(x, x)
y.backward()
x.grad
#checking if gradient calculated correctly
x.grad == 4 * x

And now, based on the above I have to solve this:

Let ()=sin(). Plot () and and (), where the latter is computed without exploiting that ′()=cos().

x = np.linspace(-np.pi, np.pi, 100) x = torch.tensor(x, requires_grad=True) y = torch.sin(x)

...and now what?

I tried:

y.backward() x.grad

but I'm getting an error that y is not a scalar value.

I need to pass these assertions: assert torch.allclose(x.grad[10].float(), torch.Tensor([-0.8053]), rtol=1e-2) assert torch.allclose(x.grad[50].float(), torch.Tensor([0.9995]), rtol=1e-2)

1

There are 1 answers

0
Karl On

By default, pytorch expects you to call backward on a scalar value. This is why if you call y.backward(), you get the error grad can be implicitly created only for scalar outputs.

One solution is to aggregate y with a sum operation, which doesn't affect downstream gradients.

x = np.linspace(-np.pi, np.pi, 100) 
x = torch.tensor(x, requires_grad=True) 
y = torch.sin(x)
loss = y.sum()
loss.backward()

You can also call backward on a vector value if you provide an upstream gradient vector.

x = np.linspace(-np.pi, np.pi, 100) 
x = torch.tensor(x, requires_grad=True) 
y = torch.sin(x)
grad_vector = torch.ones_like(y)
y.backward(grad_vector)

Both methods pass the assertions you provided.