I am trying to implement 2D Neural Cellular Automaton (NCA) using pytorch.
MAIN ISSUE: weights don't change between learning iterations, so the model doesn't 'progress' and stays in the state it was initialized.
The main script seems to work properly (not entirely sure how correct it is though), but I just can't make the training work.
Some technical details: my goal is to create a NCA that learns the 'rules' for recreating some image (numpy array) only with information about the cell and it's 4 neighbors. So the model cannot get information from all grid at ones. For certain reasons I want to do it without convolution.
Idea is to have a network that performs update for every cell on the grid and (after it updated all cells) improves itself based on some dissimilarity measure between original and generated (updated) array.
Code:
import numpy as np
import torch
from torch import nn
import torch.optim as optim
class NeuralNetwork(nn.Module):
def __init__(self):
super().__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(5, 32),
nn.ReLU(),
nn.Linear(32, 2)
)
def forward(self, x):
x = self.flatten(x)
x = self.linear_relu_stack(x)
return x
# cellular automaton update function
def update_grid(grid, model):
dsize = grid.shape[0]
domain_next = grid.copy()
for i in range(1, dsize - 1):
for j in range(1, dsize - 1):
inp = [grid[i][j], grid[i + 1][j], grid[i - 1][j],
grid[i][j + 1], grid[i][j - 1]]
out = model(torch.tensor([inp], dtype=torch.float32))
domain_next[i][j] = torch.argmax(out).item()
return domain_next
if __name__ == '__main__':
dsize = 8
np.random.seed(42)
domain = np.random.randint(2, size=(dsize, dsize))
init_domain = domain.copy()
#torch.manual_seed(43)
model = NeuralNetwork()
optimizer = optim.SGD(model.parameters(), lr=0.01)
# training loop
num_train_iter = 100
for _ in range(num_train_iter):
optimizer.zero_grad()
# create domain based on rules from NN
domain_next = update_grid(domain, model)
domain_tensor = torch.tensor(domain, dtype=torch.float32, requires_grad=True)
domain_next_tensor = torch.tensor(domain_next, dtype=torch.float32, requires_grad=True)
# measure of how close generated image is to the original
similarity = torch.mean(torch.abs(domain_next_tensor - domain_tensor))
# to make it possible to minize, loss is in some sense...
# an inverse of similarity
loss = 1 - similarity
loss = torch.mean(torch.abs(domain_next_tensor - domain_tensor))
loss.backward()
optimizer.step()
if (_ + 1) % 10 == 0:
print(f'Iteration {_ + 1}, Loss: {loss.item()}')
# evaluation phase
model.eval()
num_iterations = 10
for _ in range(num_iterations):
domain = update_grid(domain, model)
Output:
Iteration 10, Loss: 0.1666666716337204
Iteration 20, Loss: 0.1666666716337204
Iteration 30, Loss: 0.1666666716337204
Iteration 40, Loss: 0.1666666716337204
Iteration 50, Loss: 0.1666666716337204
Iteration 60, Loss: 0.1666666716337204
Iteration 70, Loss: 0.1666666716337204
Iteration 80, Loss: 0.1666666716337204
Iteration 90, Loss: 0.1666666716337204
Iteration 100, Loss: 0.1666666716337204
What I've tried: changing parameters like learning rate, number of iterations, etc; changing loss function, optimizer; changing topology; restructuring the code and a lot of other small things for which I don't have a reasoning.
Looks like your code updates the domain grid in the
update_gridfunction, which is okay. BUT, this update is not a part of the computational graph that PyTorch uses to backpropagate the gradients. Soooo, even though you callloss.backward(), it doesn't have any effect on the model parameters.