CrossEntropyLoss loss function type problem

22 views Asked by At

I am trying to solve a classification problem but something is not working for me in the code. I have a network that receives 30 inputs and outputs 4 actions. Each action can receive an integer value from 0 to 4 (that is, receive a number between 0 and 4, a total of 5 options \ 5 classes)

I tried to implement this code but something is not working properly. I'm sure I have a problem with the dimensions Would appreciate help !! :)

This is my code:

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

# Load data from CSV file
file_path = '/content/30_1_output.csv'
df = pd.read_csv(file_path)

# Extract features and target outputs
X = df.iloc[:, :-4].values  # Assuming the last 4 columns are the target outputs
Y = df.iloc[:, -4:].values   # Extracting the last 4 columns as target outputs

# Map the values according to the specified mapping
Y = np.where(Y == -2, 0,  # If value is -2, map it to 1
            np.where(Y == -1, 1,  # If value is -1, map it to 2
                np.where(Y == 0, 2,  # If value is 0, map it to 3
                    np.where(Y == 1, 3,  # If value is 1, map it to 4
                        np.where(Y == 2, 4, Y)  # If value is 2, map it to 5
                    )
                )
            )
        )

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

# Convert to PyTorch tensors
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.long)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.long)

# Create datasets and dataloaders
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
test_dataset = TensorDataset(X_test_tensor, y_test_tensor)

# Adjust the batch sizes in DataLoader to match the batch size of the model outputs
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

# Define the neural network model
class NeuralNetwork(nn.Module):
    def __init__(self, input_size, num_actions):
        super(NeuralNetwork, self).__init__()
        self.layer1 = nn.Linear(input_size, 64*2)
        self.relu = nn.ReLU()
        self.layer2 = nn.Linear(64*2, 32*2)
        self.output_layer = nn.Linear(32*2, num_actions * 5)  # 5 probabilities for each action

    def forward(self, x):
        out = self.layer1(x)
        out = self.relu(out)
        out = self.layer2(out)
        out = self.relu(out)
        out = self.output_layer(out)
        out = torch.softmax(out.view(-1, 5), dim=1)  # Reshape output and apply softmax

        return out

# Initialize the model
input_size = X.shape[1]  # Number of input features
num_actions = Y.shape[1] # Number of actions
model = NeuralNetwork(input_size, num_actions)

# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
num_epochs = 30
for epoch in range(num_epochs):
    total_loss = 0
    for inputs, labels in train_loader:
        # Forward pass
        outputs = model(inputs)

        # Reshape outputs back to [batch_size, num_actions, 5]
        outputs = outputs.view(-1, num_actions, 5)
        # Take argmax along the last dimension to get the index of the class with the highest probability
        #outputs2 = torch.argmax(outputs, dim=2)
        #labels = labels.view(-1)

        loss = criterion(outputs, labels)

        # Backward pass and optimization
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        total_loss += loss.item()

    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {total_loss / len(train_loader):.4f}')

print("Training complete")

and this is the shape of the output and the labels: enter image description here

Thank you so much!!

this is the error I get:

RuntimeError: Expected target size [64, 5], got [64, 4]
1

There are 1 answers

0
Muhammed Yunus On

The label data Y is 4 columns:

Y = df.iloc[:, -4:].values # Extracting the last 4 columns as target outputs

whereas the model is outputting 5 columns:

out = torch.softmax(out.view(-1, 5), dim=1) # Reshape output and apply softmax

There needs to be a matching number of columns between Y and the model's prediction, otherwise they can't be directly compared and the loss can't be computed.

Either drop the redundant column from the model to make it exactly align with Y, or add the required column to Y to match model's output. The columns need to match in size and meaning, so you need to establish what the missing column represents, and where it should go.

If the data has 5 classes, and a sample can only be one of those at a time, then you can discard a class and represent the data using 4 classes (when all 4 classes are zero, it implies the 5th class). Alternatively, if you have 4 classes to begin with for a 5-class problem, you can append an extra/redundant column that explicitly flags the 5th class.