Why does the Keras relu function in R not work in the middle layer?

82 views Asked by At

Sorry for asking this silly question. I am experimenting with the Keras framework, and due to convergence issues in a much more involved set-up, I am now proceeding step-by-step.

I set up a very simple 1 node neural net with relu. Depending on how I set it up, however, the relu behaves as expected, or incorrectly as a linear identity mapping.

Solution 1: input node -> identity pass through to node with relu activation -> identity pass through to output node [black curve in picture below]

Solution 2: input node -> identity pass through to output node with relu activation [red = blue curve in picture below]

Solution 3: input node -> identity pass through -> relu activation -> identity pass through to output node [blue = red curve in picture below]

Any clue as to why solution 1 does not work? [red and blue curve overlap in the output picture below]

I find it worrying that the RELU function functions differently if put into the network at different position or in different ways.

NB: GELU/SIGMOID/etc do not seem to be affected by this issue; just set mm = "sigmoid" or mm = "gelu" below.

#### load libraries
library(tensorflow)
library(keras)

#### define a simple test grid
x = as_tensor(-5+10*(1:1e3)/1e3, dtype = tf$float32)
#### direct pass through of the input to output
dum1 = list(matrix(1,1,1), as.array(0, dim = 1))
mm = "relu"

#### does not work as planned; yields linear, not RELU ####
model <- keras_model_sequential(input_shape = c(1, 1)) %>%
  layer_flatten() %>%
  layer_dense(1, activation = mm, weights = dum1) %>%
  layer_dense(1, weights = dum1)
plot(x,predict(model, x), type = "l", col = "black")

#### works as planned ####
model <- keras_model_sequential(input_shape = c(1, 1)) %>%
  layer_flatten() %>%
  layer_dense(1, activation = mm, weights = dum1)
lines(x,predict(model, x), type = "l", col = "red")

#### works as planned ####
model <- keras_model_sequential(input_shape = c(1, 1)) %>%
  layer_flatten() %>%
  layer_activation_relu() %>%
  layer_dense(1, weights = dum1)
lines(x,predict(model, x), type = "l", col = "blue")

output picture of this code; red and blue overlap

I googled for different answers and manuals to no avail. Above is my issue stripped to the bear essentials.

1

There are 1 answers

5
mrk On

For all weights being 1, this means that the output of the neuron (before ReLU) is just the sum of the inputs.

Now, let's consider the effect of ReLU on this sum of inputs:

  • If the sum of inputs is positive or zero, ReLU has no effect, and the output remains the same (i.e., the sum of the inputs).
  • If the sum of inputs is negative, ReLU will set the output to zero. Since the sum of the inputs can be both positive and negative, the overall effect of the ReLU activation is that it will "activate" any negative part of the sum of inputs to zero, and leave the positive part unchanged.

Given this behavior, if the sum of the inputs (which is the sum of x, and since some x are negative) is negative, ReLU will set it to zero, but if the sum is positive, ReLU will leave it unchanged.

Hence, with weights set to 1, the ReLU activation in this scenario behaves as a linear transformation for positive or zero inputs and sets negative inputs to zero, which might give the appearance of a linear response across the range of input values.

Note: I do not have R on my computer, could you check whether this is the reason, by monitoring the sum of your inputs?