VGG16 - Different shape between R and Python. How to deal with that?

51 views Asked by At

I am trying to translate a Python Project with Keras to R. However, I stumbled on a strange issue with the shapes. Below you see the VGG16 model with (None,3,224,224) for R and with (None, 224, 224, 3) for Python. What can I do about that?

I have tried to simply switch the dimensions to the R shape, but this gives an error with the MaxPooling

ShapeError

ValueError: Input 0 of layer "vgg16" is incompatible with the layer: expected shape=(None, 3, 224, 224), found shape=(None, 224, 224, 3)

Tried (not worked, because of MaxPoolingShape NHWC)

test = aperm(reshaped_img, c(1,4,2,3))
--> "Error: Default MaxPoolingOp only supports NHWC on device type CPU"

Input_shape (not worked)

Using the input_shape parameter when loading the model does prevent the model from loading, because of mismatched shape. Found in the Documentation

R Code

model <- application_vgg16(weights="imagenet", include_top=TRUE)
summary(model)

Gives me:

Model: "vgg16"
________________________________________________________________________________
 Layer (type)                       Output Shape                    Param #     
================================================================================
 input_3 (InputLayer)               [(None, 3, 224, 224)]           0           
 block1_conv1 (Conv2D)              (None, 64, 224, 224)            1792        
 block1_conv2 (Conv2D)              (None, 64, 224, 224)            36928       
 block1_pool (MaxPooling2D)         (None, 64, 112, 112)            0           
 block2_conv1 (Conv2D)              (None, 128, 112, 112)           73856       
 block2_conv2 (Conv2D)              (None, 128, 112, 112)           147584      
 block2_pool (MaxPooling2D)         (None, 128, 56, 56)             0           
 block3_conv1 (Conv2D)              (None, 256, 56, 56)             295168      
 block3_conv2 (Conv2D)              (None, 256, 56, 56)             590080      
 block3_conv3 (Conv2D)              (None, 256, 56, 56)             590080      
 block3_pool (MaxPooling2D)         (None, 256, 28, 28)             0           
 block4_conv1 (Conv2D)              (None, 512, 28, 28)             1180160     
 block4_conv2 (Conv2D)              (None, 512, 28, 28)             2359808     
 block4_conv3 (Conv2D)              (None, 512, 28, 28)             2359808     
 block4_pool (MaxPooling2D)         (None, 512, 14, 14)             0           
 block5_conv1 (Conv2D)              (None, 512, 14, 14)             2359808     
 block5_conv2 (Conv2D)              (None, 512, 14, 14)             2359808     
 block5_conv3 (Conv2D)              (None, 512, 14, 14)             2359808     
 block5_pool (MaxPooling2D)         (None, 512, 7, 7)               0           
 flatten (Flatten)                  (None, 25088)                   0           
 fc1 (Dense)                        (None, 4096)                    102764544   
 fc2 (Dense)                        (None, 4096)                    16781312    
 predictions (Dense)                (None, 1000)                    4097000     
================================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
________________________________________________________________________________

Python


model = VGG16()
model = Model(inputs=model.inputs, outputs=model.output)

Gives me:

Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_3 (InputLayer)        [(None, 224, 224, 3)]     0         
                                                                 
 block1_conv1 (Conv2D)       (None, 224, 224, 64)      1792      
                                                                 
 block1_conv2 (Conv2D)       (None, 224, 224, 64)      36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, 112, 112, 64)      0         
                                                                 
 block2_conv1 (Conv2D)       (None, 112, 112, 128)     73856     
                                                                 
 block2_conv2 (Conv2D)       (None, 112, 112, 128)     147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, 56, 56, 128)       0         
                                                                 
 block3_conv1 (Conv2D)       (None, 56, 56, 256)       295168    
                                                                 
 block3_conv2 (Conv2D)       (None, 56, 56, 256)       590080    
                                                                 
 block3_conv3 (Conv2D)       (None, 56, 56, 256)       590080    
                                                                 
 block3_pool (MaxPooling2D)  (None, 28, 28, 256)       0         
                                                                 
 block4_conv1 (Conv2D)       (None, 28, 28, 512)       1180160   
                                                                 
 block4_conv2 (Conv2D)       (None, 28, 28, 512)       2359808   
                                                                 
 block4_conv3 (Conv2D)       (None, 28, 28, 512)       2359808   
                                                                 
 block4_pool (MaxPooling2D)  (None, 14, 14, 512)       0         
                                                                 
 block5_conv1 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_conv2 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_conv3 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_pool (MaxPooling2D)  (None, 7, 7, 512)         0         
                                                                 
 flatten (Flatten)           (None, 25088)             0         
                                                                 
 fc1 (Dense)                 (None, 4096)              102764544 
                                                                 
 fc2 (Dense)                 (None, 4096)              16781312  
                                                                 
 predictions (Dense)         (None, 1000)              4097000   
                                                                 
=================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
_________________________________________________________________
1

There are 1 answers

1
TheArmbreaker On

I still do not know how those shuffeled input layer happen to exists. However, I found out that it does not happen on my local machine. The shuffled layer occurs on an Amazon SageMaker Notebook Instance.

I have posted screenshots here:

Stackoverflow Example on Imagenet Layer