How do I reshape the pythonarrays as input and output for a convolutional neural network in Keras4Delphi properly

16 views Asked by At

I'm working on a Speech2Vec algorithm with 3 neural networks. The first of these networks is for speech recognization. It converts a single FFT window (one dimensional) to 'sound nrs.': 1..1500. So I have a 1D convolutional neural network with 512 (FFT size) x 3 channels input count and ~1500 outputs ('sound nr'). Building up the model is without any error. But before training the neural network I get an error: "ValueError: 'logits' and 'labels' must have the same shape, received ((None, 512, 1491) vs (None, 1491, 1))." So the output format in incorrect and confused by input shape (512 FFT size). I have the input shape defined as follows:

input_shape := Tnp_shape.Create([SoundSize DIV 2, 3]);

Then I have the code for generating the x and y arrays to train:

SetLength(xtestarray, SoundCount*(SoundSize DIV 2)*3);
for i := 0 to SoundCount-1 do
for j := 0 to (SoundSize DIV 2)-1 do
begin
  xtestarray[i*(SoundSize DIV 2)*3+j*3+0] := SoundsFFT[i,j];
  xtestarray[i*(SoundSize DIV 2)*3+j*3+1] := SoundsFFT_DeltaA[i,j].a;
  xtestarray[i*(SoundSize DIV 2)*3+j*3+2] := SoundsFFT_DeltaA[i,j].b;
end;
x := TNumPy.npArray<Double>(xtestarray);
x := TNDArray(x.reshape([SoundCount, (SoundSize DIV 2), 3]));

SetLength(ytestarray, SoundCount*SoundCount);
for i := 0 to SoundCount-1 do
for j := 0 to SoundCount-1 do
  if i = j then
    ytestarray[i*SoundCount+j] := 1
  else
    ytestarray[i*SoundCount+j] := 0;
y := TNumPy.npArray<Double>(ytestarray);
y := TNDArray(y.reshape([SoundCount, SoundCount{, 1}]));

Does somebody know what causes the error I get when running my program? How should I reshape both input and output arrays correctly?

0

There are 0 answers