How can I make CNTK solve the XOR problem?

71 views Asked by At

I'm trying to make an implementation of a XOR in CNTK in C#. I followed the example here: https://github.com/microsoft/CNTK/blob/release/latest/Examples/TrainingCSharp/Common/LogisticRegression.cs , but it doesn't work. After training, the neural network gives the wrong result: For comparison, I run the same architecture using AForge and after training it gives the correct result: CNTK result AForge result

int iterations = 10000;
// initialize input and output values
double[][] input = new double[4][] {
    new double[] {0, 0}, new double[] {0, 1},
    new double[] {1, 0}, new double[] {1, 1}
};
double[][] output = new double[4][] {
    new double[] {0}, new double[] {1},
    new double[] {1}, new double[] {0}
};

int inputDim = 2;
int hiddenDim = 2;
int numOutputClasses = 1;
DeviceDescriptor device = DeviceDescriptor.CPUDevice;
Variable inputVariable1 = Variable.InputVariable(new int[] { inputDim }, DataType.Float);
Variable outputVariable1 = Variable.InputVariable(new int[] { hiddenDim }, DataType.Float);
Variable inputVariable2 = Variable.InputVariable(new int[] { hiddenDim }, DataType.Float);
Variable outputVariable2 = Variable.InputVariable(new int[] { numOutputClasses }, DataType.Float);

var weightParam1 = new Parameter(new int[] { hiddenDim, inputDim }, DataType.Float, 1, device, "w");
var biasParam1 = new Parameter(new int[] { hiddenDim }, DataType.Float, 0, device, "b");
var classifierOutput0 = CNTKLib.Sigmoid(CNTKLib.Times(weightParam1, inputVariable1) + biasParam1);

var weightParam2 = new Parameter(new int[] { numOutputClasses, hiddenDim }, DataType.Float, 1, device, "ww");
var biasParam2 = new Parameter(new int[] { numOutputClasses }, DataType.Float, 0, device, "bb");
var classifierOutput1 = CNTKLib.Sigmoid(CNTKLib.Times(weightParam2, classifierOutput0) + biasParam2);

//var loss = CNTKLib.CrossEntropyWithSoftmax(classifierOutput1, outputVariable2);
var loss = CNTKLib.BinaryCrossEntropy(classifierOutput1, outputVariable2);
var evalError = CNTKLib.ClassificationError(classifierOutput1, outputVariable2);

// prepare for training
CNTK.TrainingParameterScheduleDouble learningRatePerSample = new CNTK.TrainingParameterScheduleDouble(0.01, 1);
IList<Learner> parameterLearners = new List<Learner>() { Learner.SGDLearner(classifierOutput1.Parameters(), learningRatePerSample) };
var trainer = Trainer.CreateTrainer(classifierOutput1, loss, evalError, parameterLearners);

float[] inputValuesArr = new float[input.Length * input[0].Length];
for (int i = 0; i < input.Length; i++)
{
    for(int k = 0; k < input[i].Length; k++)
    {
        inputValuesArr[i * input[i].Length + k] = (float)input[i][k];
    }
}
float[] outputValuesArr = new float[output.Length * output[0].Length];
for (int i = 0; i < output.Length; i++)
{
    for (int k = 0; k < output[i].Length; k++)
    {
        outputValuesArr[i * output[i].Length + k] = (float)output[i][k];
    }
}
Value inputValues= Value.CreateBatch<float>(new int[] { inputDim }, inputValuesArr, device);
Value outputValues = Value.CreateBatch<float>(new int[] { numOutputClasses }, outputValuesArr, device);

// train the model
for (int minibatchCount = 0; minibatchCount < iterations; minibatchCount++)
{
    //TODO: sweepEnd should be set properly instead of false.
#pragma warning disable 618
    trainer.TrainMinibatch(new Dictionary<Variable, Value>() { { inputVariable1, inputValues }, { outputVariable2, outputValues } }, device);
#pragma warning restore 618
    //TestHelper.PrintTrainingProgress(trainer, minibatchCount, updatePerMinibatches);
}

var inputDataMap = new Dictionary<Variable, Value>() { { inputVariable1, inputValues } };
var outputDataMap = new Dictionary<Variable, Value>() { { classifierOutput1.Output, null } };
classifierOutput1.Evaluate(inputDataMap, outputDataMap, device);
var outputValue = outputDataMap[classifierOutput1.Output];
IList<IList<float>> actualLabelSoftMax = outputValue.GetDenseData<float>(classifierOutput1.Output);

I wrote an XOR implementation, the trained network gave the result: {0.1, 0.6, 0.6, 0.7} instead of {0, 1, 1, 0} I hope someone can help.

Update: I found an example with CNTK for c#, it works, and its structure has such layers: (2-8-1), if you change the structure to (2-2-1), and change the activation functions to sigmoidal, it starts to produce the same result as my code. This is strange, because a neural network with the same structure (2-2-1) of the AForge library does the right result.sample CNTK C# XOR

0

There are 0 answers