In these article (before dwelling long upon deep learning fundamentals) I would like to talk a bit aboutÂ neural networks. So first of all why do we need neural networks?

**Neural Networks**

There are several algorithms and data structures that are working pretty well. For example, we can sort given items with quicksort quite efficiently. Or we can calculate the shortest path in a directed graph with shortest path algorithms. But there are other non-trivial problems: how to recognize a human face? For us humans it is quite straightforward … but how to program computers to do so? Computer scientists have come to the conclusion that there is no predefined step-by-step algorithm for this problem. So maybe we should mimic human brain itself! Maybe an artificial neural network can do facial recognition just as easily as we humans. So this is why neural networks came to be!

A neural network has **3** layers: input layer, hidden layer (there can be more hidden layers not just one) and the output layer. We feed the network with input and then the network makes some prediction (which is the output). The “hello world” program for neural networks is the XOR problem.

Se we have two features: **x** and **y**. Thats why we have two neurons in the input layer. The output neuron should present the **x XOR y** values. So somehow we have to train our artificial neural network. What does it mean exactly? That after the training procedure, the network should give the correct results. If** x=0** and **y=0** then the output should be **0** (according to the XOR table). If **x=0** and **y=1** then the output should be **1**. And so on…

**Calculating the Output**

Another good question: how to calculate the output? We have to apply the so-called activation function on the weighted sum of inputs.

So first we have to calculate the sum of weighted inputs. Then we can apply the activation function in order to end up with the activation. This is how we calculate the value/activation of every node in the network (of course for the inputs we do not have to do so because we get the exact data).

**Training the Neural Network**

So what is the big picture? A little change in the edge weights results in a little change in the output. So we just have to keep tuning the edge weights until we get the right output. Of course we need a dataset to know what is a good output and what is not. Thats why we have the XOR table in this case with all the input values and the associated output values. So we need something like the error: the difference between the result (so the output of the neural network) and the actual value (so the value from the XOR table). We want to end up with an optimization problem: so we want to minimize the error terms.

So the total error of the network is the sum of the differences between the **y** actual value (in the XOR table) and **y’** value (the prediction by the network). Of course if **C(w)** is **0** it means there is no error, the network makes the predictions present in the XOR table. This is exactly what we are after!

If we plot **(x,y)** value with the associated **C(w)** error value we end up with something like this. So we have to find the minimum of this function. Why? Because if **C(w)** is very small it means our neural network is making good predictions. Are there any optimization techniques to deal with this problem? Of course: gradient descent and stochastic gradient descent methods.