Understanding Neural Networks – Part Two

In Part 1, we introduced the idea of a neuron as the building block for neural networks, it has inputs and outputs and uses an activation function to generate the output from the weighted inputs. This time we are going to explore the activation function in a bit more detail and talk about how neural networks work. Don’t worry though, whilst we are going to talk about maths, you don’t need to know any and if all you do is remember the names of four functions and the vague shape of their curves then you know more than enough.

The Activation Function

We mentioned previously that there are different activation functions and the data scientist needed to select the most appropriate one to use in any given situation. The activation function of a node determines the output of that node given a set of inputs. You can see from this that the activation function is at the heart of how a neural network works and that different activation functions will give very different results for a network. 

We also talked previously about how you might use multiple activations functions in a network but that each layer would have the same functions (in all of its nodes). Changing activation function for a layer will significantly change the output of that layer and it is important to select a function which gives the right sort of output for the inputs you are expecting. To understand which activation function you should select, lets look at some of the more common activation functions. The Y axis represents the output value and the X axis represents the weighted sum of input values. The graph is the activation function in each case.

The simplest activation function is a threshold function. The output remains at zero until some threshold t of weighted input values is reaches at which point the output jumps to 1. This function is a binary function. The output is zero or it is one. It is common to use a function like this for output nodes in some types of networks. It can signify true or false when there is a single output node or when you have categorical outputs it can indicate which category is found – though more often we are interested in a probability in the range 0 to 1.

Another very common activation function is the sigmoid function which is a smooth variable function. This is also common in the final layer as it is useful for generating a confidence value or a probability as in the previous example for categorical output.

The rectifier function is similar to sigmoid but is zero below the threshold and then has a straight line up to the maximum value. This function is extremely common in hidden layers.

The last function we will look at is the hyperbolic tangent function which is different as it generates negative outputs for inputs below the threshold and positive outputs for those above.

Each of these functions can be used to create an appropriate class of activation for a layer.

Now let’s consider a very simple neural network which has five independent variables or inputs, the same number of nodes in a single hidden layer and a single output node. It doesn’t really matter what this network is going to calculate, perhaps the input nodes are variables relating to a network connection, for instance is it internal; is it to a high risk asset etc. The output could be a risk level between zero and 1 for this connection (which activation function would you choose for the single output node). Each connection would be a set of these input variables and we would apply them one at a time to the input nodes. The neural network then calculates in a wave moving from left to right (in our diagram). The hidden layer will calculate its outputs and then the output node will in tern calculate its output based upon its inputs from the hidden layer. Once the entire network is calculated the output value can be read and the network is ready for the next set of observations. In our example we could establish the risk of many thousands of network connections and use the outputs to make decisions in close to real-time.

You can see that each layer is highly connected to the preceding layer. That is, each node has a connection to every single node in the preceding layer. This is typical in neural networks and we will explore in more depth how this contributes to the effectiveness of neural networks. Normally a neural network will have many hidden layers and these layers tend to also be fully connected. You can imagine that this adds up to very complex matrices of connections. Just like in their biological counterparts, it is the connections which are the important element in artificial neural networks. Besides the activation function you have probably realised that the biggest impact on the output function is the weights on these input connections. Where you may ask do we get the weights for input synapses of each node? Well that is an excellent question which we will cover it in part three of this series.

About Us

Welcome to the home of advanced Information Security. Here you can learn about using Machine Learning and advanced analytics to improve your security environment.

In addition we will provide impartial advice about security technologies such as SIEM (Security Information and Event Management) and UEBA (User and Entity Behavioral Analysis) systems.

If you’d like help or advice on any of these subjects, or if you’d like to submit your own articles for consideration, then you can contact the site administrator through Linkedin. Check out the Contact page for more details.

Recent Posts

Categories