Even if you’re not working in the data science or software engineering space, it’s tough to avoid getting in front of the term artificial neural networks.

Artificial neural networks (ANN) are ubiquitous. They are used in chatbots, medical imaging, media planning, and a ton of other areas. But have we asked with a sense of deep curiosity: what is an artificial neural network, and what can it really achieve?

We have all come across the common definition that artificial neural networks replicate the functioning of the human neural system. That explains the working principle, but most of us still don’t know what makes an ANN so special or what problem sets it’s ideal for. To clear the air, here’s the most comprehensive and yet accessible guide you will find on artificial neural networks.

When a dozen terms like artificial intelligence, machine learning, deep learning, and neural networks, it’s easy to get confused. The actual bifurcation between these verticals is not that complicated.

AI is the universal set which is the subject-matter at hand. It is the systematic study of how intelligent programs operate and are made. Machine learning is a subset of AI that focuses on how machines can learn by themselves. Deep learning is a further subset of ML that focuses on how layers of neural networks can be used to generate outputs. You can use this visualization to navigate the hierarchy:

So what is an artificial neural network? The answer is exactly how the popular media touts it. It’s a system of data processing and output generation that replicates the neural system to unravel non-linear relations in a large dataset. The data might come from sensory routes and might be in the form of text, pictures, or audio.

The best way to understand how an artificial neural network works is by understanding how a natural neural network inside the brain works and drawing a parallel between them. Neurons are the fundamental component of the human brain and are responsible for learning and retention of knowledge and information as we know it. You can consider them the processing unit in the brain. They take the sensory data as input, process it, and give the output data used by other neurons. The information is processed and passed until a decisive outcome is attained.

The basic neural network in the brain is connected by synapses. You can visualize them as the end-nodes of a bridge that connects two neurons. So, the synapse is the meeting-point for two neurons. Synapses are an important part of this system because the strength of a synapse would determine the depth of understanding and the retention of information.

When you are practicing an activity, you are strengthening these synaptic relations. This is how you can visualize the neural network in your brain:

All the sensory data that your brain is collecting in real-time is processed through these neural networks. They have a point of origination in the system. And as they are processed by the initial neurons, the processed form of an electric signal coming out of one neuron becomes the input for another neuron. This micro-information processing at each layer of neurons is what makes this network effective an efficient. By replicating this recurring theme of processing data across the neural network, ANNs are able to produce superior outputs.

In an ANN, everything is designed to replicate this very process. Don’t worry about the mathematical equation. That’s not the key idea to be understood right now. All the data entering with the label ‘X’ in the system is having a weight of ‘W’ to generate a weighted signal. This replicates the role of a synaptic signal’s strength in the brain. The bias variable is attached to control the results of the output from the function.

So, all of this data is processed in the function and you end up with an output. That’s how a one-layer neural network or a perceptron would look like. The idea of an artificial neural network revolves around connecting several combinations of such artificial neurons to get more potent outputs. That is why the typical artificial neural network’s conceptual framework looks a lot like this:

We’ll soon define the hidden layer, as we deep dive into how an artificial neural network functions. But as far as a rudimentary understanding of an artificial neural network is concerned, you know the first principles now.

This mechanism is used to decipher large datasets. The output generally tends to be an establishment of causality between the variables entered as input that can be used for forecasting. Now that you know the process, you can fully appreciate the technical definition here:

*“A network modeled after the human brain by creating an artificial neural system via a pattern-recognizing computer algorithm that learns from, interprets, and classifies sensory data.”*

Brace yourself, things are about to get interesting here. And don’t worry – you don’t have to do a ton of math right now.

The magic happens first at the activation function. The activation function does initial processing to determine whether the neuron will be activated or not. If the neuron is not activated, its output will be the same as its input. Nothing happens then. This is critical to have in the neural network, otherwise, the system will be forced to process a ton of information that has no impact on the output. You see, the brain has limited capacity but it has been optimized to use it to the best.

One central property common across all of the artificial neural networks is the concept of non-linearity. Most variables which are studied, possess a non-linear relationship in real life.

Take for instance the price of chocolate and the number of chocolates. Assume that one chocolate costs $1. How much would 100 chocolates cost? Probably $100. How much would 10,000 chocolates cost? Not $10,000; because either the seller will add the cost of using extra packaging to put all the chocolates together or she will reduce the cost since you are moving so much of her inventory off her hands in one go. That is the concept of non-linearity.

An activation function will use basic mathematical principles to determine whether the information is to be processed or not. The most common forms of activation functions are Binary Step Function, Logistic Function, Hyperbolic Tangent Function, and Rectified Linear Units. Here’s the basic definition of each one of these:

**Binary step function:**This function activates a neuron on the basis of a threshold. If the function has the end-result which is above or beneath a benchmarked value, the neuron is activated.**Logistic function:**This function has a mathematical end-result in the shape of an ‘S’ curve and is used when probabilities are the key criteria to determine whether the neuron should be activated. So, at any point, you can calculate the slope of this curve. The value of this function lies between 0 and 1.*Slope is calculated using a differential function. The concept is used when two variables don’t have a linear relationship. The slope is the value of a tangent that touches the curve at the exact point where the nonlinearity kicks in. The problem with the logistic function is that it is not good for processing information with negative values.***Hyperbolic tangent function:**It’s quite similar to the logistic function, except its values fall between -1 and +1. So, the problem of a negative value not being processed in the network goes away.**Rectified linear units (ReLu):**This function’s values lie between 0 and positive infinity. ReLu simplifies a few things – if the input is positive, it will give the value of ‘x’. For all other inputs, the value would be ‘0’. You can use a Leaky ReLu that has values between negative infinity and positive infinity. It’s used when the relationship between the variables being processed is really weak and might get omitted by the activation function altogether.

Now you can refer to the same two diagrams of a perceptron and a neural network. What is the difference, apart from the number of neurons? The key difference is the hidden layer. A hidden layer sits right between the input layer and the output layer in a neural network. The hidden layer’s job is to refine the processing and eliminate variables which will not have a strong impact on the output.

If the number of instances in a dataset where the impact of the change in the value of an input variable is noticeable on the output variable, the hidden layer will show that relationship. The hidden layer makes it easy for the ANN to give out stronger signals to the next layer of processing.

Even after doing all this math and understanding how the hidden layer operates, you might be wondering how does an artificial neural network actually learn? Let’s start with the basic question of what is learning. Learning, in the simplest terms, is establishing causality between two things (activities, processes, variables, etc.). When you ‘learn’ how to throw a curveball, you are establishing causality between the physical action of throwing the ball a certain way and getting the ball’s trajectory to get curved a certain way.

Now, this causality is very difficult to establish. Remember the saying correlation does not equal causation? It’s fairly easy to determine when two variables are moving in the same direction. It is very difficult to say with absolute certainty which variable is causing the movement in which variable. Obviously, we are often able to establish this intuitively; but how do you make an algorithm understand intuition?

You use a cost function. Mathematically, it is the squared difference between the actual value of the dataset and the output value of the dataset. You can also consider the degree of error. We square it because sometimes the difference can be negative.

You can brand each cycle of input to output processing with the cost function. Your and the ANN’s job is to minimize the cost function to its lowest possible value. You achieve it by adjusting the weights in the ANN. (Remember the synaptic relations, aka the weights? That’s what we are talking about). There are several ways of doing this, but as far as you understand the principle, you would just be using different tools to execute it.

With each cycle, we aim to minimize the cost function. The process of going from input to output is called forward propagation. And the process of using output data to minimize the cost function by adjusting weight in reverse order from the last hidden layer to the input layer is called backward propagation.

You can keep adjusting these weights using either the Brute Force method, which renders inefficient when the dataset is too big, or Batch-Gradient Descent, which is an optimization algorithm. Now you have an intuitive understanding of how an artificial neural network learns.

Understanding these two forms of neural networks can also be your introduction to two different facets of AI application – computer vision and natural language processing. In the simplest form, these two branches of AI help a machine visually identify objects and understand the context of linguistic data. As you can imagine, there are already used applications of these branches in self-driving cars and virtual assistants like Siri.

Now, each of these branches has its own established neural network. NLP is highly dependent on recurrent neural networks. The difference between an RNN and an ANN is that in an ANN, each input signal is considered to be independent of the next input signal. So, the input data that exists between two nodes, in and of itself does not have any relationship.

In reality, that is not the case. When we are communicating, each word clears the contextual way for the next word. Hence, the fundamental nature of language is that it creates interdependencies between information that is inputted earlier and the information that is inputted later. RNNs are sensitive to this by running a parallel memory that establishes the relationship between these inputs to clear the context.

**Convolutional neural networks** are ideally used for computer vision. Apart from the generally used activation functions, they add a pooling function and a convolution function. A convolution function, in the simpler terms, would show how the input of one image and an input of a second image (a filter) will result in a third image (the result). You can imagine this by visualizing it as a filtered image (a new set of pixel values) sitting on top of your input image (original set of pixel values) to get a resulting image (changed pixel values).

A pooling function will take the maximum or minimum value, depending on the added function, to make processing on this set of information easy. Here is how you can visualize them:

What we’ve talked about so far was all going on underneath the hood. Now we can zoom out and see these ANNs in action to fully appreciate their bond with our evolving world:

One of the earliest applications of ANNs has been on personalizing eCommerce platform experiences for each user. Do you remember the really effective recommendations on Netflix? Or the just-right product suggestions Amazon? They are a result of the ANN.

There is a ton of data being used here: your past purchases, demographic data, geographic data, and the data that shows what did people buying the same product buy next. All of these serve as the inputs to determine what might work for you. At the same time, what you really buy helps the algorithm get optimized. With every purchase, you are enriching the company and the algorithm that empowers the ANN. At the same time, every new purchase made on the platform will also improve the algorithm’s prowess in recommending the right products to you.

Not long ago, chatboxes had started picking up steam on websites. An agent would sit on one side and help you out with your queries typed in the box. Then, a phenomenon called natural language processing (NLP) was introduced to chatbots and everything changed.

NLP generally uses statistical rules to replicate human language capacities, and like other ANN applications, gets better with time. Your punctuations, intonations and enunciations, grammatical choices, syntactical choices, word and sentence order, and even the language of choice can serve as inputs to train the NLP algorithm.

The chatbot becomes conversational by using these inputs to both understand the context of your queries and to formulate answers in a way that would best suit your style. The same NLP is also being used for audio editing in music and security verification purposes.

Most of us follow the outcome predictions being made by AI-powered algorithms during the presidential elections as well as the FIFA World Cup. Since both the events are phased, it helps the algorithm quickly understand its efficacy and minimize the cost function as teams and candidates get eliminated. The real challenge in such situations is the degree of input variables. From candidates to player stats to demographics to anatomical capabilities – everything has to be incorporated.

In stock markets, predictive algorithms that use ANNs have been around for a while now. News updates and financial metrics are the key input variables used. Thanks to this, most exchanges and banks are easily able to trade assets under high-frequency trading initiatives at speeds that far exceed human capabilities.

The problem with stock markets is that the data is always noisy. Randomness is very high because of the degree of subjective judgment which can impact the price of a security is very high. Nevertheless, ANNs are being used in market-making activities by every leading bank these days.

Actuarial tables were already being used to determine the risk factors associated with each insurance applicant. ANNs have taken all that data a notch higher.

All the lenders can run through the decades of data they possess with the strongly established weights in the system and use your information as input to determine the appropriate risk profile associated with your loan application. Your age, gender, city of residence, school of graduation, an industry of engagement, salary, and savings ratio, are all used as inputs to determine your credit risk scores.

What was earlier heavily dependent on your individual credit score has now become a much more comprehensive mechanism. That is the reason why several private fintech players have jumped into the personal loans space to run the same ANNs and lend to people whose profiles are considered too risky by banks.

Tesla, Waymo, and Uber have been using similar ANNs. The inputs and product engineering might have differed, but they were deploying sophisticated visual computing to make self-driving cars a reality.

Much of self-driving has to do with processing information that comes from the real-world in the form of nearby vehicles, road signs, natural and artificial lights, pedestrians, buildings, and so on. Obviously, the neural networks powering these self-driving cars are more complicated than the ones we discussed here, but they do operate on the same principles that we expounded.

ANNs are getting more and more sophisticated day by day. NLPs are now helping in early mental health issue diagnosis, computer vision is being used in medical imaging, and ANNs are powering drone delivery. As ANNs become more complex and layered, the need for human intelligence in this system would become less. Even areas like design have started deploying AI solutions with generative design.

The eventual evolution of all the ANNs put together would be General Intelligence – a form of intelligence so sophisticated that it can learn and perceive all the information known and unknown to humanity. While it is a very distant reality, if even possible, it has become a conceivable concept thanks to ANN’s wide adoption.

Hardik Shah

Hardik Shah is a Tech Consultant at Simform, an application development services company. He leads large scale mobility programs that cover platforms, solutions, governance, standardization, and best practices. Connect with him to discuss the best practices of software methodologies.

There are so many podcasts out there, and many of them fail after the first few episodes.

While the name might sound simple, networking tends to be incredibly complex.

There are a lot of different ways a hacker can completely turn your day upside down.

While the name might sound simple, networking tends to be incredibly complex.

There are a lot of different ways a hacker can completely turn your day upside down.

**Never miss a post.**

Subscribe to keep your fingers on the tech pulse.