Understanding Neural Networks and What They Learn

Neural networks are a fundamental technology in the field of artificial intelligence and machine learning. They are designed to simulate the way the human brain processes information, enabling machines to learn from data, identify patterns, and make decisions. Here’s an in-depth look at how they work and what they actually learn.

Structure of Neural Networks

A neural network consists of layers of nodes, or neurons, that are connected by edges. These layers typically include:

  1. Input Layer: This layer receives the initial data. Each neuron in the input layer represents a feature of the data.
  2. Hidden Layers: These layers process the inputs received from the input layer. A neural network can have multiple hidden layers, each performing complex computations and transformations on the data.
  3. Output Layer: This layer produces the final output of the network, such as a classification or prediction.

How Neural Networks Work

  1. Initialisation: The network starts with random weights. Weights are parameters that adjust the input’s importance.
  2. Forward Propagation: Data is passed through the network from the input layer to the output layer. Each neuron’s output is determined by applying an activation function to the weighted sum of its inputs.
  3. Activation Functions: These functions introduce non-linearity into the model, allowing the network to learn more complex patterns. Common activation functions include ReLU (Rectified Linear Unit), sigmoid, and tanh (1).
  4. Loss Function: After forward propagation, the network’s output is compared to the actual target values using a loss function, which measures the error.
  5. Backpropagation: The network adjusts its weights to minimise the error. This involves computing the gradient of the loss function with respect to each weight and updating the weights accordingly.
  6. Iteration: Steps 2-5 are repeated for many iterations (epochs), with the network gradually improving its accuracy by continuously adjusting the weights.

What Neural Networks Actually Learn

Neural networks learn patterns and representations in the data. Here’s a detailed look at what they actually learn:

  1. Weights and Biases: During training, a neural network learns the optimal values of weights and biases that minimise the loss function. These values determine how input features are combined and transformed through the layers.
  2. Feature Extraction: In the hidden layers, the network learns to extract relevant features from the raw input data. Early layers might learn simple features such as edges in an image, while deeper layers combine these simple features to recognise complex patterns like shapes and objects.
  3. Data Representation: Neural networks transform the input data into internal representations that are more useful for the task at hand. For instance, in image recognition, an internal representation might capture the presence of certain textures or patterns.
  4. Decision Boundaries: For classification tasks, the network learns to draw boundaries in the feature space that separate different classes. These boundaries are learned in such a way that the network can accurately classify new, unseen data.

Practical Example: Image Classification

Consider a neural network trained to classify images of cats and dogs. Here’s what it might learn at various stages:

  • Early Layers: Detect simple features like edges, colours, and textures.
  • Intermediate Layers: Recognise more complex patterns such as fur patterns, eyes, or noses.
  • Final Layers: Combine these patterns to identify the overall structure of a cat or a dog.

Through this hierarchical learning process, the network develops a robust understanding of the visual characteristics that distinguish cats from dogs.

Conclusion

Neural networks are powerful tools for learning from data. They work by adjusting weights through forward and backward propagation to minimise errors. What they learn includes optimal weights and biases, feature extraction, internal data representations, and decision boundaries. This capability allows them to perform complex tasks such as image classification, natural language processing, and more, by learning intricate patterns and representations from raw data.

(1) Sigmoid and Tanh

In the context of neural networks, the sigmoid and tanh functions are commonly used activation functions. These functions are crucial for introducing non-linearity into the model, allowing it to learn complex patterns.

Sigmoid Function

The sigmoid function, also known as the logistic function, is defined mathematically as:

    \[ \sigma(x) = \frac{1}{1 + e^{-x}} \]

where ( e ) is the base of the natural logarithm.

Characteristics of Sigmoid:

  1. Output Range: The sigmoid function outputs values between 0 and 1. This makes it useful for models where we want to predict probabilities, such as binary classification tasks.
  2. S-shape Curve: The sigmoid function has an S-shaped curve, which means that small changes in input ( x ) around 0 result in significant changes in the output, but as ( x ) moves far from 0 (either positive or negative), the output changes very slowly.
  3. Non-linearity: The non-linear nature allows the neural network to learn and model complex data patterns that a linear function could not.
  4. Gradient: One downside is the gradient can become very small for large positive or negative input values, leading to the vanishing gradient problem during backpropagation. This can slow down or halt the training of deep networks.

Tanh Function

The tanh function, or hyperbolic tangent function, is defined as:

    \[ \text{tanh}(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}} \]

Characteristics of Tanh:

  1. Output Range: The tanh function outputs values between -1 and 1. This symmetric range makes it centred around 0, often leading to faster convergence in training because the mean of the activations is closer to zero.
  2. S-shape Curve: Similar to the sigmoid function, tanh also has an S-shaped curve but is steeper, leading to a more pronounced gradient.
  3. Non-linearity: The non-linear nature allows for complex pattern learning, just like the sigmoid function.
  4. Gradient: Although tanh suffers from the vanishing gradient problem like the sigmoid function, it tends to have stronger gradients compared to the sigmoid function, which can make it preferable in some scenarios.

Comparison and Use Cases

  • Sigmoid: Often used in the output layer of binary classification problems since it outputs a probability value between 0 and 1. Its use in hidden layers is less common due to the vanishing gradient issue.
  • Tanh: Preferred over sigmoid in hidden layers because its output range (-1 to 1) can lead to a mean activation closer to zero, which can improve the convergence during training.

In summary, while both sigmoid and tanh functions play similar roles in introducing non-linearity to neural networks, they have different characteristics that make them suitable for different parts of the network. Understanding these differences is key to effectively applying them in neural network design.

Stay updated with the latest AI news. Subscribe now for free email updates. We respect your privacy, do not spam, and comply with GDPR.

Bob Mazzei
Bob Mazzei

AI Consultant, IT Engineer

Articles: 90