Machine Learning from Scratch: Neural Network

Introduction

In this post, I’ll be implementing a basic feedforward neural network from scratch in Python. This is the tenth post in the “Machine Learning from Scratch” series.

Neural networks are the foundation of modern deep learning. While frameworks like TensorFlow and PyTorch make them easy to use, building one from scratch helps understand the core concepts of forward propagation, backpropagation, and gradient descent.

Neural Network

A neural network consists of layers of interconnected neurons. Each connection has a weight, and each neuron applies an activation function to its weighted inputs plus a bias term.

The key components are:

Forward Propagation: Pass input through the network to get predictions
Loss Calculation: Measure how wrong the predictions are
Backpropagation: Calculate gradients of the loss with respect to weights
Weight Update: Adjust weights using gradient descent

For this implementation, I’ll build a simple network with one hidden layer using the sigmoid activation function.

Implementation

I’m using numpy for numerical computations. For testing, I’ll use binary classification data from scikit-learn.

The NeuralNetwork class has the following methods:

__init__: Constructor to set network architecture and hyperparameters.
fit: Method to train the network using backpropagation.
predict: Method to make predictions on new data.

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import datasets

class NeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size, lr=0.01, num_iter=1000):
        self.lr = lr
        self.num_iter = num_iter
        
        self.W1 = np.random.randn(input_size, hidden_size) * 0.01
        self.b1 = np.zeros((1, hidden_size))
        self.W2 = np.random.randn(hidden_size, output_size) * 0.01
        self.b2 = np.zeros((1, output_size))

    def _sigmoid(self, x):
        return 1 / (1 + np.exp(-np.clip(x, -500, 500)))

    def _sigmoid_derivative(self, x):
        return x * (1 - x)

    def fit(self, X, y):
        y = y.reshape(-1, 1)
        
        for _ in range(self.num_iter):
            z1 = np.dot(X, self.W1) + self.b1
            a1 = self._sigmoid(z1)
            z2 = np.dot(a1, self.W2) + self.b2
            a2 = self._sigmoid(z2)

            loss = np.mean((y - a2) ** 2)

            dz2 = (a2 - y) * self._sigmoid_derivative(a2)
            dW2 = np.dot(a1.T, dz2) / X.shape[0]
            db2 = np.sum(dz2, axis=0, keepdims=True) / X.shape[0]
            
            dz1 = np.dot(dz2, self.W2.T) * self._sigmoid_derivative(a1)
            dW1 = np.dot(X.T, dz1) / X.shape[0]
            db1 = np.sum(dz1, axis=0, keepdims=True) / X.shape[0]

            self.W2 -= self.lr * dW2
            self.b2 -= self.lr * db2
            self.W1 -= self.lr * dW1
            self.b1 -= self.lr * db1

    def predict(self, X):
        z1 = np.dot(X, self.W1) + self.b1
        a1 = self._sigmoid(z1)
        z2 = np.dot(a1, self.W2) + self.b2
        a2 = self._sigmoid(z2)
        return (a2 > 0.5).astype(int).flatten()

Now let’s test the neural network on a binary classification problem.

def accuracy(y_test, predictions):
    return np.sum(y_test == predictions) / len(y_test)


if __name__ == '__main__':
    X, y = datasets.make_classification(
        n_samples=1000, n_features=10, n_informative=8,
        n_redundant=2, n_classes=2, random_state=42
    )
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42
    )

    model = NeuralNetwork(
        input_size=10, hidden_size=16, output_size=1, 
        lr=0.1, num_iter=5000
    )
    model.fit(X_train, y_train)
    predictions = model.predict(X_test)

    acc = accuracy(y_test, predictions)
    print(f"Accuracy: {acc}")

The neural network learns to classify the data with high accuracy. Even this simple single-hidden-layer network can learn complex non-linear decision boundaries that linear models cannot.

This implementation covers the fundamentals, but modern neural networks use techniques like ReLU activation, batch normalization, dropout, and more sophisticated optimizers like Adam. These improvements make networks train faster and generalize better.

That’s all for this post. Thanks for reading!

Machine Learning from Scratch: Convolutional Neural Network

Machine Learning from Scratch: Random Forest