Introduction
In this post, I’ll be implementing a basic feedforward neural network from scratch in Python. This is the tenth post in the “Machine Learning from Scratch” series.
Neural networks are the foundation of modern deep learning. While frameworks like TensorFlow and PyTorch make them easy to use, building one from scratch helps understand the core concepts of forward propagation, backpropagation, and gradient descent.
Neural Network
A neural network consists of layers of interconnected neurons. Each connection has a weight, and each neuron applies an activation function to its weighted inputs plus a bias term.
The key components are:
- Forward Propagation: Pass input through the network to get predictions
- Loss Calculation: Measure how wrong the predictions are
- Backpropagation: Calculate gradients of the loss with respect to weights
- Weight Update: Adjust weights using gradient descent
For this implementation, I’ll build a simple network with one hidden layer using the sigmoid activation function.
Implementation
I’m using numpy for numerical computations. For testing, I’ll use binary classification data from scikit-learn.
The NeuralNetwork class has the following methods:
__init__: Constructor to set network architecture and hyperparameters.fit: Method to train the network using backpropagation.predict: Method to make predictions on new data.
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import datasets
class NeuralNetwork:
def __init__(self, input_size, hidden_size, output_size, lr=0.01, num_iter=1000):
self.lr = lr
self.num_iter = num_iter
self.W1 = np.random.randn(input_size, hidden_size) * 0.01
self.b1 = np.zeros((1, hidden_size))
self.W2 = np.random.randn(hidden_size, output_size) * 0.01
self.b2 = np.zeros((1, output_size))
def _sigmoid(self, x):
return 1 / (1 + np.exp(-np.clip(x, -500, 500)))
def _sigmoid_derivative(self, x):
return x * (1 - x)
def fit(self, X, y):
y = y.reshape(-1, 1)
for _ in range(self.num_iter):
z1 = np.dot(X, self.W1) + self.b1
a1 = self._sigmoid(z1)
z2 = np.dot(a1, self.W2) + self.b2
a2 = self._sigmoid(z2)
loss = np.mean((y - a2) ** 2)
dz2 = (a2 - y) * self._sigmoid_derivative(a2)
dW2 = np.dot(a1.T, dz2) / X.shape[0]
db2 = np.sum(dz2, axis=0, keepdims=True) / X.shape[0]
dz1 = np.dot(dz2, self.W2.T) * self._sigmoid_derivative(a1)
dW1 = np.dot(X.T, dz1) / X.shape[0]
db1 = np.sum(dz1, axis=0, keepdims=True) / X.shape[0]
self.W2 -= self.lr * dW2
self.b2 -= self.lr * db2
self.W1 -= self.lr * dW1
self.b1 -= self.lr * db1
def predict(self, X):
z1 = np.dot(X, self.W1) + self.b1
a1 = self._sigmoid(z1)
z2 = np.dot(a1, self.W2) + self.b2
a2 = self._sigmoid(z2)
return (a2 > 0.5).astype(int).flatten()
Now let’s test the neural network on a binary classification problem.
def accuracy(y_test, predictions):
return np.sum(y_test == predictions) / len(y_test)
if __name__ == '__main__':
X, y = datasets.make_classification(
n_samples=1000, n_features=10, n_informative=8,
n_redundant=2, n_classes=2, random_state=42
)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
model = NeuralNetwork(
input_size=10, hidden_size=16, output_size=1,
lr=0.1, num_iter=5000
)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
acc = accuracy(y_test, predictions)
print(f"Accuracy: {acc}")
The neural network learns to classify the data with high accuracy. Even this simple single-hidden-layer network can learn complex non-linear decision boundaries that linear models cannot.
This implementation covers the fundamentals, but modern neural networks use techniques like ReLU activation, batch normalization, dropout, and more sophisticated optimizers like Adam. These improvements make networks train faster and generalize better.
That’s all for this post. Thanks for reading!