1. Neural Networks from First Principles
A neural network stacks layers of neurons. Each neuron computes a weighted sum followed by a non-linear activation \(\sigma\):
Common activations: ReLU \(\max(0, z)\), sigmoid, tanh, softmax (output layer for multi-class).
import torch
import torch.nn as nn
class MLP(nn.Module):
def __init__(self, input_dim, hidden, num_classes):
super().__init__()
self.net = nn.Sequential(
nn.Linear(input_dim, hidden),
nn.ReLU(),
nn.Linear(hidden, hidden),
nn.ReLU(),
nn.Linear(hidden, num_classes),
)
def forward(self, x):
return self.net(x)
Visualize architectures in our Neural Network Simulator.
2. Backpropagation & Gradient Descent
Training minimizes loss \(L\) via gradient descent. Backpropagation applies the chain rule to compute \(\partial L / \partial W^{[l]}\) efficiently:
Modern optimizers (Adam, AdamW) adapt learning rates per parameter. Use learning rate schedules and gradient clipping for stability.
3. Convolutional Neural Networks
CNNs exploit spatial locality in images via convolution filters, pooling, and hierarchical feature learning. Architectures: LeNet, AlexNet, ResNet, EfficientNet.
model = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=3, padding=1),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Conv2d(32, 64, kernel_size=3, padding=1),
nn.ReLU(),
nn.AdaptiveAvgPool2d(1),
nn.Flatten(),
nn.Linear(64, 10),
)
4. Recurrent Networks & Sequences
RNNs, LSTMs, and GRUs process sequential data (text, time series). Hidden state \(h_t\) depends on previous step: \(h_t = f(W x_t + U h_{t-1} + b)\). Transformers have largely replaced RNNs for NLP due to parallelization and long-range dependencies.
5. Transformers & Attention
Self-attention computes relationships between all token pairs. Scaled dot-product attention:
Multi-head attention runs parallel attention heads. Encoder-decoder transformers power GPT, BERT, and modern vision models (ViT).
6. Production Machine Learning
- Data versioning — DVC, lakehouse patterns
- Experiment tracking — MLflow, Weights & Biases
- Model serving — ONNX, TorchServe, Triton
- Monitoring — data drift, concept drift, latency SLAs
Responsible Deployment
Evaluate fairness across demographic groups, document limitations, and maintain human oversight for high-stakes decisions.