Blog | Xinzhi Xue

Deep Learning Foundations: Building from Scratch

Nov 26, 2025

Understanding deep learning requires more than just using high-level frameworks. This post chronicles my journey building neural networks from the ground up, implementing everything from basic classifiers to modern CNNs and RNNs, gaining insights that frameworks often abstract away.

Deep Learning Foundations

Core ML Algorithms: Implemented fundamental algorithms from scratch including K-Nearest Neighbors (KNN), Support Vector Machines (SVM), Softmax classifiers, and 2-layer neural networks. This hands-on approach revealed the mathematical foundations that modern frameworks abstract away.
Vectorization & Performance: Applied vectorization techniques for performance optimization, producing 5x+ speedups by replacing nested loops with NumPy matrix operations. Learned how proper matrix operations are crucial for scalable deep learning.
Modular Neural Networks: Built modular neural network components including forward/backward propagation, multiple optimizers (SGD, Momentum, Adam), and numerical gradient verification to ensure correctness.

Advanced Deep Learning

Regularization Techniques: Implemented modern regularization methods including batch normalization, layer normalization, group normalization, and dropout. Understanding these techniques from scratch revealed how they enable training deeper networks and prevent overfitting.
Convolutional Neural Networks: Built CNNs from scratch including convolution layers, max pooling, and spatial normalization. Applied these concepts using PyTorch on the CIFAR-10 dataset, bridging the gap between theory and practical implementation.
Recurrent Neural Networks: Developed RNNs for sequence modeling by implementing vanilla RNN with backpropagation through time (BPTT). Applied the architecture to image captioning on the COCO dataset, learning how to handle sequential data and temporal dependencies.

Advanced Deep Learning Architectures

Transformer Networks for Vision: Implemented multi-headed attention, positional encoding, and complete transformer encoder/decoder architectures from scratch in PyTorch. Built a Vision Transformer (ViT) that converts images into patch sequences and processes them through self-attention layers, reaching 45.8% accuracy on CIFAR-10 after just 2 epochs. Applied transformers to image captioning on COCO dataset by combining CNN feature extraction with transformer decoders.
Self-Supervised Learning with SimCLR: Implemented contrastive learning framework using SimCLR, including data augmentation pipelines and normalized temperature-scaled cross-entropy loss. Developed both naive and vectorized implementations of the contrastive loss function. Demonstrated the power of self-supervised pretraining by delivering 82.4% classification accuracy compared to only 15.3% without pretraining, using just 10% of the CIFAR-10 training data.
Attention Mechanisms & Scalability: Gained deep understanding of attention complexity (O(l²×d)) and architectural trade-offs in transformer models. Explored how multi-headed attention enables models to learn different relationship types simultaneously, and how transformers differ from RNNs by introducing parallelism and learning long-range dependencies. Applied these insights to both NLP-inspired image captioning and pure vision tasks with ViT.

Key Takeaways

Mathematics Matters: Understanding gradient descent, backpropagation, and chain rule is essential
Vectorization is Critical: Matrix operations provide massive performance gains over loops
Modular Design: Building reusable components makes complex architectures manageable
Theory + Practice: Implementing from scratch, then using frameworks provides deepest understanding
Debugging Skills: Numerical gradient checking is invaluable for validating implementations

Run LLM Completely Offline

Feb 9, 2025

Imagine you're solo backpacking in Yosemite — magnificent waterfalls, a sky glittering with stars. And suddenly, you feel the irresistible urge to chat with someone, not just anyone, a chatbot to be precise. This guide shows you how to summon one in total isolation, no internet required, so you're never truly off the grid.

Step 0: Activate Python Virtual Environment

python3 -m venv myenv
source myenv/bin/activate

Step 1: Prepare Files on an Internet-Connected Machine

1.1 Download Model & Dependencies

# Install git-lfs first (on the connected machine)
git lfs install
git clone https://huggingface.co/deepseek-ai/deepseek-r1-distill-qwen-1.5b /path/to/deepseek-r1

# Create requirements.txt
echo "torch
transformers" > requirements.txt

# Download Python wheels for Apple Silicon (ARM64)
pip download -r requirements.txt --platform macosx_13_0_arm64 --only-binary=:all:

1.2 Build SentencePiece Wheel (Optional)

If you need to build the SentencePiece library from source for ARM64:

# Install dependencies
xcode-select --install
brew install cmake

# Clone repository
git clone https://github.com/google/sentencepiece.git
cd sentencepiece

# Install Python build tools
pip install setuptools wheel

Note: Before building, in the sentencepiece/python folder:

Replace src with python/src in all files
Replace build_bundled.sh with python/build_bundled.sh
Replace _version.py with python/_version.py

# Build the wheel
cd /path/to/sentencepiece
python python/setup.py bdist_wheel

# Find the .whl file in dist/ folder
# Install it
pip install dist/sentencepiece-*.whl

1.3 Package for Transfer

mkdir deepseek-offline
mv deepseek-r1 requirements.txt *.whl deepseek-offline/
zip -r deepseek-offline.zip deepseek-offline

Step 2: Transfer to MacBook Pro

Copy deepseek-offline.zip via USB/external drive
Unzip on your Mac:

unzip deepseek-offline.zip -d ~/Documents/deepseek-offline

Step 3: Install Docker for Apple Silicon

Download Docker Desktop for Apple Silicon (ARM64) from docker.com
Install offline:
- Double-click the .dmg file
- Drag Docker to Applications folder
Launch Docker Desktop and complete initial setup (no sign-in needed)

Step 4: Build the Offline Docker Image

4.1 Create Dockerfile

cd ~/Documents/deepseek-offline
cat <<EOF > Dockerfile
# Apple Silicon-compatible base image
FROM python:3.13-slim

# Install system dependencies
RUN echo "APT::Get::Assume-Yes \"true\";" > /etc/apt/apt.conf.d/99force-yes && \
    echo 'APT::Get::AllowUnauthenticated "true";' >> /etc/apt/apt.conf.d/99force-yes && \
    apt-get update && apt-get install -y --no-install-recommends \
    libopenblas-dev \
    && rm -rf /var/lib/apt/lists/*

# Copy offline packages
COPY *.whl /tmp/wheels/
COPY requirements.txt .

# Install dependencies offline
RUN pip install --find-links /tmp/wheels/ -r requirements.txt

# Copy model
COPY deepseek-r1-distill-qwen-1.5b /app/model

# Set cache environment variable to writable directory
RUN mkdir -p /tmp/huggingface_cache && chmod -R 777 /tmp/huggingface_cache
ENV HF_HOME=/tmp/huggingface_cache

# Prevent telemetry and network calls
ENV TRANSFORMERS_OFFLINE=1
ENV HF_DATASETS_OFFLINE=1

WORKDIR /app
EOF

4.2 Build the Image

docker build -t deepseek-r1-offline .

Step 5: Run with Full Isolation

# Run with NO network and read-only filesystem
docker run --rm \
  --network none \
  --read-only \
  --mount type=tmpfs,destination=/tmp \
  deepseek-r1-offline \
  python -c "
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained('/app/model', local_files_only=True).to('cpu')
tokenizer = AutoTokenizer.from_pretrained('/app/model', local_files_only=True)

inputs = tokenizer('Explain quantum physics simply:', return_tensors='pt')
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0]))
"

Step 6: Verify No Data Leakage

1. Confirm Docker container has no network:

# While container is running (in another terminal)
docker exec -it <CONTAINER_ID> sh
ping 8.8.8.8  # Should fail immediately

2. Monitor host machine traffic:

sudo lsof -i -P -n | grep ESTABLISHED  # Should show no Docker-related connections

Optional: Persistent Inference Script

1. Create inference.py

from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer
import torch
import threading

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained('/app/model', local_files_only=True)
model = AutoModelForCausalLM.from_pretrained('/app/model', local_files_only=True).to('cpu')

def run_inference(prompt, max_length=200):
    """Run inference with streaming output"""
    inputs = tokenizer(prompt, return_tensors="pt")

    # Create streamer for token-by-token output
    streamer = TextIteratorStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

    # Run generation in separate thread
    def generate():
        model.generate(
            inputs.input_ids,
            max_length=max_length,
            streamer=streamer
        )

    threading.Thread(target=generate).start()

    # Stream tokens as they're generated
    for token in streamer:
        print(token, end="", flush=True)

if __name__ == "__main__":
    run_inference("Explain quantum physics simply:")

2. Run through Docker

docker run --rm \
  --network none \
  --read-only \
  --mount type=tmpfs,destination=/tmp \
  -v ~/Documents/deepseek-offline/inference.py:/app/inference.py \
  deepseek-r1-offline \
  python inference.py

Key Security Features

--network none: Kernel-level network isolation
--read-only: Container cannot modify its filesystem
tmpfs mount: Ephemeral storage wiped on exit
TRANSFORMERS_OFFLINE=1: Disables Hugging Face telemetry
Apple Silicon-native build: Avoids x86 emulation vulnerabilities
local_files_only=True: Prevents any network model downloads

Cleanup

# Remove container and image when done
docker rmi deepseek-r1-offline

Performance Note: The use of MPS (device='mps') can leverage your Mac's GPU for faster inference while keeping everything local. Replace .to('cpu') with .to('mps') in the code above if you want GPU acceleration.