Back to Projects

Search-Supervised Deep Learning

Training and Deploying Neural Connect 4 Agents

A full-stack ML system that trains neural networks on MCTS self-play data and deploys them as real-time interactive bots via AWS Lightsail and Anvil. CNN and hybrid CNN-Transformer architectures achieve 81.7% Top-1 accuracy without tree search at inference.

GitHub Repository
Project Focus
Game AI, Supervised Learning, Cloud Deployment, Architecture Comparison
Training Stack
Python, TensorFlow/Keras, NumPy, MCTS, Google Colab
Deployment
AWS Lightsail, Docker, Anvil (Python web app)
Dataset
~40,000 board states from 1,500 MCTS self-play games

System Architecture

This project was architected as a production-style ML pipeline rather than a standalone notebook. Clear separation of responsibilities mirrors real-world deployment patterns.

MCTS Self-Play → Dataset Generation → Model Training (Colab GPU)
→ Model Serialization (.h5)
→ Dockerized Inference API (AWS Lightsail)
→ Anvil Frontend (Authenticated UI)
→ Human vs Bot Gameplay
Layer Responsibility
Training Model development and experimentation
AWS Backend Stateless inference only
Docker Environment reproducibility
Anvil Frontend Authentication, UI, state management

Data Generation

Rather than hand-labeling positions, MCTS was used as a high-quality move generator. The pipeline ran 1,500 self-play games with 1,200 rollouts per move over ~15 hours. Randomized early moves increased diversity; duplicate board states were consolidated via majority vote.

1,500
Self-play games
1,200
Rollouts per move
~40k
Board states
15h
Compute time

Model Results

Metric CNN Hybrid Transformer
Top-1 Accuracy 78.3% 81.7%
Top-2 Accuracy 92.4% 94.1%
Inference Time 1.2 ms 3.8 ms
Training Time ~20 min ~2 hours
Best Use Case Real-time, lightweight deployment Maximum move prediction accuracy

The hybrid model improves top-1 accuracy by 3.4 points. The CNN trades a small accuracy drop for 3x faster inference and 6x faster training, making it ideal for real-time deployment.

Architecture Journey

CNN: Iterative Refinement

Initial models exposed classic failure modes: shallow CNN underfit (~60%), deep CNN overfit, heavy regularization caused capacity collapse. The final CNN balanced depth and generalization with progressive convolution blocks (32 → 256 filters), batch normalization, dropout scheduling, and Global Average Pooling.

Pure Transformer

Performance plateaued at 46–55% accuracy. The 6×7 board is too small for effective token diversity, and transformers lack inductive spatial bias. Conclusion: transformers without feature extraction underperform on compact spatial domains.

Hybrid CNN–Transformer

CNN feature extractor compresses to 3×3 spatial tokens, then a 4-layer Transformer encoder and dense classification head. This combined spatial priors with global attention for the best overall accuracy.

Tactical Error Analysis

High Confidence (>95%)

  • Immediate wins: 98%+ accuracy
  • Forced defensive blocks
  • Opening central control

Failure Modes (<65%)

  • Multi-move traps
  • Dense endgames
  • Zugzwang states

Neural networks approximate pattern recognition but cannot simulate future branches. This reflects the historical evolution from supervised AlphaGo to AlphaZero-style policy + search hybrids.

Deployment

AWS Lightsail Backend

  • Dockerized TensorFlow inference service
  • Model loaded at startup, stateless requests
  • Returns probability distribution over 7 moves
  • Deterministic, low-latency inference (<4 ms)

Anvil Frontend

  • Authenticated UI (email/password, no auto signup)
  • Model selector (CNN vs Transformer)
  • Real-time human vs bot gameplay
  • Tab navigation: Play Game | Training Description

Anvil UI

The frontend was designed to resemble a polished consumer product: gradient background, rounded board container, elevated shadows, distinct yellow/red piece styling, animated feedback messages.

Connect 4 Anvil game interface

Game interface with model selector and board

Connect 4 Anvil gameplay

Interactive gameplay view

Model Performance Visualization

Connect 4 high performance scenarios

High-confidence board states where models excel

Connect 4 challenging board scenarios

Challenging scenarios (multi-move traps, endgames)

Engineering Tradeoffs

Dimension CNN Hybrid Transformer
Accuracy Strong Best
Latency Excellent Moderate
Training Cost Low High
Implementation Complexity Moderate High

Both models were deployed to allow direct comparison in the live Anvil app.

Key Takeaways

  • Inductive bias matters more than model novelty; small spatial grids favor CNNs
  • Transformers benefit from hybridization with spatial feature extractors
  • Supervised imitation from MCTS labels has inherent planning limits; neural-guided search (AlphaZero-style) is the natural next step
  • Production deployment adds non-trivial engineering overhead: containerization, cloud hosting, authenticated UI

Future Improvements

Neural-guided MCTS (AlphaZero-style) Reinforcement learning fine-tuning Larger dataset (5k–10k games) Lightweight inference-optimized CNN Specialized endgame classifier

Outcome

The final system achieves 81.7% Top-1 move prediction accuracy, avoids catastrophic tactical blunders, runs inference in under 4 ms, supports authenticated interactive gameplay, and is fully containerized and cloud-hosted. It reflects production-ready ML system design with measurable outcomes across search supervision, architecture experimentation, regularization tuning, Docker containerization, cloud deployment, and frontend integration.