Search-Supervised Deep Learning

Training and Deploying Neural Connect 4 Agents

A full-stack ML system that trains neural networks on MCTS self-play data and deploys them as real-time interactive bots via AWS Lightsail and Anvil. CNN and hybrid CNN-Transformer architectures achieve 81.7% Top-1 accuracy without tree search at inference.

GitHub Repository

Project Focus: Game AI, Supervised Learning, Cloud Deployment, Architecture Comparison
Training Stack: Python, TensorFlow/Keras, NumPy, MCTS, Google Colab
Deployment: AWS Lightsail, Docker, Anvil (Python web app)
Dataset: ~40,000 board states from 1,500 MCTS self-play games

System Architecture

This project was architected as a production-style ML pipeline rather than a standalone notebook. Clear separation of responsibilities mirrors real-world deployment patterns.

MCTS Self-Play → Dataset Generation → Model Training (Colab GPU)
→ Model Serialization (.h5)
→ Dockerized Inference API (AWS Lightsail)
→ Anvil Frontend (Authenticated UI)
→ Human vs Bot Gameplay

Layer	Responsibility
Training	Model development and experimentation
AWS Backend	Stateless inference only
Docker	Environment reproducibility
Anvil Frontend	Authentication, UI, state management

Data Generation

Rather than hand-labeling positions, MCTS was used as a high-quality move generator. The pipeline ran 1,500 self-play games with 1,200 rollouts per move over ~15 hours. Randomized early moves increased diversity; duplicate board states were consolidated via majority vote.

1,500

Self-play games

1,200

Rollouts per move

~40k

Board states

15h

Compute time

Model Results

Metric	CNN	Hybrid Transformer
Top-1 Accuracy	78.3%	81.7%
Top-2 Accuracy	92.4%	94.1%
Inference Time	1.2 ms	3.8 ms
Training Time	~20 min	~2 hours
Best Use Case	Real-time, lightweight deployment	Maximum move prediction accuracy

The hybrid model improves top-1 accuracy by 3.4 points. The CNN trades a small accuracy drop for 3x faster inference and 6x faster training, making it ideal for real-time deployment.

Architecture Journey

CNN: Iterative Refinement

Initial models exposed classic failure modes: shallow CNN underfit (~60%), deep CNN overfit, heavy regularization caused capacity collapse. The final CNN balanced depth and generalization with progressive convolution blocks (32 → 256 filters), batch normalization, dropout scheduling, and Global Average Pooling.

Pure Transformer

Performance plateaued at 46–55% accuracy. The 6×7 board is too small for effective token diversity, and transformers lack inductive spatial bias. Conclusion: transformers without feature extraction underperform on compact spatial domains.

Hybrid CNN–Transformer

CNN feature extractor compresses to 3×3 spatial tokens, then a 4-layer Transformer encoder and dense classification head. This combined spatial priors with global attention for the best overall accuracy.

Tactical Error Analysis

High Confidence (>95%)

•Immediate wins: 98%+ accuracy
•Forced defensive blocks
•Opening central control

Failure Modes (<65%)

•Multi-move traps
•Dense endgames
•Zugzwang states

Neural networks approximate pattern recognition but cannot simulate future branches. This reflects the historical evolution from supervised AlphaGo to AlphaZero-style policy + search hybrids.

Deployment

AWS Lightsail Backend

•Dockerized TensorFlow inference service
•Model loaded at startup, stateless requests
•Returns probability distribution over 7 moves
•Deterministic, low-latency inference (<4 ms)

Anvil Frontend

•Authenticated UI (email/password, no auto signup)
•Model selector (CNN vs Transformer)
•Real-time human vs bot gameplay
•Tab navigation: Play Game | Training Description

Anvil UI

The frontend was designed to resemble a polished consumer product: gradient background, rounded board container, elevated shadows, distinct yellow/red piece styling, animated feedback messages.

Game interface with model selector and board

Interactive gameplay view

Model Performance Visualization

High-confidence board states where models excel

Challenging scenarios (multi-move traps, endgames)

Engineering Tradeoffs

Dimension	CNN	Hybrid Transformer
Accuracy	Strong	Best
Latency	Excellent	Moderate
Training Cost	Low	High
Implementation Complexity	Moderate	High

Both models were deployed to allow direct comparison in the live Anvil app.

Key Takeaways

• Inductive bias matters more than model novelty; small spatial grids favor CNNs
• Transformers benefit from hybridization with spatial feature extractors
• Supervised imitation from MCTS labels has inherent planning limits; neural-guided search (AlphaZero-style) is the natural next step
• Production deployment adds non-trivial engineering overhead: containerization, cloud hosting, authenticated UI

Future Improvements

Neural-guided MCTS (AlphaZero-style) Reinforcement learning fine-tuning Larger dataset (5k–10k games) Lightweight inference-optimized CNN Specialized endgame classifier

Outcome

The final system achieves 81.7% Top-1 move prediction accuracy, avoids catastrophic tactical blunders, runs inference in under 4 ms, supports authenticated interactive gameplay, and is fully containerized and cloud-hosted. It reflects production-ready ML system design with measurable outcomes across search supervision, architecture experimentation, regularization tuning, Docker containerization, cloud deployment, and frontend integration.