Back to Projects

Smart Doc Approver

An agentic, ensemble-based machine learning system for automated receipt classification, field extraction, anomaly detection, and approval routing.

Live Demo Medium Article GitHub
Course
MIS 382N (Advanced Machine Learning), UT Austin
Stack
Python, PyTorch, Hugging Face, LangGraph, OCR Tooling, Gradio

Technical Overview

Smart Doc Approver is designed as a modular, agentic pipeline that mimics how a human analyst reviews receipts. Instead of relying on a single model, the system orchestrates multiple specialized models and routes documents based on confidence and validation checks.

  • Document classification using an ensemble of vision models (ResNet + ViT)
  • OCR ensemble with confidence-based retries for difficult receipts
  • Field extraction via LayoutLM-based token classification, rules, and NER
  • Anomaly detection using an ensemble of unsupervised and tree-based models
  • Agentic orchestration using LangGraph to route approve / review / reject decisions

Why an Agentic Approach

Traditional document pipelines fail on edge cases and lack adaptability. This system uses shared state, confidence thresholds, and conditional routing so that uncertain documents trigger additional processing or human review instead of hard failures.

Outcomes

  • Improved end-to-end accuracy through ensemble modeling
  • Reduced unnecessary compute via confidence-based routing
  • Human-in-the-loop feedback designed for continual improvement
  • Fully interactive demo deployed via Hugging Face Spaces