Architecture
Faultline is an ML training continuity and recovery platform. Training scripts send metrics and checkpoints to a Cloud API; Postgres stores metadata; object storage holds checkpoint blobs; the Next.js dashboard proxies through your login session.
[Trainer / HF / Lightning / SDK]
| HTTPS + API key
Cloud API (FastAPI)
|
PostgreSQL + MinIO/S3
|
Dashboard (Next.js BFF + JWT)Built for long-running ML jobs on laptops, HPC clusters, and cloud GPUs. The local Rust runtime remains available for offline checkpoint experiments.
Full details in the repository: docs/ARCHITECTURE.md