From training to recovery in four steps

This is a simulated run — no signup required. Click each step below (or let it auto-play) to see how Faultline tracks metrics, stores checkpoints, handles crashes, and tells you exactly how to resume.

Demo modesample data only. Sign up to connect real training runs.

Metrics stream while the job runs

Your trainer reports loss and step to Faultline in real time. You can watch progress without SSHing into the node.

Simulated run

running

resnet-finetune-live

faultline-demo

Latest step
180
Checkpoint
160
Loss
0.420

Live metrics

Loss curve for this run — updates as training progresses in the real product.

Checkpoints

Committed blobs you can restore from after a failure.

  • Step 160

    45.78 MB

    committed

Want the full dashboard with three sample runs? Browse the demo project or sign up to connect your own jobs.