Demo mode — sample data only. Sign up to connect real training runs.
failedLightningrecoverable
slurm-protein-exp7
Step 400 · loss 0.1800 · 7/3/2026, 12:17:06 AM
Recovery
resume_from_checkpoint
python -m faultline.cli resume demo-run-failedMetrics
Checkpoints
Latest checkpoint: step 40010.0m ago agocommitted
100
200
300
400
Events
- infofaultline.run.resume_completed7/3/2026, 12:18:06 AM
recovery succeeded from checkpoint step 400
Recovery successful
- errorfaultline.run.failed7/3/2026, 12:17:06 AM
Slurm node eviction (demo)
- infofaultline.checkpoint.saved7/3/2026, 12:16:06 AM
checkpoint step 400