Skip to content

Troubleshooting Guide

Use this page to monitor in-flight runs, restart cleanly after interruptions, and resolve the most common configuration or data issues. For a full catalogue of generated files, see the Output Data Description.

Monitor & Restart

Artefact / Setting Applies to How to use
Stdout / stderr logs All runs Tail logs in real time or redirect to run.log to capture warnings and configuration echoes.
epoch_<N>_PPmodel.pth Gradient runs Copy the desired epoch checkpoint to <inputs>/init_PPmodel.pth (and init_AdamState.pth if preserving optimizer state) to resume from that epoch.
best_pot.*,
best_plotPP.*
Monte Carlo Promote the chosen file to init_PPmodel.pth and change the parameters to rerun for the next stage.
separateKptGrad,
checkpoint
Memory relief Enable separateKptGrad = 1 to process \(\mathbf{k}\)-points sequentially; add checkpoint = 1 if memory pressure persists (expect slower runtimes). This changes only how the band loss is accumulated; defPot and coupling targets remain system-level terms in gradient runs.

Quick Fixes

Symptom Likely cause Recommended fix
Parser errors about missing keys Required entries absent in NN_config.par Ensure PPmodel, hiddenLayers, nSystem, and any mode-specific knobs (e.g., max_num_epochs, mc_iter) are present.
Band-count mismatch nBands disagrees with bandWeights_X.par or expBandStruct_X.par Regenerate inputs so len(bandWeights) = nBands = columns(expBandStruct) - 1.
Divergent or unstable loss Step size too large or band weights skewed Reduce optimizer_lr, confirm scheduler_gamma < 1, and inspect bandWeights_X.par for extreme values.
Memory exhaustion Large maxKE, too many cores, no gradient splitting Set separateKptGrad = 1, consider checkpoint = 1, reduce maxKE, or run with num_cores = 0.

Utilities & Self-Checks

  • inflate_kpoints.py densifies \(\mathbf{k}\)-point paths for debugging convergence.
  • plot_BS_from_file.py, plot_SOC_NL_T_Vloc.py visualize existing results without rerunning training.
  • utils/pp_func.py exposes Fourier-transform helpers for notebooks.