Skip to content

Global Configuration Settings (NN_config.par)

NN_config.par contains all the global runtime and calculation settings.

The parser accepts key = value pairs, ignores blank lines, and strips anything after #. Booleans are written as 0/1, and integer lists such as hiddenLayers are space-separated.

Minimum Required Keys

Key Type Purpose / Options
PPmodel string Class name from utils/nn_models.py.
Common choices are: Net_relu_xavier, Net_celu_HeInit_decayGaussian, Net_relu_xavier_BN_dropout_decayGaussian, Net_sigmoid_xavier_decayGaussian, Net_celu_RandInit_decayGaussian, ZeroFunction. Each suffix encodes activation (relu/celu/sigmoid), initialization (xavier/HeInit/RandInit), normalization (BN), dropout, and gating (decay, decayGaussian).
hiddenLayers list of int Space-separated layer widths, e.g., 32 32.
The code automatically prepends an 1D input layer (\(\|\mathbf{G}\|\)) and appends an output layer whose size equals the number of unique atoms inferred from system_X.par.
nSystem int Number of system_X datasets to ingest. Must match the number of indexed input bundles (system_0.par, system_1.par, ...).

Execution Controls & Diagnostic Flags

Key Type Effect / Options
num_cores int 0 disables multiprocessing (default). A positive value spawns that many worker processes for Hamiltonian assembly and caches SO/NL matrices in shared memory.
The value is clipped to the available CPU count.
SHOWPLOTS bool (0/1) 1 opens interactive Matplotlib windows for band/potential plots. Set 0 for cluster or batch runs to avoid GUI errors.
separateKptGrad bool (0/1) When 1, separates the band-structure loss by each \(\mathbf{k}\)-point and accumulates gradients manually. Recommended for memory-heavy gradient runs. Slightly increases runtime but drastically shrinks peak memory usage. Auxiliary observables such as deformation potentials and coupling targets are still evaluated as global system-level loss terms, so they remain consistent with the non-separated workflow.
checkpoint bool (0/1) Enables PyTorch gradient checkpointing. Helpful in cases when memory limit is a major issue. Not generally recommended. Particularly, if both checkpoint and separateKptGrad are 1, a warning is emitted because the run becomes slow.
memory_flag bool (0/1) Toggle for generating memory usage reports for debugging (run with mprof to generate the report and plots).
runtime_flag bool (0/1) Toggle for wall-clock timers and debugging print statements to stdout.
printGrad bool (0/1) Toggle for printing out per-layer gradient norms each epoch. Helps debug vanishing/exploding gradients or confirm that all atoms receive updates.

Spin-Orbit

See Long-range (LR), spin-orbit coupling (SOC), nonlocal (NL), and strain terms for additional context on how these channels enter the Hamiltonian.

Key Type Effect / Options
SObool bool (0/1) Toggle for the spin-orbit coupling matrices. Only enable when the bundle includes SO information in init_<atom>Params.par and the training data expects SOC splittings, as constructing the SOC Hamiltonian drastically increases runtime and memory.
cacheSO bool (0/1) Valid only when SObool = 1. Caches SO/NL matrices in shared memory for faster repeated access, but slightly memory hungry. 0 recomputes the matrices each epoch (slower but safer on limited RAM).
Strongly recommended to turn on when training with SO/NL parameters.

Model Architecture Details

For a broader discussion of the available NN architectures and analytic forms, see Pseudopotential Model Selection.

Key Type Effect / Options
PPmodel_decay_rate,
PPmodel_decay_center
float Used only by *_decay variants. PPmodel_decay_rate sets the steepness of the logistic tail; PPmodel_decay_center is the \(\|\mathbf{G}\|\) value (in Bohr\(^{-1}\)) where decay begins. Increase PPmodel_decay_rate for sharper cutoffs.
PPmodel_gaussian_std float Used by *_decayGaussian variants. Controls the Gaussian envelope width in the reciprocal space.
PPmodel_scale list of float Exclusive to Net_celu_HeInit_scale_decayGaussian. Provide one scaling factor per atom species to bias certain channels (e.g., heavier atoms) during initialization.
penalize_starting float Lower bound (in Bohr\(^{-1}\)) of the reciprocal-space range where the penalty is applied.
penalize_lambda float Strength of the penalty that discourages deviations from the initialization curve beyond penalize_starting. Increase to keep the NN closer to the analytic prior.
penalize_mag_threshold float Global threshold applied independently to each atom-channel value of the reciprocal-space pseudopotential. The magnitude penalty acts on the excess max(|v(G)| - penalize_mag_threshold, 0) over the full sampled range [0, 12] Bohr\(^{-1}\).
penalize_mag_lambda float Prefactor for the magnitude penalty. Set <= 0 to disable this term entirely.

Tip: browse utils/nn_models.py to see the full catalog and the mathematical form of each PPmodel. The suffixes encode activation (relu, celu, sigmoid), initializer (xavier, HeInit, RandInit), and whether batch norm (BN), dropout, or decay gates are included.

Numerical note: extremely large penalize_mag_threshold values are not useful in practice and can trigger numerical instability in the current implementation. The code emits a warning when penalize_mag_threshold > 100.

Initialization Calculation Settings

Additional guidance on these options appears in Initialization Strategies.

These knobs control the initialization of the neural network pseudopotential's initialization against analytic potentials. Such initialization is run only when init_Zunger_num_epochs > 0 and analytic init_<atom>Params.par vectors (or init_qSpace_pot.par) are present.

Key Type Effect / Options
init_Zunger_num_epochs int 0 (default) skips analytic pre-training. Any positive value trains against the Zunger (or tabulated) curves for that many epochs before fitting bands.
init_Zunger_optimizer string Optimizer for the initialization stage.
Supported: adam (default) or sgd.
init_Zunger_optimizer_lr float Learning rate during initialization. Required when init_Zunger_num_epochs > 0.
init_Zunger_scheduler_gamma float Exponential LR decay applied every schedulerStep init epochs.
Leave unset (or =1.0) to keep the LR constant.
init_Zunger_plotEvery int Frequency for saving initZunger_plotPP.* or initZunger_epoch_* outputs. Use large values (e.g., 500) to minimize I/O, or -1 to skip.
init_Zunger_printGrad bool (0/1) 1 prints gradient statistics during initialization, useful for diagnosing stagnant curves.

Order of initialization:
Load init_PPmodel.pth/init_AdamState.pth if present, otherwise fit to init_qSpace_pot.par/init_<atom>Params.par, otherwise fall back to the weight initializer baked into PPmodel (Xavier/He/etc.).

Gradient-Based Training Loop

Active only when max_num_epochs > 0 and mc_bool = 0. See Workflow Selection for guidance on when to use this mode versus Monte Carlo.

Key Type Effect / Options
max_num_epochs int Positive integer activates gradient training.
optimizer string PyTorch optimizer name.
Supported: adam (default), sgd, asgd, lbfgs, adadelta, adagrad, adamw, sparseadam, adamax, nadam, radam, rmsprop.
optimizer_lr float Initial learning rate for the chosen optimizer.
sgd_momentum,
adam_beta1,
adam_beta2
float Optional overrides for the optimizer hyperparameters. Ignored unless the optimizer supports them.
scheduler_gamma float Multiplicative decay factor for the exponential LR scheduler (values <1 decay the LR).
schedulerStep int Number of epochs between scheduler updates. Combine with scheduler_gamma to control how quickly the LR decays.
plotEvery int Save epoch_<N>_PPmodel.pth, band plots, and cost snapshots every plotEvery epochs.
patience int Early-stopping patience measured in epochs. Defaults to max_num_epochs + 1 (effectively disables early stopping).
perturbEvery int If >0, injects a random perturbation into NN weights every perturbEvery epochs. Set -1 or omit to disable.

When switching optimizers mid-project, remove any stale init_AdamState.pth files; otherwise PyTorch will attempt to load incompatible states.

Pre-adjustment Nudges (Optional)

Key Type Effect / Options
pre_adjust_moves int Number of single-parameter “pre-adjust” iterations to run before epoch 0. Each move optimizes the weight scalar with the largest gradient magnitude.
This is a handy feature when initial losses are huge as this method guarantees decrease of loss when learning rate is small.
pre_adjust_stepSize float Optional learning rate for the pre-adjust stage. Defaults to optimizer_lr if omitted.

Monte Carlo Mode

See Workflow Selection for the decision chart comparing Monte Carlo refinement and gradient training.

Enable by setting mc_bool = 1 while keeping max_num_epochs = 0. The Monte Carlo kernel honors the following keys:

Key Type Effect / Options
mc_bool bool (0/1) 1 activates Monte Carlo refinement. Requires max_num_epochs = 0.
mc_iter int Number of trial moves per Monte Carlo block. After each block the code logs cost traces and (optionally) writes plots.
mc_perturb_mode int Selects one of the perturbation kernels defined in utils/NN_train.py::perturb_model (\(\varepsilon\) = mc_percentage):
Mode 1 – Multiply every NN weight by a random factor in \([1-\varepsilon, 1+\varepsilon]\); SOC/NL parameters scale by \((1 \pm \varepsilon/100)\).
Mode 2 – Magnitude-aware perturbations: if the weight value has absolute value larger than 20, they are nudged back toward zero by at most \(\varepsilon \cdot \|value\|\), near-zero values amplified by up to \(10\cdot \varepsilon\cdot value\), SOC/NL scale by \((1 \pm \varepsilon/1000)\).
Mode 3 – Normalize each tensor, randomly shift 50% of entries by \(N(0, \varepsilon)\) in normalized space, then denormalize (NN weights only).
Mode 4 – Add absolute steps of \(\pm \varepsilon\) to ~60% of NN weights and \(\pm \varepsilon/1000\) to SOC/NL parameters. (Larger NN weights perturbations than SOC/NL parameters)
Mode 5 – Add absolute steps of \(\pm \varepsilon/10\) to ~60% of NN weights and \(\pm \varepsilon\) to SOC/NL parameters. (Larger SOC/NL parameter perturbations than NN weights)
Mode 6 – Keep NN weights fixed; perturb SOC parameters only by \(\pm \varepsilon\).
Mode 7 – Keep NN weights fixed; perturb NL parameters only by \(\pm \varepsilon\).
mc_percentage float The \varepsilon magnitude passed to the selected perturbation mode. Increase to encourage larger exploration steps; decrease for conservative proposals.
mc_beta float Inverse temperature \(\beta\) in the Monte Carlo Metropolis criterion. Larger \(\beta\) makes the search greedier; smaller \(\beta\) increases acceptance of uphill moves.
mc_beta_schedule string Optional path to a text file containing iteration_index beta pairs. Overrides mc_beta to implement annealing/tempering schedules.

The Monte Carlo workflow often read companion files described in System & Data Files: mcOpts*.par (global MC settings), <atom>ParamSteps.par (per-atom step sizes), and mc_beta_schedule files.

Eigenvalue Reordering

Key Type Effect / Options
smooth_reorder bool (0/1) Enables the smooth eigenvalue-matching heuristic. Use when frequent band crossings cause large loss spikes.
eigvec_reorder bool (0/1) Forces eigenvector-based band sorting (more robust but slower).