Brief Intro. to Inputs and Commands#
The following files are the central input files for DeePTB. Before executing the program, please make sure these files are prepared and stored in the working directory. Here we give some simple descriptions, for more details, users should consult the Advanced session.
Inputs#
Data#
The dataset files contrains both the atomic structure and the training label information.
The atomic structure contains the atoms’ position, unit-cell vector and atomic number vector. They must be included in your datafile in all task. The training labels are prepared dependent on each task. If you are working on DeePTB-SK
mode, the eigenvalues and kpoints are needed. If you are working with DeePTB-E3
mode the Hamiltonian/Density Matrix under LCAO basis must be provided, while overlap matrix are optionally provided (But we suggest to do so for convenience).
The atomic structure should be prepared in either ASE trajectory binary file format, or the plain text format. We highly suggest to use the tool dftio
to deal with the data preparation. It can transform the data from DFT output to the target format automatically. Herefore completion, we will introduce the format of each type.
For ASE trajectory binary file, each structure is stored using an Atom class defined in ASE package. The provided trajectory file must have suffix
.traj
and the length of the trajectory isnframes
For the plain text format, three seperate textfiles for atomic structures need to be provided:
atomic_numbers.dat
,cell.dat
andpositions.dat
. The length unit used incell.dat
andpositions.dat
(if cartesian coordinates) is Angstrom.For training a DeePTB-SK model, we need to prepare the
eigenvalues
label, which contrains theeigenvalues.npy
andkpoints.npy
. A typical dataset of DeePTB-SK task looks like:data/ -- set.x -- -- eigenvalues.npy # numpy array of fixed shape [nframes, nkpoints, nbands] -- -- kpoints.npy # numpy array of fixed shape [nkpoints, 3] -- -- xdat.traj # ase trajectory file with nframes -- -- info.json # defining the parameters used in building AtomicData graph data
The band structures data includes the kpoints list and eigenvalues in the binary format of
.npy
. The shape of kpoints data is fixed as [nkpoints,3] and eigenvalues is fixed as [nframes,nkpoints,nbands]. Thenframes
here must be the same as in atomic structures files.Important: The eigenvalues.npy should not contain bands that contributed by the core electrons, which is not setted as the TB orbitals in model setting.
For typical DeePTB-E3 task, we need to prepare the Hamiltonian/density matrix along with overlap matrix as labels. They are arranged as hdf5 binary format, and named as
hamiltonians.h5
/density_matrices.h5
andoverlaps.h5
respectively. A typical dataset of DeePTB-E3 looks like:data/ -- set.x -- -- positions.dat # a text file with nframe x natom row and 3 col -- -- cell.dat # a text file with nframe x 3 row and 3 col, or 3 rol and 3 col. -- -- atomic_numbers.dat # a text file with nframe x natom row and 1 col -- -- hamiltonian.h5 # a hdf5 dataset file with group named "0", "1", ..., "nframe". Each group contains a dict of {"i_j_Rx_Ry_Rz": numpy.ndarray} -- -- overlaps.h5 # a hdf5 dataset file with group named "0", "1", ..., "nframe". Each group contains a dict of {"i_j_Rx_Ry_Rz": numpy.ndarray} -- -- info.json
Data settings: info.json#
In DeePTB, the atomic structures and band structures data are stored in AtomicData graph structure. info.json
defines the key parameters used in building AtomicData graph dataset, which looks like:
{
"nframes": 1,
"pos_type": "ase/cart/frac",
"pbc": [true, true, true]
}
nframes
is the length of the trajectory, as we defined in the previous section. pos_type
defines the input format of the atomic structures, which is set to ase
if ASE .traj
file is provided, and cart
or frac
if cartesian / fractional coordinate in positions.dat
file provided. The pbc
specifies the periodic boundray condition of the system. The three value coresponding to the three boundary vector set in the unit cell information of the atomic data file.
For DeePTB-SK mode, we should also specify the parameters in info.json
that controls the fitting eigenvalues:
{
"nframes": 1,
"pos_type": "ase/cart/frac",
"pbc": [true, true, true],
"bandinfo": {
"band_min": 0,
"band_max": 6,
"emin": null, # optional
"emax": null # optional
}
}
bandinfo
defines the fitting target. The emin
and emax
defines the fitting energy window of the band, while the band_min
and band_max
select which band are targeted.
note: The
0
energy point is located at the lowest energy eigenvalues from the band of band_min. The band_min should be aligned to the same index for eigenvalues from DFT ans DeePTB.
Train config: input.json#
DeePTB provides input config templates for quick setup. User can run:
dptb config -tr [[-e3] <e3tb>] [[-sk] <sktb>] [[-skenv] <sktbenv>] PATH
The template config file will be generated at the PATH
. We provide several template for different mode of deeptb, please run dptb config -h
to checkout.
In general, the input.json
file contains following parts:
common_options:#
provides vital information to build a DeePTB models.
For DeePTB-SK mode. The example of
common_options
is:
"common_options": {
"basis": {
"C": ["2s", "2p"],
"N": ["2s", "2p", "d*"]
},
"device": "cpu",
"overlap": false,
"dtype": "float32",
"seed": 42
}
For DeePTB-E3 mode, the basis with similar format is different using string instead a list. For
C
andN
with DZP basis, thebasis
should be defined as:
"basis": {
"C": "2s2p1d",
"N": "2s2p1d"
}
train_options#
spicify the training procedure.
"train_options": {
"num_epoch": 500,
"batch_size": 1,
"optimizer": {
"lr": 0.05,
"type": "Adam"
},
"lr_scheduler": {
"type": "exp",
"gamma": 0.999
},
"loss_options":{
"train": {"method": "eigvals"}
},
"save_freq": 10,
"validation_freq": 10,
"display_freq": 10
}
The loss_options
section is used to specify the loss function used in training. The method
key is used to specify the loss function, which can be eigvals
, hamil_abs
, and hamil_blas
. The eigvals
loss is used for DeePTB-SK model, while hamil_abs
and hamil_blas
are used for DeePTB-E3 model. Here Adam
optimizer is always preferred for better convergence speed. While the lr_scheduler
are recommended to use “rop”, as:
"lr_scheduler": {
"type": "rop",
"factor": 0.8,
"patience": 50,
"min_lr": 1e-6
}
More details about rop is available at: https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.ReduceLROnPlateau.html
model_options#
key setting to build DeePTB models.
For DeePTB-SK model without env correction, only the nnsk
section is needed. The example of a nnsk
model is:
"model_options": {
"nnsk": {
"onsite": {"method": "uniform"},
"hopping": {"method": "powerlaw", "rs":2.6, "w": 0.3},
"freeze": false,
"push": false
}
}
Different method of onsite
and hopping
have their specific parameters requirements, please checkout our full parameter lists.
For DeePTB-SK model with environment dependency, the embedding
, prediction
and nnsk
sections are required as in this example:
"model_options": {
"embedding":{
"method": "se2", "rs": 2.5, "rc": 5.0,
"radial_net": {
"neurons": [10,20,30]
}
},
"prediction":{
"method": "sktb",
"neurons": [16,16,16]
},
"nnsk": {
"onsite": {"method": "uniform"},
"hopping": {"method": "powerlaw", "rs":5.0, "w": 0.1},
"freeze": true
}
}
For DeePTB-E3 model, only embedding
and prediction
is required, as:
"model_options": {
"embedding": {
"method": "slem/lem", # s in slem stands for strict localization
"r_max": {
"C": 7.0,
"N": 7.0
},
"irreps_hidden": "32x0e+32x1o+16x2e+8x3o+8x4e+4x5o",
"n_layers": 3,
"n_radial_basis": 18,
"env_embed_multiplicity": 10,
"avg_num_neighbors": 51,
"latent_dim": 64,
"latent_channels": [
32
],
"tp_radial_emb": true,
"tp_radial_channels": [
32
],
"PolynomialCutoff_p": 6,
"cutoff_type": "polynomial",
"res_update": true,
"res_update_ratios": 0.5,
"res_update_ratios_learnable": false
},
"prediction":{
"method": "e3tb",
"scales_trainable":false,
"shifts_trainable":false,
"neurons": [64,64] # optional, required when overlap in common_options is True
}
}
data_options#
assigns the datasets used in training.
"data_options": {
"train": {
"type": "DefaultDataset", # optional, default "DefaultDataset"
"root": "./data/",
"prefix": "kpathmd100",
"get_Hamiltonian": false, # optional, default false
"get_eigenvalues": true, # optional, default false
"get_overlap": false, # optional, default false
"get_DM": false # optional, default false
},
"validation": {
"type": "DefaultDataset",
"root": "./data/",
"prefix": "kpathmd100",
"get_Hamiltonian": false,
"get_eigenvalues": true,
"get_overlap": false,
"get_DM": false
}
}
Commands#
Training#
When data and input config file is prepared, we are ready to train the model:
dptb train <input config> [[-o] <output directory>] [[-i|-r] <deeptb checkpoint path>]